Skip to content

matthewmueller/mini-html-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mini-html-parser

Mini html parser for webworkers / node. Parses and builds a simplified DOM tree in one go. Intended for well-formed HTML.

Installation

With node.js:

npm install mini-html-parser

In the browser (with component):

$ component install matthewmueller/mini-html-parser
  • Development: 16kb
  • Minified + gzipped: 4kb

Example

var html = '<h1>some title</h1><p>this is a <em>post</em> from <a href="http://mat.io">mat.io</a>.</p>';
var parser = parser(html);
var dom = parser.parse();

API

Parser(html)

Create a parser with the following html string.

Parser#parse()

Parse the html string returning a simplified DOM object. The DOM object contains the following DOM nodes below. If the parser fails to parse the HTML string, parse will return an Error object.

element:

{
  nodeName: 'A',
  nodeType: 1,
  childNodes: [...],
  previousSibling: ...,
  nextSibling: ...,
  parentNode: ...
}

text:

{
  nodeName: '#text',
  nodeType: 3,
  nodeValue: '...',
  previousSibling: ...,
  nextSibling: ...,
  parentNode: ...
}

comment:

{
  nodeName: '#comment',
  nodeType: 8,
  nodeValue: '...',
  previousSibling: ...,
  nextSibling: ...,
  parentNode: ...
}

document fragment:

{
  nodeName: '#fragment',
  nodeType: 11,
  nodeValue: null,
  childNodes: [...],
  previousSibling: null,
  nextSibling: null,
  parentNode: null
}

TODO

  • handle other node types (doctype, etc.)
  • benchmark

This library won't parse X...

This is not a full-blown XML parser. It's error handling is minimal and is best suited for well-formed HTML. It uses regex for it's matching, which can lead to errors. For more information on this topic read this: http://stackoverflow.com/a/1732454/376773

Credits

A lot of the regular expressions and inspiration came from John Resig's Pure Javascript HTML Parser.

License

MIT

About

Mini HTML parser for webworkers / node. Intended for well-formed HTML.

Resources

Stars

Watchers

Forks

Packages

No packages published