Compact JavaScript HTML parser.
- Target and Development Environments
- Functions and Features
- Development and test
- Links
- License
- Works in a wide range of environments (but is slow) because it does not use
RegExp
- Written in Closure Script
- About 7KB including handler
HTML document fragments written by web designers generally work correctly.
- The document tree can be constructed correctly even if the optional closing tag is omitted.
caption,dd,li,td,dt,th,p,rb,rp,rt,html,head,colgroup,optgroup,option,tbody,thead,tfoot,tr,rbc,rtc
- Broken document fragments in conditional comments can also be parsed.
<!--[if IE 8]> </div><br clear=both><div> <![endif]-->
- Element missing end tag
- An “auto-closing end tag” that is not present in the document, cannot omit the end tag, and is not closed by another starting tag is identified by the
isImplicit
flag. (onParseEndTag
)
- An “auto-closing end tag” that is not present in the document, cannot omit the end tag, and is not closed by another starting tag is identified by the
- Element missing start tag
isMissingStartTag
flag is true (onParseEndTag
)
- Time Slice Execution
- Parsing Stop
<html><head><body>
is not a supplement to create a complete HTML document like parse5.<table><p>
and other invalid documents, the structure of the tree created from them differs from the specification.- XHTML is not well tested.
- Do not remove newline characters in
<pre>, <listing>, <textarea>
.
git clone https://github.com/ECMAScript2/es2-html-parser
cd es2-html-parser
npm i
gulp dist
npm run test
See src/js/example/*.js for how to write the handler. A SAX Style API is provided.
See test/*.js for how to use the parser.
- Original code by Erik John Resig (ejohn.org) Early JavaScript HTML parser, compact code but useful in most cases
- pettanR / webframework / js / 02_Dom / 09_HTMLParser.js Based on John Resig's code, without regular expressions
- html.json Project using es2-html-parser
ES2 HTML Parser is licensed under MIT license.