Skip to content

Commit

Permalink
Added bench.js
Browse files Browse the repository at this point in the history
taken from astro/node-expat
  • Loading branch information
fb55 committed Feb 8, 2012
1 parent e7cb57f commit f406be9
Show file tree
Hide file tree
Showing 2 changed files with 121 additions and 11 deletions.
32 changes: 21 additions & 11 deletions README.md
Expand Up @@ -5,15 +5,6 @@ A forgiving HTML/XML/RSS parser written in JS for NodeJS. The parser can handle
##Installing
npm install htmlparser2

##How is this different from [node-htmlparser](https://github.com/tautologistics/node-htmlparser)?
This is a fork of the project above. The main difference is that this is just intended to be used with node (it runs on other platforms using [browserify](https://github.com/substack/node-browserify)). Besides, the code is much better structured, has less duplications and is remarkably faster than the original.

The parser now provides a callback interface close to [sax.js](https://github.com/isaacs/sax-js) (originally intended for [readabilitySAX](https://github.com/fb55/readabilitysax)). I also fixed a couple of bugs & included some pull requests for the original project (eg. [RDF feed support](https://github.com/tautologistics/node-htmlparser/pull/35)).

The support for location data and verbose output was removed a couple of versions ago. It's still available in the [verbose branch](https://github.com/FB55/node-htmlparser/tree/verbose) (if you really need it, for whatever reason that may be).

The `DefaultHandler` and the `RssHandler` were renamed to clarify their purpose (to `DomHandler` and `FeedHandler`). The old names are still available when requiring `htmlparser2`, so your code should work as expected.

##Usage

```javascript
Expand Down Expand Up @@ -56,7 +47,26 @@ Read more about the DomHandler in the [wiki](https://github.com/FB55/node-htmlpa
##Parsing RSS/RDF/Atom Feeds
```javascript
new htmlparser.FeedHandler(function (error, feed) {
new htmlparser.FeedHandler(function(<error> error, <object> feed){
...
});
```
```
##Performance
Using a slightly modified version of [node-expat](https://github.com/astro/node-expat)s `bench.js`, I received the following results (on a MacBook (late 2010):
* [htmlparser](https://github.com/tautologistics/node-htmlparser): 51779 el/s
* [sax.js](https://github.com/isaacs/sax-js): 53169 el/s
* [node-expat](https://github.com/astro/node-expat): 103388 el/s
* [htmlparser2](https://github.com/fb55/node-htmlparser): 118614 el/s
The test may be found in `tests/bench.js`.
##How is this different from [node-htmlparser](https://github.com/tautologistics/node-htmlparser)?
This is a fork of the project above. The main difference is that this is just intended to be used with node (it runs on other platforms using [browserify](https://github.com/substack/node-browserify)). Besides, the code is much better structured, has less duplications and is remarkably faster than the original.
The parser now provides a callback interface close to [sax.js](https://github.com/isaacs/sax-js) (originally intended for [readabilitySAX](https://github.com/fb55/readabilitysax)). I also fixed a couple of bugs & included some pull requests for the original project (eg. [RDF feed support](https://github.com/tautologistics/node-htmlparser/pull/35)).
The support for location data and verbose output was removed a couple of versions ago. It's still available in the [verbose branch](https://github.com/FB55/node-htmlparser/tree/verbose).

The `DefaultHandler` and the `RssHandler` were renamed to clarify their purpose (to `DomHandler` and `FeedHandler`). The old names are still available when requiring `htmlparser2`, so your code should work as expected.
100 changes: 100 additions & 0 deletions tests/bench.js
@@ -0,0 +1,100 @@
/*
var node_xml = require("node-xml");
function NodeXmlParser() {
var parser = new node_xml.SaxParser(function(cb) { });
this.parse = function(s) {
parser.parseString(s);
};
}
var p = new NodeXmlParser();
*//*
var libxml = require("libxmljs");
function LibXmlJsParser() {
var parser = new libxml.SaxPushParser(function(cb) { });
this.parse = function(s) {
parser.push(s, false);
};
}
var p = new LibXmlJsParser();
*//*
var sax = require('sax');
function SaxParser() {
var parser = sax.parser();
this.parse = function(s) {
parser.write(s);
}
}
var p = new SaxParser();
*//*
var expat = require('node-expat');
function ExpatParser() {
var parser = new expat.Parser();
this.parse = function(s) {
parser.parse(s, false);
};
}
var p = new ExpatParser();
*//*
var htmlparser = require('htmlparser');
function HtmlParser() {
var handler = new htmlparser.DefaultHandler();
var parser = new htmlparser.Parser(handler);
this.parse = function(s) {
parser.parseComplete(s);
};
}
var p = new HtmlParser();
*/
var htmlparser2 = require('htmlparser2/lib/Parser.js');

// provide callbacks
// otherwise, parsing would be optimized
var emptyCBs = {
onopentagname: function(){},
onattribute: function(){},
ontext: function(){},
onclosetag: function(){}
};

function HtmlParser2() {
var parser = new htmlparser2(emptyCBs);
this.parse = function(s) {
parser.write(s);
};
}

var p = new HtmlParser2();


p.parse("<r>");
var nEl = 0;
(function d() {
p.parse("<foo bar='baz'>quux</foo>");
nEl++;
process.nextTick(d);
})();

var its =[];
setInterval(function() {
console.log(nEl + " el/s");
its.push(nEl);
nEl = 0;
}, 1e3);

process.on('SIGINT', function () {
var average = its.reduce(function(average, v){
return average+v;
}) / its.length;
console.log("Average:", average, "el/s");
process.exit(0);
});

0 comments on commit f406be9

Please sign in to comment.