Fast and real-time extraction of web pages information (html, text, etc) using node-dom based on given criterias (example : retrieves real-time the price of a product)
Switch branches/tags
Nothing to show
Latest commit 9079126 Sep 7, 2012 @Ayms modifs
Failed to load latest commit information.
lib Colossal performances increase Mar 13, 2012
test first commit Dec 6, 2011
LICENSE.txt first commit Dec 6, 2011 modifs Sep 7, 2012
package.json Prepare node-Tor tunnelling Aug 2, 2012


Node.js implementation of Extract Widget bot using, and


Real-time extraction of web pages information (html, text, etc) based on given criterias.

It can be used as a server or an API, then parameters are passed in the URL, or directly as an independant node.js module.

The difference with node-gadgets is that for performances reasons it does not return the full gadgets, only the relevant information (shopbot example : seeking for "nike lebron 9" will return real-time the price of the shoes on nike store's web site)

Install :

npm install node-bot


git clone
cd node-bot
npm link .

Complementary modules : node-ewa

 Note : node-ewa is not a public module for now, so you can only use node-bot's server/API mode. 

Use :

getelements.js :

As a module :

	var getElements = require('node-bot').getElements;
	var $E=encodeURIComponent;
	var response={
		end:function(gadgets) {
			//output format, see below
	with node-googleSearch
	var params='search='+$E('nike shoes')'+'&name='+$E(nike_shoes)+'&regexp='+$E('\\$|€')+'&nbmax=20';
	without node-googleSearch
	var params='url=,pdp,ctr-inline/cid-1/pid-417121/pgid-437002&search='+$E('nike shoes')'+'&name='+$E(nike_shoes)+'&regexp='+$E('\\$|€')+'&nbmax=20';


As a server/API :

	var http = require('http'),  
	URL = require('url'),
	getElements = require('node-bot').getElements;

	var handleRequest = function (request, response) {
		var qs = URL.parse(request.url);
		if (qs.pathname == '/getelements'){


To call it directly :

with node-googleSearch

http://myserver:myport/getelements?name=nike_shoes&search='nikestore nike lebron9'&regexp=$|€&nbmax=20

without node-googleSearch

http://myserver:myport/getelements?url=,pdp,ctr-inline/cid-1/pid-417121/pgid-437002&name=nike_shoes&search='nikestore nike lebron9'&regexp=$|€&nbmax=20

Example with encoded parameters to retrieve the price of "lebron9" shoes on nike store :

with node-googleSearch


without node-googleSearch


To call it from a script :

	var xscript=document.createElement('SCRIPT');
	var params='name=nike_shoes'+'&search='+$E(nike shoes nikestore)+'&regexp='+$E('\\$|€')+'&nbmax=20'; //add url parameter if you already know it (do not use node-googleSearch)

	xscript.onload or onreadystatechange --> do what you have to do with the output

Output format (see more details below) : nike_shoes=(Array containing the gadgets) (where 'nike_shoes' does correspond to the parameter 'name')

	So to use it you can do :
	xscript.onload=function() {
		var res=eval('nike_shoes'+this.shoe_number); //for example if the parameter name depends on some var in your code
		if (res.length>0) {
		or simply
		if (nike_shoes.length>0) {

See example here : (see API code at the end of html file)

Note : if your regexp does contain "\" and if you pass it through a js var (Example above : $E('\\$|€')) make sure to double it.

Note2 : make sure that the encoding of your files/browsers is utf-8

Parameters :

url : the url of the site where you want to extract gadgets from, if absent the url is retrieved with node-googleSearch using the value of search string (example : "nikestore nike shoes" will return the first url returned by Google Search that matches this string).

name : the name that will become the name of the global var containing the output in its 'gadgets' property (example : nike_shoes.gadgets).

regexp : while building the DOM, node-dom will use that regular expression to detect the objects that you are looking for (example : regexp=$|€ --> you are looking for gadgets in the page that are related to a price in $ or €).

search : indicates that once the gadgets have been selected with the regexp, you can filter these gadgets based on the value of search field (example : "nikestore nike shoes" url can contain other products than shoes, node-bot will return only the results matching "nike shoes").

nbmax : important parameter for performances, the value does specify a limit for the weight of searched gadgets so node-bot does not spend a lot of time processing gadgets that are not relevant. The default value is 100, recommended value is 20.

Output :

The output is an Array of :

[gadget html,width,height,gadget name,reserved,base,price,html of regexp object]

No json format here for now for historical reasons and backward compatibility with existing projects (TODO later).

The first three parameters in the output are not filled by node-bot.

See documentation for more details.

Tunnelling with node-Tor :



Tests and API :

jCore server ( : http://my_server/getelements?params

Example with node-googleSearch :


Example without node-googleSearch :


You can use the API on jCore server : http://my_server (if by any unforeseen reasons the server is down, please advise).

Links above might not return a correct result due to changes on nike store web site, then you can try :

Webble project : (quick test : click on OK, then on first link that appears)

Example of API code and use, retrieve real time the price of babyliss homelight product on different merchant sites : (click on "acheter maintenant" then wait for prices to be displayed in green)

See tests.txt in ./test