Skip to content
A node server and module which allows for cross-domain page scraping on web documents with JSONP or POST.
JavaScript HTML Shell
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
docs
lib
tests
.gitignore
README.md
index.js
package.json

README.md

noodle

noodle is a Node.js server and module for querying and scraping data from web documents. It features:

{
  "url": "https://github.com/explore",
  "selector": "ol.ranked-repositories h3 a",
  "extract": "href"
}

Features

  • Cross domain document querying (html, json, xml, atom, rss feeds)
  • Server supports querying via JSONP and JSON POST
  • Multiple queries per request
  • Access to queried server headers
  • Allows for POSTing to web documents
  • In memory caching for query results and web documents

Server quick start

Setup

$ npm install noodlejs

or

$ git clone git@github.com:dharmafly/noodle.git
$ cd noodle
$ npm install

Start the server by running the binary

$ bin/noodle-server
Noodle node server started
├ process title  node-noodle
├ process pid    4739
└ server port    8888

You may specify a port number as an argument

$ bin/noodle-server 9090
Noodle node server started
├ process title  node-noodle
├ process pid    4739
└ server port    9090

Noodle as a node module

If you are interested in the node module just run npm install noodlejs, require it and check out the noodle api

var noodle = require('noodlejs');

noodle.query({
  url:      'https://github.com/explore',
  selector: 'ol.ranked-repositories h3 a',
  extract:  'href'
})
.then(function (results) {
  console.log(results);
});

Tests

The noodle tests create a temporary server on port 8889 which the automated tests tell noodle to query against.

To run tests you can use the provided binary from the noodle package root directory:

$ cd noodle
$ bin/tests

Contribute

Contributors and suggestions welcomed.

You can’t perform that action at this time.