Skip to content

Commit

Permalink
Move examples into visual range.
Browse files Browse the repository at this point in the history
  • Loading branch information
mape committed Dec 31, 2010
1 parent d6f7e7b commit 7129f9c
Showing 1 changed file with 60 additions and 58 deletions.
118 changes: 60 additions & 58 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,61 +8,6 @@ Via [npm](http://github.com/isaacs/npm):

$ npm install scraper

## Arguments

### First (required)
Contains the info about what page/pages will be scraped

#### string
'http://www.nodejs.org'
**or**

#### request object
{
'uri': 'http://search.twitter.com/search?q=nodejs'
, 'headers': {
'User-Agent': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)'
}
}
**or**

#### Array (if you want to do fetches on multiple URLs)
[
urlString
, urlString
, requestObject
, urlString
]

### Second (optional)
The callback that allows you do use the data retrieved from the fetch.

function(err, $, urlInfo) {
if (err) {throw err;}
/* Showing the data within urlInfo:
{ href: 'http://search.twitter.com/search?q=javascript',
protocol: 'http:',
slashes: true,
host: 'search.twitter.com',
hostname: 'search.twitter.com',
search: '?q=javascript',
query: 'q=javascript',
pathname: '/search',
port: 80 }
*/

console.log('Messages from: '+urlInfo.href);
$('.msg').each(function() {
console.log($(this).text().trim()+'\n');
}
}

### Third (optional)
This argument is an object containing settings for the fetcher overall.

* **reqPerSec**: float; (allows you to throttle your fetches so you don't hammer the server you are scraping)

## Examples

### Simple
Expand All @@ -71,7 +16,7 @@ First argument is an url as a string, second is a callback which exposes a jQuer
var scraper = require('scraper');
scraper('http://search.twitter.com/search?q=javascript', function(err, jQuery) {
if (err) {throw err}

jQuery('.msg').each(function() {
console.log(jQuery(this).text().trim()+'\n');
});
Expand All @@ -89,7 +34,7 @@ First argument is an object containing settings for the "request" instance used
}
, function(err, $) {
if (err) {throw err}

$('.msg').each(function() {
console.log($(this).text().trim()+'\n');
});
Expand All @@ -115,7 +60,7 @@ First argument is an array containing either strings or objects, second is a cal
]
, function(err, $, urlInfo) {
if (err) {throw err;}

console.log('Messages from: '+urlInfo.href);
$('.msg').each(function() {
console.log($(this).text().trim()+'\n');
Expand All @@ -126,6 +71,63 @@ First argument is an array containing either strings or objects, second is a cal
}
);



## Arguments

### First (required)
Contains the info about what page/pages will be scraped

#### string
'http://www.nodejs.org'
**or**

#### request object
{
'uri': 'http://search.twitter.com/search?q=nodejs'
, 'headers': {
'User-Agent': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)'
}
}
**or**

#### Array (if you want to do fetches on multiple URLs)
[
urlString
, urlString
, requestObject
, urlString
]

### Second (optional)
The callback that allows you do use the data retrieved from the fetch.

function(err, $, urlInfo) {
if (err) {throw err;}
/* Showing the data within urlInfo:
{ href: 'http://search.twitter.com/search?q=javascript',
protocol: 'http:',
slashes: true,
host: 'search.twitter.com',
hostname: 'search.twitter.com',
search: '?q=javascript',
query: 'q=javascript',
pathname: '/search',
port: 80 }
*/

console.log('Messages from: '+urlInfo.href);
$('.msg').each(function() {
console.log($(this).text().trim()+'\n');
}
}

### Third (optional)
This argument is an object containing settings for the fetcher overall.

* **reqPerSec**: float; (allows you to throttle your fetches so you don't hammer the server you are scraping)

## Depends on
* [tmpvar](https://github.com/tmpvar/)'s [jsdom](https://github.com/tmpvar/jsdom)
* [mikeal](https://github.com/mikeal/)'s [request](https://github.com/mikeal/node-utils/tree/master/request)
Expand Down

0 comments on commit 7129f9c

Please sign in to comment.