RSS feeds of comics sites usually contains the links to a webpage but not the strip images.
This module iterates on the items on a feed and parse the webpages to create a new feed with embedded comic strips.
Supported websites:
- GoComics
- Dilbert.com
- Explosm.net (credits to eguendelman)
The list of parsers is meant to be extensible, see
Parsers
.
PRs are welcome.
Inspired by gocomics-scrape and re-implemented using Node.
npm install comics-feed
comics-feed [.rss|url]
Turns this
into this
(rendered by Firefox)
parsers/*.js
will be loaded automatically by parserFactory
as of 0.0.9.
A parser should have this interface:
/**
* Parser = {
* name,
* match(),
* scrape()
* }
*
* match():
* @param {Object} siteUrl parsed url for the comic strips site
* Returns a boolean whether this scraper can handle this site
*
* scrape():
* @param {String} baseUrl url of the webpage containing the comic strip
* @param {Object} $ [cheerio](http://matthewmueller.github.io/cheerio/) object containing the parsed page
* @param {Function} callback callback function to return the scraped info
*
* callback:
* @param {Object} error error object if one occurs
* @param {String} imgUrl URL for the strip's image
*
*/
See test/live.js
- allow parsers to return custom description
- error handling
- invalid URL
- malformed feed
- scraping error
- adds pubDate for items
- re-entrance
- module globals cleanup
heroku-comics-feed uses this module to provide a subscribable RSS service.