Skip to content

leesei/node-comics-feed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

node-comics-feed

npm Licence David

RSS feeds of comics sites usually contains the links to a webpage but not the strip images.
This module iterates on the items on a feed and parse the webpages to create a new feed with embedded comic strips.

Supported websites:

  • GoComics
  • Dilbert.com
  • Explosm.net (credits to eguendelman)

The list of parsers is meant to be extensible, see Parsers.
PRs are welcome.

Inspired by gocomics-scrape and re-implemented using Node.

Usage

npm install comics-feed
comics-feed [.rss|url]

Turns this

Before

into this

After

(rendered by Firefox)

Parsers

parsers/*.js will be loaded automatically by parserFactory as of 0.0.9.

A parser should have this interface:

/**
 * Parser = {
 *   name,
 *   match(),
 *   scrape()
 * }
 *
 * match():
 * @param {Object}   siteUrl  parsed url for the comic strips site
 * Returns a boolean whether this scraper can handle this site 
 *
 * scrape():
 * @param {String}   baseUrl  url of the webpage containing the comic strip
 * @param {Object}   $        [cheerio](http://matthewmueller.github.io/cheerio/) object containing the parsed page
 * @param {Function} callback callback function to return the scraped info
 *
 * callback:
 * @param {Object}   error    error object if one occurs
 * @param {String}   imgUrl   URL for the strip's image 
 *
 */

Tested on

See test/live.js

TODO

  • allow parsers to return custom description
  • error handling
    • invalid URL
    • malformed feed
    • scraping error
  • adds pubDate for items
  • re-entrance
  • module globals cleanup

SaaS on Heroku

heroku-comics-feed uses this module to provide a subscribable RSS service.

About

Scrape comics strips from feed, embed the strip images and create a new feed

Resources

License

Stars

Watchers

Forks

Packages

No packages published