script that crawls the request database and my school library CD catalog and returns matches.
Python JavaScript
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


recursive web crawler that scrapes request titles on and CD titles at a school library to check for matches.

make sure the mongodb server is running!

to run the scraper, go into the outer what_crawler directory and run

scrapy crawl

to run the library scraper, go into the outer library_crawler directory and run

scrapy crawl bobcat

after all of the titles have been scraped, run

mongo localhost:27017/whatRequests --quiet aggregator.js

To change the scraper to run on a different library catalog, will need to modified. Changes will need to be made to allowed_domains, start_urls, and the two XPaths, which can be found using the firebug copy XPath feature or using the XPath chrome extension.

If you're using windows, you will need to download pywin32 because of a bug in Twisted.