Box Office Mojo Scraper - Scrapy

Running The Scraper

Add the csv file with all the titleId(s) of the movies into the project root.
Set that csv file as the argument of df.read_csv() in spider.py.
If you paused scraping and want to resume, set the total number of rows already scraped in the rowsScraped variable of spider.py.
Run with scrapy crawl mojo -L WARN -o master_dataset.csv to append the results to an existing csv.
Use aggregator.py to join with some other csv file based on the titleId.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
scrapeByIds		scrapeByIds
.gitignore		.gitignore
README.md		README.md
aggregator.py		aggregator.py
cumulative_popularity_generator.py		cumulative_popularity_generator.py
data_analysis.py		data_analysis.py
master_aggregator.py		master_aggregator.py
numericizer.py		numericizer.py
scrapy.cfg		scrapy.cfg
synopsis_joiner.py		synopsis_joiner.py