GitHub - acoffman/dvd_scrape: Fetches a CSV file containing DVD information from the web, parses it, saves it to a database, and scrapes the web for cover art.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
lib		lib
.gitignore		.gitignore
README		README

Repository files navigation

This is part of a larger project that I hope to complete someday. As it stands, 
this script will pull a ZIP file from the web and unzip it. It then parses the 
CSV file inside into a mysql database using ActiveRecord. The script then cycles through
the database and scrapes Amazon for cover art for each DVD.

The script "works" in the sense that it does everything that it is supposed to and 
won't crash. It is not optimized, tested very thouroughly or refactored much at all.

When I have some additional time, all of those things will get done, and I plan on writing
some RSpec tests for it as well (I know thats out of order!) 

The script is also very slow, it will take hours to run the large CSV file (about 400,000 values)
The download, unzip, parsing, and database creation are fairly fast, taking about 7 minutes for the whole process
on the large file. Fetching the artwork for the DVDs is the bottleneck, taking several seconds per DVD.