Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Ruby CSS JavaScript
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
app Initial
bin Initial
config Initial
db Initial
lib Initial
log Initial
public Initial
spec Initial
vendor Initial
.rspec Initial
.ruby-gemset Initial
.ruby-version Initial
Gemfile Initial
Gemfile.lock Initial
Procfile Initial Initial
Rakefile Initial Initial

Rap Genius Trackback Scraper

This is the tool we used to scrape 178k URLs in 15 minutes in order to find which pages were hosting potentially spammy Rap Genius links. Given a list of URLs to scrape, it creates aggregate information that identifies the spammiest sites for manual review.

For more details on the motivation and background for this repository, check out the blog post on Rap Genius


You can run the scrape process using a set of sample data in vendor/urls.txt. To get started:

$ bundle install && rake db:create db:migrate urls:import
$ gem install foreman
$ mkdir tmp
$ foreman start worker

Then, once the pages have all been scraped (i.e., Page.unscraped.count == 0):

# from the console



Something went wrong with that request. Please try again.