These map reduce functions use Common Crawl data to look at the spread of congressional legislation on the internet
Latest commit ed95662 Sep 18, 2012 Albert Wavering Removing build folder
Permalink
Failed to load latest commit information.
bin Updated project files Sep 18, 2012
conf
dist/lib Updated project files Sep 18, 2012
lib Updated project files Sep 18, 2012
src Updated project files Sep 18, 2012
test/java/org/commoncrawl/hadoop/mapred Updated project files Sep 18, 2012
.DS_Store
README-Amazon-AMI
README.md Revised to reflect integrated code. Sep 18, 2012
VERSION Updated project files Sep 18, 2012
build.properties Updated project files Sep 18, 2012
build.xml Updated project files Sep 18, 2012

README.md

CC-Bill-Tracker

These map reduce functions use Common Crawl data to look at the spread of congressional legislation on the internet.

Program Tasks:

  1. Count on how many pages the bill, in any of its forms, has been mentioned
  2. Record the domains of pages that mention a bill, in any of its forms, and outputs the 50 domains that have mentioned the bill the most (with their count of pages that have mentioned the bill)
  3. Output the top 50 words found across all pages that mention a bill in any of its forms, less a set of 100 very common words

These functions are called from the file TotalAnalysis.