Loading…

Python 28 33

cc-mrjob

forked from Smerity/cc-mrjob

Demonstration of using Python to process the Common Crawl dataset with the mrjob framework

Updated

nutch

forked from Aloisius/nutch

CommonCrawl Test version of Nutch

Updated

Python 14 7

gzipstream

forked from Smerity/gzipstream

gzipstream allows Python to process multi-part gzip files from a streaming source

Updated

example-warc-java

Updated

cc-warc-examples

forked from Smerity/cc-warc-examples

CommonCrawl WARC/WET/WAT examples and processing code for Java + Hadoop

Updated

commoncrawl-crawler

The CommonCrawl Crawler Engine and Related MapReduce code

Updated

commoncrawl

CommonCrawl Project Repository

Updated

Python 0 6

example-traitor

forked from norvigaward/2012-naward13

Updated

Java 2 3

example-apprankings

forked from norvigaward/2012-naward07

Updated

Java 0 2

example-javascriptusage

forked from norvigaward/2012-naward18

Updated

Java 0 2

example-companyfootprints

forked from norvigaward/2012-naward05

Updated

Java 1 1

example-europeanjob

forked from norvigaward/2012-naward15

Updated

Java 0 2

example-languageentropy

forked from norvigaward/2012-naward09

Updated

Java 0 3

example-babel2012

forked from norvigaward/2012-naward25

Updated

example-bill-tracker

forked from awavering/CC-Bill-Tracker

These map reduce functions use Common Crawl data to look at the spread of congressional legislation on the internet

Updated

JavaScript 1 5

example-ismoneyrootevil

forked from joyita/IsMoneyTheRootOfAllEvil

Updated

example-wikientities

forked from chrishan/wikientities

Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts

Updated

commoncrawl-examples

A library of examples showing how to use the Common Crawl corpus.

Updated