- cc-warc-examples 35 CommonCrawl WARC/WET/WAT examples and processing code for Java + Hadoop
- tf-ham 29 A TensorFlow implementation of "Learning Efficient Algorithms with Hierarchical Attentive Memory"
- right_whale_hunt 27 Annotated faces for NOAA Right Whale Recognition Kaggle competition
- pubcrawl 16 *Deprecated* A short and sweet Python web crawler using Redis as the process queue, seen set and Memcache style rate limiter for robots.txt
- cs205_ga 16 How deep does Google Analytics go? Efficiently tackling Common Crawl using AWS & MapReduce