Skip to content
No description, website, or topics provided.
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
pycommoncrawl
.gitignore
README.md
requirements.txt
setup.py

README.md

PyCommonCrawl

A python interface for Common Crawl.

INSTALL

pip3 install pycommoncrawl

USAGE

from pycommoncrawl.common_crawl_data_accessor import CommonCrawlDataAccessor

common_crawl_data_accessor = CommonCrawlDataAccessor()

# Iterate by line
for line in common_crawl_data_accessor.get_raw_resource_data("WAT"):
    print(line)

# Iterate by WARC bloc
for warc in common_crawl_data_accessor.get_raw_resource_data_per_warc("WAT"):
    print(warc["Content-Length"])
You can’t perform that action at this time.