Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 

PyCommonCrawl

A python interface for Common Crawl.

INSTALL

pip3 install pycommoncrawl

USAGE

from pycommoncrawl.common_crawl_data_accessor import CommonCrawlDataAccessor

common_crawl_data_accessor = CommonCrawlDataAccessor()

# Iterate by line
for line in common_crawl_data_accessor.get_raw_resource_data("WAT"):
    print(line)

# Iterate by WARC bloc
for warc in common_crawl_data_accessor.get_raw_resource_data_per_warc("WAT"):
    print(warc["Content-Length"])

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages