Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Index for WET files? #11
Hi, hope I am posting this questions in the right place...
I found .WARC format domain index at http://index.commoncrawl.org/CC-MAIN-2016-18//
If not, is there anyway I could convert the WARC object address to WET object address?
Unfortunately, we do not provide an index to WET files. It's easy to achieve the location of a WET (or WAT) file given a WARC file:
The Common Crawl index also provides offsets into the WARC file, which could be used to estimate the offsets in the WET file.