Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Python library for reading and writing warc files
Python
Branch: master

This branch is 4 commits ahead, 1 commit behind internetarchive:master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
docs
test_data
warc
.gitignore
.travis.yml
LICENSE
MANIFEST.in
Readme.rst
requirements.txt
setup.py

Readme.rst

warc: Python library to work with WARC files

build status

WARC (Web ARChive) is a file format for storing web crawls.

http://bibnum.bnf.fr/WARC/

This warc library makes it very easy to work with WARC files.:

import warc
f = warc.open("test.warc")
for record in f:
    print record['WARC-Target-URI'], record['Content-Length']

Documentation

The documentation of the warc library is available at http://warc.readthedocs.org/.

License

This software is licensed under GPL v2. See LICENSE file for details.

Something went wrong with that request. Please try again.