warc-extractor Tool to extract web pages from warc.gz and write content documents. Each line of file is composed by one document. #How to use $ python warcParser.py PATH_DATASET INITIAL_FILE_COUNTER