Join GitHub today
Use capture time for warcinfo WARC-Date and timestemap in WARC filename #2
The https://github.com/iipc/warc-specifications/blob/gh-pages/specifications/warc-format/warc-1.1/index.md#warc-date-mandatory is defined as "The timestamp shall represent the instant that data capture for record creation began." For request and response records that's obviously the time a request was made resp. a response is received. For warcinfo records Common Crawl WARC files
A monthly crawls is fetched over 8-9 days, but the content of a WARC file always relates to one segment which is fetched within 2 hours. The warcinfo WARC-Date should indicate the time when fetching/capturing starts.
Implemented, included in May 2017 crawl: