Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support payload digest of revisit records #15

Open
Arkiver2 opened this issue Nov 24, 2016 · 0 comments
Open

Support payload digest of revisit records #15

Arkiver2 opened this issue Nov 24, 2016 · 0 comments

Comments

@Arkiver2
Copy link

Currently warcat gives the following error on revisit records from a deduplicated WARC:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/warcat/tool.py", line 282, in action
    action(record)
  File "/usr/local/lib/python3.4/dist-packages/warcat/tool.py", line 298, in verify_payload_digest
    raise VerifyProblem('Bad payload digest.', '5.9')
warcat.tool.VerifyProblem: ('Bad payload digest.', '5.9', True)

The payload digest of a revisit record should be the payload digest of the record the revisit record points to, see 6.7.2 on page 15 (page 21 in the PDF) on http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf:

To report the payload digest used for comparison, a 'revisit' record using this profile shall include a WARC-Payload-Digest field, with a value of the digest that was calculated on the payload.
(...)
For records using this profile, the payload is defined as the original payload content whose digest value was unchanged.

Currently warcat reports an error for the payload digest, it would be nice if it would check the WARC for the record the revisit record refers to. If that record is in the WARC, compare the payload digest with that. If the record is not in the WARC, throw a warning or info that the record the revisit record refers to is not in the WARC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants