Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"No Checksum" different from DROID #338

Open
EveWright opened this issue Feb 15, 2022 · 4 comments
Open

"No Checksum" different from DROID #338

EveWright opened this issue Feb 15, 2022 · 4 comments

Comments

@EveWright
Copy link

When running the checksum expression (MD5) against DROID outputs produced I have found DROID would generate the checksum ‘d41d8cd98f00b204e9800998ecf8427e’ whereas CSV Validator would generate “No Checksum” and therefore fail. I understand this checksum reflects a zero string, so both DROID and CSV Validator are telling me the same thing but as this is displaying as a fail, I am needing to remove these ‘No Checksum’ files from my CSV file prior to using CSV Validator for integrity checking.

In a sense, this is a good thing, as it highlights ‘blank’ files in our collections that require review but I would find it more straightforward is DROID and CSV Validator displayed the same output for this type of file.

@DavidUnderdown
Copy link

Normally you'd only get no checksum if CSV Validator is looking at a folder rather than a file, I would expect to see the checksum for an empty string returned (the same as DROID) when you have a zero byte file. Do you have any example files you could share?

@EveWright
Copy link
Author

Hi David, apologies for the significant delay in getting back to you. You can find some example files attached.
Blank Files.zip

@DavidUnderdown
Copy link

Curious, I am seeing the same behaviour, though I'm sure in the past CSV Validator returned the defined checksum for an empty string in these circumstances (I've built such checks into CSV Schema in the past). I wondered if it was somehow down to using MD5 rather than SHA-256 but the same behaviour occurs for a SHA-256 checksum test as well.

This does seem to be a bug, but I suspect it's happening in the underlying library we are using, so may be harder to change the behaviour. In the short term note that you will always get the checksum d41d8cd98f00b204e9800998ecf8427e when MD5 is passed an empty string (see https://en.wikipedia.org/wiki/MD5#MD5_hashes), so rather than deleting the rows from your CSV file you could find a replace d41d8cd98f00b204e9800998ecf8427e with NO CHECKSUM, then at least any other checks you are doing on the metadata for the lines relating to those files would still take place.

@EveWright
Copy link
Author

That's really helpful thanks David

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants