-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SHA-1 manifest file format does not work with shasum #69
Comments
https://tools.ietf.org/html/draft-kunze-bagit-14#section-2.1.3 specifies that the form is Feel free to submit a pull request for this feature. |
Sorry I was referring to the statement But I get your point that BagIt is only for BagIt. |
Right, but if you read that section it says
Notice it doesn't have the |
As a similar newcomer to the project, that was my impression as well: people might have had existing use of If you have a use-case where it makes sense to use one of these tools rather than a command-line bagit utility, it might be good to expound upon it here so it could be reflected in the spec. It seems like it could be confusing if it was only implemented in bagit-java and someone's workflow breaks if a different tool was used at some point. |
@acdha I'm not really sure I have a good use case. We were using this library to format our exports as "bags" and looking at the manifest it appeared that we could have people use normal tools like |
So I'll close this. Thanks for the quick response all! |
@acdha @johnscancella: A little context might help here: we're adding support for Bags to Fedora's new import/export tool. So mostly it's developers trying to quickly verify the Bags we're exporting with whatever tools we have on-hand. I think this is a use case worth considering, though — I suspect many people who generate Bags have Bag-specific validation tools, but some may not. And even if you do have Bag-specific tools, it may be more convenient to do a quick check with other tools like md5sum/shasum. So far, all of the non-Bag tools we've tried have failed (hence this ticket, and also https://jira.duraspace.org/browse/FCREPO-2354?focusedCommentId=53229&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-53229). Because the manifest files look very similar to the md5sum/shasum output files, this is pretty confusing. My reading of the spec makes me think that using lowercase letters in the checksums, and adding an asterisk before the path name, would still be valid. They would also then be compatible with a wider variety of validation tools. |
@escowles I looked at your ticket. While adding a space will work some of the time, it would be better to use the *, because bagit expects to calculate the checksum over the bytes of the file (which is what the * in the md5sum tools means). By not doing it in binary mode we have encountered some strange results, so better to just always use *. We talk more about this in the bagit specification here What this means is that we would have to change the bagit specification to always put * in front of the path. @acdha what are your feelings about this? As for the case of the checksum lettering, I think that is an issue with the Ruby implementation. The library of Congress does not maintain the ruby implementation, so you will have to file a bug with them. |
@johnscancella I think it's worth considering updating the spec to be clearer in this area. Ideally, the format would be compatible with md5sum/shasum, using " " or " *" as the separator between the checksum and filename. In the meantime, I think it would be nice to use " *" as the separator, to be more compatible with common tools, and that would still be valid. Also, we've filed a bug against the Ruby BagIt library we were using for not handling checksums with uppercase letters: tipr/bagit#27 — and the method we were using to convert a binary checksum into a hex string was what was creating uppercase letters, so we can easily change that locally. |
I'm 👍 on always using lowercase checksums since that should work with everything and I believe we already have tests for both in https://github.com/loc-rdc/bagit-conformance-suite/ as well as bagit-python. (Good thing to check, however) I'm a somewhat reluctant 👎 on a leading asterisk by default since while that's clearly the best way to avoid problems with text mode on Windows (I hope nobody is bagging stuff on VMS but it's a big world…) it hasn't been required in the past and I would be surprised if didn't break some existing tools. Playing conservatively, I'd be tempted to suggest that we ensure that the separator by default is two spaces since that will work with both any complaint BagIt library and with the GNU coreutils |
@escowles Speaking of Ruby BagIt, have you / @tipr seen https://github.com/loc-rdc/bagit-conformance-suite/ before? We haven't announced it but @johnscancella @dbrunton and I are now officially spending time supporting the spec and implementations which we use, and one of my goals is to have that suite available when https://github.com/loc-rdc/bagitspec/pull/1 lands. I've been planning to email the larger community as soon as we reach internal consensus but mostly it's been adding more edge-case handling & pedantic warnings rather than major changes. |
When submitting an issue please include:
Running 5.0.0-BETA
OS X 10.11.5
Please format it in the given when then style
For example (from link above):
Given
When
Then
We are using this library to develop an import/export feature using BagIt bags. However in the test implementation the manifest-SHA1.txt that is generated cannot be parsed by OSX's
shasum
command.For example the file appears as follows.
However
I can't find a standard on what the format of this a SHA-1 file is supposed to be, but it seems like there is supposed to be an asterisk before the filename. In the BagIt spec that has been specified as optional, but if I add either an additional space or an asterisk before the file path then the lines are read perfectly.
ie.
This fails to parse
But this
or this
is readable as valid formatted SHA-1 checksum lines
The text was updated successfully, but these errors were encountered: