New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle old Archivematica METS documents #53
Conversation
dfc5dca
to
eec8691
Compare
eec8691
to
1f4b7fd
Compare
@cole, can you take a look? I've added 1954131, this came up when I was updating artefactual/archivematica-storage-service#438 so it uses metsrw. I've tweaked the FSEntry constructor a bit so it doesn't error. Also, I'm going to need 22e17ff since the transfer METS is still using old-style fptrs. 22e17ff is going to be relevant in the future because AM needs to be able to work with old-style fptrs when importing vintage AIPs. |
1954131
to
cefa538
Compare
Uh this is not as easy as I thought... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay, support for fptrs! This looks good. I commented on a few minor bits, but I'd be happy merging this as is (as soon as you address or ignore the py3 string/bytes thing) if you prefer.
4968085
to
158351b
Compare
This commit updates `_parse_tree_structmap` so `FSEntry` objects are created also when the parser runs into direct `fptr` elements, e.g.: <div TYPE="directory" LABEL="objects"> <fptr FILEID="001.csv-0c0576c3-e82e-4501-9aef-1f3d2d35aeda"/> <fptr FILEID="002.mpg-722de890-2a3c-4343-bb69-f2b29894d9a7"/> <fptr FILEID="003.mpg-423247e7-c27d-4b5a-a64a-b02833f2212d"/> </div>
158351b
to
985047f
Compare
In Py2, ``FSEntry.path`` uses binary. This commit updates the constructor so it uses the `utf-8` encoder when we're encoding text type. The default encoder is `ascii` which is problematic. In Py3, ``FSEntry.path`` is using Unicode and that needs to be fixed.
985047f
to
f7ff13f
Compare
ae2950e
to
a4f0edc
Compare
This avoids the following issue: > return ' '.join([str(x) for x in texts]) ^^^^^^ E UnicodeEncodeError: 'ascii' codec can't encode characters in position 42-44: ordinal not in range(128) Only fixed in Py2. Py3 codepath will need more work overall.
a4f0edc
to
f06008d
Compare
Tests are passing but coverage decreased. I'll merge. |
This pull request brings fixes for some issues found while trying to read METS documents generated by old versions of Archivematica.
Connects to archivematica/Issues#24.