Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle old Archivematica METS documents #53

Merged
merged 8 commits into from Mar 8, 2019

Conversation

sevein
Copy link
Member

@sevein sevein commented Nov 30, 2018

This pull request brings fixes for some issues found while trying to read METS documents generated by old versions of Archivematica.

Connects to archivematica/Issues#24.

@sevein sevein added the WIP label Nov 30, 2018
@sevein sevein self-assigned this Nov 30, 2018
@sevein sevein force-pushed the dev/issue-24-handle-old-aips branch 6 times, most recently from dfc5dca to eec8691 Compare December 6, 2018 22:51
@sevein sevein force-pushed the dev/issue-24-handle-old-aips branch from eec8691 to 1f4b7fd Compare January 17, 2019 20:08
@sevein sevein removed the WIP label Mar 8, 2019
@sevein
Copy link
Member Author

sevein commented Mar 8, 2019

@cole, can you take a look?

I've added 1954131, this came up when I was updating artefactual/archivematica-storage-service#438 so it uses metsrw. I've tweaked the FSEntry constructor a bit so it doesn't error.

Also, I'm going to need 22e17ff since the transfer METS is still using old-style fptrs. 22e17ff is going to be relevant in the future because AM needs to be able to work with old-style fptrs when importing vintage AIPs.

@sevein sevein force-pushed the dev/issue-24-handle-old-aips branch from 1954131 to cefa538 Compare March 8, 2019 17:46
@sevein
Copy link
Member Author

sevein commented Mar 8, 2019

Uh this is not as easy as I thought...

Copy link
Contributor

@cole cole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, support for fptrs! This looks good. I commented on a few minor bits, but I'd be happy merging this as is (as soon as you address or ignore the py3 string/bytes thing) if you prefer.

metsrw/fsentry.py Outdated Show resolved Hide resolved
tests/test_mets.py Outdated Show resolved Hide resolved
tests/test_mets.py Outdated Show resolved Hide resolved
metsrw/mets.py Show resolved Hide resolved
@sevein sevein force-pushed the dev/issue-24-handle-old-aips branch 2 times, most recently from 4968085 to 158351b Compare March 8, 2019 20:10
This commit updates `_parse_tree_structmap` so `FSEntry` objects are
created also when the parser runs into direct `fptr` elements, e.g.:

    <div TYPE="directory" LABEL="objects">
        <fptr FILEID="001.csv-0c0576c3-e82e-4501-9aef-1f3d2d35aeda"/>
        <fptr FILEID="002.mpg-722de890-2a3c-4343-bb69-f2b29894d9a7"/>
        <fptr FILEID="003.mpg-423247e7-c27d-4b5a-a64a-b02833f2212d"/>
    </div>
@sevein sevein force-pushed the dev/issue-24-handle-old-aips branch from 158351b to 985047f Compare March 8, 2019 20:18
In Py2, ``FSEntry.path`` uses binary. This commit updates the constructor so
it uses the `utf-8` encoder when we're encoding text type. The default encoder
is `ascii` which is problematic.

In Py3, ``FSEntry.path`` is using Unicode and that needs to be fixed.
@sevein sevein force-pushed the dev/issue-24-handle-old-aips branch from 985047f to f7ff13f Compare March 8, 2019 20:20
@sevein sevein requested a review from cole March 8, 2019 20:23
@sevein sevein force-pushed the dev/issue-24-handle-old-aips branch from ae2950e to a4f0edc Compare March 8, 2019 21:03
This avoids the following issue:

    > return ' '.join([str(x) for x in texts])
                       ^^^^^^
    E UnicodeEncodeError: 'ascii' codec can't encode characters in
      position 42-44: ordinal not in range(128)

Only fixed in Py2. Py3 codepath will need more work overall.
@sevein sevein force-pushed the dev/issue-24-handle-old-aips branch from a4f0edc to f06008d Compare March 8, 2019 21:34
@sevein
Copy link
Member Author

sevein commented Mar 8, 2019

Tests are passing but coverage decreased. I'll merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants