Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode xlink:href attributes when reading METS #51

Merged
merged 1 commit into from Sep 25, 2018

Conversation

jrwdunham
Copy link
Contributor

A previous commit introduced URL-encoding of xlink:href values when writing METS. This commit introduces the corresponding URL-decoding when reading METS. Also adds some simple tests to display and confirm expected behaviour.

Connected to archivematica/Issues#206

@jrwdunham jrwdunham self-assigned this Sep 25, 2018
@jrwdunham jrwdunham added the bug label Sep 25, 2018
Copy link
Contributor

@ross-spencer ross-spencer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No significant concerns here Joel. If anything triggers in my comments, feel free to address them, but otherwise, LGTM.

'validate',
'xsd_error_log_string',
'xsd_validate',
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Learning a bit about this recently, this seems like good practice.

except ValueError:
raise exceptions.MetsError(
'Value "{}" (for attribute xlink:href) is not a valid'
' URL.'.format(self.path))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems sensible too.

except ValueError:
raise exceptions.MetsError(
'Value "{}" (for attribute xlink:href) is not a valid'
' URL.'.format(self.target))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this correctly, MetsError is the base exception for the module. ParseError is the exception that might normally be raised when reading a METS file. I think this is fine, but something like MetsEncodeError might read clearer for exceptions where we're encoding data. I think I prefer the verb-y-ness so I think it's just a preference thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

metsrw/utils.py Outdated
@@ -1,5 +1,5 @@
# -*- coding: utf-8 -*-
from six.moves.urllib.parse import quote_plus, urlparse, urlunparse
from six.moves.urllib.parse import quote_plus, unquote_plus, urlparse, urlunparse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this just creeps over the line-length limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

"""Decode percent encoding introduced per RFC3986
https://tools.ietf.org/html/rfc3986#section-2.1.
"""
return _urlendecode(url, unquote_plus)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever!

assert url == metsrw.urldecode(metsrw.urlencode(url))
for url in BAD_URLS:
with pytest.raises(ValueError):
metsrw.urlencode(url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are useful tests to have. Thanks for adding them Joel.

Copy link
Contributor

@ross-spencer ross-spencer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 (tests look good, and pass locally)

A previous commit introduced URL-encoding of xlink:href values when
writing METS. This commit introduces the corresponding URL-decoding when
reading METS. Also adds some simple tests to display and confirm
expected behaviour.
@jrwdunham jrwdunham force-pushed the dev/issue-206-url-decode-xlink-href branch from d4280dc to 35c2579 Compare September 25, 2018 23:05
@jrwdunham jrwdunham merged commit 35c2579 into master Sep 25, 2018
@jrwdunham jrwdunham deleted the dev/issue-206-url-decode-xlink-href branch September 25, 2018 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants