New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decode xlink:href attributes when reading METS #51
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No significant concerns here Joel. If anything triggers in my comments, feel free to address them, but otherwise, LGTM.
'validate', | ||
'xsd_error_log_string', | ||
'xsd_validate', | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Learning a bit about this recently, this seems like good practice.
except ValueError: | ||
raise exceptions.MetsError( | ||
'Value "{}" (for attribute xlink:href) is not a valid' | ||
' URL.'.format(self.path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems sensible too.
except ValueError: | ||
raise exceptions.MetsError( | ||
'Value "{}" (for attribute xlink:href) is not a valid' | ||
' URL.'.format(self.target)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I read this correctly, MetsError
is the base exception for the module. ParseError
is the exception that might normally be raised when reading a METS file. I think this is fine, but something like MetsEncodeError
might read clearer for exceptions where we're encoding data. I think I prefer the verb-y-ness so I think it's just a preference thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
metsrw/utils.py
Outdated
@@ -1,5 +1,5 @@ | |||
# -*- coding: utf-8 -*- | |||
from six.moves.urllib.parse import quote_plus, urlparse, urlunparse | |||
from six.moves.urllib.parse import quote_plus, unquote_plus, urlparse, urlunparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this just creeps over the line-length limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
"""Decode percent encoding introduced per RFC3986 | ||
https://tools.ietf.org/html/rfc3986#section-2.1. | ||
""" | ||
return _urlendecode(url, unquote_plus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever!
assert url == metsrw.urldecode(metsrw.urlencode(url)) | ||
for url in BAD_URLS: | ||
with pytest.raises(ValueError): | ||
metsrw.urlencode(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are useful tests to have. Thanks for adding them Joel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 (tests look good, and pass locally)
A previous commit introduced URL-encoding of xlink:href values when writing METS. This commit introduces the corresponding URL-decoding when reading METS. Also adds some simple tests to display and confirm expected behaviour.
d4280dc
to
35c2579
Compare
A previous commit introduced URL-encoding of xlink:href values when writing METS. This commit introduces the corresponding URL-decoding when reading METS. Also adds some simple tests to display and confirm expected behaviour.
Connected to archivematica/Issues#206