Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xsd:hexBinary added to XSDToPython #386

Closed
wants to merge 1 commit into from
Closed

xsd:hexBinary added to XSDToPython #386

wants to merge 1 commit into from

Conversation

bcogrel
Copy link
Contributor

@bcogrel bcogrel commented Apr 27, 2014

Literal("b234", datatype=(XSD + "hexBinary")).toPython() now returns a string and not a literal.

@gromgull
Copy link
Member

The test-failure is probably unrelated, but what is the use of this? if we should support hexBinary, should not have some way of "serializing/unserializing" hex-strings to bytes/str objects?

@bcogrel
Copy link
Contributor Author

bcogrel commented Apr 27, 2014

Good question.

My use case is assigning a RSA modulus (like this one http://www.w3.org/wiki/WebID#Adding_a_certificate).

The status quo of using the same string in Python and in RDF is good for this use case.
To me, it seems more natural to manipulate this string value

modulus = 'b0dfc17e0969569a813d97b3bc42b434b35ecdf45e786d6db1462604f2f8d800b4d61b56c494cf02667dccdb2a227f259b3b5d99b50553d50a13328b7a34a46f0d1d68713ddfef6ee02da87db6ee794781cddcb61946943359b23c1ce15a7eb10f4dedf1e00ed2ee62986c42bd1ba6bc9174d2869ac78801b7c2b1dc8a1f73ac99f94aa0ae5bbcb7494dc3ff15af01fdc92b98574854518bbc58601c56fc62771f843acdd495e602875729ad0813485620ede2d83050e3607d9a92b47c67c8adf341b5ab0b9926388c33dd3c60117b8aebea46f655f049e871acfbac57aec61f6a66b250ae8e9e05aba2b3a8379012122a5b6e5310d26ce1bc3e2a4c80168aa5'

assert(modulus == format(int(modulus, 16), "x")
>>> True

than the following representation which starts with '0x':

print hex(int(modulus, 16))
>>> '0xb0dfc17e0969569a813d97b3bc42b434b35ecdf45e786d6db1462604f2f8d800b4d61b56c494cf02667dccdb2a227f259b3b5d99b50553d50a13328b7a34a46f0d1d68713ddfef6ee02da87db6ee794781cddcb61946943359b23c1ce15a7eb10f4dedf1e00ed2ee62986c42bd1ba6bc9174d2869ac78801b7c2b1dc8a1f73ac99f94aa0ae5bbcb7494dc3ff15af01fdc92b98574854518bbc58601c56fc62771f843acdd495e602875729ad0813485620ede2d83050e3607d9a92b47c67c8adf341b5ab0b9926388c33dd3c60117b8aebea46f655f049e871acfbac57aec61f6a66b250ae8e9e05aba2b3a8379012122a5b6e5310d26ce1bc3e2a4c80168aa5L'

NB: In Python 2.7.6 there is a final L character that makes this string be an invalid hexadecimal value.

modulus2 = hex(int(modulus, 16))
int(modulus2, 16)
>>>ValueError: invalid literal for int() with base 16:

In this use case, I have no interest in using the decimal value.

In my patch, it returns a unicode, not a str like hex() and format(). Do you think this is a problem?

I am not used to use bytes(). What would be its interest?

This patch only concerns the translation XSD -> Python. In the other direction, it could be interesting to prevent users from serializing xsd:hexBinary values starting with "0x", which is not valid.

@joernhees
Copy link
Member

nb: restarted the travis job, passed

in my opinion there are two ways to treat this: hexbinary means that it's actually a hex representation of some binary, so we could either return those bytes (after a call to binascii.unhexlify ) or we could just return a hexstring of those bytes.
Examples:

l = Literal('6ac3b6726e', datatype=XSD.hexBinary)
l.toPython() == 'j\xc3\xb6rn' == u'jörn'.encode('utf-8')  # this would be bytes
l.toPython() == '6ac3b6726e'  # this would be hexstring

I'd favor of bytes option as otherwise one could just call str(l) or unicode(l):

In [3]: str(l)
Out[3]: '6ac3b6726e'

In [4]: unicode(l)
Out[4]: u'6ac3b6726e'

In [5]: binascii.unhexlify(str(l))
Out[5]: 'j\xc3\xb6rn'

we should have a testcase for this as well...

@bcogrel
Copy link
Contributor Author

bcogrel commented Apr 28, 2014

Which version of Python did you used? I cannot reproduce your results for str(l) and unicode(l) in Python 2.7.6 and Python 3.4.0.

In 2.7.6:

l = u'jörn'.encode('utf-8')
str(l)
>>> 'j\xc3\xb6rn'
unicode(l)
>>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

In 3.4.0:

l = u'jörn'.encode('utf-8')
l
>>> b'j\xc3\xb6rn'
str(l)
>>> "b'j\\xc3\\xb6rn'"

I would argue in favor of returning a hexstring of those bytes.

  1. This hexstring can represent any kind of data (eg. a long in my use case or a string in the previous comment) so it is better to stay away from any encoding assumption.
  2. The returned value is the same than the initial one.
value = '6ac3b6726e'
l = Literal(value, datatype=XSD.hexBinary)
l.toPython() == value
>>> True

@joernhees
Copy link
Member

i used python 2.7.6 but your problem is that you defined l as something completely different, in my case it was l = Literal('6ac3b6726e', datatype=XSD.hexBinary), that's why you got errors...

my point is: with your code, why not just use str()?

value = '6ac3b6726e'
l = Literal(value, datatype=XSD.hexBinary)
str(l) == value
>>> True

@bcogrel
Copy link
Contributor Author

bcogrel commented Apr 28, 2014

Oh, sorry for that.

My preference in using toPython() instead of str() is because toPython() is generic and works as I expected for all other literal types I tried (date, integer, string, etc.).

I started to prototype an ORM (Object RDF Mapper) that use this generic method. See for instance:
https://gitlab.bcgl.fr/benjamin/ld-orm/blob/master/ld_orm/parsing/value.py#L69

Am I using toPython() correctly or am I expecting something unintended?

Here my test example using a hexstring value:
https://gitlab.bcgl.fr/benjamin/ld-orm/blob/master/tests/model_test.py#L207 ,
https://gitlab.bcgl.fr/benjamin/ld-orm/blob/master/tests/model_test.py#L231 and
https://gitlab.bcgl.fr/benjamin/ld-orm/blob/master/tests/model_test.py#L389

@bcogrel
Copy link
Contributor Author

bcogrel commented Apr 30, 2014

Thank you for your comments, you convince me. I have opened a new pull request (#388) where I propose an implementation for the bytes option. I will close this pull request.

@bcogrel bcogrel closed this Apr 30, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants