-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trailing backslash in literal causes from_n3 to throw exception #546
Comments
this is a tricky one as it keeps confusing my brain with several encoding/escaping layers from python and n3 itself... py2.7 vs. 3.4 doesn't seem to be part of the problem here, which is why i'll just use py2.7. To start off let me just put what i think you want in a
As you can see the n3 representation already needs to escape the With that file we can do this: In [1]: import rdflib
INFO:rdflib:RDFLib Version: 4.2.1
In [2]: g = rdflib.Graph()
In [3]: g.parse('foo.n3', format='n3')
Out[3]: <Graph identifier=Nb7a7399152c14612a6443bdb3c96453d (<class 'rdflib.graph.Graph'>)>
In [4]: list(g)
Out[4]:
[(rdflib.term.URIRef(u'foo:s'),
rdflib.term.URIRef(u'foo:p'),
rdflib.term.Literal(u'Sample string with trailing backslash\\', datatype=rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#string')))]
In [5]: lit = list(g)[0][2]
In [6]: lit
Out[6]: rdflib.term.Literal(u'Sample string with trailing backslash\\', datatype=rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#string'))
In [7]: lit.n3()
Out[7]: u'"Sample string with trailing backslash\\\\"^^<http://www.w3.org/2001/XMLSchema#string>'
In [8]: print lit
Sample string with trailing backslash\
In [9]: print lit.n3()
"Sample string with trailing backslash\\"^^<http://www.w3.org/2001/XMLSchema#string> Actually line 7 is the most interesting one, as it shows that to represent the n3 string in python we need to double escape it. I kind of abuse Now your version used In [1]: sample = "\"Sample string with trailing backslash\\\"^^xsd:string"
In [2]: sample
Out[2]: '"Sample string with trailing backslash\\"^^xsd:string'
In [3]: print sample
"Sample string with trailing backslash\"^^xsd:string The last one actually shows one of the problems: your version lacks another In [4]: sample = "\"Sample string with trailing backslash\\\\\"^^xsd:string"
In [5]: sample
Out[5]: '"Sample string with trailing backslash\\\\"^^xsd:string' Why "one of the problems"? Well, cause it still doesn't work 💃 👊 : In [6]: from rdflib.util import from_n3
INFO:rdflib:RDFLib Version: 4.2.1
In [7]: from_n3(sample)
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-7-2ac3b89358bc> in <module>()
----> 1 from_n3(sample)
/usr/local/lib/python2.7/site-packages/rdflib/util.pyc in from_n3(s, default, backend, nsm)
181 # Hack: this should correctly handle strings with either native unicode
182 # characters, or \u1234 unicode escapes.
--> 183 value = value.encode("raw-unicode-escape").decode("unicode-escape")
184 return Literal(value, language, datatype)
185 elif s == 'true' or s == 'false':
UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 37: \ at end of string``` Great, and that's where it now becomes fishy... In [1]: 'foo\\\\bar'.encode('raw-unicode-escape')
Out[1]: 'foo\\\\bar'
In [2]: 'foo\\\\bar'.encode('raw-unicode-escape').decode('unicode-escape')
Out[2]: u'foo\\bar'
In [3]: 'foo\\\\bar'.decode('unicode-escape')
Out[3]: u'foo\\bar'
In [4]: 'foo\\bar'.decode('unicode-escape')
Out[4]: u'foo\x08ar' (The last one obviously being wrong.) I'll make a pull request (up for discussion) to fix that, see #548. |
for more inconsistencies of |
fix double reduction of \ escapes in from_n3, fixes #546
* master: (49 commits) Update reference to "Emulating container types" Avoid class reference to imported function Prevent RDFa parser from failing on time elements with child nodes Second proposed fix for the broken top_level.txt make Prologue and Query new style classes DOC: minor typo in paramater DOC: unamed -> unnamed AuditableStore.commit does not call self.store.commit anymore ignore operations with no effect fixed trivial copy-paste bug added test cases for AuditableStore expanded path comparison ops in order to keep py2.6 support and not use total_ordering let paths be comparable against all nodes. Fixes #545 re-introduces special handling for DCTERMS.title and test for it Fix initBindings handling. Fixes #294 added .n3 methods for path objects Made ClosedNamespace (and _RDFNamespace) inherit from Namespace cleaned up trailing whitespace Small but nice SPARQL Optimisation fix test for #546 from_n3 trailing backslash ...
Steps to reproduce (Python 3.4):
I am aware of the trailing backslash problem with raw strings: https://docs.python.org/2/faq/design.html#why-can-t-raw-strings-r-strings-end-with-a-backslash
I would like to know whether this is a rdflib bug and, if not, how I should handle my string before passing it to
from_n3
.The text was updated successfully, but these errors were encountered: