Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

from_n3 erroneously unescapes \xhh #549

Closed
joernhees opened this issue Nov 22, 2015 · 1 comment · Fixed by #1343
Closed

from_n3 erroneously unescapes \xhh #549

joernhees opened this issue Nov 22, 2015 · 1 comment · Fixed by #1343
Labels
bug Something isn't working discussion enhancement New feature or request parsing Related to a parsing. testing
Milestone

Comments

@joernhees
Copy link
Member

Exploring #546 made me realize that from_n3 is more liberal than the following method based on the full n3 parser (should just be used for testing!), which is obviously much slower (one full graph with store etc. for parsing) and ignores cases such as quoted graphs or lists that from_n3 can actually parse:

In [1]: import rdflib
INFO:rdflib:RDFLib Version: 4.2.2-dev

In [2]: def own_from_n3(s):
   ...:     n3_s = '<foo:s> <foo:p> ' + s.encode('utf-8') + '.\n'
   ...:     g = rdflib.Graph()
   ...:     g.parse(data=n3_s, format='n3')
   ...:     return list(g)[0][2]
   ...:

This is based on #548 already, so the trailing slashes work in both versions:

In [3]: s = '"trailing slash\\\\"'

In [4]: rdflib.util.from_n3(s)
Out[4]: rdflib.term.Literal(u'trailing slash\\')

In [5]: own_from_n3(s)
Out[5]: rdflib.term.Literal(u'trailing slash\\')

from_n3 is at least incorrectly unescaping \\x6f to a u'\xf6' aka u'ö'. The n3 spec explicitly mentions that the \xhh escapes are unused: http://www.w3.org/TeamSubmission/n3/#escaping :

n [6]: s = u'"jörn"'

In [7]: rdflib.util.from_n3(s)
Out[7]: rdflib.term.Literal(u'j\xf6rn')

In [8]: own_from_n3(s)
Out[8]: rdflib.term.Literal(u'j\xf6rn')

In [9]: s = '"j\\xf6rn"'

In [10]: rdflib.util.from_n3(s)
Out[10]: rdflib.term.Literal(u'j\xf6rn')

In [11]: own_from_n3(s)
  File "<string>", line unknown
BadSyntax


In [12]: s = '"j\\u00f6rn"'

In [13]: rdflib.util.from_n3(s)
Out[13]: rdflib.term.Literal(u'j\xf6rn')

In [14]: own_from_n3(s)
Out[14]: rdflib.term.Literal(u'j\xf6rn')

In [15]: s = '"j\\U000000f6rn"'

In [16]: rdflib.util.from_n3(s)
Out[16]: rdflib.term.Literal(u'j\xf6rn')

In [17]: own_from_n3(s)
Out[17]: rdflib.term.Literal(u'j\xf6rn')

In [18]: s = u'"j\\u00f6öörn"'

In [19]: rdflib.util.from_n3(s)
Out[19]: rdflib.term.Literal(u'j\xf6\xf6\xf6rn')

In [20]: own_from_n3(s)
Out[20]: rdflib.term.Literal(u'j\xf6\xf6\xf6rn')

further inconsistencies between from_n3 and the full n3 parser should be investigated ... Also i'd like to introduce some tests for this as well...

@joernhees joernhees added bug Something isn't working enhancement New feature or request parsing Related to a parsing. testing discussion labels Nov 22, 2015
@joernhees joernhees added this to the rdflib 4.2.2 milestone Nov 22, 2015
@joernhees
Copy link
Member Author

I also guess this goes all the way back to things mentioned in #192

@joernhees joernhees modified the milestones: rdflib 5.0.1, rdflib 4.2.2 Jan 30, 2016
@white-gecko white-gecko modified the milestones: rdflib 5.0.1, rdflib 5.1.0 Mar 16, 2020
@white-gecko white-gecko modified the milestones: rdflib 5.1.0, rdflib 6.0.0 May 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working discussion enhancement New feature or request parsing Related to a parsing. testing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants