Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

from_n3 fails on numeric shortcuts #1769

Closed
jpmccu opened this issue Mar 22, 2022 · 6 comments · Fixed by #1771
Closed

from_n3 fails on numeric shortcuts #1769

jpmccu opened this issue Mar 22, 2022 · 6 comments · Fixed by #1771

Comments

@jpmccu
Copy link
Contributor

jpmccu commented Mar 22, 2022

From the turtle spec [1]:

Data Type Abbreviated Lexical Description
xsd:integer -5 "-5"^^xsd:integer Integer values may be written as an optional sign and a series of digits. Integers match the regular expression "[+-]?[0-9]+".
xsd:decimal -5.0 "-5.0"^^xsd:decimal Arbitrary-precision decimals may be written as an optional sign, zero or more digits, a decimal point and one or more digits. Decimals match the regular expression "[+-]?[0-9]*.[0-9]+".
xsd:double 4.2E9 "4.2E9"^^xsd:double Double-precision floating point values may be written as an optionally signed mantissa with an optional decimal point, the letter "e" or "E", and an optionally signed integer exponent. The exponent matches the regular expression "[+-]?[0-9]+" and the mantissa one of these regular expressions: "[+-]?[0-9]+.[0-9]+", "[+-]?.[0-9]+" or "[+-]?[0-9]".

Testing this, I get:

>>> rdflib.util.from_n3("-5")
rdflib.term.BNode('-5')
>>> rdflib.util.from_n3("-5.0")
rdflib.term.BNode('-5.0')
>>> rdflib.util.from_n3("4.2E9")
rdflib.term.BNode('4.2E9')

It does seem to work on positive integers:

>>> rdflib.util.from_n3("5")
rdflib.term.Literal('5', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer'))

[1] https://www.w3.org/TR/turtle/#literals

@ghost ghost mentioned this issue Mar 23, 2022
8 tasks
@aucampia
Copy link
Member

aucampia commented Mar 27, 2022

I would advise avoiding from_n3 for the following reasons:

  1. It is not used by the actual n3 parser or any other part of rdflib as far as I can tell.
  2. n3 is not an accepted standard: https://www.w3.org/TeamSubmission/n3/
  3. the grammar by which from_n3 operates is not well defined.

These may be fixable problems, but ultimately the best way to parse n3 is to use the n3 parser, if this problem is present there (I'm adding tests for it now) then we should fix it, but essentially we are maintaining two n3 parsers at the moment and it may be better to just maintain one.

I would say a much better option is to mark this function as deprecated and remove it in the next major version of RDFLib unless we can unify the parsing of the n3 parser from_n3.

CC: @gjhiggins

aucampia added a commit to aucampia/rdflib that referenced this issue Mar 27, 2022
This changeset only adds tests for existing functionality.
The tests verifies that the problem reported in RDFLib#1769
does not exist in actual parsers.
@aucampia
Copy link
Member

I have added tests to confirm the cases in question works correctly with the actual persers:

All of the added tests pass without issue, so I would recommend using the actual parser for your parsing needs instead.

@jpmccu
Copy link
Contributor Author

jpmccu commented Mar 27, 2022 via email

@aucampia
Copy link
Member

aucampia commented Mar 27, 2022

Is there an example of how I can use the parser to parse individual terms?

I would advise trying something like this:

g = Graph()
g.parse(
data=f"""<{EGNS.subject}> <{EGNS.predicate}> {literal_string} .""",
format=format,
)
triples = list(g.triples((None, None, None)))
assert len(triples) == 1
assert expected_literal == triples[0][2]

So essentially create graph template where you can plug your term into as a an object, and then parse it and take the object from the resulting triple.

EDIT: just be wary of potential injection cases, as maliciously crafted strings could result in unexpected behaviour, for example creating multiple triples. Not sure it is that relevant here but better to raise the potential just in case.

@ghost
Copy link

ghost commented Mar 27, 2022

I would advise avoiding from_n3 for the following reasons:

1. It is not used by the actual n3 parser or any other part of rdflib as far as I can tell.

2. n3 is not an accepted standard: https://www.w3.org/TeamSubmission/n3/

3. the grammar by which `from_n3` operates is not well defined.

All good reasons for general avoidance. However, to deprecate it might be premature, from_n3 does seem to have one currently-unreplicable use, that of expanding CURIEs, as discussed in #626

@aucampia aucampia linked a pull request Apr 9, 2022 that will close this issue
8 tasks
@aucampia
Copy link
Member

aucampia commented Apr 9, 2022

This was fixed by @gjhiggins in #1771

@aucampia aucampia closed this as completed Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants