from_n3 fails on numeric shortcuts #1769

jpmccu · 2022-03-22T18:40:43Z

From the turtle spec [1]:

Data Type	Abbreviated	Lexical	Description
xsd:integer	-5	"-5"^^xsd:integer	Integer values may be written as an optional sign and a series of digits. Integers match the regular expression "[+-]?[0-9]+".
xsd:decimal	-5.0	"-5.0"^^xsd:decimal	Arbitrary-precision decimals may be written as an optional sign, zero or more digits, a decimal point and one or more digits. Decimals match the regular expression "[+-]?[0-9]*.[0-9]+".
xsd:double	4.2E9	"4.2E9"^^xsd:double	Double-precision floating point values may be written as an optionally signed mantissa with an optional decimal point, the letter "e" or "E", and an optionally signed integer exponent. The exponent matches the regular expression "[+-]?[0-9]+" and the mantissa one of these regular expressions: "[+-]?[0-9]+.[0-9]+", "[+-]?.[0-9]+" or "[+-]?[0-9]".

Testing this, I get:

>>> rdflib.util.from_n3("-5")
rdflib.term.BNode('-5')
>>> rdflib.util.from_n3("-5.0")
rdflib.term.BNode('-5.0')
>>> rdflib.util.from_n3("4.2E9")
rdflib.term.BNode('4.2E9')

It does seem to work on positive integers:

>>> rdflib.util.from_n3("5")
rdflib.term.Literal('5', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer'))

[1] https://www.w3.org/TR/turtle/#literals

The text was updated successfully, but these errors were encountered:

aucampia · 2022-03-27T10:37:29Z

I would advise avoiding from_n3 for the following reasons:

It is not used by the actual n3 parser or any other part of rdflib as far as I can tell.
n3 is not an accepted standard: https://www.w3.org/TeamSubmission/n3/
the grammar by which from_n3 operates is not well defined.

These may be fixable problems, but ultimately the best way to parse n3 is to use the n3 parser, if this problem is present there (I'm adding tests for it now) then we should fix it, but essentially we are maintaining two n3 parsers at the moment and it may be better to just maintain one.

I would say a much better option is to mark this function as deprecated and remove it in the next major version of RDFLib unless we can unify the parsing of the n3 parser from_n3.

CC: @gjhiggins

This changeset only adds tests for existing functionality. The tests verifies that the problem reported in RDFLib#1769 does not exist in actual parsers.

aucampia · 2022-03-27T12:17:50Z

I have added tests to confirm the cases in question works correctly with the actual persers:

Add tests for the parsing of literals for the turtle family of formats #1778

All of the added tests pass without issue, so I would recommend using the actual parser for your parsing needs instead.

jpmccu · 2022-03-27T12:28:51Z

Is there an example of how I can use the parser to parse individual terms?

On Sun, Mar 27, 2022 at 8:18 AM Iwan Aucamp ***@***.***> wrote: I have added tests to confirm the cases in question works correctly with the actual persers: - #1778 <#1778> All of the added tests pass without issue, so I would recommend using the actual parser for your parsing needs instead. — Reply to this email directly, view it on GitHub <#1769 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAETCEPHQQLGOC2BXPD2HVLVCBNYTANCNFSM5RLXH5DQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- Jamie McCusker (she/they) Director, Data Operations Tetherless World Constellation Rensselaer Polytechnic Institute ***@***.*** ***@***.***> http://tw.rpi.edu

aucampia · 2022-03-27T12:33:50Z

Is there an example of how I can use the parser to parse individual terms?

I would advise trying something like this:

rdflib/test/test_parsers/test_parser_turtlelike.py

Lines 92 to 99 in e06db40

    
           g = Graph() 
        
           g.parse( 
        
               data=f"""<{EGNS.subject}> <{EGNS.predicate}> {literal_string} .""", 
        
               format=format, 
        
           ) 
        
           triples = list(g.triples((None, None, None))) 
        
           assert len(triples) == 1 
        
           assert expected_literal == triples[0][2]

So essentially create graph template where you can plug your term into as a an object, and then parse it and take the object from the resulting triple.

EDIT: just be wary of potential injection cases, as maliciously crafted strings could result in unexpected behaviour, for example creating multiple triples. Not sure it is that relevant here but better to raise the potential just in case.

ghost · 2022-03-27T14:19:21Z

I would advise avoiding from_n3 for the following reasons:

1. It is not used by the actual n3 parser or any other part of rdflib as far as I can tell.

2. n3 is not an accepted standard: https://www.w3.org/TeamSubmission/n3/

3. the grammar by which `from_n3` operates is not well defined.

All good reasons for general avoidance. However, to deprecate it might be premature, from_n3 does seem to have one currently-unreplicable use, that of expanding CURIEs, as discussed in #626

aucampia · 2022-04-09T14:22:13Z

This was fixed by @gjhiggins in #1771

ghost mentioned this issue Mar 23, 2022

Fix for issue1769 #1771

Merged

8 tasks

aucampia mentioned this issue Mar 27, 2022

Add tests for the parsing of literals for the turtle family of formats #1778

Merged

4 tasks

aucampia linked a pull request Apr 9, 2022 that will close this issue

Fix for issue1769 #1771

Merged

8 tasks

aucampia closed this as completed Apr 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

from_n3 fails on numeric shortcuts #1769

from_n3 fails on numeric shortcuts #1769

jpmccu commented Mar 22, 2022

aucampia commented Mar 27, 2022 •

edited

Loading

aucampia commented Mar 27, 2022

jpmccu commented Mar 27, 2022 via email

aucampia commented Mar 27, 2022 •

edited

Loading

ghost commented Mar 27, 2022

aucampia commented Apr 9, 2022

from_n3 fails on numeric shortcuts #1769

from_n3 fails on numeric shortcuts #1769

Comments

jpmccu commented Mar 22, 2022

aucampia commented Mar 27, 2022 • edited Loading

aucampia commented Mar 27, 2022

jpmccu commented Mar 27, 2022 via email

aucampia commented Mar 27, 2022 • edited Loading

ghost commented Mar 27, 2022

aucampia commented Apr 9, 2022

aucampia commented Mar 27, 2022 •

edited

Loading

aucampia commented Mar 27, 2022 •

edited

Loading