Why does a query that works in Jena 3.16 but throw an error in Jena 4.10? #2102

704998200 · 2023-11-23T03:28:16Z

Version

4.10

Question

This query works in Jena 3.16

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX brick:   <https://brickschema.org/schema/Brick#>  
SELECT ?room  
WHERE {       
?room brick:isPartOf <https://brickschema.org/schema/1.0.2/building_example#building:gtc/vavs/2/port[zn]>.  
}

But throw an error in Jena 4.10
[line: 5, col: 31] Bad IRI: 'https://brickschema.org/schema/1.0.2/building_example#building:gtc/vavs/2/port[zn]': <https://brickschema.org/schema/1.0.2/building_example#building:gtc/vavs/2/port[zn]> Code: 0/ILLEGAL_CHARACTER in FRAGMENT: The character violates the grammar rules for URIs/IRIs.

I know it's the problem of '[zn]'.

I try to excape '[zn]',but it also throw an error

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX brick: <https://brickschema.org/schema/Brick#>  
SELECT ?room  
WHERE {     
   ?room brick:isPartOf <https://brickschema.org/schema/1.0.2/building_example#building:gtc/vavs/2/port\[zn\]>.
}

Here is the fragment of the TTL data

<https://brickschema.org/schema/1.0.2/building_example#building:gtc/rooms/1.H.1/room>
        rdf:type        brick:Room;
        rdfs:label      "1.H.1";
        brick:isPartOf  <https://brickschema.org/schema/1.0.2/building_example#building:gtc/vavs/3/port[zn]> .

How can i fix it?

The text was updated successfully, but these errors were encountered:

rvesse · 2023-11-23T10:07:57Z

Jena 4.x more strictly enforces certain tests around URI validity because allowing base URIs into the system always leads to problems down the road.

Since this is a URI it should be URL encoded appropriately (i.e. Java/SPARQL backslash escapes are not suitable here), and that needs to happen in both your data and your queries.

In general the way your URIs are structured looks strange. You seem to be trying to put a lot of "structure" into the URI fragment (the portion after the #) when you should really be using / based URL construction to achieve this, and your URIs seem to conflate schema concepts with instances of those concepts.

Your data should probably have URIs more like https://yourdata.com/building/gtc/rooms/1.H.1/room with appropriate rdf:type's declared as you do in your example where yourdata.com is substituted for some appropriate URI for the instances of the data, rather than the schema concept URI as you do currently

neumarcx · 2023-11-23T10:09:14Z

indeed the IRI validation has changed for SPARQL queries since 3.16. You have to get your data into the correct format. if this is not possible you have to URL encode the path component.

afs · 2023-11-23T12:39:40Z

It's a bug IMO. The Turtle parse accepts the data with a warning. There ought to be consistency between the SPARQL parser and the Turtle parser.

That said - the data is not legal. [ and ] are not legal in a URI except for IPv6 host addresses.

No amount of escaping will change that.
URI do not have a \\ escape.
Using \u005B doesn't work.

%-encoding as @rvesse mentions is replacing the character [ with %5B and really does put 3 characters %, 5, B into the URI which then must match the data.

Not using [, ], or using %-encoding consistently is the best solution.

Even if Jena allowed it, then the current data is likely to cause problems eslewhere.

The SPARQL query can work with the string form of the URI in the data in an potentially inefficient manner:

      .... brick:isPartOf  ?X .
      FILTER ( str(?X) = "https://brickschema.org/schema/1.0.2/building_example#building:gtc/vavs/3/port[zn]")

afs · 2023-11-24T09:56:48Z

Side note: Parsing in SPARQL and parsing in Turtle are signficantly dofefrent in the way dubious (error or warning) IRIs are treated.

The W3C specs define the IRI token as

IRIREF ::=  `<' ([^#x00-#x20<>"{}|^`\] | [UCHAR] '>'

and then expect further checking for the legality of the string that matches that rule.

Jena's SPARQL parser, ARQ, uses that rule (via javacc) then performs IRI validation.

Jena's Turtle parser uses a custom tokenizer and does more limited checking on the characters between < and > , then performs IRI validation. Because the Turtle tokenizer is custom, the messages are more human-meaningful.

Any IRI validation has to parse the string so it duplicates the character exclusion rules of IRIREF.

This is all known and intended by the W3C working groups - both specs intentionally did not include the full RFC3986/3986 grammar. It is quite large and it would have to be modified for UCHAR. UCHAR escapes mean later checks are necessary anyway. It does not fit well with a standard parser/tokenizer split.

An effect is that { (not the UCHAR way of doing that) is illegal surface syntax in SPARQL (an error that stops the parser) but a warning in Turtle.

704998200 added the question label Nov 23, 2023

afs self-assigned this Nov 24, 2023

afs added the bug label Nov 24, 2023

afs added a commit to afs/jena that referenced this issue Dec 1, 2023

apacheGH-2102: Parse and accept [], with warning

99d85f7

afs mentioned this issue Dec 1, 2023

GH-2102: Parse and accept [], with warning #2118

Merged

3 tasks

afs added a commit to afs/jena that referenced this issue Dec 1, 2023

apacheGH-2102: Parse and accept [], with warning

f6969b9

afs added a commit to afs/jena that referenced this issue Dec 1, 2023

apacheGH-2102: Parse and accept [], with warning

30f6c57

afs added a commit to afs/jena that referenced this issue Dec 1, 2023

apacheGH-2102: Parse and accept [], with warning

0466550

afs closed this as completed in #2118 Dec 1, 2023

afs added a commit that referenced this issue Dec 1, 2023

GH-2102: Parse and accept [], with warning

8cc0a5e

Aklakan pushed a commit to Aklakan/jena that referenced this issue Dec 2, 2023

apacheGH-2102: Parse and accept [], with warning

a24d3e9

cnanjo pushed a commit to fhircat/jena that referenced this issue Mar 2, 2024

apacheGH-2102: Parse and accept [], with warning

d11ebf9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does a query that works in Jena 3.16 but throw an error in Jena 4.10? #2102

Why does a query that works in Jena 3.16 but throw an error in Jena 4.10? #2102

704998200 commented Nov 23, 2023 •

edited

rvesse commented Nov 23, 2023

neumarcx commented Nov 23, 2023

afs commented Nov 23, 2023

afs commented Nov 24, 2023

Why does a query that works in Jena 3.16 but throw an error in Jena 4.10? #2102

Why does a query that works in Jena 3.16 but throw an error in Jena 4.10? #2102

Comments

704998200 commented Nov 23, 2023 • edited

Version

Question

rvesse commented Nov 23, 2023

neumarcx commented Nov 23, 2023

afs commented Nov 23, 2023

afs commented Nov 24, 2023

704998200 commented Nov 23, 2023 •

edited