Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does a query that works in Jena 3.16 but throw an error in Jena 4.10? #2102

Closed
704998200 opened this issue Nov 23, 2023 · 4 comments · Fixed by #2118
Closed

Why does a query that works in Jena 3.16 but throw an error in Jena 4.10? #2102

704998200 opened this issue Nov 23, 2023 · 4 comments · Fixed by #2118
Assignees

Comments

@704998200
Copy link

704998200 commented Nov 23, 2023

Version

4.10

Question

This query works in Jena 3.16

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX brick:   <https://brickschema.org/schema/Brick#>  
SELECT ?room  
WHERE {       
?room brick:isPartOf <https://brickschema.org/schema/1.0.2/building_example#building:gtc/vavs/2/port[zn]>.  
}

wechat_20231123110831

But throw an error in Jena 4.10
[line: 5, col: 31] Bad IRI: 'https://brickschema.org/schema/1.0.2/building_example#building:gtc/vavs/2/port[zn]': <https://brickschema.org/schema/1.0.2/building_example#building:gtc/vavs/2/port[zn]> Code: 0/ILLEGAL_CHARACTER in FRAGMENT: The character violates the grammar rules for URIs/IRIs.

I know it's the problem of '[zn]'.

I try to excape '[zn]',but it also throw an error

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  
PREFIX brick: <https://brickschema.org/schema/Brick#>  
SELECT ?room  
WHERE {     
   ?room brick:isPartOf <https://brickschema.org/schema/1.0.2/building_example#building:gtc/vavs/2/port\[zn\]>.
}

Here is the fragment of the TTL data

<https://brickschema.org/schema/1.0.2/building_example#building:gtc/rooms/1.H.1/room>
        rdf:type        brick:Room;
        rdfs:label      "1.H.1";
        brick:isPartOf  <https://brickschema.org/schema/1.0.2/building_example#building:gtc/vavs/3/port[zn]> .

How can i fix it?

@rvesse
Copy link
Member

rvesse commented Nov 23, 2023

Jena 4.x more strictly enforces certain tests around URI validity because allowing base URIs into the system always leads to problems down the road.

Since this is a URI it should be URL encoded appropriately (i.e. Java/SPARQL backslash escapes are not suitable here), and that needs to happen in both your data and your queries.

In general the way your URIs are structured looks strange. You seem to be trying to put a lot of "structure" into the URI fragment (the portion after the #) when you should really be using / based URL construction to achieve this, and your URIs seem to conflate schema concepts with instances of those concepts.

Your data should probably have URIs more like https://yourdata.com/building/gtc/rooms/1.H.1/room with appropriate rdf:type's declared as you do in your example where yourdata.com is substituted for some appropriate URI for the instances of the data, rather than the schema concept URI as you do currently

@neumarcx
Copy link
Contributor

indeed the IRI validation has changed for SPARQL queries since 3.16. You have to get your data into the correct format. if this is not possible you have to URL encode the path component.

@afs
Copy link
Member

afs commented Nov 23, 2023

It's a bug IMO. The Turtle parse accepts the data with a warning. There ought to be consistency between the SPARQL parser and the Turtle parser.

That said - the data is not legal. [ and ] are not legal in a URI except for IPv6 host addresses.

No amount of escaping will change that.
URI do not have a \\ escape.
Using \u005B doesn't work.

%-encoding as @rvesse mentions is replacing the character [ with %5B and really does put 3 characters %, 5, B into the URI which then must match the data.

Not using [, ], or using %-encoding consistently is the best solution.

Even if Jena allowed it, then the current data is likely to cause problems eslewhere.

The SPARQL query can work with the string form of the URI in the data in an potentially inefficient manner:

      .... brick:isPartOf  ?X .
      FILTER ( str(?X) = "https://brickschema.org/schema/1.0.2/building_example#building:gtc/vavs/3/port[zn]")

@afs afs self-assigned this Nov 24, 2023
@afs afs added the bug label Nov 24, 2023
@afs
Copy link
Member

afs commented Nov 24, 2023

Side note: Parsing in SPARQL and parsing in Turtle are signficantly dofefrent in the way dubious (error or warning) IRIs are treated.

The W3C specs define the IRI token as

IRIREF ::=  `<' ([^#x00-#x20<>"{}|^`\] | [UCHAR] '>'

and then expect further checking for the legality of the string that matches that rule.

Jena's SPARQL parser, ARQ, uses that rule (via javacc) then performs IRI validation.

Jena's Turtle parser uses a custom tokenizer and does more limited checking on the characters between < and > , then performs IRI validation. Because the Turtle tokenizer is custom, the messages are more human-meaningful.

Any IRI validation has to parse the string so it duplicates the character exclusion rules of IRIREF.

This is all known and intended by the W3C working groups - both specs intentionally did not include the full RFC3986/3986 grammar. It is quite large and it would have to be modified for UCHAR. UCHAR escapes mean later checks are necessary anyway. It does not fit well with a standard parser/tokenizer split.

An effect is that { (not the UCHAR way of doing that) is illegal surface syntax in SPARQL (an error that stops the parser) but a warning in Turtle.

afs added a commit to afs/jena that referenced this issue Dec 1, 2023
afs added a commit to afs/jena that referenced this issue Dec 1, 2023
afs added a commit to afs/jena that referenced this issue Dec 1, 2023
afs added a commit to afs/jena that referenced this issue Dec 1, 2023
@afs afs closed this as completed in #2118 Dec 1, 2023
afs added a commit that referenced this issue Dec 1, 2023
Aklakan pushed a commit to Aklakan/jena that referenced this issue Dec 2, 2023
cnanjo pushed a commit to fhircat/jena that referenced this issue Mar 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants