Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parseQuery decode the query string which cause invalid URI #1957

Closed
kfrawee opened this issue May 18, 2022 · 7 comments · Fixed by #1959
Closed

parseQuery decode the query string which cause invalid URI #1957

kfrawee opened this issue May 18, 2022 · 7 comments · Fixed by #1959

Comments

@kfrawee
Copy link

kfrawee commented May 18, 2022

I have the following ttl file:

@prefix : <https://www.example.co/reserved/language#> .

<https://www.example.co/reserved/root> :_id "01G39WKRH76BGY5D3SKDHJP2SX" ;
    :transcript%20data [ :_id "01G39WKRH7JYRX78X7FG4RCNYF" ;
            :_key "transcript%20data" ;
            :value "value" ;
            :value_id "01G39WKRH7PVK1DXQHWT08DZA8" ] .

And I have the following query:

q = """
PREFIX : <https://www.example.co/reserved/language#>

    SELECT  ?o 
    WHERE { ?s :transcript%20data/:value ?o . }
""" 

While trying to query the graph I got from the ttl file I got the following error:
https://www.example.co/reserved/language#transcript data does not look like a valid URI, trying to serialize this will break.
As you see, parseQuery has decoded the "%20" to a space " " which cases invalid URI. And this will return False while passed to _is_valid_uri function.

I've tested the query on different SPARQL engines and it is valid and works as expected.
So, what do you advise? to make the query valid and get the required results?

I am using rdflib Version: 6.1.1 on macOS Monterey 12.4

@kfrawee
Copy link
Author

kfrawee commented May 18, 2022

@aucampia @gjhiggins

@aucampia
Copy link
Member

@kfrawee will look when I have time

@ghost
Copy link

ghost commented May 18, 2022

I have the following ttl file:
...
So, what do you advise? to make the query valid and get the required results?

Looks like an issue with the SPARQL parser so it might be some time before it's fixed. In the interim, you can work around the problem by replacing %20 with _

@ghost
Copy link

ghost commented May 19, 2022

fwiw:

  1. I think the causal regexp is
    PN_LOCAL = Regex(
  2. dunno if it's of any practical use in this instance but it may be worth taking note of / comparing with the example SPARQL 1.1 parser in the pyastbuilder repos

@ghost
Copy link

ghost commented May 19, 2022

Ah, I was close but not close enough.

I'm still exploring the ramifications but if you're prepared to make a small edit to

def _hexExpand(match):
there is a temporary workaround that would enable you to work with the original data:

 def _hexExpand(match):
-    return chr(int(match.group(0)[1:], 16))
+    if match.group(0) == "%20":
+        return match.group(0)
+    else:
+        return chr(int(match.group(0)[1:], 16))

@kfrawee
Copy link
Author

kfrawee commented May 19, 2022

@gjhiggins

if match.group(0) == "%20":

It is not specific for "%20". I am using urllib.parse.quote to URL encode the ttl and quires. So, the same behavior happens to the rest of the characters.

@ghost
Copy link

ghost commented May 19, 2022

It is not specific for "%20" ... the same behavior happens to the rest of the characters.

Yes, indeed so. I addressed the entire set of percent encoded reserved characters in the PR #1959 I submitted.

@aucampia aucampia linked a pull request May 21, 2022 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants