Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write a Proper SparQL parser using Antlr4 #236

Closed
joka921 opened this issue Apr 16, 2019 · 10 comments
Closed

Write a Proper SparQL parser using Antlr4 #236

joka921 opened this issue Apr 16, 2019 · 10 comments
Assignees

Comments

@joka921
Copy link
Member

joka921 commented Apr 16, 2019

This subsumes many of the other issues below.

  • First step:
    • Implement a Sparql parser that supports exactly the same subset as the current one,
      but with a better structure and correct/automated lexing.
@joka921 joka921 self-assigned this Apr 16, 2019
@niklas88
Copy link
Member

@manonthegithub might also be interested in your progress on this so we don't duplicate the work

@niklas88
Copy link
Member

@joka921 @manonthegithub on the internal fork we do have an ANTLR SPARQL grammar for the completion script that could be used for this.

@joka921
Copy link
Member Author

joka921 commented Apr 16, 2019

I have seen this grammar and am already using it

@niklas88
Copy link
Member

Adding to this the current SPARQL parser also breaks if there isn't a space before . at the end of a triple which is often the case for Wikidata examples.

@niklas88
Copy link
Member

niklas88 commented Jun 6, 2019

Ok I tried quickfixing the . issue because it just happens so often. Turns out SPARQL is quite weird here because the . may appear inside literals, prefixed names and IRIs.

For example the following query works (in Blazegraph):

SELECT ?item WHERE {
  ?item wdt:P31 wd:Q2934.?item wdt:P39 wd:Q41240317
}

Using ^ reversing the following also works

SELECT ?item WHERE {
  ?item wdt:P31 wd:Q2934. wd:Q41240317 ^wdt:P39 ?item
}

However removing the after the . breaks parsing even though it's not needed at the same position when the ? disambiguates. So yeah we really should use a proper parser that naturally handles this weirdness.

@niklas88
Copy link
Member

niklas88 commented Jun 6, 2019

@joka921 note that the current ANTLR grammar doesn't support the predicate paths that #244 will soon add. I'll look into this so beware there will be some changes.

@niklas88
Copy link
Member

@floriankramer just a note that this would also add # comments which aren't supported by the new lexer either.

@floriankramer
Copy link
Member

@niklas88 Although adding those into the lexer would be relatively easy (simply consume everything up to and including the next newline when a # is found outside of another token type).

@joka921
Copy link
Member Author

joka921 commented May 15, 2022

Update:

We finally are making progress on this. We already have a complete grammar and the and it is now assigned to @Qup42

@hannahbast
Copy link
Member

This has been done and it was indeed a milestone for QLever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants