Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for SPARQL CDTs (lists and maps as literals) #2518

Closed
hartig opened this issue Jun 4, 2024 · 7 comments · Fixed by #2501 or #2695
Closed

Support for SPARQL CDTs (lists and maps as literals) #2518

hartig opened this issue Jun 4, 2024 · 7 comments · Fixed by #2501 or #2695
Labels
enhancement Incrementally add new feature

Comments

@hartig
Copy link
Contributor

hartig commented Jun 4, 2024

Version

5.1.0

Feature

The lack of built-in support for generic types of composite values such as lists and maps has been a long-standing issue for RDF and SPARQL. Together with a few other colleagues at the Amazon Neptune team we have developed an approach to represent lists and maps as literals in RDF data, and to extend SPARQL with features related to such literals. These extensions of SPARQL include:

  • an aggregation function to produce these composite values (FOLD),
  • functions to operate on these composite values in expressions, and
  • a new operator (UNFOLD) to unfold such composite values into their individual components.

We have created a complete formal specification of the approach and a comprehensive test suite for implementers, which can be found in our Github repo: https://github.com/awslabs/SPARQL-CDTs

Perhaps before you dive into the aforementioned specification, you may take a look at our short paper, in which we provide a slightly more extensive motivation for this work and a (very!) brief summary of the approach. After that, Section 2 of the specification provides a more detailed informal description of the different parts of the approach.

I am happy to answer any questions that you may have, both about the approach in general and about the idea to integrate the approach into Jena. Also, if you have issues with some parts of the specification, feel free to create an issue in the aforementioned Github repo. (And in case you are wondering, yes we are planning to file the approach as a SPARQL Enhancement Proposal (SEP) for the SPARQL-dev Community Group).

Are you interested in contributing a solution yourself?

Yes

@hartig hartig added the enhancement Incrementally add new feature label Jun 4, 2024
@rvesse
Copy link
Member

rvesse commented Jun 4, 2024

See PR #2501

@afs
Copy link
Member

afs commented Aug 28, 2024

Please continue discussion on this contribution on this issue.

The PR has been merged to branch gh2518-cdt.

@afs
Copy link
Member

afs commented Aug 29, 2024

@olaf - in cleaning the ingested branch code, I found that main.jj does not build with javacc (I have version 7.0.12).

There are two uses of Unfold
and two declarations of Unfold.

This is not due to the ingestion/squash process onto the Jena branch - those links are to the PR #2501.

I don't know how that could have happened but the generated java code did not correspond to the JavaCC input. The problem is now fixed.

I've also added a Builder for OpUnfold so algebra using that operator can be written out and read in again. The command line tool qparse now works (it parses queries and also checks that they print out in a form that equals the input and also have the same algebra).

I've finished cleaning up the code (warnings, some white space things I noticed) for now.

Please test when you have some time.

I'll keep the branch in-step with the main branch. There are some parser changes for RDF 1.2 in the pipeline.

@hartig
Copy link
Contributor Author

hartig commented Aug 31, 2024

@afs

in cleaning the ingested branch code, I found that main.jj does not build with javacc (I have version 7.0.12).

There are two uses of Unfold
and two declarations of Unfold.

I didn't notice myself. I guess that this was an artifact of my earlier attempt to rebase my branch, which was a several-hours project after more than a year of divergence between the branches.

The problem is now fixed.

Thanks!

Thanks also for the other changes (Builder for OpUnfold, cleaning of warnings, etc). I looked through all your commits and they are fine.

Please test when you have some time.

Done. Everything works as expected! That is, i) mvn clean package runs through without problems for that branch, ii) arq.rdftests runs our SPARQL CDTs test suite without issues, and iii) arq.arq can be used to run SPARQL queries that use our CDT-related features.

So, from my side, this can be merged now.

@afs
Copy link
Member

afs commented Aug 31, 2024

Thanks.

I've finished aligning the code. I'll squash/tidy the additional commits and do a few timing tests to make sure nothing major has happened.

I'm busy most of this coming week and this isn't a trivial set of changes, so there will be a short delay.

@afs
Copy link
Member

afs commented Sep 10, 2024

@hartig -- Thank you for the contribution!

It's now merged into the main branch and will be in the next Jena release.
It is also in development snapshots from now on.

@hartig
Copy link
Contributor Author

hartig commented Sep 10, 2024

Great news! Thanks a lot @afs for your help on getting this contribution merged!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Incrementally add new feature
Projects
None yet
3 participants