Proof of concept for SPARQL UPDATE #916

hannahbast · 2023-03-19T15:06:51Z

This is not yet meant for production but as a proof of concept. At the time of the creation of this PR, there are commands insert and delete for inserting or deleting a single triple via the API. The triple is located in each of the permutations, and the location is sorted in an internal data structure (of a new class DeltaTriples). There are already extensive unit tests for this functionality, and it seems to work correctly.

There is no code yet that actually takes the inserted or deleted triples into account when processing queries. But that is actually relatively little work (when reading a block from a permutation, just augment the result by the delta triples stored for that block). The hard part, it seems, is the location.

1. Add support for URL parameters "insert=..." and "delete=..." for inserting or deleting a single triple. 2. Stub of new class `DeltaTriples` that maintains the set of inserted and deleted triples. 3. First working implementation of a method `findTripleInPermutation` that for a given triple and a given permutation, finds the matching block in that permutation and the right position in that block.

There is now a test that checks for all existing triples whether the found location is correct (by checking `id2` and `id3` at the found position in the found block, note that the block does not have an explicity `id1` for a given position). The `findTripleInPermutation` method is still (very) inefficient in that it goes through the complete relation metadata in order to find the sequence of `id1`s relevant for a block. This will be fixed in the next commit. Note: the previous commit lacked the new files `DeltaTriples.h`, `DeltaTriples.cpp`, and `DeltaTriplesTest.cpp`.

1. Refactored `DeltaTriple::locateTripleInAllPermutations` (the central method of this class). 2. Wrote a test that checks all triple that are contained in the index as well as a slightly modified version of each triple that is not in the index. The test checks that the triple has been located at the exact right position in all permutations. (This is harder than it seems because a lot of things can go wrong + we do not have the relation `Id`s for the blocks explicitly, but only implicitly via the relation metadata.) 3. The method `locateTripleInAllPermutations` now inserts the results into proper data structures that can then be used conveniently in an index scan (writing that code is the next step, but it should be relatively straightforward now).

…cept

1. The internal data structure for each permutation now stores a `std::set` (which is ordered) for the triples located at a certain position of a certain block. 2. For each triple that is inserted or deleted, remember the iterator into the corresponding `std::set` for each permutation. This is important for undoing an insertion or deletion (for example, when re-inserting a previous deleted triple). 3. The unit tests now test insertion and deletion directly. The following cases are covered now: deletion of triples that exist in the index, insertion of triples that did not exist in the index, deletion of previously inserted triples, and re-insertion of previously deleted triples. 4. Move the visualization code (that show the contents of a particular block where a triple has been located) out of the implementation of the `DeltaTriples` class and into `DeltaTriplesTest`.

codecov · 2023-03-19T15:43:25Z

Codecov Report

Patch coverage: 85.29% and project coverage change: -1.97 ⚠️

Comparison is base (df93d56) 74.07% compared to head (48578d3) 72.11%.

❗ Current head 48578d3 differs from pull request most recent head 8910f61. Consider uploading reports for the commit 8910f61 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #916      +/-   ##
==========================================
- Coverage   74.07%   72.11%   -1.97%     
==========================================
  Files         254      247       -7     
  Lines       23997    24389     +392     
  Branches     3021     3156     +135     
==========================================
- Hits        17776    17587     -189     
- Misses       5010     5449     +439     
- Partials     1211     1353     +142

Impacted Files	Coverage Δ
src/global/Id.h	`84.84% <ø> (ø)`
src/index/CompressedRelation.h	`100.00% <ø> (ø)`
src/index/ConstantsIndexBuilding.h	`100.00% <ø> (ø)`
src/index/Index.h	`100.00% <ø> (+25.00%)`	⬆️
src/index/IndexImpl.h	`57.14% <0.00%> (-9.53%)`	⬇️
src/index/IndexMetaData.h	`67.56% <ø> (+1.85%)`	⬆️
src/util/AllocatorWithLimit.h	`95.16% <ø> (ø)`
src/index/Index.cpp	`46.39% <22.22%> (-3.61%)`	⬇️
src/index/DeltaTriples.cpp	`68.90% <68.90%> (ø)`
src/index/DeltaTriples.h	`83.33% <83.33%> (ø)`
... and 9 more

... and 88 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

It complained about an assignment in the condition of an `if`.

1. The block size used to be `1 << 23` (over 8M), which is too large, since we always need to decompress at least one whole block, even when reading only few triples. It's now 100'000, which still has a small relatively small overall space consumption. 2. Add member `_col2LastId` to block data because we need it for the delta triples (#916).

1. Each permutation now has information about the delta triples per block and where exactly they are located in that permutation 2. As a proof of concept, the LOG already shows the number of delta triples found in the blocks relevant for the scan.

1. Improved the names. In particular, `TripleWithPosition` is now called `LocatedTriple` and analogously for the related classes. 2. Put `LocatedTriple` and associated classed in own file `LocatedTriple.h`. Also move a function there that used to be a helper lambda in one of `DeltaTriples` methods. TODO: The `locatedTripleInPermutation` method should also be moved to `LocatedTriple.h` and there should be a `test/LocatedTripleTest` with all the associated tests.

…cept

1. The code for locating triples for an individual permutation is now no longer in the (already too big) class `DeltaTriples` but in separate classes in files `LocatedTriples.{h,cpp}`. The corresponding tests are also in a new file `test/LocatedTriplesTest.cpp` now. Used the opportunity to improve the code in several respects. 2. First attempt at writing a function that or a given block merges the relevant delta triples into it. Tried to do it in-place without using an extra array, until I realized that that leads to a very hard algorithmic problem, which most likely can't be solved practically efficiently. At least it's clear now that the best approach is to first decompress the block into a temporary array and the merge that temporary array and the relevant delta triples in the pre-allocated portion of the result `IdTable`. That's what I will implement next and it shouldn't be hard. 3. When testing the merging, it helps to output the columns of an `IdTable`. For that, values like `VocabIndex:15` are rather unpractical to read, so I abbreviated these to use a one-letter prefix, like in `V:15`.

…cept

There is now a method `LocatedTriplesPerBlock::mergeTriples` that merges the delta triple into a given (possibly partial) block. There are unit tests for the following cases: full block with unrestricted `id1` and `id2`, full block with restricted `id1`, patial block with restricetd `id1`, partial block with restricted `id1` and `id2`, and the latter with only a single column. Removed the previous complicated version that tried to do it in place.

Implemented it and wrote some unit tests. TODO: If the first block of the scan is incomplete, delta triples are currently ignored (with a warning). There is no principle problem to add this, but this first needs some refactoring of the original code to avoid code duplication.

1. The block size used to be 1 << 23 (over 8M), which is too large, since we always need to decompress at least one whole block, even when reading only few triples. It's now 500'000, which still has a relatively small overall space consumption. 2. Add member _col2LastId to block data because we will need it for #916 and it's nice if our live indexes already have this information so that we can play around with this PR without having to rebuild indexes. 3. Renamed the data members in `CompressedRelation.h` such that they have `trailingUnderscores_`. 4. Unrelated fix in IndexTestHelpers.h: The test TTL file was not deleted after the test, now it is.

The variant of `CompressedRelation::scan` with two `Id`s fixed now also considers delta triples. This was significantly more complicated than for the variant with only one `Id` fixed and required quite a bit of refactoring. TODO: For the first incomplete block when only a single `Id` is fixed, delta triples are still not considered. That should be easy to add though.

There was still one case missing: the possibly incomplete bloc at the beginning when only a single `Id` is fixed. Now delta tiples ae also considered for these blocks. TOOD: The unit test works for a permutation with a single relation, but there is still a problem when there are multiple relations.

…cept

I am not convinced though that it was a bug.

This reverts commit 3afe571.

…cept

sonarcloud · 2023-04-17T18:47:31Z

SonarCloud Quality Gate failed.

1 Bug
0 Vulnerabilities
0 Security Hotspots
84 Code Smells

No Coverage information
0.0% Duplication

[incomplete, intermediate commit so that I can switch branches]

One test in `LocatedTriplesTest` still fails because writing the files for a PSO permutation and then reading from it no longer works as it did. I hope that Johannes or Julian can help me.

…cept

This is the first part of a series of PRs split of from the large proof-of-concept PR #916, which realizes SPARQL 1.1 Update

sonarcloud · 2023-06-09T16:01:18Z

SonarCloud Quality Gate failed.

1 Bug
0 Vulnerabilities
0 Security Hotspots
79 Code Smells

No Coverage information
0.0% Duplication

The code is based on the PRs ad-freiburg#916 and ad-freiburg#1000

arcangelo7 · 2024-05-20T10:40:26Z

Hi Hannah,

Thank you for the updates on the proof of concept for SPARQL UPDATE. For the OpenCitations project, the update functionality has become a priority as we are looking to migrate our OpenCitations Meta data to Qlever. We would like to inquire about the current progress on this feature.

Specifically, we are interested in understanding if it is possible to add triples via SPARQL using the insert and delete commands mentioned. From the PR description, it is not clear if these operations can be performed directly via SPARQL or if they are limited to inserting or deleting one triple at a time through the API.

Could you please provide more details on this? Any updates or clarifications on the state of the SPARQL UPDATE feature would be greatly appreciated.

Hannah Bast added 6 commits March 15, 2023 21:11

Merge remote-tracking branch 'origin/master' into update-proof-of-con…

9680383

…cept

Make clang-format happy

d2d99e0

Try to make the more pedantic native build happy

4017177

It complained about an assignment in the condition of an `if`.

hannahbast mentioned this pull request Mar 19, 2023

Smaller block size and add _col2LastId to block metadata #917

Merged

Hannah Bast added 8 commits March 21, 2023 13:25

Merge remote-tracking branch 'origin/master' into update-proof-of-con…

f9506a8

…cept

Merge remote-tracking branch 'origin/master' into update-proof-of-con…

0dbd01e

…cept

Merge remote-tracking branch 'origin/master' into update-proof-of-con…

82911e7

…cept

Hannah Bast added 8 commits April 4, 2023 12:31

Merge remote-tracking branch 'origin/master' into update-proof-of-con…

ad743bb

…cept

Resolve conflict that git silently merged in the last commit

7a36a9d

Merge remote-tracking branch 'origin/master' into update-proof-of-con…

b10ec84

…cept

Address bug pointed out by SonarCloud

3afe571

I am not convinced though that it was a bug.

Revert "Address bug pointed out by SonarCloud"

2f4b105

This reverts commit 3afe571.

Merge remote-tracking branch 'origin/master' into update-proof-of-con…

48578d3

…cept

Hannah Bast added 2 commits May 24, 2023 22:19

Merge branch 'master' into update-proof-of-concept

3906826

[incomplete, intermediate commit so that I can switch branches]

Make it compile again

ad1163c

One test in `LocatedTriplesTest` still fails because writing the files for a PSO permutation and then reading from it no longer works as it did. I hope that Johannes or Julian can help me.

Hannah Bast added 5 commits June 8, 2023 21:45

Merge remote-tracking branch 'origin/master' into update-proof-of-con…

ebd075c

…cept

Merge remote-tracking branch 'origin/master' into update-proof-of-con…

8cd5186

…cept

Fix test with help of Johannes

b881717

Latest submodules

e88c293

Some clean up before splitting off a first smaller PR

8910f61

hannahbast pushed a commit that referenced this pull request Jun 9, 2023

Code for locating triples in an existing index

84a4bdf

This is the first part of a series of PRs split of from the large proof-of-concept PR #916, which realizes SPARQL 1.1 Update

hannahbast mentioned this pull request Jun 9, 2023

Code for locating triples in an existing index #1000

Open

Qup42 added a commit to Qup42/qlever that referenced this pull request Apr 29, 2024

Initial adaption of LocatedTriples

4a145d1

The code is based on the PRs ad-freiburg#916 and ad-freiburg#1000

Qup42 mentioned this pull request May 22, 2024

Small refactorings as preparation for UPDATE queries #1353

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proof of concept for SPARQL UPDATE #916

Proof of concept for SPARQL UPDATE #916

hannahbast commented Mar 19, 2023

codecov bot commented Mar 19, 2023 •

edited

Loading

sonarcloud bot commented Apr 17, 2023

sonarcloud bot commented Jun 9, 2023

arcangelo7 commented May 20, 2024 •

edited

Loading

Proof of concept for SPARQL UPDATE #916

Are you sure you want to change the base?

Proof of concept for SPARQL UPDATE #916

Conversation

hannahbast commented Mar 19, 2023

codecov bot commented Mar 19, 2023 • edited Loading

Codecov Report

sonarcloud bot commented Apr 17, 2023

sonarcloud bot commented Jun 9, 2023

arcangelo7 commented May 20, 2024 • edited Loading

codecov bot commented Mar 19, 2023 •

edited

Loading

arcangelo7 commented May 20, 2024 •

edited

Loading