High-level overview of indexing #1056

turbolent · 2023-08-18T02:39:30Z

Hi there!

I came across qlever while researching the implementations of RDF graph databases.

Though I read the "Knowledge-Base Index" section in the CIKM'17 paper paper and the "Engines and indexing" chapter in the 2023 book chapter, as well as tried to read through src/index (especially IndexImpl), it is still a bit unclear to me what the high-level process is for indexing an input file of unsorted triples (e.g. from a Turtle or NTriples files).

Could you please share some high-level steps that are performed, ideally without leaving optimizations like parallel processing out?

For example, after parsing a triple from the input file, the IDs are derived for the triple parts. How are the the individual permutations generated? Are all triples first written in ID representation written to a temporary place (e.g. temporary file or memory-mapped data structure), and then for each permutation/index, all data in ID representation is re-sorted and re-read/processed to create the current permutation's index?

Such a high-level explanation might be useful as developer documentation and could help onboard new contributors.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High-level overview of indexing #1056

High-level overview of indexing #1056

turbolent commented Aug 18, 2023

High-level overview of indexing #1056

High-level overview of indexing #1056

Comments

turbolent commented Aug 18, 2023