Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High-level overview of indexing #1056

Open
turbolent opened this issue Aug 18, 2023 · 0 comments
Open

High-level overview of indexing #1056

turbolent opened this issue Aug 18, 2023 · 0 comments

Comments

@turbolent
Copy link

Hi there!

I came across qlever while researching the implementations of RDF graph databases.

Though I read the "Knowledge-Base Index" section in the CIKM'17 paper paper and the "Engines and indexing" chapter in the 2023 book chapter, as well as tried to read through src/index (especially IndexImpl), it is still a bit unclear to me what the high-level process is for indexing an input file of unsorted triples (e.g. from a Turtle or NTriples files).

Could you please share some high-level steps that are performed, ideally without leaving optimizations like parallel processing out?

For example, after parsing a triple from the input file, the IDs are derived for the triple parts. How are the the individual permutations generated? Are all triples first written in ID representation written to a temporary place (e.g. temporary file or memory-mapped data structure), and then for each permutation/index, all data in ID representation is re-sorted and re-read/processed to create the current permutation's index?

Such a high-level explanation might be useful as developer documentation and could help onboard new contributors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant