-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speeding up the first phase of Index Building #302
Commits on Jan 27, 2020
-
Speeding up the first phase of Index Building
- Identified the writing of Triple elements to hash maps for deduplication as a bottleneck. - Speed this up by writing to multiple hash maps at once and merging them afterwards - Also concurrently pipelined all the other steps in the first index building phase. - That way the TurtleParsing now becomes the bottleneck (Teaser: This can be sped up by 30% using compile time regexes, so this might become even faster in a later PR). - This was done by Implementing an abstract, templated BatchedPipeline that abstracts over the creation and transformation of values in a pipeline that allows to control the degree of concurrency used on each level and between the different levels. - This pipeline infrastructure was heavily unit-tested to ensure its correctness since it internally uses quite some template magic. - This Commit also introduces the absl::flat_hash_map that is faster than the dense_hash_map used before. But in doubt I can also revert this change since in the meantime the parallelism is what helps us more than the faster hash maps. But I have thought, that we wanted to try out those anyway. - This can already be reviewed, especially the BatchedPipeline.h file. Before merging I would suggest merging the Unicode PR, because there is some merging work to be done (but not too much) between those PRs.
Configuration menu - View commit details
-
Copy full SHA for 68196f3 - Browse repository at this point
Copy the full SHA 68196f3View commit details -
Configuration menu - View commit details
-
Copy full SHA for a8d47fb - Browse repository at this point
Copy the full SHA a8d47fbView commit details -
Set the default Collation Level at IndexBuild time to IDENTICAL to en…
…sure total ordering.
Configuration menu - View commit details
-
Copy full SHA for e712839 - Browse repository at this point
Copy the full SHA e712839View commit details -
Included Unicode Normalization To ensure the reproducability of Index…
… Builds TODO: this also appears in another Branch, after merging, rewrite the history?
Configuration menu - View commit details
-
Copy full SHA for 8781f06 - Browse repository at this point
Copy the full SHA 8781f06View commit details -
Adressed the changes from Niklas' Review.
I would like to test the performance before merging this, as it should be at least somewhat fast.
Configuration menu - View commit details
-
Copy full SHA for 0f3dfbc - Browse repository at this point
Copy the full SHA 0f3dfbcView commit details -
We don't collate using ICU directly, but create SortKeys for every El…
…ement when we first see it - This boosts up the speed in our parallel pipeline.
Configuration menu - View commit details
-
Copy full SHA for 0e9934d - Browse repository at this point
Copy the full SHA 0e9934dView commit details -
replace google::sparsehash by absl::flat_hash_map
It is much faster, especially for large hash maps Replaced the default implementation of ad_utility::HashMap. We no longer need a default-key provider. absl strictly randomizes the iteration order of the hash map, so some unit tests had to be changed.
Configuration menu - View commit details
-
Copy full SHA for cc711ec - Browse repository at this point
Copy the full SHA cc711ecView commit details -
Only use 50M triples per Partial vocabulary for the wikidata build, t…
…o account for the overhead of the SortKeys.
Configuration menu - View commit details
-
Copy full SHA for d22e798 - Browse repository at this point
Copy the full SHA d22e798View commit details
Commits on Jan 28, 2020
-
Made the QueryPlanner return deterministic plans for the unit tests
- (mostly) In the unit tests there sometimes are Execution(Sub)Trees that are equivalent and return the same costEstimate(). The query planner must deterministically decide between them to make the current unit tests pass. - This is not the case with the newly integrated absl::flat_hash_map which purposely randomizes its iteration order. - with this commit, The QueryPlanner detects that it is in the unit test mode (no ExecutionContext assigned) and then deterministically chooses the alternative with the smaller cache key on equal cost. - Some of the unit tests had to be adapted to match this behavior.
Configuration menu - View commit details
-
Copy full SHA for 7a77acd - Browse repository at this point
Copy the full SHA 7a77acdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 72e1aef - Browse repository at this point
Copy the full SHA 72e1aefView commit details
Commits on Jan 29, 2020
-
Configuration menu - View commit details
-
Copy full SHA for 2c68abc - Browse repository at this point
Copy the full SHA 2c68abcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0d32451 - Browse repository at this point
Copy the full SHA 0d32451View commit details
Commits on Jan 30, 2020
-
QueryPlannerTest is no longer implementation-dependent.
Previously the size estimate dummys for Execution trees used std::hash<string> which is implementation-defined. Thus the QueryPlannerTest failed on some platforms without indicating a bug in the QueryPlanner or QLever in general. Now we use deterministic estimates.
Configuration menu - View commit details
-
Copy full SHA for 0cbcf8b - Browse repository at this point
Copy the full SHA 0cbcf8bView commit details -
QueryPlannerTest is no longer implementation-dependent.
Previously the size estimate dummys for Execution trees used std::hash<string> which is implementation-defined. Thus the QueryPlannerTest failed on some platforms without indicating a bug in the QueryPlanner or QLever in general. Now we use deterministic estimates.
Configuration menu - View commit details
-
Copy full SHA for 3783f38 - Browse repository at this point
Copy the full SHA 3783f38View commit details