New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pattern Trick for Objects #366
base: master
Are you sure you want to change the base?
Pattern Trick for Objects #366
Conversation
7264422
to
945b7b8
Compare
TODO:
|
I just realized the following phenomenon regarding the efficiency of the "normal" pattern trick + I assume this also applies for the new code. Namely, consider the following query on Fbeasy:
This query is easy using patterns. For each entity, we just need to compute the size of its pattern. We do not even have to materialized the patterns for that. However, that is what seems to happen and hence the query takes very long, even on Fbeasy. It's an important query because it gives a natural way to order entities for any knowledge base., without assuming that any other predicate exists that is a good measure of popularity |
b95592d
to
ab7b49f
Compare
* When an input to the HasPredicateScan from a subquery has no predicates associated with it the result computation would terminate early. * The full scan always used the subject metadata to acquire its output size
@hannahbast Those types of queries should now be supported in this pr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I like this very much. There are some general remarks:
-
This PR is too long. It took me 3 days to review, and contains three seperate features (Objects, Counting, Compression) that each are non-trivial. Please split up features in the future to make the review process more interactive.
-
The general functionality is fine, I had some remarks on software architecture.
-
For some reason you suddenly switched to
snake_case
whereas the rest of QLever usescamelCase
. I think we should stick with a naming scheme for consistency. -
@hannahbast suggested keeping the original
ql:has-predicate
like you did, but additionally adding the aliasql:subject-has-predicate
to match yourql:object-has-predicate
for consistency.
If you have any questions or remarks, don't hesitate to contact me via comments here or telephone.
- num_cols: 2 | ||
- selected: ["?entity", "?count"] | ||
- contains_row: ["<Australia>", "5"] | ||
- contains_row: ["<Astronomer>", "4"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also like an e2e test for the ql:object-has-predicate
where the ?entity is already constrained (e.g. Geographer-Object-Has-Predicate)
_predicateVarName("predicate"), | ||
_countVarName("count"), | ||
_count_for(CountType::SUBJECT) {} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer setting the defaults directly with the member definition, it is error-prone to always manually specify the defaults for members that are not initialized by constructor arguments. E.G the _subjectColumnIndex
Always seems to be 0 unless it is not specified.
_predicateVarName("predicate"), | ||
_countVarName("count"), | ||
_count_for(CountType::SUBJECT) {} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
_predicateVarName("predicate"), | ||
_countVarName("count"), | ||
_count_for(CountType::SUBJECT) {} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
|
||
// _____________________________________________________________________________ | ||
void EntityCountPredicates::setCountFor(CountType count_for) { | ||
_count_for = count_for; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
countFor (camelCase)
|
||
// This is not inside a templated method as the PSO metadata is based upon | ||
// HashMaps which need to be treated differently | ||
TripleVec::bufreader_type reader(*vocabData->idTriples); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment, and I don't think it is necessary. We don't deal with the PSO metadata here at all.
Id langPredLowerBound, Id langPredUpperBound, IndexMetaDataHmap& meta_data, | ||
const std::string& filename_base) { | ||
// This is not inside a templated method as the PSO metadata is based upon | ||
// HashMaps which need to be treated differently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean: We only use this with PSO and PSO is always MetaDataHmap and thus we need no template?
I think this can be left out, it automatically breaks the compilation if we change this.
The only important INFO is that meta_data
has to be sorted by the P column.
for (size_t i = 0; i < _predicate_local_to_global_ids.size(); ++i) { | ||
_predicate_global_to_local_ids.try_emplace( | ||
_predicate_local_to_global_ids[i], i); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate code block here, can be removed.
_predicate_global_to_local_ids, &_object_meta_data, _maxNumPatterns, | ||
langPredLowerBound, langPredUpperBound, meta_data, file); | ||
|
||
_initialized = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this_initialized
variable, is it checked later on?
It also gets true if we are halfway initialized (e.g. Objects but no subjects)
"The requested feature requires a loaded patterns file (" | ||
"do not specify the --no-patterns option for this to work)"); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only skimmed this last part, I assume it is mostly a copy of the original pattern creation with adapted templated types.
In general we should use a proper symmetric serialization mechanism in the future to avoid mistakes with the manual read and write calls.
This pr will do two things: