Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern Trick for Objects #366

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

floriankramer
Copy link
Member

This pr will do two things:

  • Reduce the number of bytes used to represent predicate ids within patterns and the hasPredicate relation
  • Add an object based hasPredicate relation which also supports the pattern trick

@floriankramer floriankramer force-pushed the reversed_pattern_trick branch 5 times, most recently from 7264422 to 945b7b8 Compare December 4, 2020 13:00
@floriankramer
Copy link
Member Author

floriankramer commented Dec 4, 2020

TODO:

  • Ignore literals when generating the pattern trick for objects
  • Implement usage of the pattern trick for objects data
  • testing

@hannahbast
Copy link
Member

I just realized the following phenomenon regarding the efficiency of the "normal" pattern trick + I assume this also applies for the new code. Namely, consider the following query on Fbeasy:

SELECT ?entity (COUNT(?predicate) AS ?count) WHERE {
  ?entity ql:has-predicate ?predicate
}
GROUP BY ?entity 
ORDER BY DESC(?count)

This query is easy using patterns. For each entity, we just need to compute the size of its pattern. We do not even have to materialized the patterns for that. However, that is what seems to happen and hence the query takes very long, even on Fbeasy.

It's an important query because it gives a natural way to order entities for any knowledge base., without assuming that any other predicate exists that is a good measure of popularity

@floriankramer
Copy link
Member Author

@hannahbast Those types of queries should now be supported in this pr

@floriankramer floriankramer marked this pull request as ready for review February 12, 2021 13:42
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I like this very much. There are some general remarks:

  • This PR is too long. It took me 3 days to review, and contains three seperate features (Objects, Counting, Compression) that each are non-trivial. Please split up features in the future to make the review process more interactive.

  • The general functionality is fine, I had some remarks on software architecture.

  • For some reason you suddenly switched to snake_case whereas the rest of QLever uses camelCase. I think we should stick with a naming scheme for consistency.

  • @hannahbast suggested keeping the original ql:has-predicate like you did, but additionally adding the alias ql:subject-has-predicate to match your ql:object-has-predicate for consistency.
    If you have any questions or remarks, don't hesitate to contact me via comments here or telephone.

- num_cols: 2
- selected: ["?entity", "?count"]
- contains_row: ["<Australia>", "5"]
- contains_row: ["<Astronomer>", "4"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also like an e2e test for the ql:object-has-predicate where the ?entity is already constrained (e.g. Geographer-Object-Has-Predicate)

_predicateVarName("predicate"),
_countVarName("count"),
_count_for(CountType::SUBJECT) {}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer setting the defaults directly with the member definition, it is error-prone to always manually specify the defaults for members that are not initialized by constructor arguments. E.G the _subjectColumnIndex Always seems to be 0 unless it is not specified.

_predicateVarName("predicate"),
_countVarName("count"),
_count_for(CountType::SUBJECT) {}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

_predicateVarName("predicate"),
_countVarName("count"),
_count_for(CountType::SUBJECT) {}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here


// _____________________________________________________________________________
void EntityCountPredicates::setCountFor(CountType count_for) {
_count_for = count_for;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

countFor (camelCase)


// This is not inside a templated method as the PSO metadata is based upon
// HashMaps which need to be treated differently
TripleVec::bufreader_type reader(*vocabData->idTriples);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment, and I don't think it is necessary. We don't deal with the PSO metadata here at all.

Id langPredLowerBound, Id langPredUpperBound, IndexMetaDataHmap& meta_data,
const std::string& filename_base) {
// This is not inside a templated method as the PSO metadata is based upon
// HashMaps which need to be treated differently
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean: We only use this with PSO and PSO is always MetaDataHmap and thus we need no template?
I think this can be left out, it automatically breaks the compilation if we change this.
The only important INFO is that meta_data has to be sorted by the P column.

for (size_t i = 0; i < _predicate_local_to_global_ids.size(); ++i) {
_predicate_global_to_local_ids.try_emplace(
_predicate_local_to_global_ids[i], i);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate code block here, can be removed.

_predicate_global_to_local_ids, &_object_meta_data, _maxNumPatterns,
langPredLowerBound, langPredUpperBound, meta_data, file);

_initialized = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of this_initialized variable, is it checked later on?
It also gets true if we are halfway initialized (e.g. Objects but no subjects)

"The requested feature requires a loaded patterns file ("
"do not specify the --no-patterns option for this to work)");
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only skimmed this last part, I assume it is mostly a copy of the original pattern creation with adapted templated types.
In general we should use a proper symmetric serialization mechanism in the future to avoid mistakes with the manual read and write calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants