New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a FullRelationScan operation. #164
Added a FullRelationScan operation. #164
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM though I did add some questions. Will be interesting to see what kind of performance we get from this.
// ____________________________________________________________ | ||
ConstIterator cend() const { return _map.end(); } | ||
|
||
// ____________________________________________________________ | ||
Iterator end() { return _map.end(); } | ||
|
||
// ____________________________________________________________ | ||
ConstIterator end() const { return _map.end(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already cend()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use a for loop to iterate through the meta data right now (for (auto e : meta)
) and that seems to only use begin
and end
, but I needed a constant version of those as I only had a reference to the constant metadata. std::vector
seems to also have a constant end
and a separate also constant cend
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about for (const auto& e: meta) {…}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niklas88 : the for (const auto...)
overload uses the const overload to begin
and end
.
So they are necessary for Florians use case.
@floriankramer : Niklas is right, MetaData should by all means be accessed in a const
way.
Btw: By a second thought I don't think you will have much fun with this implementation since iterating over the MetaData takes a very long time. I think storing the ids of all subjects, objects and predicates in a sorted way on disk would be much faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joka921 Is there currently any non constant access? The Index only returns a constant reference.
I tried running the current implementation on the wikidata full dataset on galera and aborted the query after it took more than a full minute, so we definitely need to precompute this, preferably during the index building.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@floriankramer I think @joka921 was only talking about the for (auto e: meta)
in your comment which didn't make the constness explicit.
@@ -215,12 +215,18 @@ class MetaDataWrapperHashMap { | |||
// __________________________________________________________________ | |||
Iterator begin() { return _map.begin(); } | |||
|
|||
// __________________________________________________________________ | |||
ConstIterator begin() const { return _map.begin(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use cbegin()
even though in std::vector
there is also a const begin()
see https://en.cppreference.com/w/cpp/container/vector/begin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ad_utility::HashMap
does not have a cbegin, and we can't add it using a simple using statement, (that leads to g++ complaining that there is no matching member).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, ok this was just style nitpicking anyway
return os.str(); | ||
} | ||
|
||
size_t FullRelationScan::getResultWidth() const { return 2; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct when there is not COUNT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The count is always returned by the operation. When it is not selected it is discarded during the json assembly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about when using this as a subquery e.g. to join it with wikibase:claim
?
src/engine/FullRelationScan.cpp
Outdated
std::to_string(static_cast<int>(type))); | ||
} | ||
|
||
for (const std::pair<size_t, const FullRelationMetaData&> elem : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these if
s do basically the same except for which meta data is used and using a different type for predicates. I think this could be handled in a templated method which just gets called with index.getXyzMeta()
and index.getNofXyz()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/index/Index.h
Outdated
return _psoMeta; | ||
} else { | ||
AD_THROW(ad_semsearch::Exception::CHECK_FAILED, | ||
"Can only use the spo metadta if all 6 permutations " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true? I mean we still have _psoMetah
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just forgot to remove the check after copying the method. It's gone now.
45b702f
to
53111c3
Compare
Given the bad performance of this implementation on large datasets, and the possiblity of extracting similar data to what this provided using schema:name I'll close this pr for now. |
This implements a FullRelationScan opertion, which allows for queries of the form:
as we had discussed. The current implementation is based upon the
FullRelationMetadata
stored in the index. It does not do any optimizations (e.g. applying prefix filters when scanning the metadata or caching a prefix of the result and applyingLIMIT
early).The FullRelationScan is used if all of these three conditions are met by the query: