New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented Several internal relations needed for QLeverUI autocomplete #112
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work, having everything needed for successful auto completion inside QLever is definitely important to us. Haven't looked at the whole PR yet but wanted to leave some preliminary comments
src/engine/QueryPlanner.cpp
Outdated
@@ -366,7 +367,7 @@ QueryExecutionTree QueryPlanner::createExecutionTree(ParsedQuery& pq) const { | |||
// but not necessarily optimal. | |||
// TODO: Adjust so that the optimal place for the operation is found. | |||
if (pq._distinct) { | |||
QueryExecutionTree distinctTree(*final._qet.get()); | |||
QueryExecutionTree distinctTree(* final._qet.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be a space here, are you using clang-format
already? Also you should install this pre-commit hook to make sure you only commit correctly formatted code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did use the hook and clang-format, but it was version 3.8. Updated to 6.0 now.
src/engine/QueryPlanner.cpp
Outdated
@@ -431,7 +432,7 @@ QueryExecutionTree QueryPlanner::createExecutionTree(ParsedQuery& pq) const { | |||
|
|||
final._qet.get()->setTextLimit(getTextLimit(pq._textLimit)); | |||
LOG(DEBUG) << "Done creating execution plan.\n"; | |||
return *final._qet.get(); | |||
return * final._qet.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there shouldn't be a space here either
src/engine/QueryExecutionTree.h
Outdated
case ResultTable::ResultType::ENTITY_TYPE: { | ||
switch (row[validIndices[j].first]) { | ||
case 0: | ||
os << "subject"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these really be string literals instead of a custom URI? These could even use the ql:
prefix to match the predicates and could be stored in the Vocabulary
just like any other value.
src/engine/QueryPlanner.cpp
Outdated
tree.setOperation(QueryExecutionTree::SCAN_STATS, statScan); | ||
seeds.push_back(plan); | ||
} | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block is the exact same as the one above, why do we need both?
@@ -571,7 +572,63 @@ vector<QueryPlanner::SubtreePlan> QueryPlanner::seedWithScansAndText( | |||
ad_semsearch::Exception::BAD_QUERY, | |||
"Triples should have at least one variable. Not the case in: " + | |||
node._triple.asString()); | |||
} else if (node._variables.size() == 1) { | |||
} | |||
if (node._triple._p == NUM_TRIPLES_PREDICATE || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are unrelated so I'm not sure if these should be in the same if
src/engine/QueryPlanner.cpp
Outdated
auto& tree = *plan._qet.get(); | ||
if (isVariable(node._triple._s)) { | ||
std::shared_ptr<Operation> statScan( | ||
new StatScan(_qec, statId, StatScan::ScanType::POS_BOUND_O)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the statId
is just a flag for telling the StatScan
what operation to actually do, right? Why not have different Operation
s for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StatScan will always perform a scan on the same file(s) regardless of the statId.
statId mainly affects the result type of the stat's column
LOCAL_VOCAB | ||
LOCAL_VOCAB, | ||
// An integer in range 0,1,2 | ||
ENTITY_TYPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it makes sense to have a special entry type just for this. Why not just add the right URIs to the vocabulary during index construction and then just return the IDs like for any other KB entry?
src/engine/StatScan.cpp
Outdated
void StatScan::computePSOfreeS(ResultTable* result) const { | ||
result->_nofColumns = 2; | ||
result->_resultTypes.push_back(ResultTable::ResultType::KB); | ||
if (_statId == Id(0) || _statId == Id(2)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the different types of stats share enough code to warrant being in the same operation
src/index/Index.cpp
Outdated
metaData.setup(_totalVocabularySize, FullRelationMetaData::empty, | ||
fileName + MMAP_FILE_SUFFIX); | ||
} | ||
if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong formatting
src/index/Index.cpp
Outdated
@@ -80,6 +81,18 @@ void Index::createFromFile(const string& filename, bool allPermutations) { | |||
// also perform unique for first permutation | |||
createPermutation<IndexMetaDataHmap>(&idTriples, Permutation::Pso, true); | |||
createPermutation<IndexMetaDataHmap>(&idTriples, Permutation::Pos); | |||
if (_entityStats) { | |||
ExtVec stats = computeEntityStats(idTriples); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might have some overlap with computeExpensiveStatistics()
in this PR
Also can you give a small summary why these relations need to be implemented as their own |
The relations do not need to be implemented as their own at all. I just separated them in order to keep the actual KB data "clean". I don't see a problem in storing them as normal triples if you think that would be the better alternative. |
@jbuerklin as you already know from our mail yesterday, we have discussed this again and believe your original plan of a special |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this does not work with a text index.( see comments)
- ql:num-triples counts the number of triples each entity occurs in - ql:num-occurrences counts the number of text records each entity occurs in - ql:entity-type returns for every entity wether it is a subject, predicate or object (or any combination of those. The relation is not functional)
Added predicates parse the pso pair index instead of the input nt/tsv file
Given that nothing happened in this pr since september last year, is this pr still active or can we close it? |
These relations are written to their own index files ".index.stats.pso" in order to cleanly separate them from the actual KB index
Both IndexBuilderMain and ServerMain need to be called with -s flag in order for this to work.