Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented Several internal relations needed for QLeverUI autocomplete #112

Closed
wants to merge 10 commits into from
Closed

Conversation

jbuerklin
Copy link
Contributor

  • ql:num-triples counts the number of triples each entity occurs in
  • ql:num-occurrences counts the number of text records each entity occurs in
  • ql:entity-type returns for each entity whether it is a subject, predicate or object (or any combination of those. The relation is not functional)

These relations are written to their own index files ".index.stats.pso" in order to cleanly separate them from the actual KB index
Both IndexBuilderMain and ServerMain need to be called with -s flag in order for this to work.

@niklas88 niklas88 self-requested a review August 27, 2018 10:11
Copy link
Member

@niklas88 niklas88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work, having everything needed for successful auto completion inside QLever is definitely important to us. Haven't looked at the whole PR yet but wanted to leave some preliminary comments

@@ -366,7 +367,7 @@ QueryExecutionTree QueryPlanner::createExecutionTree(ParsedQuery& pq) const {
// but not necessarily optimal.
// TODO: Adjust so that the optimal place for the operation is found.
if (pq._distinct) {
QueryExecutionTree distinctTree(*final._qet.get());
QueryExecutionTree distinctTree(* final._qet.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be a space here, are you using clang-format already? Also you should install this pre-commit hook to make sure you only commit correctly formatted code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did use the hook and clang-format, but it was version 3.8. Updated to 6.0 now.

@@ -431,7 +432,7 @@ QueryExecutionTree QueryPlanner::createExecutionTree(ParsedQuery& pq) const {

final._qet.get()->setTextLimit(getTextLimit(pq._textLimit));
LOG(DEBUG) << "Done creating execution plan.\n";
return *final._qet.get();
return * final._qet.get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there shouldn't be a space here either

case ResultTable::ResultType::ENTITY_TYPE: {
switch (row[validIndices[j].first]) {
case 0:
os << "subject";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these really be string literals instead of a custom URI? These could even use the ql: prefix to match the predicates and could be stored in the Vocabulary just like any other value.

tree.setOperation(QueryExecutionTree::SCAN_STATS, statScan);
seeds.push_back(plan);
}
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is the exact same as the one above, why do we need both?

@@ -571,7 +572,63 @@ vector<QueryPlanner::SubtreePlan> QueryPlanner::seedWithScansAndText(
ad_semsearch::Exception::BAD_QUERY,
"Triples should have at least one variable. Not the case in: " +
node._triple.asString());
} else if (node._variables.size() == 1) {
}
if (node._triple._p == NUM_TRIPLES_PREDICATE ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are unrelated so I'm not sure if these should be in the same if

auto& tree = *plan._qet.get();
if (isVariable(node._triple._s)) {
std::shared_ptr<Operation> statScan(
new StatScan(_qec, statId, StatScan::ScanType::POS_BOUND_O));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the statId is just a flag for telling the StatScan what operation to actually do, right? Why not have different Operations for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StatScan will always perform a scan on the same file(s) regardless of the statId.
statId mainly affects the result type of the stat's column

LOCAL_VOCAB
LOCAL_VOCAB,
// An integer in range 0,1,2
ENTITY_TYPE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it makes sense to have a special entry type just for this. Why not just add the right URIs to the vocabulary during index construction and then just return the IDs like for any other KB entry?

void StatScan::computePSOfreeS(ResultTable* result) const {
result->_nofColumns = 2;
result->_resultTypes.push_back(ResultTable::ResultType::KB);
if (_statId == Id(0) || _statId == Id(2)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the different types of stats share enough code to warrant being in the same operation

metaData.setup(_totalVocabularySize, FullRelationMetaData::empty,
fileName + MMAP_FILE_SUFFIX);
}
if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong formatting

@@ -80,6 +81,18 @@ void Index::createFromFile(const string& filename, bool allPermutations) {
// also perform unique for first permutation
createPermutation<IndexMetaDataHmap>(&idTriples, Permutation::Pso, true);
createPermutation<IndexMetaDataHmap>(&idTriples, Permutation::Pos);
if (_entityStats) {
ExtVec stats = computeEntityStats(idTriples);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might have some overlap with computeExpensiveStatistics() in this PR

@niklas88
Copy link
Member

Also can you give a small summary why these relations need to be implemented as their own Operations instead of simply computing them during index construction and storing them as normal triples?

@niklas88 niklas88 added this to Needs review in QLever Aug 31, 2018
@jbuerklin
Copy link
Contributor Author

The relations do not need to be implemented as their own at all. I just separated them in order to keep the actual KB data "clean". I don't see a problem in storing them as normal triples if you think that would be the better alternative.
Doing so would also probably solve most of the comments you made earlier.

@niklas88
Copy link
Member

niklas88 commented Sep 5, 2018

@jbuerklin as you already know from our mail yesterday, we have discussed this again and believe your original plan of a special Operation makes sense and is actually easier and less intrusive than adding the predicates during index construction. Also we think it keeps the code better isolated when you keep sharing a class for all 3 predicates. That said can you give a short explanation for why you are switching on statId and why this is an Id? Also why you can't reuse the ResultType fields used by GROUP BY and need to add a new type. The use of Id should be strictly restricted to actual entity IDs. And if it's just to tell the StatsScan what operation is currently needed I feel like an enum would be more suited.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this does not work with a text index.( see comments)

src/index/Index.cpp Show resolved Hide resolved
src/index/Index.cpp Outdated Show resolved Hide resolved
jbuerklin and others added 10 commits September 22, 2018 14:37
- ql:num-triples counts the number of triples each entity occurs in
- ql:num-occurrences counts the number of text records each entity occurs in
- ql:entity-type returns for every entity wether it is a subject, predicate or object (or any combination of those. The relation is not functional)
Added predicates parse the pso pair index instead of the input nt/tsv 
file
@floriankramer
Copy link
Member

Given that nothing happened in this pr since september last year, is this pr still active or can we close it?

@niklas88 niklas88 closed this Mar 14, 2019
QLever automation moved this from Review to Done Mar 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
QLever
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants