Implemented Several internal relations needed for QLeverUI autocomplete #112

jbuerklin · 2018-08-25T15:33:14Z

ql:num-triples counts the number of triples each entity occurs in
ql:num-occurrences counts the number of text records each entity occurs in
ql:entity-type returns for each entity whether it is a subject, predicate or object (or any combination of those. The relation is not functional)

These relations are written to their own index files ".index.stats.pso" in order to cleanly separate them from the actual KB index
Both IndexBuilderMain and ServerMain need to be called with -s flag in order for this to work.

niklas88

Thanks for your work, having everything needed for successful auto completion inside QLever is definitely important to us. Haven't looked at the whole PR yet but wanted to leave some preliminary comments

niklas88 · 2018-08-27T10:15:07Z

src/engine/QueryPlanner.cpp

@@ -366,7 +367,7 @@ QueryExecutionTree QueryPlanner::createExecutionTree(ParsedQuery& pq) const {
  // but not necessarily optimal.
  // TODO: Adjust so that the optimal place for the operation is found.
  if (pq._distinct) {
-    QueryExecutionTree distinctTree(*final._qet.get());
+    QueryExecutionTree distinctTree(* final._qet.get());


There shouldn't be a space here, are you using clang-format already? Also you should install this pre-commit hook to make sure you only commit correctly formatted code.

I did use the hook and clang-format, but it was version 3.8. Updated to 6.0 now.

niklas88 · 2018-08-27T10:15:28Z

src/engine/QueryPlanner.cpp

@@ -431,7 +432,7 @@ QueryExecutionTree QueryPlanner::createExecutionTree(ParsedQuery& pq) const {

  final._qet.get()->setTextLimit(getTextLimit(pq._textLimit));
  LOG(DEBUG) << "Done creating execution plan.\n";
-  return *final._qet.get();
+  return * final._qet.get();


there shouldn't be a space here either

niklas88 · 2018-08-27T15:33:10Z

src/engine/QueryExecutionTree.h

+          case ResultTable::ResultType::ENTITY_TYPE: {
+            switch (row[validIndices[j].first]) {
+              case 0:
+                os << "subject";


Should these really be string literals instead of a custom URI? These could even use the ql: prefix to match the predicates and could be stored in the Vocabulary just like any other value.

niklas88 · 2018-08-27T15:37:52Z

src/engine/QueryPlanner.cpp

+            tree.setOperation(QueryExecutionTree::SCAN_STATS, statScan);
+            seeds.push_back(plan);
+          }
+          {


This block is the exact same as the one above, why do we need both?

niklas88 · 2018-08-27T15:38:22Z

src/engine/QueryPlanner.cpp

@@ -571,7 +572,63 @@ vector<QueryPlanner::SubtreePlan> QueryPlanner::seedWithScansAndText(
            ad_semsearch::Exception::BAD_QUERY,
            "Triples should have at least one variable. Not the case in: " +
                node._triple.asString());
-      } else if (node._variables.size() == 1) {
+      }
+      if (node._triple._p == NUM_TRIPLES_PREDICATE ||


These are unrelated so I'm not sure if these should be in the same if

niklas88 · 2018-08-27T15:40:04Z

src/engine/QueryPlanner.cpp

+          auto& tree = *plan._qet.get();
+          if (isVariable(node._triple._s)) {
+            std::shared_ptr<Operation> statScan(
+                new StatScan(_qec, statId, StatScan::ScanType::POS_BOUND_O));


So the statId is just a flag for telling the StatScan what operation to actually do, right? Why not have different Operations for this?

StatScan will always perform a scan on the same file(s) regardless of the statId.
statId mainly affects the result type of the stat's column

niklas88 · 2018-08-27T15:43:31Z

src/engine/ResultTable.h

-    LOCAL_VOCAB
+    LOCAL_VOCAB,
+    // An integer in range 0,1,2
+    ENTITY_TYPE


I'm not sure it makes sense to have a special entry type just for this. Why not just add the right URIs to the vocabulary during index construction and then just return the IDs like for any other KB entry?

niklas88 · 2018-08-27T15:50:35Z

src/engine/StatScan.cpp

+void StatScan::computePSOfreeS(ResultTable* result) const {
+  result->_nofColumns = 2;
+  result->_resultTypes.push_back(ResultTable::ResultType::KB);
+  if (_statId == Id(0) || _statId == Id(2)) {


I'm not sure the different types of stats share enough code to warrant being in the same operation

niklas88 · 2018-08-27T15:52:02Z

src/index/Index.cpp

-    metaData.setup(_totalVocabularySize, FullRelationMetaData::empty,
-                   fileName + MMAP_FILE_SUFFIX);
-  }
+  if


wrong formatting

niklas88 · 2018-08-27T16:00:46Z

src/index/Index.cpp

@@ -80,6 +81,18 @@ void Index::createFromFile(const string& filename, bool allPermutations) {
  // also perform unique for first permutation
  createPermutation<IndexMetaDataHmap>(&idTriples, Permutation::Pso, true);
  createPermutation<IndexMetaDataHmap>(&idTriples, Permutation::Pos);
+  if (_entityStats) {
+    ExtVec stats = computeEntityStats(idTriples);


I think this might have some overlap with computeExpensiveStatistics() in this PR

niklas88 · 2018-08-29T07:16:53Z

Also can you give a small summary why these relations need to be implemented as their own Operations instead of simply computing them during index construction and storing them as normal triples?

jbuerklin · 2018-09-01T12:53:57Z

The relations do not need to be implemented as their own at all. I just separated them in order to keep the actual KB data "clean". I don't see a problem in storing them as normal triples if you think that would be the better alternative.
Doing so would also probably solve most of the comments you made earlier.

niklas88 · 2018-09-05T09:15:43Z

@jbuerklin as you already know from our mail yesterday, we have discussed this again and believe your original plan of a special Operation makes sense and is actually easier and less intrusive than adding the predicates during index construction. Also we think it keeps the code better isolated when you keep sharing a class for all 3 predicates. That said can you give a short explanation for why you are switching on statId and why this is an Id? Also why you can't reuse the ResultType fields used by GROUP BY and need to add a new type. The use of Id should be strictly restricted to actual entity IDs. And if it's just to tell the StatsScan what operation is currently needed I feel like an enum would be more suited.

joka921

Currently this does not work with a text index.( see comments)

src/index/Index.cpp

- ql:num-triples counts the number of triples each entity occurs in - ql:num-occurrences counts the number of text records each entity occurs in - ql:entity-type returns for every entity wether it is a subject, predicate or object (or any combination of those. The relation is not functional)

Added predicates parse the pso pair index instead of the input nt/tsv file

floriankramer · 2019-03-11T10:58:12Z

Given that nothing happened in this pr since september last year, is this pr still active or can we close it?

niklas88 self-requested a review August 27, 2018 10:11

niklas88 reviewed Aug 27, 2018

View reviewed changes

niklas88 added this to Needs review in QLever Aug 31, 2018

joka921 requested changes Sep 16, 2018

View reviewed changes

src/index/Index.cpp Show resolved Hide resolved

src/index/Index.cpp Outdated Show resolved Hide resolved

jbuerklin and others added 10 commits September 22, 2018 14:37

- fixed IndexBuilderMain command line args

1f5445c

entity stats use createPermutationPair

7b27144

entity stats can be created without rebuilding the index

e2453bb

addEntityStats uses vocabMap.find instead of vocab.getId

0e2410b

renamed entityStats to addedPredicates, StatScan to AddedPredicateScan

059ea6d

Fixed JSON output for ql:entity-type

297bb8b

fixed failing computeAddedPredicatesTest

4087821

fixed JSON output for entity types

67cbace

Fixed the error with added predicates and text index

7b24faf

Added predicates parse the pso pair index instead of the input nt/tsv file

niklas88 closed this Mar 14, 2019

QLever automation moved this from Review to Done Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented Several internal relations needed for QLeverUI autocomplete #112

Implemented Several internal relations needed for QLeverUI autocomplete #112

jbuerklin commented Aug 25, 2018

niklas88 left a comment •

edited

niklas88 Aug 27, 2018

jbuerklin Sep 1, 2018

niklas88 Aug 27, 2018

niklas88 Aug 27, 2018

niklas88 Aug 27, 2018

niklas88 Aug 27, 2018

niklas88 Aug 27, 2018

jbuerklin Sep 1, 2018

niklas88 Aug 27, 2018

niklas88 Aug 27, 2018

niklas88 Aug 27, 2018

niklas88 Aug 27, 2018

niklas88 commented Aug 29, 2018

jbuerklin commented Sep 1, 2018

niklas88 commented Sep 5, 2018

joka921 left a comment

floriankramer commented Mar 11, 2019

Implemented Several internal relations needed for QLeverUI autocomplete #112

Implemented Several internal relations needed for QLeverUI autocomplete #112

Conversation

jbuerklin commented Aug 25, 2018

niklas88 left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niklas88 commented Aug 29, 2018

jbuerklin commented Sep 1, 2018

niklas88 commented Sep 5, 2018

joka921 left a comment

Choose a reason for hiding this comment

floriankramer commented Mar 11, 2019

niklas88 left a comment •

edited