Added the basics for detailed query processing time tracking. #187

floriankramer · 2019-01-30T11:28:46Z

I'm just creating a pr now to avoid duplicate work, and to get some early feedback on whether or not this is how we want our detailed query timing information to be structured (especially regarding caching).
The next step from my point of view would be to modify all computeResult methods of all operations to access their RuntimeInformation and fill in the missing data.
This implementation currently would not report information for the subtree of a cached result, but instead only show the root node of that cached result and have a flag set that indicates that that subtree was cached.

niklas88

Thanks, this already looks great, I like the direction this is going, pretty much what I imagined.

niklas88 · 2019-01-30T11:50:08Z

src/engine/Operation.h

+   private:
+    std::string _descriptor;
+    ad_utility::HashMap<std::string, std::string> _details;
+    double _time;


Couldn't we use a std::chrono::duration here? I think that would be cleaner and aligns with the changes @joka921 made in #179. I'm not sure if you can cherry-pick his ad_utility::Timer changes as is though.

I'd suggest waiting until his pr is merged before we change this to keep the history of pr's clean. If we really want the Timer changes now I think creating a separate pr for that and then merging that first would be cleaner.

niklas88 · 2019-01-30T13:08:55Z

src/engine/Operation.h

@@ -71,6 +150,9 @@ class Operation {
        // only at innermost failure of a recursive call
        throw ad_semsearch::AbortException("WEIRD_EXCEPTION");
      }
+      timer.stop();


So am I seeing this right, when an exception is thrown the timer is not stopped?

When an exception is thrown the timer will be deleted (as it only exists in the local scope) and the runtimeInfo will not be updated (so the time will be 0). My reasoning was, that in the case that something went wrong in this subtree the runtimeInfo would not be interesting, as the timing of the error is normally not important. We could also change the time and then probably set a failure flag in the case of exceptions though.

niklas88 · 2019-01-30T13:16:16Z

src/engine/Operation.h

+      out << "\"was_chached\" : " << _wasCached << ", ";
+      out << "\"details\" : {";
+      auto it = _details.begin();
+      while (it != _details.end()) {


This could be a for() loop making the it iterator local to the block so we don't need the it2 name. Maybe that too could be a for loop

I'm using a while loop right now to make the detection of when to print another comma easier. When using a for loop it would have to be of the form for (auto it = begin(); it != end; ) {... ; ++it ...} or the iterator would need to be increased by one whenever we check if we are more than one before the end (e.g. if (it + 1 != end() { ... }).

I wouldn't mind if (it + 1 != end()) {} but I understand your reasoning

niklas88

I wonder if we can keep the const qualifiers if we have computeResult() return a std::pair<ResultTable, RuntimeInformation> .

niklas88 · 2019-01-30T13:19:10Z

src/engine/CountAvailablePredicates.cpp

-    const CompactStringVector<size_t, Id>& patterns,
-    const size_t subjectColumn) {
+    const CompactStringVector<size_t, Id>& patterns, const size_t subjectColumn,
+    RuntimeInformation& runtimeInfo) {


Hmm, the Google style guide actually forbids non-const references unless an API needs them, however we haven't really enforced that and they are scattered around the code base.

I've changed the parameter to be a pointer instead.

floriankramer · 2019-01-30T14:42:48Z

If we want to have computeResult const we could not have RuntimeInformation as a member, unless we want to return a RuntimeInformation object from the computeResult method and then copy it over to the member _runtimeInfo as that is a member of Operation itself. Otherwise if computeResult was constant we'd have to pass the RuntimeInformation in as a parameter, but that parameter would be a member of the object (the _runtimeInfo member) which would then make the call not constant (I think).

floriankramer · 2019-01-30T14:46:46Z

The current version should be feature complete now. I've implemented the RuntimeInformation for all oeprations and also made it part of the cache (That way it is not a member of the ResultTable where it would only be meaningful if the ResultTable was part of the cache) so that cached results still have the information of their original oeprations and excution time. Beyond your comments so far, are there any more features we want for this pr?

niklas88 · 2019-01-30T15:16:24Z

@floriankramer I don't think we need more features for this PR, I'll have a look in a minute.
That said, one thing I'm taking another look at is the info from the Pattern Trick, in the last meeting we had some doubt whether the conceptional costs are correct at the moment. Also great idea adding them to the RuntimeInformation. I think this will be a really really useful tool.

niklas88

Great work as always, thank you! Postponing the merge until I've got this tested on some of our instances.

… into performance_digests

niklas88 · 2019-01-30T16:12:26Z

@floriankramer it's running on wikidata-full and this is pretty darn awesome. One thing I noticed is that maybe we should output two times just for convenience (it could be computed from the current output).
a) total_time (currently printed) and b) operation_time which is the total_time minus the sum of the times of the children. This would allow us to immediately see what time the operation itself took. What do you think?

niklas88 · 2019-01-30T16:46:17Z

@floriankramer I've added a commit for the operation_time and it is running on wikidata-full, however something seems wrong. For example in the Mountains > 8000 m query from our UI I get a negative operation_time. Looks okay on other queries though.

floriankramer · 2019-01-30T17:00:55Z

The numbers can be negative for cached results, as the time of the cached result itself is unly the time of the cache lookup, while the time of the subresults is still their old computation time. I've pushed a fix that ignores the childrens time when computing the operation_time for a cached RuntimeInfo

niklas88 · 2019-01-30T17:17:22Z

@floriankramer nice finde, however I think that's not it for the following query:

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX psn: <http://www.wikidata.org/prop/statement/value-normalized/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?subj ?label ?coord ?elev WHERE
{ 
  ?subj wdt:P31 wd:Q8502 .
  ?subj p:P2044 ?elev_s .
  ?elev_s psn:P2044 ?elev_v .
  ?elev_v wikibase:quantityAmount ?elev .
  ?subj wdt:P625 ?coord .
  ?subj rdfs:label ?label .

  FILTER(?elev > 8000.0) .
  FILTER langMatches(lang(?label), "en") .  
}
ORDER BY DESC(?elev)

I'm getting the following, note the first join

OrderBy on DESC(?elev)
total_time: 0.446333s
operation_time: 0.000164986s
cached: false

  Join on ?subj
  total_time: 0.446168s
  operation_time: -0.001973s
  cached: false

    Join on ?subj
    total_time: 0.445591s
    operation_time: 0.00822499s
    cached: false

      Join on ?subj
      total_time: 0.385394s
      operation_time: 0.000215009s
      cached: false

        IndexScan ?subj @en@<http://www.w3.org/2000/01/rdf-schema#label> ?label
        total_time: 0.307385s
        operation_time: 0.307385s
        cached: false

        Sort on ASC(?subj)
        total_time: 0.077794s
        operation_time: 0.000105999s
        cached: false

          Join on ?elev_s
          total_time: 0.077688s
          operation_time: 0.000151003s
          cached: false

            IndexScan ?subj <http://www.wikidata.org/prop/P2044> ?elev_s
            total_time: 0.011821s
            operation_time: 0.011821s
            cached: false

            Sort on ASC(?elev_s)
            total_time: 0.065716s
            operation_time: 9.60007e-05s
            cached: false

              FILTER ?elev > 8000.0
              total_time: 0.06562s
              operation_time: 0.00285999s
              cached: false

                Join on ?elev_v
                total_time: 0.06276s
                operation_time: 0.044705s
                cached: false

                  IndexScan ?elev_s <http://www.wikidata.org/prop/statement/value-normalized/P2044> ?elev_v
                  total_time: 0.011871s
                  operation_time: 0.011871s
                  cached: false

                  IndexScan ?elev_v <http://wikiba.se/ontology#quantityAmount> ?elev
                  total_time: 0.006184s
                  operation_time: 0.006184s
                  cached: false

      IndexScan ?subj <http://www.wikidata.org/prop/direct/P625> ?coord
      total_time: 0.051972s
      operation_time: 0.051972s
      cached: false

    IndexScan ?subj <http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q8502>
    total_time: 0.00255s
    operation_time: 0.00255s
    cached: false

floriankramer · 2019-01-30T19:20:34Z

I'll have to look into that tomorrow. If I had to guess though I would say it is connected with the early execution of IndexScans that only return one column. That might mess with the cached flag and leaed to the time being that of an index computation while the parent operation (the join) actually did a cache read which would reduce the parent operations overall time.

floriankramer · 2019-01-31T12:08:50Z

The latest commmit shoud fix the problem with negative operation times. The problem was that the IndexScan was computed early, as it only returns a single column. The time reported in the RuntimeInformation of that index scan was then that of the first execution, which did an actual scan, while the time measured by the join was that of a cache access. The latest commit modified the way IndexScans handle their RuntimeInformation, making them store the initial runtime as a detail and then reset their time, which should lead to the cached time being used.

niklas88 · 2019-01-31T13:20:34Z

@floriankramer this is running on wikidata-full now, the times seem correct now. However it looks like the "InitialScanTime" isn't accessible in the cached operation or isn't printed. This is a minor detail though. I think the problem is that with the IndexScan the RuntimeInformation might not be stored in the Cache

By the way, we shouldn't mess with the server after 16:00 because Hannah presents it in Dagstuhl.

The performance digest is much nicer to read and provides better information. Alternatively we could also only do this for "Using Key:"

niklas88 · 2019-02-01T13:44:32Z

Ok I found another issue, the strings aren't escaped in RuntimeInformation::toJson(). I've fixed that.

niklas88 · 2019-02-01T15:44:27Z

src/engine/IndexScan.cpp

+      // runtime information time to ensure the resulting runtime
+      // information trees operation runtimes are actually correct.
+      if (firstRun) {
+        runtimeInfo.addDetail("IntialScanTime",


@floriankramer ok this doesn't work as is. The problem is that in getResult() the newResult->_runtimeInfo is set via copying and only after that this detail is added. Then later during the actual query we retrieve the old copy which doesn't have the detail in it.

Maybe we should only do the runtimeInfo.setTime(0) below and then when taking a RuntimeInformation from the cache always save the original times in the details?

That's a good idea.

for all Operations not just for IndexScans

Added the basics for detailed query processing time tracking.

a17b808

floriankramer requested a review from niklas88 January 30, 2019 11:28

floriankramer added 2 commits January 30, 2019 13:31

Added RuntimeInfo to the more common operations.

c86b7e9

Added RuntimeInfo to the rest of the operations.

b575d75

niklas88 reviewed Jan 30, 2019

View reviewed changes

Added RuntimeInfo to the cache. Fixed several minor RuntimeInfo bugs.

2005498

floriankramer force-pushed the performance_digests branch from dee210f to 2005498 Compare January 30, 2019 14:43

niklas88 approved these changes Jan 30, 2019

View reviewed changes

Merge branch 'performance_digests' of github.com:floriankramer/QLever…

205d842

… into performance_digests

Also compute time only used for the operation

1cd4cf0

Fixed the operation runtime for cached RuntimeInfo objects.

0d4f017

Add blank lines between digests

d2dcd88

Use ":" in details instead of ","

500adc5

niklas88 mentioned this pull request Jan 31, 2019

Improve Pattern Trick Stats #188

Merged

Fixed the reported times for IndexScans that return a single column.

cc1b09d

niklas88 added 2 commits February 1, 2019 14:04

Make all Operation::asString() printing LOG(TRACE)

5172581

The performance digest is much nicer to read and provides better information. Alternatively we could also only do this for "Using Key:"

Make cache acces LOG(TRACE) as well

8f5914f

Escape strings in RuntimeInformation::toJson()

72d0da1

niklas88 reviewed Feb 1, 2019

View reviewed changes

Add initital time of cached Operations as detail

add49be

for all Operations not just for IndexScans

niklas88 merged commit d8bf511 into ad-freiburg:master Feb 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added the basics for detailed query processing time tracking. #187

Added the basics for detailed query processing time tracking. #187

floriankramer commented Jan 30, 2019

niklas88 left a comment

niklas88 Jan 30, 2019

floriankramer Jan 30, 2019

niklas88 Jan 30, 2019

floriankramer Jan 30, 2019

niklas88 Jan 30, 2019

floriankramer Jan 30, 2019

niklas88 Jan 30, 2019

niklas88 left a comment •

edited

niklas88 Jan 30, 2019

floriankramer Jan 30, 2019

floriankramer commented Jan 30, 2019

floriankramer commented Jan 30, 2019

niklas88 commented Jan 30, 2019 •

edited

niklas88 left a comment

niklas88 commented Jan 30, 2019 •

edited

niklas88 commented Jan 30, 2019 •

edited

floriankramer commented Jan 30, 2019

niklas88 commented Jan 30, 2019

floriankramer commented Jan 30, 2019

floriankramer commented Jan 31, 2019

niklas88 commented Jan 31, 2019 •

edited

niklas88 commented Feb 1, 2019 •

edited

niklas88 Feb 1, 2019

niklas88 Feb 1, 2019

floriankramer Feb 3, 2019

Added the basics for detailed query processing time tracking. #187

Added the basics for detailed query processing time tracking. #187

Conversation

floriankramer commented Jan 30, 2019

niklas88 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niklas88 left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

floriankramer commented Jan 30, 2019

floriankramer commented Jan 30, 2019

niklas88 commented Jan 30, 2019 • edited

niklas88 left a comment

Choose a reason for hiding this comment

niklas88 commented Jan 30, 2019 • edited

niklas88 commented Jan 30, 2019 • edited

floriankramer commented Jan 30, 2019

niklas88 commented Jan 30, 2019

floriankramer commented Jan 30, 2019

floriankramer commented Jan 31, 2019

niklas88 commented Jan 31, 2019 • edited

niklas88 commented Feb 1, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niklas88 left a comment •

edited

niklas88 commented Jan 30, 2019 •

edited

niklas88 commented Jan 30, 2019 •

edited

niklas88 commented Jan 30, 2019 •

edited

niklas88 commented Jan 31, 2019 •

edited

niklas88 commented Feb 1, 2019 •

edited