Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the basics for detailed query processing time tracking. #187

Merged
merged 14 commits into from Feb 4, 2019

Conversation

floriankramer
Copy link
Member

I'm just creating a pr now to avoid duplicate work, and to get some early feedback on whether or not this is how we want our detailed query timing information to be structured (especially regarding caching).
The next step from my point of view would be to modify all computeResult methods of all operations to access their RuntimeInformation and fill in the missing data.
This implementation currently would not report information for the subtree of a cached result, but instead only show the root node of that cached result and have a flag set that indicates that that subtree was cached.

Copy link
Member

@niklas88 niklas88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this already looks great, I like the direction this is going, pretty much what I imagined.

private:
std::string _descriptor;
ad_utility::HashMap<std::string, std::string> _details;
double _time;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we use a std::chrono::duration here? I think that would be cleaner and aligns with the changes @joka921 made in #179. I'm not sure if you can cherry-pick his ad_utility::Timer changes as is though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest waiting until his pr is merged before we change this to keep the history of pr's clean. If we really want the Timer changes now I think creating a separate pr for that and then merging that first would be cleaner.

@@ -71,6 +150,9 @@ class Operation {
// only at innermost failure of a recursive call
throw ad_semsearch::AbortException("WEIRD_EXCEPTION");
}
timer.stop();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So am I seeing this right, when an exception is thrown the timer is not stopped?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When an exception is thrown the timer will be deleted (as it only exists in the local scope) and the runtimeInfo will not be updated (so the time will be 0). My reasoning was, that in the case that something went wrong in this subtree the runtimeInfo would not be interesting, as the timing of the error is normally not important. We could also change the time and then probably set a failure flag in the case of exceptions though.

out << "\"was_chached\" : " << _wasCached << ", ";
out << "\"details\" : {";
auto it = _details.begin();
while (it != _details.end()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a for() loop making the it iterator local to the block so we don't need the it2 name. Maybe that too could be a for loop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using a while loop right now to make the detection of when to print another comma easier. When using a for loop it would have to be of the form for (auto it = begin(); it != end; ) {... ; ++it ...} or the iterator would need to be increased by one whenever we check if we are more than one before the end (e.g. if (it + 1 != end() { ... }).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mind if (it + 1 != end()) {} but I understand your reasoning

Copy link
Member

@niklas88 niklas88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can keep the const qualifiers if we have computeResult() return a std::pair<ResultTable, RuntimeInformation> .

const CompactStringVector<size_t, Id>& patterns,
const size_t subjectColumn) {
const CompactStringVector<size_t, Id>& patterns, const size_t subjectColumn,
RuntimeInformation& runtimeInfo) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the Google style guide actually forbids non-const references unless an API needs them, however we haven't really enforced that and they are scattered around the code base.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the parameter to be a pointer instead.

@floriankramer
Copy link
Member Author

If we want to have computeResult const we could not have RuntimeInformation as a member, unless we want to return a RuntimeInformation object from the computeResult method and then copy it over to the member _runtimeInfo as that is a member of Operation itself. Otherwise if computeResult was constant we'd have to pass the RuntimeInformation in as a parameter, but that parameter would be a member of the object (the _runtimeInfo member) which would then make the call not constant (I think).

@floriankramer
Copy link
Member Author

The current version should be feature complete now. I've implemented the RuntimeInformation for all oeprations and also made it part of the cache (That way it is not a member of the ResultTable where it would only be meaningful if the ResultTable was part of the cache) so that cached results still have the information of their original oeprations and excution time. Beyond your comments so far, are there any more features we want for this pr?

@niklas88
Copy link
Member

niklas88 commented Jan 30, 2019

@floriankramer I don't think we need more features for this PR, I'll have a look in a minute.
That said, one thing I'm taking another look at is the info from the Pattern Trick, in the last meeting we had some doubt whether the conceptional costs are correct at the moment. Also great idea adding them to the RuntimeInformation. I think this will be a really really useful tool.

Copy link
Member

@niklas88 niklas88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work as always, thank you! Postponing the merge until I've got this tested on some of our instances.

@niklas88
Copy link
Member

niklas88 commented Jan 30, 2019

@floriankramer it's running on wikidata-full and this is pretty darn awesome. One thing I noticed is that maybe we should output two times just for convenience (it could be computed from the current output).
a) total_time (currently printed) and b) operation_time which is the total_time minus the sum of the times of the children. This would allow us to immediately see what time the operation itself took. What do you think?

@niklas88
Copy link
Member

niklas88 commented Jan 30, 2019

@floriankramer I've added a commit for the operation_time and it is running on wikidata-full, however something seems wrong. For example in the Mountains > 8000 m query from our UI I get a negative operation_time. Looks okay on other queries though.

@floriankramer
Copy link
Member Author

The numbers can be negative for cached results, as the time of the cached result itself is unly the time of the cache lookup, while the time of the subresults is still their old computation time. I've pushed a fix that ignores the childrens time when computing the operation_time for a cached RuntimeInfo

@niklas88
Copy link
Member

@floriankramer nice finde, however I think that's not it for the following query:

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX psn: <http://www.wikidata.org/prop/statement/value-normalized/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?subj ?label ?coord ?elev WHERE
{ 
  ?subj wdt:P31 wd:Q8502 .
  ?subj p:P2044 ?elev_s .
  ?elev_s psn:P2044 ?elev_v .
  ?elev_v wikibase:quantityAmount ?elev .
  ?subj wdt:P625 ?coord .
  ?subj rdfs:label ?label .

  FILTER(?elev > 8000.0) .
  FILTER langMatches(lang(?label), "en") .  
}
ORDER BY DESC(?elev)

I'm getting the following, note the first join

OrderBy on DESC(?elev)
total_time: 0.446333s
operation_time: 0.000164986s
cached: false

  Join on ?subj
  total_time: 0.446168s
  operation_time: -0.001973s
  cached: false

    Join on ?subj
    total_time: 0.445591s
    operation_time: 0.00822499s
    cached: false

      Join on ?subj
      total_time: 0.385394s
      operation_time: 0.000215009s
      cached: false

        IndexScan ?subj @en@<http://www.w3.org/2000/01/rdf-schema#label> ?label
        total_time: 0.307385s
        operation_time: 0.307385s
        cached: false

        Sort on ASC(?subj)
        total_time: 0.077794s
        operation_time: 0.000105999s
        cached: false

          Join on ?elev_s
          total_time: 0.077688s
          operation_time: 0.000151003s
          cached: false

            IndexScan ?subj <http://www.wikidata.org/prop/P2044> ?elev_s
            total_time: 0.011821s
            operation_time: 0.011821s
            cached: false

            Sort on ASC(?elev_s)
            total_time: 0.065716s
            operation_time: 9.60007e-05s
            cached: false

              FILTER ?elev > 8000.0
              total_time: 0.06562s
              operation_time: 0.00285999s
              cached: false

                Join on ?elev_v
                total_time: 0.06276s
                operation_time: 0.044705s
                cached: false

                  IndexScan ?elev_s <http://www.wikidata.org/prop/statement/value-normalized/P2044> ?elev_v
                  total_time: 0.011871s
                  operation_time: 0.011871s
                  cached: false

                  IndexScan ?elev_v <http://wikiba.se/ontology#quantityAmount> ?elev
                  total_time: 0.006184s
                  operation_time: 0.006184s
                  cached: false

      IndexScan ?subj <http://www.wikidata.org/prop/direct/P625> ?coord
      total_time: 0.051972s
      operation_time: 0.051972s
      cached: false

    IndexScan ?subj <http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q8502>
    total_time: 0.00255s
    operation_time: 0.00255s
    cached: false

@floriankramer
Copy link
Member Author

I'll have to look into that tomorrow. If I had to guess though I would say it is connected with the early execution of IndexScans that only return one column. That might mess with the cached flag and leaed to the time being that of an index computation while the parent operation (the join) actually did a cache read which would reduce the parent operations overall time.

@floriankramer
Copy link
Member Author

The latest commmit shoud fix the problem with negative operation times. The problem was that the IndexScan was computed early, as it only returns a single column. The time reported in the RuntimeInformation of that index scan was then that of the first execution, which did an actual scan, while the time measured by the join was that of a cache access. The latest commit modified the way IndexScans handle their RuntimeInformation, making them store the initial runtime as a detail and then reset their time, which should lead to the cached time being used.

@niklas88
Copy link
Member

niklas88 commented Jan 31, 2019

@floriankramer this is running on wikidata-full now, the times seem correct now. However it looks like the "InitialScanTime" isn't accessible in the cached operation or isn't printed. This is a minor detail though. I think the problem is that with the IndexScan the RuntimeInformation might not be stored in the Cache

By the way, we shouldn't mess with the server after 16:00 because Hannah presents it in Dagstuhl.

The performance digest is much nicer to read and provides better
information. Alternatively we could also only do this for "Using Key:"
@niklas88
Copy link
Member

niklas88 commented Feb 1, 2019

Ok I found another issue, the strings aren't escaped in RuntimeInformation::toJson(). I've fixed that.

// runtime information time to ensure the resulting runtime
// information trees operation runtimes are actually correct.
if (firstRun) {
runtimeInfo.addDetail("IntialScanTime",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@floriankramer ok this doesn't work as is. The problem is that in getResult() the newResult->_runtimeInfo is set via copying and only after that this detail is added. Then later during the actual query we retrieve the old copy which doesn't have the detail in it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should only do the runtimeInfo.setTime(0) below and then when taking a RuntimeInformation from the cache always save the original times in the details?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea.

for all Operations not just for IndexScans
@niklas88 niklas88 merged commit d8bf511 into ad-freiburg:master Feb 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants