New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added the basics for detailed query processing time tracking. #187
Added the basics for detailed query processing time tracking. #187
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this already looks great, I like the direction this is going, pretty much what I imagined.
src/engine/Operation.h
Outdated
private: | ||
std::string _descriptor; | ||
ad_utility::HashMap<std::string, std::string> _details; | ||
double _time; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest waiting until his pr is merged before we change this to keep the history of pr's clean. If we really want the Timer changes now I think creating a separate pr for that and then merging that first would be cleaner.
@@ -71,6 +150,9 @@ class Operation { | |||
// only at innermost failure of a recursive call | |||
throw ad_semsearch::AbortException("WEIRD_EXCEPTION"); | |||
} | |||
timer.stop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So am I seeing this right, when an exception is thrown the timer is not stopped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When an exception is thrown the timer will be deleted (as it only exists in the local scope) and the runtimeInfo
will not be updated (so the time will be 0). My reasoning was, that in the case that something went wrong in this subtree the runtimeInfo
would not be interesting, as the timing of the error is normally not important. We could also change the time and then probably set a failure flag in the case of exceptions though.
src/engine/Operation.h
Outdated
out << "\"was_chached\" : " << _wasCached << ", "; | ||
out << "\"details\" : {"; | ||
auto it = _details.begin(); | ||
while (it != _details.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a for()
loop making the it
iterator local to the block so we don't need the it2
name. Maybe that too could be a for
loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using a while loop right now to make the detection of when to print another comma easier. When using a for loop it would have to be of the form for (auto it = begin(); it != end; ) {... ; ++it ...}
or the iterator would need to be increased by one whenever we check if we are more than one before the end (e.g. if (it + 1 != end() { ... }
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't mind if (it + 1 != end()) {}
but I understand your reasoning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can keep the const
qualifiers if we have computeResult()
return a std::pair<ResultTable, RuntimeInformation>
.
const CompactStringVector<size_t, Id>& patterns, | ||
const size_t subjectColumn) { | ||
const CompactStringVector<size_t, Id>& patterns, const size_t subjectColumn, | ||
RuntimeInformation& runtimeInfo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the Google style guide actually forbids non-const references unless an API needs them, however we haven't really enforced that and they are scattered around the code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the parameter to be a pointer instead.
If we want to have |
dee210f
to
2005498
Compare
The current version should be feature complete now. I've implemented the |
@floriankramer I don't think we need more features for this PR, I'll have a look in a minute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work as always, thank you! Postponing the merge until I've got this tested on some of our instances.
… into performance_digests
@floriankramer it's running on wikidata-full and this is pretty darn awesome. One thing I noticed is that maybe we should output two times just for convenience (it could be computed from the current output). |
@floriankramer I've added a commit for the |
The numbers can be negative for cached results, as the time of the cached result itself is unly the time of the cache lookup, while the time of the subresults is still their old computation time. I've pushed a fix that ignores the childrens time when computing the |
@floriankramer nice finde, however I think that's not it for the following query:
I'm getting the following, note the first join
|
I'll have to look into that tomorrow. If I had to guess though I would say it is connected with the early execution of IndexScans that only return one column. That might mess with the cached flag and leaed to the time being that of an index computation while the parent operation (the join) actually did a cache read which would reduce the parent operations overall time. |
The latest commmit shoud fix the problem with negative operation times. The problem was that the IndexScan was computed early, as it only returns a single column. The time reported in the RuntimeInformation of that index scan was then that of the first execution, which did an actual scan, while the time measured by the join was that of a cache access. The latest commit modified the way IndexScans handle their RuntimeInformation, making them store the initial runtime as a detail and then reset their time, which should lead to the cached time being used. |
@floriankramer this is running on wikidata-full now, the times seem correct now. However it looks like the "InitialScanTime" isn't accessible in the cached operation or isn't printed. This is a minor detail though. I think the problem is that with the By the way, we shouldn't mess with the server after 16:00 because Hannah presents it in Dagstuhl. |
The performance digest is much nicer to read and provides better information. Alternatively we could also only do this for "Using Key:"
Ok I found another issue, the strings aren't escaped in |
src/engine/IndexScan.cpp
Outdated
// runtime information time to ensure the resulting runtime | ||
// information trees operation runtimes are actually correct. | ||
if (firstRun) { | ||
runtimeInfo.addDetail("IntialScanTime", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@floriankramer ok this doesn't work as is. The problem is that in getResult()
the newResult->_runtimeInfo
is set via copying and only after that this detail is added. Then later during the actual query we retrieve the old copy which doesn't have the detail in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should only do the runtimeInfo.setTime(0)
below and then when taking a RuntimeInformation
from the cache always save the original times in the details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea.
for all Operations not just for IndexScans
I'm just creating a pr now to avoid duplicate work, and to get some early feedback on whether or not this is how we want our detailed query timing information to be structured (especially regarding caching).
The next step from my point of view would be to modify all computeResult methods of all operations to access their
RuntimeInformation
and fill in the missing data.This implementation currently would not report information for the subtree of a cached result, but instead only show the root node of that cached result and have a flag set that indicates that that subtree was cached.