New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of a More Flexible Cache #318
Conversation
@joka921: thanks a lot :-) I have five questions:
|
|
I finally did the evaluation compared to the current master, you can see the results here: http://qlever.informatik.uni-freiburg.de/evaluation?dir=output.2020-03-19 There is practically no difference, great :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great in general, only found some small nitpicks
CMakeLists.txt
Outdated
@@ -82,6 +82,10 @@ if (USE_PARALLEL) | |||
add_definitions("-D_PARALLEL_SORT") | |||
endif() | |||
|
|||
if (USE_TREE_BASED_CACHED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That D
at the end looks like a typo
src/util/Cache.h
Outdated
// either to the just created element or an existing element. | ||
// | ||
// This needs to happen in a single operation because otherwise we may find | ||
// an item in the cash now but it may already be deleted when trying to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An old one but s/cash/cache/
src/util/Cache.h
Outdated
// | ||
// This needs to happen in a single operation because otherwise we may find | ||
// an item in the cash now but it may already be deleted when trying to | ||
// retrieve it in another operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double space
src/util/Cache.h
Outdated
} | ||
|
||
// tryEmplacePinned acts like tryEmplace but if the pin paramater is true | ||
// also marks the emplaced element so that it can only be removed explicitly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was actually my mistake in the original code but let's fix it now, there is no "pinned parameter" that was the original idea but varargs made it impossible
src/util/Cache.h
Outdated
// TODO(schnelle): It would be nice to use std::shared_mutex to only exclude | ||
// multiple writers. | ||
// Sadly there is currently no easy way to upgrade a shared_lock to an | ||
// exclusive lock. TUtfhus using a shared_lock here would conflict with moving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also mine s/TUtfhus/Also/
* | ||
* TreeBasedPQ is based on a balanced tree (std::multiset). It has stronger | ||
* asymptotic guarantees, but the constant factor of the log(n) operations might | ||
* be bigger because of the more complex tree structure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really curious what the real world performance of this will be. My gut feeling is that with todays CPUs the bad caching behaviour of (non-B) Trees will make this fare significantly worse than the HeapBasedPQ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be true, but it might also be, that we only access relatively expensive cache Entries (rather long std::vectors) so the actual access time to the cache does not really matter.
src/util/PriorityQueue.h
Outdated
InternalScore mScore; | ||
Value mValue; | ||
// keep Track of whether this Node is currently a part of the PQ or not | ||
// (because of pop or erase operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unclosed )
ad_utility::Timer timer; | ||
timer.start(); | ||
auto& cache = _executionContext->getQueryTreeCache(); | ||
const string cacheKey = asString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function was mostly only moved from the .h to the .cpp file because of cyclic dependencies.
New Code starts at this marker.
} | ||
}); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New Code ends here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I think since there are quite a few fixup commits it would make a lot of sense
to do a git rebase --interactive master
and squash at leas these fixups.
Ideally we would have only building commits but we haven't done that in the past
so…
- One uses a std::priority_queue (binary heap) and inserts duplicates on `updateKey` (no strict `log n` guarantee on many updateKeys, but better caching behavior). - The other one uses a balanced tree (std::multiset) with additional indirection, but asymptotice `log n` guarantee. - Both need multiple indirections via shared_ptrs.
df85ad9
to
0dadc29
Compare
- The two different Priority Queues are used under the hood. - Each element is associated with a score, if the capacity is exceeded, the element with the lowest score is deleted. - The initial computation + the updating of the key on element access are freely configurable via template parameters. - The (previously hard-coded) LRU cache is now implemented as a partial specialization of the general cache. - The cache also allows the retrieval of statistics like the number of cached/pinned elements and their total size.
- previously we could only pin all subtrees of a query computation in the cache. - We also need to pin the sizes of all IndexScan descendants of a pinned result, since they are needed by the QueryPlanner when trying to reuse a pinned result "for free"
- Same idea as facebook's folly::synchronized - Combines a type (typically a data structure) and a mutex. - Operations on the type are not possible without locking the mutex. - If an operation to be performed on the data treats it as const and if the mutex supports shared locking, it is able to explicitly request a shared lock for efficiency. - This is a comfortable way to avoid race conditions on shared data structures. - Use this type to make the stored IndexScan sizes of the pinned cache threadsafe in an efficient manner (they are only read in hot paths, thus a shared lock suffices)
0dadc29
to
1c328e5
Compare
Generalizing The thread-safe cache used in QLever to be able to use other strategies than LRU.
Contains:
Two different implementations of a priority queue with update Key (with different tradeoffs, need to measure the performance of both, can be used as dropin replacement for each other.
Generalization of the Cache class to use a priorityQueue to decide which values to pop when the capacity is exceeded. Uses a separate Score type and two customizable functions for initially computing a score and for updating the score on access of an element.
The LRU cache is then implemented as an instantiation of this class using the timestamp of the last access as the Score.
Exhaustive documentation and unit tests for the priority queues.
The LRU Cache unit tests still all pass.
TODO: @hannahbast Can you run the Evaluation using this PR on the existing Wikidata-Full index, to see if we measure any impact in a real world scenario?
Another pass can be made by additionally specifying
-DUSE_TREE_BASED_CACHE
tocmake
.This uses the alternative implementation of the Priority Queue.
@niklas88 Could you have a look at the code, it turned out more difficult than I thought to find a consistent interface.