Implementation of a More Flexible Cache #318

joka921 · 2020-03-07T14:24:19Z

Generalizing The thread-safe cache used in QLever to be able to use other strategies than LRU.
Contains:

Two different implementations of a priority queue with update Key (with different tradeoffs, need to measure the performance of both, can be used as dropin replacement for each other.
Generalization of the Cache class to use a priorityQueue to decide which values to pop when the capacity is exceeded. Uses a separate Score type and two customizable functions for initially computing a score and for updating the score on access of an element.
The LRU cache is then implemented as an instantiation of this class using the timestamp of the last access as the Score.
Exhaustive documentation and unit tests for the priority queues.
The LRU Cache unit tests still all pass.

TODO: @hannahbast Can you run the Evaluation using this PR on the existing Wikidata-Full index, to see if we measure any impact in a real world scenario?
Another pass can be made by additionally specifying -DUSE_TREE_BASED_CACHE to cmake.
This uses the alternative implementation of the Priority Queue.

@niklas88 Could you have a look at the code, it turned out more difficult than I thought to find a consistent interface.

hannahbast · 2020-03-07T17:55:54Z

@joka921: thanks a lot :-) I have five questions:

Which of the currently five open PRs should I (or do I have to) merge to run the evaluation? The most recent index was built with the pipelined index build and the faster turtle parser.
Can you briefly characterize the two different priority queues, so that I can give my docker images proper names?
Does caching (and pinning) of subresults change with this PR? In the current master, all intermediate results of a query are cached (and pinned, if the final result is pinned).
Is the issue of caching a result in memory vs. on disk an issue in this PR?
You mention the new interface. Can you very briefly explain how the cache is used now in comparison to how it was used in the current master?

joka921 · 2020-03-08T12:02:49Z

@hannahbast

It should be enought to only use this PR. The unicode stuff is already merged and the other PRs
do not change the index structure.
The "default" PQ is internally called HeapBased. It uses a binary heap(std::priority_queue) + insertion of a duplicate on updateKey which might be bad for performance, if we perform a lot of change Key operations compared to insert/pop. The other one which you activate with the cmake flag is TreeBased. It uses a balanced tree (std::multiset) which has log n for all operations (unless there are more than a constant number of entries with exactly the same score) but possibly a higher constant overhead because of the additional level of indirection.
No, that is a separate issue that will be adressed separately.
Same answer as 3.
Maybe I expressed myself not clear enough. The interface and behavior of the LRUCache does not change with this PR, thus QLever should behave exactly the same. The difference is now that the LRUCache is implemented as a instantiation of a more FlexibleCache that can also support other strategies than LRU. It is only a building block to later easily experiment with more advanced caching strategies.

hannahbast · 2020-03-19T20:54:21Z

I finally did the evaluation compared to the current master, you can see the results here:

http://qlever.informatik.uni-freiburg.de/evaluation?dir=output.2020-03-19

There is practically no difference, great :-)

niklas88

Looks great in general, only found some small nitpicks

niklas88 · 2020-03-29T15:46:54Z

CMakeLists.txt

@@ -82,6 +82,10 @@ if (USE_PARALLEL)
    add_definitions("-D_PARALLEL_SORT")
 endif()

+if (USE_TREE_BASED_CACHED)


That D at the end looks like a typo

niklas88 · 2020-03-29T16:05:32Z

src/util/Cache.h

+  // either to the just created element or an existing element.
+  //
+  // This needs to happen in a single operation because otherwise we may find
+  // an item in the cash now but it may already be deleted when trying to


An old one but s/cash/cache/

niklas88 · 2020-03-29T16:05:52Z

src/util/Cache.h

+  //
+  // This needs to happen in a single operation because otherwise we may find
+  // an item in the cash now but it may already be deleted when trying to
+  // retrieve  it in another operation.


double space

niklas88 · 2020-03-29T16:13:45Z

src/util/Cache.h

+  }
+
+  // tryEmplacePinned acts like tryEmplace but if the pin paramater is true
+  // also marks the emplaced element so that it can only be removed explicitly


This was actually my mistake in the original code but let's fix it now, there is no "pinned parameter" that was the original idea but varargs made it impossible

niklas88 · 2020-03-29T16:16:27Z

src/util/Cache.h

+  // TODO(schnelle): It would be nice to use std::shared_mutex to only exclude
+  // multiple writers.
+  // Sadly there is currently no easy way to upgrade a shared_lock to an
+  // exclusive lock. TUtfhus using a shared_lock here would conflict with moving


also mine s/TUtfhus/Also/

niklas88 · 2020-03-29T16:19:15Z

src/util/PriorityQueue.h

+ *
+ * TreeBasedPQ is based on a balanced tree (std::multiset). It has stronger
+ * asymptotic guarantees, but the constant factor of the log(n) operations might
+ * be bigger because of the more complex tree structure


I'm really curious what the real world performance of this will be. My gut feeling is that with todays CPUs the bad caching behaviour of (non-B) Trees will make this fare significantly worse than the HeapBasedPQ

That might be true, but it might also be, that we only access relatively expensive cache Entries (rather long std::vectors) so the actual access time to the cache does not really matter.

niklas88 · 2020-03-29T16:25:20Z

src/util/PriorityQueue.h

+    InternalScore mScore;
+    Value mValue;
+    // keep Track of whether this Node is currently a part of the PQ or not
+    // (because of pop or erase  operations


unclosed )

joka921 · 2020-04-12T18:59:30Z

src/engine/Operation.cpp

+  ad_utility::Timer timer;
+  timer.start();
+  auto& cache = _executionContext->getQueryTreeCache();
+  const string cacheKey = asString();


This function was mostly only moved from the .h to the .cpp file because of cyclic dependencies.
New Code starts at this marker.

joka921 · 2020-04-12T19:00:02Z

src/engine/Operation.cpp

+      }
+    });
+  }
+


New Code ends here.

niklas88

LGTM

I think since there are quite a few fixup commits it would make a lot of sense
to do a git rebase --interactive master and squash at leas these fixups.

Ideally we would have only building commits but we haven't done that in the past
so…

- One uses a std::priority_queue (binary heap) and inserts duplicates on `updateKey` (no strict `log n` guarantee on many updateKeys, but better caching behavior). - The other one uses a balanced tree (std::multiset) with additional indirection, but asymptotice `log n` guarantee. - Both need multiple indirections via shared_ptrs.

- The two different Priority Queues are used under the hood. - Each element is associated with a score, if the capacity is exceeded, the element with the lowest score is deleted. - The initial computation + the updating of the key on element access are freely configurable via template parameters. - The (previously hard-coded) LRU cache is now implemented as a partial specialization of the general cache. - The cache also allows the retrieval of statistics like the number of cached/pinned elements and their total size.

- previously we could only pin all subtrees of a query computation in the cache. - We also need to pin the sizes of all IndexScan descendants of a pinned result, since they are needed by the QueryPlanner when trying to reuse a pinned result "for free"

- Same idea as facebook's folly::synchronized - Combines a type (typically a data structure) and a mutex. - Operations on the type are not possible without locking the mutex. - If an operation to be performed on the data treats it as const and if the mutex supports shared locking, it is able to explicitly request a shared lock for efficiency. - This is a comfortable way to avoid race conditions on shared data structures. - Use this type to make the stored IndexScan sizes of the pinned cache threadsafe in an efficient manner (they are only read in hot paths, thus a shared lock suffices)

niklas88 reviewed Mar 29, 2020

View reviewed changes

joka921 commented Apr 12, 2020

View reviewed changes

src/engine/Operation.cpp

}

});

}

Copy link

Member Author

joka921 Apr 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New Code ends here.

niklas88 approved these changes Apr 12, 2020

View reviewed changes

joka921 force-pushed the f.FlexibleCache branch from df85ad9 to 0dadc29 Compare April 13, 2020 15:34

joka921 added 3 commits April 13, 2020 17:36

joka921 force-pushed the f.FlexibleCache branch from 0dadc29 to 1c328e5 Compare April 13, 2020 15:38

Eliminated nullptr segfaults in forAllDescendants.

faab2ec

niklas88 merged commit ec8c914 into ad-freiburg:master Apr 13, 2020

joka921 deleted the f.FlexibleCache branch May 8, 2021 09:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of a More Flexible Cache #318

Implementation of a More Flexible Cache #318

joka921 commented Mar 7, 2020

hannahbast commented Mar 7, 2020 •

edited

joka921 commented Mar 8, 2020

hannahbast commented Mar 19, 2020

niklas88 left a comment

niklas88 Mar 29, 2020

niklas88 Mar 29, 2020

niklas88 Mar 29, 2020

niklas88 Mar 29, 2020

niklas88 Mar 29, 2020

niklas88 Mar 29, 2020

joka921 Apr 7, 2020

niklas88 Mar 29, 2020

joka921 Apr 12, 2020

joka921 Apr 12, 2020

niklas88 left a comment

Implementation of a More Flexible Cache #318

Implementation of a More Flexible Cache #318

Conversation

joka921 commented Mar 7, 2020

hannahbast commented Mar 7, 2020 • edited

joka921 commented Mar 8, 2020

hannahbast commented Mar 19, 2020

niklas88 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niklas88 left a comment

Choose a reason for hiding this comment

hannahbast commented Mar 7, 2020 •

edited