Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of a More Flexible Cache #318

Merged
merged 5 commits into from Apr 13, 2020

Conversation

joka921
Copy link
Member

@joka921 joka921 commented Mar 7, 2020

Generalizing The thread-safe cache used in QLever to be able to use other strategies than LRU.
Contains:

  • Two different implementations of a priority queue with update Key (with different tradeoffs, need to measure the performance of both, can be used as dropin replacement for each other.

  • Generalization of the Cache class to use a priorityQueue to decide which values to pop when the capacity is exceeded. Uses a separate Score type and two customizable functions for initially computing a score and for updating the score on access of an element.

  • The LRU cache is then implemented as an instantiation of this class using the timestamp of the last access as the Score.

  • Exhaustive documentation and unit tests for the priority queues.

  • The LRU Cache unit tests still all pass.

TODO: @hannahbast Can you run the Evaluation using this PR on the existing Wikidata-Full index, to see if we measure any impact in a real world scenario?
Another pass can be made by additionally specifying -DUSE_TREE_BASED_CACHE to cmake.
This uses the alternative implementation of the Priority Queue.

@niklas88 Could you have a look at the code, it turned out more difficult than I thought to find a consistent interface.

@hannahbast
Copy link
Member

hannahbast commented Mar 7, 2020

@joka921: thanks a lot :-) I have five questions:

  1. Which of the currently five open PRs should I (or do I have to) merge to run the evaluation? The most recent index was built with the pipelined index build and the faster turtle parser.

  2. Can you briefly characterize the two different priority queues, so that I can give my docker images proper names?

  3. Does caching (and pinning) of subresults change with this PR? In the current master, all intermediate results of a query are cached (and pinned, if the final result is pinned).

  4. Is the issue of caching a result in memory vs. on disk an issue in this PR?

  5. You mention the new interface. Can you very briefly explain how the cache is used now in comparison to how it was used in the current master?

@joka921
Copy link
Member Author

joka921 commented Mar 8, 2020

@hannahbast

  1. It should be enought to only use this PR. The unicode stuff is already merged and the other PRs
    do not change the index structure.

  2. The "default" PQ is internally called HeapBased. It uses a binary heap(std::priority_queue) + insertion of a duplicate on updateKey which might be bad for performance, if we perform a lot of change Key operations compared to insert/pop. The other one which you activate with the cmake flag is TreeBased. It uses a balanced tree (std::multiset) which has log n for all operations (unless there are more than a constant number of entries with exactly the same score) but possibly a higher constant overhead because of the additional level of indirection.

  3. No, that is a separate issue that will be adressed separately.

  4. Same answer as 3.

  5. Maybe I expressed myself not clear enough. The interface and behavior of the LRUCache does not change with this PR, thus QLever should behave exactly the same. The difference is now that the LRUCache is implemented as a instantiation of a more FlexibleCache that can also support other strategies than LRU. It is only a building block to later easily experiment with more advanced caching strategies.

@hannahbast
Copy link
Member

I finally did the evaluation compared to the current master, you can see the results here:

http://qlever.informatik.uni-freiburg.de/evaluation?dir=output.2020-03-19

There is practically no difference, great :-)

Copy link
Member

@niklas88 niklas88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great in general, only found some small nitpicks

CMakeLists.txt Outdated
@@ -82,6 +82,10 @@ if (USE_PARALLEL)
add_definitions("-D_PARALLEL_SORT")
endif()

if (USE_TREE_BASED_CACHED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That D at the end looks like a typo

src/util/Cache.h Outdated
// either to the just created element or an existing element.
//
// This needs to happen in a single operation because otherwise we may find
// an item in the cash now but it may already be deleted when trying to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An old one but s/cash/cache/

src/util/Cache.h Outdated
//
// This needs to happen in a single operation because otherwise we may find
// an item in the cash now but it may already be deleted when trying to
// retrieve it in another operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double space

src/util/Cache.h Outdated
}

// tryEmplacePinned acts like tryEmplace but if the pin paramater is true
// also marks the emplaced element so that it can only be removed explicitly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually my mistake in the original code but let's fix it now, there is no "pinned parameter" that was the original idea but varargs made it impossible

src/util/Cache.h Outdated
// TODO(schnelle): It would be nice to use std::shared_mutex to only exclude
// multiple writers.
// Sadly there is currently no easy way to upgrade a shared_lock to an
// exclusive lock. TUtfhus using a shared_lock here would conflict with moving
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also mine s/TUtfhus/Also/

*
* TreeBasedPQ is based on a balanced tree (std::multiset). It has stronger
* asymptotic guarantees, but the constant factor of the log(n) operations might
* be bigger because of the more complex tree structure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really curious what the real world performance of this will be. My gut feeling is that with todays CPUs the bad caching behaviour of (non-B) Trees will make this fare significantly worse than the HeapBasedPQ

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be true, but it might also be, that we only access relatively expensive cache Entries (rather long std::vectors) so the actual access time to the cache does not really matter.

InternalScore mScore;
Value mValue;
// keep Track of whether this Node is currently a part of the PQ or not
// (because of pop or erase operations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unclosed )

ad_utility::Timer timer;
timer.start();
auto& cache = _executionContext->getQueryTreeCache();
const string cacheKey = asString();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was mostly only moved from the .h to the .cpp file because of cyclic dependencies.
New Code starts at this marker.

}
});
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New Code ends here.

Copy link
Member

@niklas88 niklas88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I think since there are quite a few fixup commits it would make a lot of sense
to do a git rebase --interactive master and squash at leas these fixups.

Ideally we would have only building commits but we haven't done that in the past
so…

- One uses a std::priority_queue (binary heap) and inserts duplicates
  on `updateKey` (no strict `log n` guarantee on many updateKeys, but
  better caching behavior).

- The other one uses a balanced tree (std::multiset) with additional
  indirection, but asymptotice `log n` guarantee.

- Both need multiple indirections via shared_ptrs.
- The two different Priority Queues are used under the hood.
- Each element is associated with a score, if the capacity is exceeded,
  the element with the lowest score is deleted.
- The initial computation + the updating of the key on element access
  are freely configurable via template parameters.
- The (previously hard-coded) LRU cache is now implemented as a partial
  specialization of the general cache.
- The cache also allows the retrieval of statistics like
  the number of cached/pinned elements and their total size.
- previously we could only pin all subtrees of a query computation
  in the cache.

- We also need to  pin the sizes of all IndexScan descendants of a
  pinned result, since they are needed by the QueryPlanner when trying
  to reuse a pinned result "for free"
- Same idea as facebook's folly::synchronized
- Combines a type (typically a data structure) and a mutex.
- Operations on the type are not possible without locking the mutex.
- If an operation to be performed on the data treats it as const and
  if the mutex supports shared locking, it is able to explicitly request
  a shared lock for efficiency.

- This is a comfortable way to avoid race conditions on shared data
  structures.

- Use this type to make the stored IndexScan sizes of the pinned cache
  threadsafe in an efficient manner (they are only read in hot paths,
  thus a shared lock suffices)
@niklas88 niklas88 merged commit ec8c914 into ad-freiburg:master Apr 13, 2020
@joka921 joka921 deleted the f.FlexibleCache branch May 8, 2021 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants