Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching Improvements (Old version with memory size limit, discard) #279

Closed
wants to merge 2 commits into from
Closed

Caching Improvements (Old version with memory size limit, discard) #279

wants to merge 2 commits into from

Conversation

niklas88
Copy link
Member

The theory behind caching aborted Operations was that they would fail
again anyway. However, under memory pressure Operations may fail due to
failing allocations and will succeed in the future once memory pressure
drops, if we kept them in the cache we wouldn't try again.

While we are at it, improve wording and use an extra abort() method on
Operation.

As a next step we should make the cache limited by the amount of memory not the number of cached results.

The theory behind caching aborted Operations was that they would fail
again anyway. However, under memory pressure Operations may fail due to
failing allocations and will succeed in the future once memory pressure
drops, if we kept them in the cache we wouldn't try again.

While we are at it, improve wording and use an extra abort() method on
Operation.
Instead of limiting the number of elements in the cache limit the total
memory used by the cached elements. This currently uses a free template
function to let elements give their size in memory. For the results in
QLever we currently ignore the memory used by RuntimeInformation. We
also ignore the memory used by the cache itself
@niklas88
Copy link
Member Author

I'm really not happy with the second commit. I think limiting the memory size makes most cache operations fundamentally O(n) where n is the number of elements in the cache, which can be quite large if we have a lot of small elements. As a new value with size approaching or exceeding the capacity is added we are forced to delete everything else. In a first test on Wikidata Full even with a cache size of 20 GB completion times rise from 70 ms on average to over 128 ms. Even more problematic when we have several large tables that are often accessed but don't all fit in the cache we will constantly be recomputing those that didn't fit before.

With an element based capacity on the other hand we can simply remove the longest untouched value in O(1) for any new value once we reach capacity.

What do you think @hannahbast @floriankramer?

@niklas88 niklas88 changed the title Caching Improvements (WIP) Caching Improvements (Old version with memory size limit, discard) Sep 6, 2019
@niklas88 niklas88 closed this Sep 25, 2019
@niklas88 niklas88 deleted the feat_cache_improve branch October 1, 2019 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant