New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
F.memory limited cache #348
Conversation
e946213
to
d1efd1b
Compare
d1efd1b
to
d7ef57b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs quite some commenting, and one restructuring,
but no obvious mistakes were found by myself.
b9914d8
to
096e997
Compare
- There is now also a configurable limit on the actual size of cache elements, including a limit on the maximal size of a single cache element - Pinned elements also count towards the size. - Results are only inserted into the cache once they are fully computed. - If a result is requested from the cache, while it is still being computed, this computation can be awaited. This functionality existed before, but is now encapsulated in a separate CacheAdapter class. This encapsulation allows us to test this functionality which was previously untested. - Removed thread-safety from the cache class itself, As it can be added easily using ad_utility::Synchronized
096e997
to
86a516c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review from 1&1 of Hannah and Johannes.
Better check the performance again after fixing the Server.h bug
src/util/Cache.h
Outdated
public: | ||
using key_type = Key; | ||
using value_type = Value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that these are better names? Key and Value look like canonical and well understandable type names to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but keeping these aliases allows easier interactions with the STL which often assumes
types of exactly these names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but please add a comment then:
// For easier interaction with the STL, which often uses key_type and value_type .
…annah on these tests, and make them pass deterministically.
Still missing: Cache size which is settable from the commandline.
Added commandline support for the cache configuration and renamed several constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the revision. Here are some more comments, mostly concerned with naming and documentation.
Dockerfile
Outdated
ENV CACHE_SIZE 30 | ||
ENV MAX_SIZE_SINGLE_CACHE_ELEMENT 5 | ||
ENV CACHE_NUM_VALUES 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CACHE_MAX_SIZE_GB
CACHE_MAX_SIZE_GB_SINGLE_ENTRY
CACHE_MAX_NUM_ENTRIES
src/ServerMain.cpp
Outdated
{"cache-size-in-gb", required_argument, NULL, 'c'}, | ||
{"cache-max-size-single-element", required_argument, NULL, 'e'}, | ||
{"cache-max-num-values", required_argument, NULL, 'k'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache-max-size-gb
cache-max-size-gb-single-entry
cache-max-num-entries
(From a user perspective, entry is more natural for a cache than "element" or "value", and it should be consistent)
src/ServerMain.cpp
Outdated
<< "Maximum amount of memory that the cache (pinned and non-pinned " | ||
"values) is allowed to consume. This will still count towards the " | ||
"limit specified by the -m option" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maximum memory size in GB for all cache entries (pinned and non-pinned). Note that the cache is [not?] part of the amount of memory limited by --memory-for-queries [but can be used on top of that?].
src/ServerMain.cpp
Outdated
<< "(Intermediate) results that are larger than this limit will never " | ||
"be cached" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maximum size in GB for a single cache entry. In other words, results larger than this will never be cached..
src/ServerMain.cpp
Outdated
"be cached" | ||
<< endl; | ||
cout << " " << std::setw(20) << "k, cache-max-num-values" << std::setw(1) | ||
<< "The number of (Intermediate) results that can be stored in the " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maximum number of entries in the cache. If exceeded, remove least-recently used entries from the cache if possible. Note that this condition and the size limit specified via --cache-max-size-gb both have to hold (logical AND).
src/util/ConcurrentCache.h
Outdated
// values that are currently being computed. | ||
// the bool tells us whether this result will be pinned in the cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No newline and Sentence should always be properly capitalized.
// Map of cache entries (key-value pairs) currently being computed. The bool tells us whether this result will be pinned in the cache.
test/CMakeLists.txt
Outdated
add_executable(CacheAdapterTest ConcurrentCacheTest.cpp) | ||
add_test(CacheAdapterTest CacheAdapterTest) | ||
target_link_libraries(CacheAdapterTest gtest_main ${CMAKE_THREAD_LIBS_INIT} absl::flat_hash_map) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why still the name "Adapter" here?
test/ConcurrentCacheTest.cpp
Outdated
}; | ||
} | ||
|
||
using SimpleAdapter = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SimpleConcurrentLruCache
test/ConcurrentCacheTest.cpp
Outdated
t.start(); | ||
// Fake computation that takes 100ms and returns value "3", which is then | ||
// stored under key 3. | ||
auto res = a.computeOnce(3, waiting_function("3"s, 100)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there was a "res -> result", "res2 -> result2" comment
test/ConcurrentCacheTest.cpp
Outdated
// note: This test might fail on a single-threaded system. | ||
// now the background computation should be ongoing and registered as | ||
// "in progress" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to test whether the system we are running on supports threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have found a wait which should work in a single-threaded scenario.
I unfortunately cannot comment on your remark of the userError vs. assertion discussion. Our scenario is the following: A user creates a cache, and adds the key "first".
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another quick 1-1 with Johannes to finish this PR
Thanks for all the renaming!
src/engine/Operation.cpp
Outdated
// When we pin a final result only, we also need to remember the sizes of all | ||
// involved IndexScans with two bound columns. If we don't do this, the query | ||
// planner will otherwise trigger their computation even if it is uneeded | ||
// because the final result can be found in the cache. | ||
if (pinChildIndexScanSizes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename variable to pinFinalResultButNotSubtrees
src/engine/Operation.cpp
Outdated
// When we pin a final result only, we also need to remember the sizes of all | ||
// involved IndexScans with two bound columns. If we don't do this, the query | ||
// planner will otherwise trigger their computation even if it is uneeded | ||
// because the final result can be found in the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// When we pin the final result but no subtrees, we need to remember the sizes of all involved index scans that have only one free variable. Note that these index scans are executed already during query planning because they have to be executed anyway, for any query plan. If we don't remember these sizes here, future queries that take the result from the cache would redo these index scans. Note that we do not need to remember the multiplicity (and distinctness) because the multiplicity for an index scan with a single free variable is always 1.
// Signal from one thread to another that a certain event has occured. | ||
// TODO<C++20>: In C++20 this can be a std::atomic_flag which has wait() and | ||
// notify() functions. | ||
class ConcurrentSignal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add another flag that signals whether it's OK for the computation to end (to ensure that certain computations are indeed concurrent so that we can be sure that the behavior in the case of concurrency is indeed tested)
test/ConcurrentCacheTest.cpp
Outdated
} | ||
std::this_thread::sleep_for(std::chrono::milliseconds(milliseconds)); | ||
return result; | ||
}; | ||
} | ||
|
||
auto wait_and_throw_function(size_t milliseconds, | ||
std::atomic<bool>* f = nullptr) { | ||
ConcurrentSignal* f = nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f -> signal
The cache has now a memory Limit.
Currently it is hardcoded via global/Constants.h
(30GiB total cache size)
5 GiB maximum for a single Cache Element.