F.memory limited cache #348

joka921 · 2020-10-08T16:11:15Z

The cache has now a memory Limit.

Currently it is hardcoded via global/Constants.h

(30GiB total cache size)
5 GiB maximum for a single Cache Element.

joka921

Needs quite some commenting, and one restructuring,
but no obvious mistakes were found by myself.

src/util/Cache.h

src/util/CacheAdapter.h

src/engine/QueryExecutionContext.h

src/util/CacheAdapter.h

- There is now also a configurable limit on the actual size of cache elements, including a limit on the maximal size of a single cache element - Pinned elements also count towards the size. - Results are only inserted into the cache once they are fully computed. - If a result is requested from the cache, while it is still being computed, this computation can be awaited. This functionality existed before, but is now encapsulated in a separate CacheAdapter class. This encapsulation allows us to test this functionality which was previously untested. - Removed thread-safety from the cache class itself, As it can be added easily using ad_utility::Synchronized

hannahbast

Code review from 1&1 of Hannah and Johannes.

Better check the performance again after fixing the Server.h bug

src/util/Cache.h

hannahbast · 2021-04-09T15:29:23Z

src/util/Cache.h

+ public:
+  using key_type = Key;
+  using value_type = Value;


Are you sure that these are better names? Key and Value look like canonical and well understandable type names to me.

I agree, but keeping these aliases allows easier interactions with the STL which often assumes
types of exactly these names.

Ok, but please add a comment then:

// For easier interaction with the STL, which often uses key_type and value_type .

src/util/Cache.h

src/engine/Operation.cpp

src/engine/QueryExecutionContext.h

src/engine/Server.h

src/global/Constants.h

…annah on these tests, and make them pass deterministically.

…apter)

Still missing: Cache size which is settable from the commandline.

Added commandline support for the cache configuration and renamed several constants.

hannahbast

Thanks a lot for the revision. Here are some more comments, mostly concerned with naming and documentation.

hannahbast · 2021-04-12T11:25:50Z

Dockerfile

+ENV CACHE_SIZE 30
+ENV MAX_SIZE_SINGLE_CACHE_ELEMENT 5
+ENV CACHE_NUM_VALUES 1000


CACHE_MAX_SIZE_GB
CACHE_MAX_SIZE_GB_SINGLE_ENTRY
CACHE_MAX_NUM_ENTRIES

hannahbast · 2021-04-12T11:27:42Z

src/ServerMain.cpp

+    {"cache-size-in-gb", required_argument, NULL, 'c'},
+    {"cache-max-size-single-element", required_argument, NULL, 'e'},
+    {"cache-max-num-values", required_argument, NULL, 'k'},


cache-max-size-gb
cache-max-size-gb-single-entry
cache-max-num-entries

(From a user perspective, entry is more natural for a cache than "element" or "value", and it should be consistent)

hannahbast · 2021-04-12T11:31:47Z

src/ServerMain.cpp

+       << "Maximum amount of memory that the cache (pinned and non-pinned "
+          "values) is allowed to consume. This will still count towards the "
+          "limit specified by the -m option"


Maximum memory size in GB for all cache entries (pinned and non-pinned). Note that the cache is [not?] part of the amount of memory limited by --memory-for-queries [but can be used on top of that?].

hannahbast · 2021-04-12T11:33:07Z

src/ServerMain.cpp

+       << "(Intermediate) results that are larger than this limit will never "
+          "be cached"


Maximum size in GB for a single cache entry. In other words, results larger than this will never be cached..

hannahbast · 2021-04-12T11:38:25Z

src/ServerMain.cpp

+          "be cached"
+       << endl;
+  cout << "  " << std::setw(20) << "k, cache-max-num-values" << std::setw(1)
+       << "The number of (Intermediate) results that can be stored in the "


Maximum number of entries in the cache. If exceeded, remove least-recently used entries from the cache if possible. Note that this condition and the size limit specified via --cache-max-size-gb both have to hold (logical AND).

hannahbast · 2021-04-12T12:12:53Z

src/util/ConcurrentCache.h

+    // values that are currently being computed.
+    // the bool tells us whether this result will be pinned in the cache


No newline and Sentence should always be properly capitalized.

// Map of cache entries (key-value pairs) currently being computed. The bool tells us whether this result will be pinned in the cache.

hannahbast · 2021-04-12T12:14:00Z

test/CMakeLists.txt

+add_executable(CacheAdapterTest ConcurrentCacheTest.cpp)
+add_test(CacheAdapterTest CacheAdapterTest)
+target_link_libraries(CacheAdapterTest gtest_main ${CMAKE_THREAD_LIBS_INIT} absl::flat_hash_map)


Why still the name "Adapter" here?

hannahbast · 2021-04-12T12:14:43Z

test/ConcurrentCacheTest.cpp

+  };
+}
+
+using SimpleAdapter =


SimpleConcurrentLruCache

hannahbast · 2021-04-12T12:15:34Z

test/ConcurrentCacheTest.cpp

+  t.start();
+  // Fake computation that takes 100ms and returns value "3", which is then
+  // stored under key 3.
+  auto res = a.computeOnce(3, waiting_function("3"s, 100));


I think there was a "res -> result", "res2 -> result2" comment

hannahbast · 2021-04-12T12:16:50Z

test/ConcurrentCacheTest.cpp

+  // note: This test might fail on a single-threaded system.
+  // now the background computation should be ongoing and registered as
+  // "in progress"


Is there any way to test whether the system we are running on supports threads?

I have found a wait which should work in a single-threaded scenario.

joka921 · 2021-04-12T20:25:18Z

I unfortunately cannot comment on your remark of the userError vs. assertion discussion.

Our scenario is the following: A user creates a cache, and adds the key "first".
Then they add the key "first" again. Why should this trigger an assertion? I think this is a normal use case, similar to std::vector::at, and we have the following possibilities:

ignore the second request
overwrite the value at the duplicate key
signal this conflict to the user (a returned flag or an exception[chosen here] are the most common ways of doing so).

hannahbast

Another quick 1-1 with Johannes to finish this PR

Thanks for all the renaming!

hannahbast · 2021-04-15T17:41:02Z

src/engine/Operation.cpp

+  // When we pin a final result only, we also need to remember the sizes of all
+  // involved IndexScans with two bound columns. If we don't do this, the query
+  // planner will otherwise trigger their computation even if it is uneeded
+  // because the final result can be found in the cache.
  if (pinChildIndexScanSizes) {


Rename variable to pinFinalResultButNotSubtrees

hannahbast · 2021-04-15T17:48:12Z

src/engine/Operation.cpp

+  // When we pin a final result only, we also need to remember the sizes of all
+  // involved IndexScans with two bound columns. If we don't do this, the query
+  // planner will otherwise trigger their computation even if it is uneeded
+  // because the final result can be found in the cache.


// When we pin the final result but no subtrees, we need to remember the sizes of all involved index scans that have only one free variable. Note that these index scans are executed already during query planning because they have to be executed anyway, for any query plan. If we don't remember these sizes here, future queries that take the result from the cache would redo these index scans. Note that we do not need to remember the multiplicity (and distinctness) because the multiplicity for an index scan with a single free variable is always 1.

hannahbast · 2021-04-15T17:59:22Z

test/ConcurrentCacheTest.cpp

+// Signal from one thread to another that a certain event has occured.
+// TODO<C++20>: In C++20 this can be a std::atomic_flag which has wait() and
+// notify() functions.
+class ConcurrentSignal {


Add another flag that signals whether it's OK for the computation to end (to ensure that certain computations are indeed concurrent so that we can be sure that the behavior in the case of concurrency is indeed tested)

hannahbast · 2021-04-15T18:01:57Z

test/ConcurrentCacheTest.cpp

    }
    std::this_thread::sleep_for(std::chrono::milliseconds(milliseconds));
    return result;
  };
 }

 auto wait_and_throw_function(size_t milliseconds,
-                             std::atomic<bool>* f = nullptr) {
+                             ConcurrentSignal* f = nullptr) {


f -> signal

…-cpu machine.

joka921 force-pushed the f.memoryLimitedCache branch from e946213 to d1efd1b Compare November 28, 2020 09:55

joka921 force-pushed the f.memoryLimitedCache branch from d1efd1b to d7ef57b Compare March 22, 2021 11:24

joka921 commented Apr 7, 2021

View reviewed changes

joka921 force-pushed the f.memoryLimitedCache branch from b9914d8 to 096e997 Compare April 10, 2021 11:57

joka921 force-pushed the f.memoryLimitedCache branch from 096e997 to 86a516c Compare April 10, 2021 12:14

Clang-formatted this whole business.

3e62aab

hannahbast requested changes Apr 10, 2021

View reviewed changes

joka921 added 5 commits April 11, 2021 12:32

Refactorings from hannahs review for the Cache.h file

5f26897

Added tests for pinned CacheAdapter functionality, added remarks by H…

099e3bb

…annah on these tests, and make them pass deterministically.

Included Hannah's review into the ConcurrentCache (previously CacheAd…

6e3dfbf

…apter)

Changes from Hannah's review.

20ed72a

Still missing: Cache size which is settable from the commandline.

Finished including Hannah's review:

0699d6f

Added commandline support for the cache configuration and renamed several constants.

hannahbast requested changes Apr 12, 2021

View reviewed changes

joka921 added 2 commits April 12, 2021 22:18

Changes from Hannah's second Review.

c5c3bec

Changes from Hannah's second Review.

b5cd717

hannahbast approved these changes Apr 15, 2021

View reviewed changes

joka921 added 2 commits April 16, 2021 18:44

The ConcurrentCacheTest now works deterministically, even on a single…

d74095a

…-cpu machine.

Last Renames from Hannah's Review.

460f0e5

joka921 merged commit 00a633a into ad-freiburg:master Apr 16, 2021

joka921 deleted the f.memoryLimitedCache branch April 16, 2021 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

F.memory limited cache #348

F.memory limited cache #348

joka921 commented Oct 8, 2020

joka921 left a comment

hannahbast left a comment

hannahbast Apr 9, 2021

joka921 Apr 11, 2021

hannahbast Apr 12, 2021

hannahbast left a comment

hannahbast Apr 12, 2021

hannahbast Apr 12, 2021

hannahbast Apr 12, 2021

hannahbast Apr 12, 2021

hannahbast Apr 12, 2021

hannahbast Apr 12, 2021

hannahbast Apr 12, 2021

hannahbast Apr 12, 2021

hannahbast Apr 12, 2021

hannahbast Apr 12, 2021

joka921 Apr 12, 2021

joka921 commented Apr 12, 2021

hannahbast left a comment

hannahbast Apr 15, 2021

hannahbast Apr 15, 2021

hannahbast Apr 15, 2021

hannahbast Apr 15, 2021

		<< "(Intermediate) results that are larger than this limit will never "
		"be cached"

		// values that are currently being computed.
		// the bool tells us whether this result will be pinned in the cache

F.memory limited cache #348

F.memory limited cache #348

Conversation

joka921 commented Oct 8, 2020

joka921 left a comment

Choose a reason for hiding this comment

hannahbast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hannahbast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joka921 commented Apr 12, 2021

hannahbast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment