feat(server): add oom guard #1650

adiholden · 2023-08-06T07:44:27Z

add flag maxmemory_ratio
When current used memory * maxmemory_ratio > maxmemory_limit denyoom commands will return oom error.

chakaz · 2023-08-06T07:59:24Z

src/server/server_state.h

@@ -226,6 +228,8 @@ class ServerState {  // public struct - to allow initialization.

  absl::flat_hash_map<std::string, base::Histogram> call_latency_histos_;
  uint32_t thread_index_ = 0;
+  uint64_t used_mem_ = 0;
+  uint64_t last_chached_used_current_ = 0;


Typo (chached -> cached)

Further, I'd rename it to used_mem_last_update_ or something along those lines

chakaz · 2023-08-06T08:01:11Z

src/server/server_state.h

@@ -130,6 +130,8 @@ class ServerState {  // public struct - to allow initialization.
    gstate_ = s;
  }

+  uint64_t GetCachedUsedMemory(uint64_t now_ns);


Cache can mean different things. As a start, I'd add a comment describing what this method does. Reading this first, I thought that this might be the memory of the items in cache (as in cache mode).
Perhaps call it GetLastUsedMemory()? I'd even call it GetUsedMemory() and comment that it might be stale for some short period of time..

chakaz · 2023-08-06T08:01:23Z

src/server/server_state.cc

@@ -71,6 +71,15 @@ void ServerState::Destroy() {
  state_ = nullptr;
 }

+uint64_t ServerState::GetCachedUsedMemory(uint64_t now_ns) {
+  uint64_t kCacheEveryNs = 1000;


make is constexpr to justify the k notation? :)

chakaz · 2023-08-06T08:04:52Z

src/server/server_state.cc

+  uint64_t kCacheEveryNs = 1000;
+  if (now_ns > last_chached_used_current_ + kCacheEveryNs) {
+    last_chached_used_current_ = now_ns;
+    used_mem_ = used_mem_current;


Maybe use used_mem_current.load(...) to be clear that you're caching an atomic var? It was an odd method to read before I realized that :)

chakaz · 2023-08-06T08:05:59Z

src/server/server_state.cc

@@ -71,6 +71,15 @@ void ServerState::Destroy() {
  state_ = nullptr;
 }

+uint64_t ServerState::GetCachedUsedMemory(uint64_t now_ns) {


Why not calculate now_ns inside the method? Is it due to performance reasons?
It looks like now we call it once per DispatchCommand(), so it shouldn't be noticeable (right?)

I actually think it's better to pass time variables for several reasons.

It's usually much harder to test code the internally relies on clock calls.

Once the default is flipped and we declare it's better style calling the clock internally - the code is immediately flooded with clock calls.

Which brings us to the biggest thing - clock calls can be expensive. It depends on the hardware, virtualisation etc but ask @dranikpg - i remember that at some point all the benchmarks he did on his laptop were meaningless because the clock function took like 30% of the total CPU in Dragonfly.

Sure, then feel free to ignore

chakaz · 2023-08-06T08:09:50Z

src/server/main_service.cc

+ABSL_FLAG(double, maxmemory_ratio, 1.1,
+          "commands with flag denyoom will return OOM when the ratio between maxmemory and used "
+          "memory is above "
+          "this value");


Merge with the line above?

romange · 2023-08-06T08:24:18Z

src/server/main_service.cc

@@ -917,6 +921,11 @@ void Service::DispatchCommand(CmdArgList args, facade::ConnectionContext* cntx)
  }

  uint64_t start_ns = ProactorBase::GetMonotonicTimeNs(), end_ns;
+  double maxmemory_ratio = GetFlag(FLAGS_maxmemory_ratio);
+  uint64_t used_memory = ServerState::tlocal()->GetCachedUsedMemory(start_ns);
+  if (used_memory > (max_memory_limit * maxmemory_ratio) && (cid->opt_mask() & CO::DENYOOM)) {


also please go over all write commands (there are not many of them) and check which one of them should have DENYOOM - I think some have but they should they not and the opposite.

I did this in another PR

romange · 2023-08-06T08:25:00Z

src/server/main_service.cc

@@ -76,6 +76,10 @@ ABSL_FLAG(MaxMemoryFlag, maxmemory, MaxMemoryFlag{},
          "Limit on maximum-memory that is used by the database. "
          "0 - means the program will automatically determine its maximum memory usage. "
          "default: 0");
+ABSL_FLAG(double, maxmemory_ratio, 1.1,


maybe call - oom_deny_ratio ?

romange · 2023-08-06T08:32:51Z

src/server/server_state.cc

+uint64_t ServerState::GetCachedUsedMemory(uint64_t now_ns) {
+  uint64_t kCacheEveryNs = 1000;
+  if (now_ns > last_chached_used_current_ + kCacheEveryNs) {
+    last_chached_used_current_ = now_ns;


chached -> cached

romange · 2023-08-06T08:33:11Z

src/server/server_state.cc

+  uint64_t kCacheEveryNs = 1000;
+  if (now_ns > last_chached_used_current_ + kCacheEveryNs) {
+    last_chached_used_current_ = now_ns;
+    used_mem_ = used_mem_current;


used_mem_current.load(memory_order_relaxed)

romange · 2023-08-06T08:34:53Z

src/server/server_state.h

@@ -226,6 +228,8 @@ class ServerState {  // public struct - to allow initialization.

  absl::flat_hash_map<std::string, base::Histogram> call_latency_histos_;
  uint32_t thread_index_ = 0;
+  uint64_t used_mem_ = 0;


used_mem_cached_ and add a comment saying it's a thread local cache of used_mem_current

fixes #1634 1. add flag maxmemory_ratio 2. When current used memory * maxmemory_ratio > maxmemory_limit denyoom commands will return oom error. Signed-off-by: adi_holden <adi@dragonflydb.io>

Signed-off-by: adi_holden <adi@dragonflydb.io>

romange · 2023-08-07T09:22:50Z

src/server/main_service.cc

@@ -892,6 +895,11 @@ void Service::DispatchCommand(CmdArgList args, facade::ConnectionContext* cntx)
  }

  uint64_t start_ns = ProactorBase::GetMonotonicTimeNs(), end_ns;
+  double oom_deny_ratio = GetFlag(FLAGS_oom_deny_ratio);
+  uint64_t used_memory = ServerState::tlocal()->GetUsedMemory(start_ns);


we already have etl for ServerState::tlocal().

Lets check first if the mask has DENYOOM before fetching all the data. We are on the hotpath.

@chakaz you mentioned that GetFlag is heavy. is it indeed the case? (do not think we need to address that now).

Signed-off-by: adi_holden <adi@dragonflydb.io>

adiholden requested review from romange and chakaz August 6, 2023 07:44

chakaz reviewed Aug 6, 2023

View reviewed changes

romange reviewed Aug 6, 2023

View reviewed changes

adiholden added 2 commits August 6, 2023 11:38

feat(server): add oom guard

fe9f401

fixes #1634 1. add flag maxmemory_ratio 2. When current used memory * maxmemory_ratio > maxmemory_limit denyoom commands will return oom error. Signed-off-by: adi_holden <adi@dragonflydb.io>

fix PR

5ae9905

Signed-off-by: adi_holden <adi@dragonflydb.io>

adiholden force-pushed the oom_guard branch from 3dd3ec4 to 5ae9905 Compare August 6, 2023 10:36

adiholden requested a review from chakaz August 6, 2023 10:37

adiholden added 2 commits August 6, 2023 13:50

fix

6c19a82

Signed-off-by: adi_holden <adi@dragonflydb.io>

fix test

35036bb

Signed-off-by: adi_holden <adi@dragonflydb.io>

romange reviewed Aug 7, 2023

View reviewed changes

first check denyoom flag

8e47ef3

Signed-off-by: adi_holden <adi@dragonflydb.io>

adiholden requested a review from romange August 8, 2023 19:44

romange approved these changes Aug 8, 2023

View reviewed changes

adiholden merged commit 116934b into main Aug 8, 2023
10 checks passed

adiholden deleted the oom_guard branch August 8, 2023 20:26

romange mentioned this pull request Aug 29, 2023

Maxmemory isn't enforced in DEBUG POPULATE #1764

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): add oom guard #1650

feat(server): add oom guard #1650

adiholden commented Aug 6, 2023

chakaz Aug 6, 2023

chakaz Aug 6, 2023

chakaz Aug 6, 2023

chakaz Aug 6, 2023

chakaz Aug 6, 2023

chakaz Aug 6, 2023

romange Aug 6, 2023

chakaz Aug 6, 2023

chakaz Aug 6, 2023

romange Aug 6, 2023

adiholden Aug 6, 2023

romange Aug 6, 2023

romange Aug 6, 2023

romange Aug 6, 2023

romange Aug 6, 2023

romange Aug 7, 2023 •

edited

Loading

feat(server): add oom guard #1650

feat(server): add oom guard #1650

Conversation

adiholden commented Aug 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romange Aug 7, 2023 • edited Loading

Choose a reason for hiding this comment

romange Aug 7, 2023 •

edited

Loading