-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(server): add oom guard #1650
Conversation
src/server/server_state.h
Outdated
@@ -226,6 +228,8 @@ class ServerState { // public struct - to allow initialization. | |||
|
|||
absl::flat_hash_map<std::string, base::Histogram> call_latency_histos_; | |||
uint32_t thread_index_ = 0; | |||
uint64_t used_mem_ = 0; | |||
uint64_t last_chached_used_current_ = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo (chached -> cached)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further, I'd rename it to used_mem_last_update_
or something along those lines
src/server/server_state.h
Outdated
@@ -130,6 +130,8 @@ class ServerState { // public struct - to allow initialization. | |||
gstate_ = s; | |||
} | |||
|
|||
uint64_t GetCachedUsedMemory(uint64_t now_ns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cache can mean different things. As a start, I'd add a comment describing what this method does. Reading this first, I thought that this might be the memory of the items in cache (as in cache mode).
Perhaps call it GetLastUsedMemory()
? I'd even call it GetUsedMemory()
and comment that it might be stale for some short period of time..
src/server/server_state.cc
Outdated
@@ -71,6 +71,15 @@ void ServerState::Destroy() { | |||
state_ = nullptr; | |||
} | |||
|
|||
uint64_t ServerState::GetCachedUsedMemory(uint64_t now_ns) { | |||
uint64_t kCacheEveryNs = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make is constexpr
to justify the k
notation? :)
src/server/server_state.cc
Outdated
uint64_t kCacheEveryNs = 1000; | ||
if (now_ns > last_chached_used_current_ + kCacheEveryNs) { | ||
last_chached_used_current_ = now_ns; | ||
used_mem_ = used_mem_current; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use used_mem_current.load(...)
to be clear that you're caching an atomic var? It was an odd method to read before I realized that :)
src/server/server_state.cc
Outdated
@@ -71,6 +71,15 @@ void ServerState::Destroy() { | |||
state_ = nullptr; | |||
} | |||
|
|||
uint64_t ServerState::GetCachedUsedMemory(uint64_t now_ns) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not calculate now_ns
inside the method? Is it due to performance reasons?
It looks like now we call it once per DispatchCommand()
, so it shouldn't be noticeable (right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think it's better to pass time variables for several reasons.
- It's usually much harder to test code the internally relies on clock calls.
- Once the default is flipped and we declare it's better style calling the clock internally - the code is immediately flooded with clock calls.
- Which brings us to the biggest thing - clock calls can be expensive. It depends on the hardware, virtualisation etc but ask @dranikpg - i remember that at some point all the benchmarks he did on his laptop were meaningless because the clock function took like 30% of the total CPU in Dragonfly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, then feel free to ignore
src/server/main_service.cc
Outdated
ABSL_FLAG(double, maxmemory_ratio, 1.1, | ||
"commands with flag denyoom will return OOM when the ratio between maxmemory and used " | ||
"memory is above " | ||
"this value"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge with the line above?
src/server/main_service.cc
Outdated
@@ -917,6 +921,11 @@ void Service::DispatchCommand(CmdArgList args, facade::ConnectionContext* cntx) | |||
} | |||
|
|||
uint64_t start_ns = ProactorBase::GetMonotonicTimeNs(), end_ns; | |||
double maxmemory_ratio = GetFlag(FLAGS_maxmemory_ratio); | |||
uint64_t used_memory = ServerState::tlocal()->GetCachedUsedMemory(start_ns); | |||
if (used_memory > (max_memory_limit * maxmemory_ratio) && (cid->opt_mask() & CO::DENYOOM)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also please go over all write commands (there are not many of them) and check which one of them should have DENYOOM - I think some have but they should they not and the opposite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this in another PR
src/server/main_service.cc
Outdated
@@ -76,6 +76,10 @@ ABSL_FLAG(MaxMemoryFlag, maxmemory, MaxMemoryFlag{}, | |||
"Limit on maximum-memory that is used by the database. " | |||
"0 - means the program will automatically determine its maximum memory usage. " | |||
"default: 0"); | |||
ABSL_FLAG(double, maxmemory_ratio, 1.1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe call - oom_deny_ratio
?
src/server/server_state.cc
Outdated
uint64_t ServerState::GetCachedUsedMemory(uint64_t now_ns) { | ||
uint64_t kCacheEveryNs = 1000; | ||
if (now_ns > last_chached_used_current_ + kCacheEveryNs) { | ||
last_chached_used_current_ = now_ns; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chached -> cached
src/server/server_state.cc
Outdated
uint64_t kCacheEveryNs = 1000; | ||
if (now_ns > last_chached_used_current_ + kCacheEveryNs) { | ||
last_chached_used_current_ = now_ns; | ||
used_mem_ = used_mem_current; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used_mem_current.load(memory_order_relaxed)
src/server/server_state.h
Outdated
@@ -226,6 +228,8 @@ class ServerState { // public struct - to allow initialization. | |||
|
|||
absl::flat_hash_map<std::string, base::Histogram> call_latency_histos_; | |||
uint32_t thread_index_ = 0; | |||
uint64_t used_mem_ = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used_mem_cached_
and add a comment saying it's a thread local cache of used_mem_current
fixes #1634 1. add flag maxmemory_ratio 2. When current used memory * maxmemory_ratio > maxmemory_limit denyoom commands will return oom error. Signed-off-by: adi_holden <adi@dragonflydb.io>
src/server/main_service.cc
Outdated
@@ -892,6 +895,11 @@ void Service::DispatchCommand(CmdArgList args, facade::ConnectionContext* cntx) | |||
} | |||
|
|||
uint64_t start_ns = ProactorBase::GetMonotonicTimeNs(), end_ns; | |||
double oom_deny_ratio = GetFlag(FLAGS_oom_deny_ratio); | |||
uint64_t used_memory = ServerState::tlocal()->GetUsedMemory(start_ns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- we already have
etl
forServerState::tlocal()
. - Lets check first if the mask has DENYOOM before fetching all the data. We are on the hotpath.
- @chakaz you mentioned that GetFlag is heavy. is it indeed the case? (do not think we need to address that now).
Signed-off-by: adi_holden <adi@dragonflydb.io>
fixes #1634