-
-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add keep-cache option to "unbound-control reload" to keep caches #569
Conversation
@wcawijngaards could you take care? |
This is in my todo list but because of more pressing stuff it got jostled away. I will have a look this/next week. |
I stumbled on this PR by accident. But I very much support this option. In my setup I tried to solve this by using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jinmeiib!
Apart from the one bug I see, the code looks good.
For testing you could update testdata/09-unbound-control.tdir with the keep-cache test cases.
From a first look I am not 100% set on the +keep-cache
extra option for the reload command since it deviates from all the other options.
Maybe a separate reload_keep_cache
command to make it more explicit that caution is needed and is not just a feature of the reload command (that should always work)?
This feature is more efficient than dump-and-load, but dump-and-load bypasses the technical difficulties of the threads' number and cache sizes.
Btw, when is such a feature (also the dump-and-load variant) useful?
daemon/daemon.c
Outdated
@@ -745,7 +789,12 @@ daemon_cleanup(struct daemon* daemon) | |||
free(daemon->workers); | |||
daemon->workers = NULL; | |||
daemon->num = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should move after daemon_clear_allocs otherwise memory leaks happen.
daemon->num = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's right. Thank you for catching it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed it at commit fccb2eb.
if(!daemon->reuse_cache || daemon->need_to_exit) | ||
daemon_clear_allocs(daemon); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing the memory leak
if(!daemon->reuse_cache || daemon->need_to_exit) | |
daemon_clear_allocs(daemon); | |
if(!daemon->reuse_cache || daemon->need_to_exit) | |
daemon_clear_allocs(daemon); | |
daemon->num = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gthess thank you for the review!
From a first look I am not 100% set on the
+keep-cache
extra option for the reload command since it deviates from all the other options. Maybe a separatereload_keep_cache
command to make it more explicit that caution is needed and is not just a feature of the reload command (that should always work)?
I don't have a strong opinion about it. Please feel free to rename it (or do you want me to update this PR with the change?).
This feature is more efficient than dump-and-load, but dump-and-load bypasses the technical difficulties of the threads' number and cache sizes.
Btw, when is such a feature (also the dump-and-load variant) useful?
I'd leave it to @UladzimirTrehubenka but I guess their operation requires occasional config changes (that don't affect the reuse of the cache) and they don't want to avoid the initial higher latency/lower throughput while populating the cache when the cache isn't reused. I also guess they want to prefer "keep cache" (over dump-and-load) to minimize the down time on reload.
Almost year for code review =( |
I would like to have included this on the 1.17.0 release, but time was an issue :( |
@gthess I'm attaching the example test tpkg to this comment. |
Perfect, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, will make the change for an explicit reload_keep_cache command and add the testcases.
'unbound-control reload' to keep caches.
* nlnet/master: - Use an explicit 'reload_keep_cache' command and introduce test cases for NLnetLabs#569. prevent memory leak in case cache isn't reused add keep-cache option to unbound-control reload to keep caches
This PR implements a new option
+keep-cache
tounbount-control
. It allows unbound to keep the RRset and message caches on reloading the configuration.The main idea is to move the ownership of the per-worker
alloc_cache
s from workers to "daemon" and keep using them when it's deemed to be feasible. In my understanding this would (almost) eliminate the first reason for clearing the caches as commented indaemon.c:daemon_cleanup
:The second reason is still an issue. And, there are other config changes that would make the reuse of caches impossible or infeasible:
This PR handles the first two cases explicitly (see the change and comment in
daemon_apply_cfg
), but leaves the consideration on other cases as the operator's responsibility. That's why keeping the cache is optional and non-default.I have an automated test that confirms the change, but its style is different from other unbound tests, so it's not included in this PR. I'm also aware that some documentation update will be necessary if (the idea of) this PR is adopted, but I'd first like to get feedback on the concept and the current code that implements it.
I ran both
make test
andmake longtest
to confirm the change of this PR won't introduce regression. In the latter I saw occaisional failure oftcp_req_order
andssl_req_order
, but they also fail in the master branch, so I don't think the failure is due to this PR.I'm looking forward to any feedback.