add keep-cache option to "unbound-control reload" to keep caches #569

jinmeiib · 2021-11-11T18:49:56Z

This PR implements a new option +keep-cache to unbount-control. It allows unbound to keep the RRset and message caches on reloading the configuration.

The main idea is to move the ownership of the per-worker alloc_caches from workers to "daemon" and keep using them when it's deemed to be feasible. In my understanding this would (almost) eliminate the first reason for clearing the caches as commented in daemon.c:daemon_cleanup:

	/* clean up caches because
	 * a) RRset IDs will be recycled after a reload, causing collisions
	 * b) validation config can change, thus rrset, msg, keycache clear */

The second reason is still an issue. And, there are other config changes that would make the reuse of caches impossible or infeasible:

changing the number of threads
changing size of rrset or message cache (then the corresponding cache is re-created and old cache entries are invalidated anyway)
changing the forwarder address
probably more

This PR handles the first two cases explicitly (see the change and comment in daemon_apply_cfg), but leaves the consideration on other cases as the operator's responsibility. That's why keeping the cache is optional and non-default.

I have an automated test that confirms the change, but its style is different from other unbound tests, so it's not included in this PR. I'm also aware that some documentation update will be necessary if (the idea of) this PR is adopted, but I'd first like to get feedback on the concept and the current code that implements it.

I ran both make test and make longtest to confirm the change of this PR won't introduce regression. In the latter I saw occaisional failure of tcp_req_order and ssl_req_order, but they also fail in the master branch, so I don't think the failure is due to this PR.

I'm looking forward to any feedback.

UladzimirTrehubenka · 2022-06-08T09:26:48Z

@wcawijngaards could you take care?

gthess · 2022-06-08T10:03:27Z

This is in my todo list but because of more pressing stuff it got jostled away. I will have a look this/next week.

penguinpee · 2022-06-27T12:05:09Z

I stumbled on this PR by accident. But I very much support this option. In my setup I tried to solve this by using systemd jobs tied to unbound.service that do dump_cache, load_cache respectively. I'd be willing to help with testing and/or documentation, if needed.

gthess

Hi @jinmeiib!

Apart from the one bug I see, the code looks good.
For testing you could update testdata/09-unbound-control.tdir with the keep-cache test cases.

From a first look I am not 100% set on the +keep-cache extra option for the reload command since it deviates from all the other options.
Maybe a separate reload_keep_cache command to make it more explicit that caution is needed and is not just a feature of the reload command (that should always work)?

This feature is more efficient than dump-and-load, but dump-and-load bypasses the technical difficulties of the threads' number and cache sizes.

Btw, when is such a feature (also the dump-and-load variant) useful?

gthess · 2022-07-21T11:46:45Z

daemon/daemon.c

@@ -745,7 +789,12 @@ daemon_cleanup(struct daemon* daemon)
 	free(daemon->workers);
 	daemon->workers = NULL;
 	daemon->num = 0;


This should move after daemon_clear_allocs otherwise memory leaks happen.

Suggested change

daemon->num = 0;

Ah, that's right. Thank you for catching it.

I've fixed it at commit fccb2eb.

gthess · 2022-07-21T11:47:05Z

daemon/daemon.c

+	if(!daemon->reuse_cache || daemon->need_to_exit)
+		daemon_clear_allocs(daemon);


Fixing the memory leak

Suggested change

if(!daemon->reuse_cache || daemon->need_to_exit)

daemon_clear_allocs(daemon);

if(!daemon->reuse_cache || daemon->need_to_exit)

daemon_clear_allocs(daemon);

daemon->num = 0;

@gthess thank you for the review!

From a first look I am not 100% set on the +keep-cache extra option for the reload command since it deviates from all the other options. Maybe a separate reload_keep_cache command to make it more explicit that caution is needed and is not just a feature of the reload command (that should always work)?

I don't have a strong opinion about it. Please feel free to rename it (or do you want me to update this PR with the change?).

This feature is more efficient than dump-and-load, but dump-and-load bypasses the technical difficulties of the threads' number and cache sizes.

Btw, when is such a feature (also the dump-and-load variant) useful?

I'd leave it to @UladzimirTrehubenka but I guess their operation requires occasional config changes (that don't affect the reuse of the cache) and they don't want to avoid the initial higher latency/lower throughput while populating the cache when the cache isn't reused. I also guess they want to prefer "keep cache" (over dump-and-load) to minimize the down time on reload.

UladzimirTrehubenka · 2022-10-06T07:00:40Z

Almost year for code review =(

gthess · 2022-10-14T19:17:43Z

I would like to have included this on the 1.17.0 release, but time was an issue :(
The PR for me is ready except for renaming the command from reload +keep-cache to reload_keep_cache, and introducing some test cases.
@jinmeiib, how much do you want to be involved with these? I could take over but I would appreciate if you could share your current test cases and I could try incorporating them in Unbound's testing directories.

jinmeiib · 2022-11-28T14:33:06Z

@gthess I'm attaching the example test tpkg to this comment.
infoblox_unbound_reload.tar.gz

gthess · 2022-11-28T14:35:32Z

Perfect, thanks!

gthess

LGTM now, will make the change for an explicit reload_keep_cache command and add the testcases.

for #569.

'unbound-control reload' to keep caches.

* nlnet/master: - Use an explicit 'reload_keep_cache' command and introduce test cases for NLnetLabs#569. prevent memory leak in case cache isn't reused add keep-cache option to unbound-control reload to keep caches

add keep-cache option to unbound-control reload to keep caches

5b2eda2

gthess self-assigned this Jun 8, 2022

gthess requested changes Jul 21, 2022

View reviewed changes

prevent memory leak in case cache isn't reused

fccb2eb

gthess added this to the 1.17.1 milestone Nov 29, 2022

gthess approved these changes Dec 14, 2022

View reviewed changes

gthess added a commit that referenced this pull request Dec 14, 2022

- Use an explicit 'reload_keep_cache' command and introduce test cases

7716d26

for #569.

gthess added a commit that referenced this pull request Dec 14, 2022

- Merge #569 from JINMEI Tatuya: add keep-cache option to

0682d43

'unbound-control reload' to keep caches.

gthess merged commit 857d6ce into NLnetLabs:master Dec 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add keep-cache option to "unbound-control reload" to keep caches #569

add keep-cache option to "unbound-control reload" to keep caches #569

jinmeiib commented Nov 11, 2021

UladzimirTrehubenka commented Jun 8, 2022

gthess commented Jun 8, 2022

penguinpee commented Jun 27, 2022 •

edited

Loading

gthess left a comment

gthess Jul 21, 2022

jinmeiib Jul 22, 2022

jinmeiib Jul 22, 2022

gthess Jul 21, 2022

jinmeiib Jul 22, 2022

UladzimirTrehubenka commented Oct 6, 2022 •

edited

Loading

gthess commented Oct 14, 2022 •

edited

Loading

jinmeiib commented Nov 28, 2022

gthess commented Nov 28, 2022

gthess left a comment

		if(!daemon->reuse_cache \|\| daemon->need_to_exit)
		daemon_clear_allocs(daemon);

add keep-cache option to "unbound-control reload" to keep caches #569

add keep-cache option to "unbound-control reload" to keep caches #569

Conversation

jinmeiib commented Nov 11, 2021

UladzimirTrehubenka commented Jun 8, 2022

gthess commented Jun 8, 2022

penguinpee commented Jun 27, 2022 • edited Loading

gthess left a comment

Choose a reason for hiding this comment

gthess Jul 21, 2022

Choose a reason for hiding this comment

jinmeiib Jul 22, 2022

Choose a reason for hiding this comment

jinmeiib Jul 22, 2022

Choose a reason for hiding this comment

gthess Jul 21, 2022

Choose a reason for hiding this comment

jinmeiib Jul 22, 2022

Choose a reason for hiding this comment

UladzimirTrehubenka commented Oct 6, 2022 • edited Loading

gthess commented Oct 14, 2022 • edited Loading

jinmeiib commented Nov 28, 2022

gthess commented Nov 28, 2022

gthess left a comment

Choose a reason for hiding this comment

penguinpee commented Jun 27, 2022 •

edited

Loading

UladzimirTrehubenka commented Oct 6, 2022 •

edited

Loading

gthess commented Oct 14, 2022 •

edited

Loading