Fix sockproc startup race conditions after running out of memory #76
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If the
auto_ssl
shared dict ran out of memory, and then nginx were reloaded, then a race condition existed where multiple "sockproc" processes could try to be started at the same time.While I think this situation would be unlikely to affect a production system in a negative way (since the race condition only occurs during nginx reloads and 1 of the sockproc processes should still succeed and allow things to work), it did lead to some errors being logged, which would intermittently cause our tests to fail (it would crop up on the test following our t/memory.t test, since that next test would be the first reload following our test to explicitly exhaust the memory).
This is fixed by using the
auto_ssl_settings
shared dict for storing the resty-lock details (the lock prevents multiple processes from being started at once). This smaller shared dict (introduced in #68) is used for storing bits of data that won't grow in size so we can better ensure the data will never be evicted from the cache. I'm now able to repeatedly run the test suite in a loop without hitting this edge case.Note that we are still using the
auto_ssl
shared dict for storing resty-lock details for domain registrations, since the memory requirements for that may grow (since there's a lock per domain, it's dynamic in size). But that should be okay, because similar to the SSL certs stored inauto_ssl
, we're okay with cache evictions for old data in those cases (along with warnings being logged).Also possibly relevant is that currently resty-lock always uses
add
for the shared dict (so it will evict old data to make room if necessary), but there's a pull request for allowing use ofsafe_add
: openresty/lua-resty-lock#6 While we should be okay by switching things toauto_ssl_settings
(since we should never have enough stored data in there to need evicting old data), it's something to keep an eye on.Related to: