Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sockproc startup race conditions after running out of memory #76

Merged
merged 1 commit into from
Jun 18, 2017

Conversation

GUI
Copy link
Collaborator

@GUI GUI commented Jun 18, 2017

If the auto_ssl shared dict ran out of memory, and then nginx were reloaded, then a race condition existed where multiple "sockproc" processes could try to be started at the same time.

While I think this situation would be unlikely to affect a production system in a negative way (since the race condition only occurs during nginx reloads and 1 of the sockproc processes should still succeed and allow things to work), it did lead to some errors being logged, which would intermittently cause our tests to fail (it would crop up on the test following our t/memory.t test, since that next test would be the first reload following our test to explicitly exhaust the memory).

This is fixed by using the auto_ssl_settings shared dict for storing the resty-lock details (the lock prevents multiple processes from being started at once). This smaller shared dict (introduced in #68) is used for storing bits of data that won't grow in size so we can better ensure the data will never be evicted from the cache. I'm now able to repeatedly run the test suite in a loop without hitting this edge case.

Note that we are still using the auto_ssl shared dict for storing resty-lock details for domain registrations, since the memory requirements for that may grow (since there's a lock per domain, it's dynamic in size). But that should be okay, because similar to the SSL certs stored in auto_ssl, we're okay with cache evictions for old data in those cases (along with warnings being logged).

Also possibly relevant is that currently resty-lock always uses add for the shared dict (so it will evict old data to make room if necessary), but there's a pull request for allowing use of safe_add: openresty/lua-resty-lock#6 While we should be okay by switching things to auto_ssl_settings (since we should never have enough stored data in there to need evicting old data), it's something to keep an eye on.

Related to:

If the `auto_ssl` shared dict ran out of memory, and then nginx were
reloaded, then a race condition existed where multiple "sockproc"
processes could try to be started at the same time.

While I think this situation would be unlikely to affect a production
system in a negative way (since the race condition only occurs during
nginx reloads and 1 of the sockproc processes should still succeed
and allow things to work), it did lead to some errors being logged,
which would intermittently cause our tests to fail (it would crop up on
the test following our t/memory.t test, since that next test would be
the first reload following our test to explicitly exhaust the memory).

This is fixed by using the `auto_ssl_settings` shared dict for storing
the resty-lock details (the lock prevents multiple processes from being
started at once). This smaller shared dict (introduced in #68) is used
for storing bits of data that won't grow in size so we can better ensure
the data will never be evicted from the cache. I'm now able to
repeatedly run the test suite in a loop without hitting this edge case.

Note that we are still using the `auto_ssl` shared dict for storing
resty-lock details for domain registrations, since the memory
requirements for that may grow (since there's a lock per domain, it's
dynamic in size). But that should be okay, because similar to the SSL
certs stored in `auto_ssl`, we're okay with cache evictions for old data
in those cases (along with warnings being logged).

Also possibly relevant is that currently resty-lock always uses `add`
for the shared dict (so it will evict old data to make room if
necessary), but there's a pull request for allowing use of `safe_add`:
openresty/lua-resty-lock#6 While we should be
okay by switching things to `auto_ssl_settings` (since we should never
have enough stored data in there to need evicting old data), it's
something to keep an eye on.

Related to:
- #68 (comment)
- #48 (comment)
@GUI GUI merged commit 4cd8cc1 into master Jun 18, 2017
@GUI GUI deleted the sockproc-lock-race branch June 18, 2017 15:06
@GUI GUI added this to the v0.11.0 milestone Jun 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant