-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to get length of local 'certs' (a nil value) #48
Comments
So in tracing through the code, I think this should have only been possible if OpenResty's built in I've fixed lua-resty-auto-ssl so that it will log and return a proper error when those builtin functions fail (035dcba), and that update is published in the v0.10.4 release. I've also included some fixes in v0.10.4 which should hopefully handle unexpected errors like this more gracefully (so you might not have to fully restart things if an unexpected error like this crops up). That being said, I'm still not exactly sure why So could you try updating to v0.10.4 and then we can see what the logs say when those built-in functions fail? Sorry for the trouble and hassle, but hopefully with v0.10.4's logging we can get to the bottom of this. If you get better error logs from v0.10.4, could you also provide which version of OpenResty and OpenSSL you're running, and whether this seems to be happening for existing certs, or just when certs are renewed? Thanks! |
I think i'm experiencing a similar error to this. Here are my logs:
About 2 minutes after this happens, the docker container becomes totally unresponsive and only a restart fixes it. I'm on v0.10.3-1 of this plugin. |
Hey, I've upgraded to the latest version and now i get the following error in my logs:
This seems to be around time when new certificates are generated. Edit this appears to be coming from this file: https://github.com/GUI/lua-resty-auto-ssl/blob/f7fe3e62c081d77c7d021fcc10371ebb26c97bc0/lib/resty/auto-ssl/ssl_providers/lets_encrypt.lua#L15 maybe Edit 2 I'm also not overriding the |
If the hook secret was nil, then it was leading to errors, but the backtraces were a bit odd, since they weren't reporting the right line number: #48 Still need to figure out a better way to ensure hook_secret doesn't disappear if the shared dict gets full, but this at least improves the errors to make things easier to understand.
If the shared dict ran out of memory, then the hook secret could end up getting removed, leading to errors like this: #48 (comment) By storing the hook secret as a global variable, it should not longer be subject to getting removed from the shared dict.
This should more generally help with potential issues seen in #48 where old items might get removed from the shdict if there's too many certificates for the configured amount of memory. By logging this as an error, it will hopefully make debugging and alerting to the real issue easier (need to increase shdict memory). Since we don't store duplicate certs for domains, and the certificate data is stored with a TTL, I think this should be okay to log at the error level, since if this happens, thing's aren't going well, since there's not enough memory for active domains.
@domharrington: Thanks for reporting this and for the details! I think I was able to track down what was happening. The line number on the error backtrace was a bit off for some reason, but the culprit was So if this theory is correct, you might need to increase the configured size of In any case, I've released v0.10.6, which I'm hoping should fix this in a couple different ways:
I'm hoping those updates in v0.10.6 should make things less fatal and easier to track down if your configured amount of memory is too low. This might also explain some other unresponsive issues a couple other people reported, but we were never able to get to the bottom of, so thanks again for the report. Caveat: I'm not entirely certain this error is related to the original issue @gjongenelen reported. |
Hey @GUI thank you very much for investigating this and coming up with a potential solution. We do currently have Thank you very much again for taking the time to look into this. I will upgrade to 0.10.6 now. Dom |
@domharrington did you experience any more issues after bumping the limit and upgrading to the latest version? |
@luto nope. It's been very stable since deploying the latest update. |
Great, thank you for confirming! Going to go ahead and close this then. |
If the `auto_ssl` shared dict ran out of memory, and then nginx were reloaded, then a race condition existed where multiple "sockproc" processes could try to be started at the same time. While I think this situation would be unlikely to affect a production system in a negative way (since the race condition only occurs during nginx reloads and 1 of the sockproc processes should still succeed and allow things to work), it did lead to some errors being logged, which would intermittently cause our tests to fail (it would crop up on the test following our t/memory.t test, since that next test would be the first reload following our test to explicitly exhaust the memory). This is fixed by using the `auto_ssl_settings` shared dict for storing the resty-lock details (the lock prevents multiple processes from being started at once). This smaller shared dict (introduced in #68) is used for storing bits of data that won't grow in size so we can better ensure the data will never be evicted from the cache. I'm now able to repeatedly run the test suite in a loop without hitting this edge case. Note that we are still using the `auto_ssl` shared dict for storing resty-lock details for domain registrations, since the memory requirements for that may grow (since there's a lock per domain, it's dynamic in size). But that should be okay, because similar to the SSL certs stored in `auto_ssl`, we're okay with cache evictions for old data in those cases (along with warnings being logged). Also possibly relevant is that currently resty-lock always uses `add` for the shared dict (so it will evict old data to make room if necessary), but there's a pull request for allowing use of `safe_add`: openresty/lua-resty-lock#6 While we should be okay by switching things to `auto_ssl_settings` (since we should never have enough stored data in there to need evicting old data), it's something to keep an eye on. Related to: - #68 (comment) - #48 (comment)
Hi,
I'm using auto-ssl for a while now, but I'm facing some issues regularly.
Once in a while this error comes up. Which results in a handshake error. Restarting the nginx-instances solved the problem. I'm using Redis as storage.
2017/02/23 07:45:01 [error] 25#0: *3177 lua entry thread aborted: runtime error: /usr/local/openresty/lualib/ngx/ocsp.lua:54: attempt to get length of local 'certs' (a nil value) stack traceback: coroutine 0: /usr/local/openresty/lualib/ngx/ocsp.lua: in function 'get_ocsp_responder_from_der_chain' .../luajit/share/lua/5.1/resty/auto-ssl/ssl_certificate.lua:123: in function 'get_ocsp_response' .../luajit/share/lua/5.1/resty/auto-ssl/ssl_certificate.lua:180: in function 'set_ocsp_stapling' .../luajit/share/lua/5.1/resty/auto-ssl/ssl_certificate.lua:210: in function 'set_cert' .../luajit/share/lua/5.1/resty/auto-ssl/ssl_certificate.lua:252: in function 'ssl_certificate' .../local/openresty/luajit/share/lua/5.1/resty/auto-ssl.lua:70: in function 'ssl_certificate' ssl_certificate_by_lua:2: in function <ssl_certificate_by_lua:1>, context: ssl_certificate_by_lua*, client: 10.42.205.225, server: 0.0.0.0:443 2017/02/23 07:45:01 [crit] 25#0: *3176 SSL_do_handshake() failed (SSL: error:1408A179:SSL routines:ssl3_get_client_hello:cert cb error) while SSL handshaking, client: 10.42.205.225, server: 0.0.0.0:443
What could be going wrong here?
Gino
The text was updated successfully, but these errors were encountered: