-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
serf id should be the different when restart #1739
Comments
Same problem as #1623 |
This changes the seeding technique for LuaJIT's PRNG from using a combination of `time (s precision) + worker PID` to using OpenSSL's `RAND_bytes()`. Reasoning: in modern deployment setups such as application containers and AWS AMIs (etc...), it is a common practise to deploy from a forked VM, resulting in high chances of collision for PIDs at a seconds precision. This could result in duplicated PRNG seeds, which is ultimately the PRNG used to generate UUIDs in Kong, by the use of [lua-resty-jit-uuid](https://github.com/thibaultcha/lua-resty-jit-uuid). Solution: in order to have a higher entropy when seeding LuaJIT's PRNG, a proposed fix was to use `/dev/urandom`. This implementation however uses OpenSSL's `RAND_bytes()`, which has the advantage of returning an error if the entropy is estimated to be too low. However, this won't cover use cases where the VM has been forked, resulting in multiple VM clones with a high entropy, but equal to that of the other clones. We suggest that such deployment environment increase their cloned VMs entropy before starting Kong. Full changelog: * use OpenSSL's `RAND_bytes()` to read random bytes * truncate the final seed to 12 digits to prevent integer overflows * update fallback seeding technique (time + worker PID) to use ms precision, just in case * introduce a new `kong` lua shared dict. This dictionary's purpose is to hold essential data through Kong's lifecycle, and should eventually only be used through `safe_set()` (an abstraction for this could be envisaged later on, but is not the purpose of this patch) * chosen seeds for each worker are stored in the kong shm, and can be consulted via the `/` endpoint. There is currently no way to re-seed all the workers at once unless by sending `SIGHUP`, because only 1 worker would be receiving such a request through the Kong Admin API. * update `debug.traceback()` calls to use lvl 2 of the call stack, to show the actual caller of our patched `math.randomseed()` * update log messages to be more explicit Fix #1751 #1739 #1623
@WALL-E , I just pushed a hotfix to #1754 which should be applicable to Note: a behavior we saw in a production deployment is that forked VMs provide the same random bytes from Much appreciated! |
I test the patch, it's ok.
|
I review the patch code, I think It may not be related to the problem I found. @thibaultcha I hope you can give me some tips. |
This changes the seeding technique for LuaJIT's PRNG from using a combination of `time (s precision) + worker PID` to using OpenSSL's `RAND_bytes()`. Reasoning: in modern deployment setups such as application containers and AWS AMIs (etc...), it is a common practise to deploy from a forked VM, resulting in high chances of collision for PIDs at a seconds precision. This could result in duplicated PRNG seeds, which is ultimately the PRNG used to generate UUIDs in Kong, by the use of [lua-resty-jit-uuid](https://github.com/thibaultcha/lua-resty-jit-uuid). Solution: in order to have a higher entropy when seeding LuaJIT's PRNG, a proposed fix was to use `/dev/urandom`. This implementation however uses OpenSSL's `RAND_bytes()`, which has the advantage of returning an error if the entropy is estimated to be too low. However, this won't cover use cases where the VM has been forked, resulting in multiple VM clones with a high entropy, but equal to that of the other clones. We suggest that such deployment environment increase their cloned VMs entropy before starting Kong. Full changelog: * use OpenSSL's `RAND_bytes()` to read random bytes * truncate the final seed to 12 digits to prevent integer overflows * update fallback seeding technique (time + worker PID) to use ms precision, just in case * introduce a new `kong` lua shared dict. This dictionary's purpose is to hold essential data through Kong's lifecycle, and should eventually only be used through `safe_set()` (an abstraction for this could be envisaged later on, but is not the purpose of this patch) * chosen seeds for each worker are stored in the kong shm, and can be consulted via the `/` endpoint. There is currently no way to re-seed all the workers at once unless by sending `SIGHUP`, because only 1 worker would be receiving such a request through the Kong Admin API. * update `debug.traceback()` calls to use lvl 2 of the call stack, to show the actual caller of our patched `math.randomseed()` * update log messages to be more explicit Fix #1751 #1739 #1623
Your issue is that both of your serf ids are identical: vagrant-101_172.28.32.101:7946_465a78ad93cc432ea8369824d49506d6 Here, the bold part is supposed to distinguish several nodes that would share the same hostname. It is generated there there, using
The LuaJIT PRNG needs to be seeded, using a unique seed for each and every worker accross your cluster. If 2 seeds are shared by 2 workers, those will eventually generate identical pseudo-random numbers, and thus, identical v4 UUIDs. The patch helps increasing the entropy for the PRNG seed by relying on OpenSSL's Now that you have some more context around this issue, if you could describe what behavior you are seeing, that would help, because as of now I do not understand what issues you are facing, or if you are facing any at all. |
If you could also include details on your deployment setup, that would help too. |
deployment detail
Reproduce step
issue: random string always is 465a78ad93cc432ea8369824d49506d6, even if I reboot many times and every time I delete the file in addition, centos+vagrant+virtualbox, the same issue will be reproduced, I see 465a78ad93cc432ea8369824d49506d6 too |
I got it, the following is the key code in hotfix #1754, about this issue @kong/cmd/init.lua
kong startup script run in nginx timer phase, but at this time, the function randomseed is not run, so the seed is default value, and random string always is same value, even if I reboot many times I guess this bug was introduced in version 0.9.0, the feature New CLI. |
Why is that? I just added a debug log for the CLI's verbose mode which, if you re-apply the patch, will print the following:
The only thing you need to ensure now is that each and every node in your cluster has a unique seed. |
May be I did not express clearly. the patch is ok, I have tested it. I'm just describing the version 0.9.3, the reason for serf-id no change when restart |
Oh ok! Sorry I understood the opposite: that the patch was not working for you. Ok, I am glad this is confirmed to be working for you too then! Actually the call to Very well, this will land in 0.9.4 next week, thanks agin for trying the patch out. |
@thibaultcha Can consider deleting file(/usr/local/kong/serf/serf.id) when In the following cases, the cluster can start normally, even if vm or image included
|
I'm not sure how that would play with the Serf-side of things. It could have consequences on the clustering modes and Serf table and all of that... @thefosk? |
I opened #1778 as an attempt to fix this. |
Summary
Follow code logic, serf id should be the different when restart, but my test results do not meet expectations
Steps To Reproduce
kong start
cat /usr/local/kong/serf/serf.id
vagrant-101_172.28.32.101:7946_465a78ad93cc432ea8369824d49506d6
kong stop
rm -f /usr/local/kong/serf/serf.id
kong start
cat /usr/local/kong/serf/serf.id
vagrant-101_172.28.32.101:7946_465a78ad93cc432ea8369824d49506d6
Additional Details & Logs
The text was updated successfully, but these errors were encountered: