-
Notifications
You must be signed in to change notification settings - Fork 425
Set RUNNING_UNDER_SYSTEMD=true so beam.smp is PID 1
#778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Set RUNNING_UNDER_SYSTEMD=true so beam.smp is PID 1
#778
Conversation
michaelklishin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RabbitMQ server script PID being 1 can lead to zombie processes, and RabbitMQ certainly can and should run under a regular PID.
|
Thanks @michaelklishin. I marked this as a draft so I'd have some time to dig into that UPDATE: this commit removed code that used to detect if |
77e86f8 to
9143b5d
Compare
detached=true so beam.smp is PID 1RUNNING_UNDER_SYSTEMD=true so beam.smp is PID 1
|
Actually, I think exporting
See this related PR - rabbitmq/rabbitmq-server#15036 |
|
The |
This should not be a concern in containers, thanks to PID namespaces. Quoting from Linux manpage:
The change proposed by Luke aims to reduce an edge case in Kubernetes, where the Erlang VM has already exited, but the shell process hasn't, and there's a delay in the shell process exiting. This causes the kubelet to return the container as "still runing", which then translates to a Pod "stuck" in terminating state. Correct me if I'm wrong, but I believe that Erlang VM reaps its own zombies, at least to some extent. |
|
Erlang/OTP largely avoids zombie child processes thanks to a dedicated process used to start such children: In addition, the BEAM VM is PID 1 aware to at least some extent and sets |
|
@lukebakken I asked around, and FWIW some core team members at VMware find this to be a reasonable change for an OCI. Most do not have an opinion ;) |
9143b5d to
d13ebdc
Compare
The current Dockerfile / docker-entrypoint.sh results in the following PIDs within a running RabbitMQ container: ``` PID CMD 1 /bin/sh /opt/rabbitmq/sbin/rabbitmq-server 20 /opt/erlang/lib/erlang/erts-15.2.7.4/bin/beam.smp -W w ... 26 erl_child_setup 1024 65 /opt/erlang/lib/erlang/erts-15.2.7.4/bin/inet_gethost 4 66 /opt/erlang/lib/erlang/erts-15.2.7.4/bin/inet_gethost 4 76 /opt/erlang/lib/erlang/erts-15.2.7.4/bin/epmd -daemon 121 /bin/sh -s rabbit_disk_monitor ``` Note that the `rabbitmq-server` script remains running and is PID 1. There was a [long discussion](rabbitmq/cluster-operator#2012) about what results this particular setup could have in heavily-loaded k8s environments. This prompted me to look at the `rabbitmq-server` script and found that the behavior can be controlled via several env variables: https://github.com/rabbitmq/rabbitmq-server/blob/main/deps/rabbit/scripts/rabbitmq-server#L97 Most notably, if `RUNNING_UNDER_SYSTEMD` is set to a value, the script will `exec` the Erlang VM. This PR sets that value, which results in the following PIDs in a container: ``` PID CMD 1 /opt/erlang/lib/erlang/erts-15.2.7.4/bin/beam.smp -W w ... 25 erl_child_setup 1024 64 /opt/erlang/lib/erlang/erts-15.2.7.4/bin/inet_gethost 4 65 /opt/erlang/lib/erlang/erts-15.2.7.4/bin/inet_gethost 4 75 /opt/erlang/lib/erlang/erts-15.2.7.4/bin/epmd -daemon 120 /bin/sh -s rabbit_disk_monitor ``` The Erlang VM already gracefully stops RabbitMQ on `SIGTERM`, so there is no change in behavior.
d13ebdc to
72351b5
Compare
The current Dockerfile / docker-entrypoint.sh results in the following PIDs within a running RabbitMQ container:
Note that the
rabbitmq-serverscript remains running and is PID 1.There was a long discussion about what results this particular setup could have in heavily-loaded k8s environments. This prompted me to look at the
rabbitmq-serverscript and found that the behavior can be controlled via several env variables:https://github.com/rabbitmq/rabbitmq-server/blob/main/deps/rabbit/scripts/rabbitmq-server#L97
Most notably, if
RUNNING_UNDER_SYSTEMDis set to a value, the script willexecthe Erlang VM. This PR sets that value, which results in the following PIDs in a container:The Erlang VM already gracefully stops RabbitMQ on
SIGTERM, so there is no change in behavior.