Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install wget to avoid the busybox wget leaking defunct processes #159

Closed
wants to merge 1 commit into from

Conversation

aleskandro
Copy link

When running busybox wget v1.31.x to request an SSL page, there is a leak of defunct processes that eventually could hit the limit for fork syscalls, leading to the unavailability of main registry process.

Installing wget from the alpine repository resolves the issue as this only affects the busybox distribution. Moreover, this should transparently resolve the issue for deployments
using wget as healthcheck/readiness/liveness probe.

xref: https://bugs.busybox.net/show_bug.cgi?id=15967

Closes #158

@milosgajdos
Copy link
Member

Please sign your commit.

When running busybox wget v1.31.x to request an SSL page, there is a
leak of defunct processes that eventually could hit the limit for fork
syscalls, leading to the unavailability of main registry process.

Installing wget from the alpine repository resolves the issue as this
only affects the busybox distribution. Moreover, this should
transparently resolve the issue for deployments
using wget as healthcheck/readiness/liveness probe.

xref: https://bugs.busybox.net/show_bug.cgi?id=15967

Closes distribution#158

Signed-off-by: aleskandro <aleskandro@redhat.com>
@aleskandro
Copy link
Author

Done!

Copy link
Member

@milosgajdos milosgajdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is busybox binary packed into alpine image kinda breaks my solar system brain -- I guess the galaxy brains that made this decision see something I can't.

Thanks for the fix. PTAL @thaJeztah

@thaJeztah
Copy link
Member

This wget binary is only used during build of the Dockerfile; as far as I can see, it's not used in the final image, unless someone decides to use it (I see some issues linked that use it in a healthcheck).

All of that is custom, so I don't think we must install wget by default; it's not part of the main requirements of the image to run, and if needed should be installed by the user (extending the image)

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(see comment above)

@aleskandro
Copy link
Author

Hi @theJeztah

unless someone decides to use it (I see some issues linked that use it in a healthcheck).

All of that is custom, so I don't think we must install wget by default; it's not part of the main requirements of the image to run, and if needed should be installed by the user (extending the image)

Hi @thaJeztah, thanks for the quick reply. I get it, and the best would be to have the wget busybox applet fixed. However, as of today, there's no way for this image to execute a healthcheck of the registry as no tool is available to run HTTP(s) requests, except the bugged wget.

Is there another way users can exploit to execute an in-container healthcheck without extending the image?

The following are leaked zombie processes left by the wget currently available in the image:

└ $ docker exec -it myreg ps aux                                       
PID   USER     TIME  COMMAND
    1 root      0:00 registry serve /etc/docker/registry/config.yml
   40 root      0:00 [ssl_client]
   41 root      0:00 [ssl_client]
   74 root      0:00 [ssl_client]
   75 root      0:00 [ssl_client]
   88 root      0:00 [ssl_client]
   89 root      0:00 [ssl_client]
  102 root      0:00 [ssl_client]
  103 root      0:00 [ssl_client]
  104 root      0:00 ps aux

@milosgajdos
Copy link
Member

Is there any reason why you do in-container health checks and not leveraging some "higher leve/external/orchestration" means? I suspect the lure is the HEALTH Dockerfile instruction https://docs.docker.com/reference/dockerfile/#healthcheck?

@schildbach
Copy link

Is there any reason why you do in-container health checks and not leveraging some "higher leve/external/orchestration" means? I suspect the lure is the HEALTH Dockerfile instruction https://docs.docker.com/reference/dockerfile/#healthcheck?

Docker/Compose health checks are similar to docker exec; the referenced executable needs to be available inside the container.

@aleskandro
Copy link
Author

aleskandro commented Mar 4, 2024

Is there any reason why you do in-container health checks and not leveraging some "higher leve/external/orchestration" means? I suspect the lure is the HEALTH Dockerfile instruction https://docs.docker.com/reference/dockerfile/#healthcheck?

Docker/Compose health checks are similar to docker exec; the referenced executable needs to be available inside the container.

Podman healthchecks are similar, which is what we use in https://github.com/openshift-qe/baremetal-qe-infra/pull/60/files#diff-546a6f0a9c42e667f953f658b9b1de750fe5ae500cf753a04c4871e24f45d08cR21-R27

In k8s, for sure, one can use an httpGet liveness/readiness probe and there's no need for executing a command within the container.

@milosgajdos
Copy link
Member

@schildbach oh, I'm not saying there is anything wrong with the request - in fact I've approved the PR - I'm just curious.

@thaJeztah the current wget binary is clearly DOS-ing the service when used in healthchecks. This needs fixing.

@thaJeztah
Copy link
Member

Yeah, the busybox case should be fixed; it looks like it's not properly reaping (when running as PID1 ?). My issue is that we should try to keep the image minimal, and only install things that the image itself uses by default. While users may decide to set a healthcheck, such a healthcheck may use "any" tooling, so could be curl or nc, or even a tool not using networking at all to decide if the container is healthy for their use-case.

Most http-healthchecks I run into tend to only check on the loopback interface (as it's only an indicator if the software is ready to accept connections); TLS termination may even be outside of the container itself, and checking the container's external IP may not even be a good indicator to check from within the container itself, but of course there's many setups / scenarios possible.

A quick fix is to start the container with --init, which makes docker insert tini (https://github.com/krallin/tini) as PID1 to handle reaping;

docker run --init --name mycontainer -d --rm alpine:3.19 /bin/sleep inf
docker exec mycontainer wget --spider https://google.com
docker exec mycontainer wget --spider https://google.com
docker exec mycontainer wget --spider https://google.com

docker exec -it mycontainer ps aux
PID   USER     TIME  COMMAND
1 root      0:00 /sbin/docker-init -- /bin/sleep inf
7 root      0:00 /bin/sleep inf
40 root      0:00 ps aux

But for more specific cases, the recommendation would be to configure the image with the right options, which can be a minimal Dockerfile that installs the tools needed for the user's needs, and to configure the healthcheck, e.g.;

# syntax=docker/dockerfile:1

FROM registry:2
RUN apk add --no-cache wget
HEALTHCHECK CMD ["wget", "--spider", "http://localhost:5000"]

@thaJeztah
Copy link
Member

xref: https://bugs.busybox.net/show_bug.cgi?id=15967

Before I forget to mention; I think Alpine ships with its own fork of busybox (I recall that they carry various patches which may not (yet) have been accepted in upstream: https://git.alpinelinux.org/aports/tree/main/busybox?h=master); so it may also be worth opening a ticket in Alpine's issue tracker for this.

Why is busybox binary packed into alpine image kinda breaks my solar system brain -- I guess the galaxy brains that made this decision see something I can't.

I think the busybox binary in alpine is purely as a most-minimal "bootstrap" for the image; provide a minimal userland (a shell, and some common utilities provided by busybox), but adding Alpines package-manager to install the actual tools you may need after that.

@aleskandro
Copy link
Author

Thanks @thaJeztah, in the specific case of the registry, it is the registry itself to be the PID1. Should we consider a fix in the registry also to handle the related signals?

@thaJeztah
Copy link
Member

Should we consider a fix in the registry also to handle the related signals?

Heh, just had a lengthy chat in our internal Slack discussing this case because it's really a bit of a grey area, and to some amount a matter of "semantics";

  • --init should normally be considered an "escape hatch" for a PID1 in a container misbehaving (if process is running as PID1, it should properly handle cleanup)
  • The grey area here is that registry (as far as I can see) is not "misbehaving"; it does handle what it's expected to do; run a registry, and handle that. It's not expected to do more than that, or not expected to handle processes it didn't create.
  • The situation happening here is that something went behind its back and shoved some processes underneath it, and those processes (seemingly) expecting something else to handle reaping.

From your earlier descriptions of busybox, I still feel it's a bug in busybox (which, after all, spins up the processes); pending that to be fixed, --init can be a proper workaround (or installing a different tool for the job, such as the apk add wget). From the perspective of the registry's core responsibility, adding that functionality to the registry code would be an "enhancement", not so much a bugfix, and (TBD) could be considered out of scope of its feature-set.

If the busybox maintainer consider it to not be a bug (perhaps it was not designed to be invoked this way?), and IF using the default wget is expected to be the default way to perform health-checks in this image, then an alternative could be to include tini in the container as PID1 to make sure the image has a reaper by default.

@aleskandro
Copy link
Author

aleskandro commented Mar 5, 2024

From your earlier descriptions of busybox, I still feel it's a bug in busybox

Agreed 100%. All the discussion here is to have a solution until the bug is fixed upstream and downstream by the distros (I'd say it's in the upstream busybox, if not already fixed in the next versions, as the same issue happens on the ubuntu's build).

If the busybox maintainer consider it to not be a bug (perhaps it was not designed to be invoked this way?), and IF using the default wget is expected to be the default way to perform health-checks in this image, then an alternative could be to include tini in the container as PID1 to make sure the image has a reaper by default.

yeah, the point for the registry is that it's actually used as PID1 in this image's case. Although I agree, it would still be like a 'safety' mechanism when PID1 inherits a defunct process, as in this case.

Should I close this PR or would we change it to something that is acceptable? I think some of the thoughts in this discussion should go in the issue at least, to inform other potential users making use of wget for healthchecking an https registry.

@milosgajdos
Copy link
Member

Should I close this PR or would we change it to something that is acceptable? I think some of the thoughts in this discussion should go in the issue at least, to inform other potential users making use of wget for healthchecking an https registry.

I think we should close it as I dont think we'll reach a consensus that will lead to merging this in.

Moving some of the discussed points to the original issue sounds like a good idea but I think it might not be needed as the links to this PR are displayed in the GH UI so I'd just leave it as is.

@aleskandro aleskandro closed this Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The wget binary leaks defunct ssl_client processes
4 participants