Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue causing clustermesh-apiserver/kvstoremesh to not start when run with a non-root user #31539

Merged
merged 2 commits into from
Apr 10, 2024

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Mar 21, 2024

gops needs to write data (e.g., the PID file) to the file-system, which turned out to be tricky when using scratch as base image, in case the container is then run using a non-root UID.

Let's use the most basic version of a distroless image instead, which contains:

  • ca-certificates
  • A /etc/passwd entry for a root, nonroot and nobody users
  • A /tmp directory
  • tzdata

This aligns the clustermesh-apiserver image with the Hubble Relay one, and removes the need for manually importing the CA certificates. The GOPS_CONFIG_DIR is explicitly configured to use a temporary directory, to prevent permission issues depending on the UID configured to run the entrypoint.

We explicitly configure the fsGroup as part of the podSecurityContext, to ensure that mounted files are accessible by the non-root user as well.

Finally, configure the specified clustermesh-apiserver etcd container security context for the etcd-init container as well, to make sure that they always match, and prevent issues caused by the init container creating files that cannot be read/written by the main instance later on.

Should fix #31524

Fix issue causing clustermesh-apiserver/kvstoremesh to not start when run with a non-root user

@giorio94 giorio94 added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. area/clustermesh Relates to multi-cluster routing functionality in Cilium. area/build Anything to do with the build, more general then area/CI needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Mar 21, 2024
@giorio94 giorio94 requested a review from a team as a code owner March 21, 2024 11:12
@giorio94 giorio94 requested a review from borkmann March 21, 2024 11:12
@giorio94
Copy link
Member Author

/test

@giorio94 giorio94 mentioned this pull request Mar 21, 2024
1 task
@rolinh
Copy link
Member

rolinh commented Mar 21, 2024

What about switching the base image from scratch to distroless nonroot? That's what we've done for Hubble Relay and it also solves this GOPS problem without resorting to chmoding.

@giorio94
Copy link
Member Author

What about switching the base image from scratch to distroless nonroot? That's what we've done for Hubble Relay and it also solves this GOPS problem without resorting to chmoding.

Sounds reasonable to me, and it would also be consistent with Hubble Relay. The only caveat is that it wouldn't work if a user (or mutating webhook) configured a user ID different from 65532 (unless setting GOPS_CONFIG_DIR=/tmp). Did that ever cause issues for Hubble Relay? I seem to remember that at least Openshift has some automatic logic to enforce certain user IDs, but I don't know the exact details.

@giorio94
Copy link
Member Author

/cc @rolinh ^^

@rolinh
Copy link
Member

rolinh commented Mar 28, 2024

From what I remember, it's not causing issues. Well, if it is, we'll have to fix Hubble Relay as well anyways 🙂

I seem to remember that at least Openshift has some automatic logic to enforce certain user IDs

This is correct. I believe it can be overwritten via the securityContext however.

@giorio94
Copy link
Member Author

From what I remember, it's not causing issues. Well, if it is, we'll have to fix Hubble Relay as well anyways 🙂

Fair enough. I'll update this PR to mimic the Hubble Relay approach then, to ensure consistency from that point of view.

@giorio94 giorio94 marked this pull request as draft March 28, 2024 17:23
@giorio94 giorio94 force-pushed the mio/clustermesh-non-root-user branch 2 times, most recently from fb51c4f to fa426d1 Compare March 29, 2024 08:40
@giorio94 giorio94 changed the title images(clustermesh): make GOPS_CONFIG_DIR writable to all users Fix issue causing clustermesh-apiserver/kvstoremesh to not start when run with a non-root user Mar 29, 2024
@giorio94
Copy link
Member Author

Updated, back to reviewable state.

Concerning the backport to v1.15, I'm planning on preparing a slightly reduced version of the first commit using gcr.io/distroless/static-debian11:latest@sha256:046b92c933032a8ca99a66f4c79a68ac029d9a4ababd1a806a82140b3b899fd3 as base image instead (i.e., the root version) to avoid requiring the fsGroup helm change, which looks more risky for a backport.

@giorio94 giorio94 marked this pull request as ready for review March 29, 2024 08:44
@giorio94 giorio94 requested review from a team as code owners March 29, 2024 08:44
@giorio94 giorio94 added the backport/author The backport will be carried out by the author of the PR. label Mar 29, 2024
@giorio94
Copy link
Member Author

/test

gops needs to write data (e.g., the PID file) to the file-system, which
turned out to be tricky when using scratch as base image, in case the
container is then run using a non-root UID.

Let's use the most basic version of a distroless image instead, which
contains:

- ca-certificates
- A /etc/passwd entry for a root, nonroot and nobody users
- A /tmp directory
- tzdata

This aligns the clustermesh-apiserver image with the Hubble Relay one,
and removes the need for manually importing the CA certificates. The
GOPS_CONFIG_DIR is explicitly configured to use a temporary directory,
to prevent permission issues depending on the UID configured to run
the entrypoint.

Finally, we explicitly configure the fsGroup as part of the
podSecurityContext, to ensure that mounted files are accessible
by the non-root user as well.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Configure the specified clustermesh-apiserver etcd container security
context for the etcd-init container as well, to make sure that they
always match, and prevent issues caused by the init container creating
files that cannot be read/written by the main instance later on.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the mio/clustermesh-non-root-user branch from fa426d1 to 0a82f93 Compare April 4, 2024 10:21
@giorio94
Copy link
Member Author

giorio94 commented Apr 4, 2024

Rebased to fix conflict.

@giorio94
Copy link
Member Author

giorio94 commented Apr 4, 2024

/test

@giorio94
Copy link
Member Author

giorio94 commented Apr 5, 2024

@squeed Gentle ping 🙏

@giorio94 giorio94 removed the request for review from borkmann April 5, 2024 08:00
Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅

@sayboras sayboras added this pull request to the merge queue Apr 10, 2024
Merged via the queue into cilium:main with commit 8521880 Apr 10, 2024
62 checks passed
@giorio94 giorio94 mentioned this pull request Apr 10, 2024
1 task
@giorio94 giorio94 added backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. and removed needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Apr 10, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Backport pending to v1.15 in 1.15.4 Apr 10, 2024
@giorio94 giorio94 removed the backport/author The backport will be carried out by the author of the PR. label Apr 10, 2024
@asauber asauber added this to Backport pending to v1.15 in 1.15.5 Apr 11, 2024
@asauber asauber removed this from Backport pending to v1.15 in 1.15.4 Apr 11, 2024
@github-actions github-actions bot added backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. and removed backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. labels Apr 24, 2024
@nebril nebril moved this from Backport pending to v1.15 to Backport done to v1.15 in 1.15.5 May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build Anything to do with the build, more general then area/CI area/clustermesh Relates to multi-cluster routing functionality in Cilium. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.15.5
Backport done to v1.15
Development

Successfully merging this pull request may close these issues.

kvstoremesh CrashLoopBackOff
5 participants