Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

securityContext set on worker pod making it difficult to use on OpenShift #126

Open
callum-stakater opened this issue Nov 15, 2022 · 5 comments
Labels
bug Something isn't working

Comments

@callum-stakater
Copy link

callum-stakater commented Nov 15, 2022

Hi There!

Recently you merged a PR I opened to make the securityContext: configurable for the controller which greatly simplifies deploying the operator to OpenShift clusters, that worked well but issue I have now is the same securityContext is set by the controller on the workflow pod that it spawns to run the tf workflow

Is there anyway you can make that configurable or maybe inherit the securityContext from the controller?

The reason this is an issue on OpenShift is that OpenShift has this concept of SecurityContextConstraints and defining an arbitrary runAsUser is a privileged action, it is possible to get around this by creating a dedicated SCC for the service account the terraform-operator users for the workflow pod, granting it access to use "Any Run as user" but better practice is to not set that value which allows OpenShift to set its own randomly allocated UID from the range assigned to the particular node

2022-11-15T14:42:30.080Z	DEBUG	controller-runtime.manager.events	Warning	{"object": {"kind":"Terraform","namespace":"stakater-terraform","name":"simple-template-example","uid":"6561f384-b5ec-448e-a01a-10aeda23b2e9","apiVersion":"tf.isaaguilar.com/v1alpha2","resourceVersion":"1388221816"}, "reason": "PodCreateError", "message": "Could not create Pod pods \"simple-template-example-9oz3h3r3-v3-setup-\" is forbidden: unable to validate against any security context constraint: [provider \"anyuid\": Forbidden: not usable by user or serviceaccount, provider \"pipelines-scc\": Forbidden: not usable by user or serviceaccount, provider \"tekton-pipelines-scc\": Forbidden: not usable by user or serviceaccount, provider restricted: .spec.securityContext.fsGroup: Invalid value: []int64{2002022-11-15T14:42:30.081602675Z 0}: 2000 is not an allowed group, spec.containers[0].securityContext.runAsUser: Invalid value: 2000: must be in the ranges: [1001880000, 1001889999], provider \"nonroot\": Forbidden: not usable by user or serviceaccount, provider \"noobaa\": Forbidden: not usable by user or serviceaccount, provider \"noobaa-endpoint\": Forbidden: not usable by user or serviceaccount, provider \"scc-kubecost\": Forbidden: not usable by user or serviceaccount, provider \"sonardb-scc\": Forbidden: not usable by user or serviceaccount, provider \"hostmount-anyuid\": Forbidden: not usable by user or serviceaccount, provider \"log-collector-scc\": Forbidden: not usable by user or serviceaccount, provider \"apps-fluentd-scc\": Forbidden: not usable by user or serviceaccount, provider \"iam-scc\": Forbidden: not usable by user or serviceaccount, provider \"stakater-managed-openshift-apps-keycloak-scc\": Forbidden: not usable by user or serviceaccount, provider \"machine-api-termination-handler\": Forbidden: not usable by user or serviceaccount, provider \"hostnetwork\": Forbidden: not usable by user or serviceaccount, provider \"hostaccess\": Forbidden: not usable by user or serviceaccount, provider \"rook-ceph\": Forbidden: not usable by user or serviceaccount, provider \"node-exporter\": Forbidden: not usable by user or serviceaccount, provider \"privileged\": Forbidden: not usable by user or serviceaccount, provider \"rook-ceph-csi\": Forbidden: not usable by user or serviceaccount]"}
@callum-stakater
Copy link
Author

actually looking a bit closer at the error it is also flagged for the .spec.securityContext.fsGroup: 2000 but same concept described above applies here

@isaaguilar
Copy link
Collaborator

Hi @callum-stakater I'm sorry your having this issue. I'm not familiar with OpenShift or the concept of SecurityContextConstraints. The usage for the security context with user/group 2000/2000 is actually just to solve an issue with multiple pods that share a PersistentVolume.

The id 2000 is a completely arbitrary number selection. However, the workflow has to guarantee that all the task pods get the same exact user/group. Those ids are used to mount the volume to make it readable and writeable. And the mounted volume contains all the data for a workflow to execute.

Is it just fsGroup 2000 that can't be used? What if the id was definable in the tf resource spec which gets applied to all task pods?

@isaaguilar isaaguilar added the bug Something isn't working label Nov 16, 2022
@callum-stakater
Copy link
Author

callum-stakater commented Nov 16, 2022

Not a problem, is the constant battle in the world of OpenShift administration :)

The number 2000 isn't important here, the issue is as OpenShift does "Secure by default" it doesn't allow setting the runAsUser or fsGroup without adding the additional scc configurations because if you are able to set any number you can set root (0), if you can set root and your container gets compromised its a bad day, so the default is for OpenShift to assign random IDs from a high range that each node has a pool of, and forces the administrator to jump through hoops to set their own IDs under the assumption they then know what they are doing and the risks involved. (Basically to stop us running random stuff off the internet that runs as root :) )

In this case though as the fsGroup/runAsUser is defined in the operator code , for a reason (shared volumes) its perfectly fine and a good practice for the vanilla kubernetes users so we will extend the deployment/chart and add the scc's we need

Though it does make the last PR we cut on the charts repo a bit redundant and misleading for future openshifters that end up here

@dan1el-k
Copy link

dan1el-k commented Dec 13, 2022

We are also running the operator on OKD/Openshift.
We usually handle this by adding the "anyuid" scc to the existing ClusterRole (via a kustomize patch without need of changing the upstream manifests).

- target:
    group: rbac.authorization.k8s.io
    version: v1
    kind: ClusterRole
  patch: |-
    - op: add
      path: /rules/-
      value:
        apiGroups:
        - security.openshift.io
        resources:
        - securitycontextconstraints
        verbs:
        - use
        resourceNames:
        - anyuid

or alternatively by adding the required User/Group directly to the namespace. This ommits to permit any SCC or create a new one as it fixes the range for users/groups to be used in that namespace.
Especially if its only 1 namespace where you run the operator and the workflows itself, this is probably the easiest solution and also doesn't contradict the SCC concept.

kind: Namespace
apiVersion: v1
metadata:
  name: terraform-operator
  annotations:
    openshift.io/sa.scc.supplemental-groups: 2000/1
    openshift.io/sa.scc.uid-range: 2000/1

Perhaps this helps.

@callum-stakater
Copy link
Author

callum-stakater commented Dec 13, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants