Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing, possibly erroneous RBAC warnings ("Failed to get pod") #2228

Closed
kevincantu opened this issue Feb 13, 2020 · 6 comments
Closed

Confusing, possibly erroneous RBAC warnings ("Failed to get pod") #2228

kevincantu opened this issue Feb 13, 2020 · 6 comments
Labels

Comments

@kevincantu
Copy link

I'm setting up a new cluster with Argo v2.4.3 (to run some stuff we've been happily doing on v2.2.0) on EKS v1.14.9-eks-c0eccc and v1.14.8-eks-b8860f AMIs, and while trying to drill down into some problems with getting workflow pods to work I've fallen down an RBAC rabbit hole.

What I saw

For some steps of a workflow, the main container of the pod shows that step doing its job OK and logging sensibly about it, but then the wait container contains something like the following:

$ kubectl logs build-distro-dfrs4-2015148610 wait -n argo
time="2020-02-13T03:03:38Z" level=info msg="Creating a docker executor"
time="2020-02-13T03:03:38Z" level=info msg="Executor (version: v2.4.3, build_date: 2019-12-06T03:35:39Z) initialized (pod: argo/build-distro-dfrs4-2015148610) with template:\n{\"name\":\"set-status\",\"arguments\":{},\"inputs\":{\"parameters\":[{\"name\":\"status\",\"value\":\"pending\"},{\"name\":\"description\",\"value\":\"The distro build is in progress\"}]},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"...\",\"args\":[...],\"resources\":{\"limits\":{\"cpu\":\"100m\",\"memory\":\"128Mi\"},\"requests\":{\"cpu\":\"50m\",\"memory\":\"64Mi\"}},\"imagePullPolicy\":\"Always\"},\"activeDeadlineSeconds\":480,\"retryStrategy\":{\"limit\":3}}"
time="2020-02-13T03:03:38Z" level=info msg="Waiting on main container"
time="2020-02-13T03:03:38Z" level=error msg="executor error: Failed to establish pod watch: unknown (get pods)\ngithub.com/argoproj/argo/errors.Wrap\n\t/go/src/github.com/argoproj/argo/errors/errors.go:88\ngithub.com/argoproj/argo/errors.InternalWrapErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:78\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).waitMainContainerStart\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:916\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:880\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:40\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2020-02-13T03:03:38Z" level=info msg="No output parameters"
time="2020-02-13T03:03:38Z" level=info msg="No output artifacts"
time="2020-02-13T03:03:38Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2020-02-13T03:03:38Z" level=info msg="Killing sidecars"
time="2020-02-13T03:03:38Z" level=warning msg="Failed to get pod 'build-distro-dfrs4-2015148610': pods \"build-distro-dfrs4-2015148610\" is forbidden: User \"system:serviceaccount:argo:default\" cannot get resource \"pods\" in API group \"\" in the namespace \"argo\""
time="2020-02-13T03:03:38Z" level=error msg="Failed to kill sidecars: pods \"build-distro-dfrs4-2015148610\" is forbidden: User \"system:serviceaccount:argo:default\" cannot get resource \"pods\" in API group \"\" in the namespace \"argo\""
time="2020-02-13T03:03:38Z" level=info msg="Alloc=4934 TotalAlloc=10056 Sys=70334 NumGC=3 Goroutines=6"
time="2020-02-13T03:03:38Z" level=fatal msg="Failed to establish pod watch: unknown (get pods)\ngithub.com/argoproj/argo/errors.Wrap\n\t/go/src/github.com/argoproj/argo/errors/errors.go:88\ngithub.com/argoproj/argo/errors.InternalWrapErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:78\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).waitMainContainerStart\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:916\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:880\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:40\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"

The UI ends up displaying this MESSAGE:

failed to save outputs: Failed to establish pod watch: unknown (get pods)

And the most actionable warning seemed to be this:

time="2020-02-13T03:03:38Z" level=warning msg="Failed to get pod 'build-distro-dfrs4-2015148610': pods \"build-distro-dfrs4-2015148610\" is forbidden: User \"system:serviceaccount:argo:default\" cannot get resource \"pods\" in API group \"\" in the namespace \"argo\""

I believe the code running that is here:

The fix I tried

Anyways, I attempted to fix that by adding permissions to the argo:default service account, so I began with these instructions, and created the following role and role binding:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: argo-workflow
rules:
# pod get/watch is used to identify the container IDs of the current pod
# pod patch is used to annotate the step's outputs back to controller (e.g. artifact location)
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
  - watch
  - patch
  - list
  - create
# logs get/watch are used to get the pods logs for script outputs, and for log archival
- apiGroups:
  - ""
  resources:
  - pods/log
  verbs:
  - get
  - watch
  - list
  - create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: argo-default-workflow
  namespace: argo
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: argo-workflow
subjects:
- kind: ServiceAccount
  name: argo
  namespace: default

Results

I expected adding this role (to the argo namespace) and this binding would fix that warning, at least!

Actually, however, with that and more permissive attempts I'm still seeing that warning every time.

Questions

What kind of service account, roles, and binding should I define to give my workflows permission to run?

Alternatively (in case I've got that right 😅 ), is this "forbidden" warning an erroneous red herring occurring because of something like a non-existent pod?

Thanks for taking a look!!

@kevincantu
Copy link
Author

If this shows what I think it does, the role and binding above don't look right, yet:

$ kubectl auth -n argo can-i --as system:serviceaccount:argo:default --list
Resources                                       Non-Resource URLs   Resource Names     Verbs
selfsubjectaccessreviews.authorization.k8s.io   []                  []                 [create]
selfsubjectrulesreviews.authorization.k8s.io    []                  []                 [create]
                                                [/api/*]            []                 [get]
                                                [/api]              []                 [get]
                                                [/apis/*]           []                 [get]
                                                [/apis]             []                 [get]
                                                [/healthz]          []                 [get]
                                                [/healthz]          []                 [get]
                                                [/openapi/*]        []                 [get]
                                                [/openapi]          []                 [get]
                                                [/version/]         []                 [get]
                                                [/version/]         []                 [get]
                                                [/version]          []                 [get]
                                                [/version]          []                 [get]
podsecuritypolicies.policy                      []                  [eks.privileged]   [use]

@maryala9
Copy link

maryala9 commented Feb 19, 2020

@kevincantu Any update on this? even i am getting the same error level=error msg="Failed to kill sidecars: pods \"image-validation-8csfn-1112697458\" is forbidden: User \"system:serviceaccount:default:default\" cannot get resource \"pods\" in API group \"\" in the namespace \"default\"".
Please let me know how to fix this..!!

@kevincantu
Copy link
Author

@maryala9, if you look closer at my RoleBinding there, the name and namespace of that service account were flipped!

Instead of:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: argo-default-workflow
...
subjects:
- kind: ServiceAccount
  name: argo
  namespace: default

That last bit should be:

- kind: ServiceAccount
  name: default
  namespace: argo

When I fixed that it sets the permissions I wanted:

% kubectl auth reconcile --remove-extra-permissions -f my-argo-rbac.yaml
...
% kubectl auth -n argo can-i --as system:serviceaccount:argo:default --list                                                                                               err:1

Resources                                       Non-Resource URLs   Resource Names     Verbs
selfsubjectaccessreviews.authorization.k8s.io   []                  []                 [create]
selfsubjectrulesreviews.authorization.k8s.io    []                  []                 [create]
pods                                            []                  []                 [get watch patch]
pods/log                                        []                  []                 [get watch]
                                                [/api/*]            []                 [get]
                                                [/api]              []                 [get]
                                                [/apis/*]           []                 [get]
                                                [/apis]             []                 [get]
                                                [/healthz]          []                 [get]
                                                [/healthz]          []                 [get]
                                                [/openapi/*]        []                 [get]
                                                [/openapi]          []                 [get]
                                                [/version/]         []                 [get]
                                                [/version/]         []                 [get]
                                                [/version]          []                 [get]
                                                [/version]          []                 [get]
podsecuritypolicies.policy                      []                  [eks.privileged]   [use]

And then workflows have started work just fine!

@kevincantu
Copy link
Author

So, in full:

# give our webhook ap (as default:default) permissions to create workflows
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: argo-invocation
  namespace: argo
rules:
- apiGroups:
  - "argoproj.io"
  resources:
  - "workflows"
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - patch
  - delete
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: default-default-invocation
  namespace: argo
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: argo-invocation
subjects:
- kind: ServiceAccount
  name: default
  namespace: default


# give workflows (as argo:default) permissions to run things
# see https://github.com/argoproj/argo/blob/master/docs/workflow-rbac.md
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: argo-workflow
  namespace: argo
rules:
# pod get/watch is used to identify the container IDs of the current pod
# pod patch is used to annotate the step's outputs back to controller (e.g. artifact location)
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
  - watch
  - patch
# logs get/watch are used to get the pods logs for script outputs, and for log archival
- apiGroups:
  - ""
  resources:
  - pods/log
  verbs:
  - get
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: argo-default-workflow
  namespace: argo
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: argo-workflow
subjects:
- kind: ServiceAccount
  name: default
  namespace: argo

@omerfsen
Copy link
Contributor

omerfsen commented Dec 9, 2020

Thank you that is great!

@sanzenwin
Copy link

Thanks, it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants