Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argo-workflows-server fail to start because of lack of permission for serviceaccount #2159

Closed
tyuhara opened this issue Jul 13, 2023 · 9 comments · Fixed by argoproj/argo-workflows#11426

Comments

@tyuhara
Copy link

tyuhara commented Jul 13, 2023

Describe the bug

Just a question. Regardless of whether server.sso.rbac is enabled, ServiceAccounts permission is required when creating a ServiceAccount in the Namespace specified in workflowNamespaces or when creating RoleBinding.
Because of this, I could not start the argo-workflows-server due to lack of the permission.

E0713 01:37:37.489110       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:167: Failed to watch *v1.ServiceAccount: failed to list *v1.ServiceAccount: serviceaccounts is forbidden: User "system:serviceaccount:argo-workflows:argo-workflows-server" cannot list resource "serviceaccounts" in API group "" at the cluster scope

{{- if .Values.server.sso.rbac.enabled }}
- apiGroups:
- ""
resources:
- serviceaccounts
verbs:
- get
- list
- watch
{{- end }}

Related helm chart

argo-workflows

Helm chart version

0.31.0

To Reproduce

Create a values.yaml like the one below and run it. Then it tries to create ServiceAccount named argo-workflows for workflow but fails because of lack of the permission.

workflow:
    create: true
    name: "argo-workflows"
  rbac:
    create: true

controller:
  workflowNamespaces:
    - namespace-a
    - namespace-b

server:
  sso:
    enabled: false

Expected behavior

Grant the list/get/watch against serviceaccounts resource regardless of whether server.sso.rbac is enabled.
It would resolve the issue because of it.

Screenshots

No response

Additional context

No response

@tyuhara tyuhara added the bug label Jul 13, 2023
@yu-croco
Copy link
Collaborator

yu-croco commented Jul 13, 2023

Hi @tyuhara

Is the values.yaml you wrote supposed to be like below?
Ref: https://github.com/argoproj/argo-helm/blob/main/charts/argo-workflows/values.yaml#L48-L56

workflow:
- create: true
- name: "argo-workflows"
+ serviceAccount:
+   create: true
+   name: "argo-workflows"
  rbac:
    create: true
...

Then it tries to create ServiceAccount named argo-workflows for workflow but fails because of lack of the permission.

I tried with below on kind cluster and didn't get error. I wonder I missed something... 🤔

# values.yaml
workflow:
  serviceAccount:
    create: true
    name: "argo-workflows"
  rbac:
    create: true
controller:
  workflowNamespaces:
    - namespace-a
server:
  sso:
    enabled: false
# ServiceAccount is created
$ k get sa
NAME                                 SECRETS   AGE
argo-workflows                       0         116s
argo-workflows-server                0         116s
argo-workflows-workflow-controller   0         116s
default                              0         11m

# argo-workflows-server works well
$ k get po
NAME                                                  READY   STATUS    RESTARTS   AGE
argo-workflows-server-7c7f569f64-89v72                1/1     Running   0          2m40s
argo-workflows-workflow-controller-76778f6544-bt98k   1/1     Running   0          2m40s

$ k logs argo-workflows-server-7c7f569f64-89v72
time="2023-07-13T12:56:37.975Z" level=info msg="not enabling pprof debug endpoints"
time="2023-07-13T12:56:37.976Z" level=info authModes="[client]" baseHRef=/ managedNamespace= namespace=default secure=false ssoNamespace=default
time="2023-07-13T12:56:37.976Z" level=warning msg="You are running in insecure mode. Learn how to enable transport layer security: https://argoproj.github.io/argo-workflows/tls/"
time="2023-07-13T12:56:37.976Z" level=info msg="SSO disabled"
time="2023-07-13T12:56:37.982Z" level=info msg="Starting Argo Server" instanceID= version=v3.4.8
time="2023-07-13T12:56:37.982Z" level=info msg="Creating event controller" asyncDispatch=false operationQueueSize=16 workerCount=4
time="2023-07-13T12:56:37.985Z" level=info msg="GRPC Server Max Message Size, MaxGRPCMessageSize, is set" GRPC_MESSAGE_SIZE=104857600
time="2023-07-13T12:56:37.985Z" level=info msg="Argo Server started successfully on http://localhost:2746" url="http://localhost:2746"
time="2023-07-13T12:56:55.958Z" level=info duration="113.375µs" method=GET path=index.html size=473 status=0
time="2023-07-13T12:57:15.958Z" level=info duration="179.959µs" method=GET path=index.html size=473 status=0
time="2023-07-13T12:57:35.959Z" level=info duration="233.333µs" method=GET path=index.html size=473 status=0
time="2023-07-13T12:57:55.959Z" level=info duration="247.334µs" method=GET path=index.html size=473 status=0
time="2023-07-13T12:58:15.961Z" level=info duration="176.958µs" method=GET path=index.html size=473 status=0
time="2023-07-13T12:58:35.960Z" level=info duration="242.5µs" method=GET path=index.html size=473 status=0
time="2023-07-13T12:58:55.961Z" level=info duration="244.541µs" method=GET path=index.html size=473 status=0
time="2023-07-13T12:59:15.961Z" level=info duration="157.459µs" method=GET path=index.html size=473 status=0

@tyuhara
Copy link
Author

tyuhara commented Jul 13, 2023

Hi @yu-croco, thank you for the checking.
I misunderstood part of it. Here is the full values.yaml I tried.

crds:
  install: false
singleNamespace: false
workflow:
  serviceAccount:
    create: true
    name: "argo-workflows"
  rbac:
    create: true
controller:
  workflowNamespaces:
    - namespace-a
    - namespace-b
server:
  extraArgs: ["--auth-mode=sso"]
  sso:
    enabled: true
    issuer: https://accounts.google.com
    clientId:
      name: argo-workflows-sso
      key: client-id
    clientSecret:
      name: argo-workflows-sso
      key: client-secret
    redirectUrl: https://<mydomain>/oauth2/callback
    rbac:
      enabled: false

And then server failed to start.

% k get sa
NAME                                 SECRETS   AGE
argo-workflows                       0         5m34s
argo-workflows-server                0         5m34s
argo-workflows-workflow-controller   0         5m34s
default                              0         2d3h

% k get po
NAME                                                 READY   STATUS      RESTARTS   AGE
argo-workflows-server-54d9f9cc-6nfr7                 0/1     Running     0          2m29s
argo-workflows-workflow-controller-6bddc77f5-mch5q   1/1     Running     0          2m29s

% k logs argo-workflows-server-54d9f9cc-6nfr7
time="2023-07-13T14:13:56.534Z" level=info msg="not enabling pprof debug endpoints"
time="2023-07-13T14:13:56.534Z" level=info authModes="[sso]" baseHRef=/ managedNamespace= namespace=argo-workflows secure=false ssoNamespace=argo-workflows
time="2023-07-13T14:13:56.535Z" level=warning msg="You are running in insecure mode. Learn how to enable transport layer security: https://argoproj.github.io/argo-workflows/tls/"
time="2023-07-13T14:13:56.843Z" level=info msg="SSO configuration" clientId="{{argo-workflows-oauth-credentials} client-id <nil>}" insecureSkipVerify=false issuer="https://accounts.google.com" issuerAlias=DISABLED redirectUrl="https://argo-workflows.otsukisama.net/oauth2/callback" scopes="[openid]"
E0713 14:16:09.477522       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:167: Failed to watch *v1.ServiceAccount: failed to list *v1.ServiceAccount: serviceaccounts is forbidden: User "system:serviceaccount:argo-workflows:argo-workflows-server" cannot list resource "serviceaccounts" in API group "" at the cluster scope

After turning sso.rbac.enabled true, it could start because serviceaccounts permission is added to ClusterRole.
Should sso.rbac.enabled be always true when singleNamespace: false? Sorry if I do not understand that value well 🙇

@vaelant
Copy link

vaelant commented Jul 13, 2023

Chart version: 0.31.0
App version: v3.4.8

Seeing the same issue with similar configuration. The server fails to start because it is unable to read service accounts cluster wide.

https://argoproj.github.io/argo-workflows/argo-server-sso/#sso-rbac

This tells me that if you want to authenticate to the server with SSO without using your SSO provider for RBAC control for users within argo workflows; you configure as follows:

  workflow:
    serviceAccount:
      create: true
      name: "argo-workflow"
    rbac:
      create: true
  
  controller:
  workflowNamespaces:
    - namespace-a
    - namespace-b
  
  
  sso:
    rbac:
      enabled: false

The problem is that if you do that; the cluster role will not have the ability to access service accounts clusterwide - so you see errors like this:

reflector.go:324] pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:167: failed to list *v1.ServiceAccount: serviceaccounts is forbidden: User "system:serviceaccount:argo:argoworkflows-argo-workflows-server" cannot list resource "serviceaccounts" in API group "" at the cluster scope

reflector.go:138] pkg/mod/k8s.io/client-go@v0.24.3/tools/cache/reflector.go:167: Failed to watch *v1.ServiceAccount: failed to list *v1.ServiceAccount: serviceaccounts is forbidden: User "system:serviceaccount:argo:argoworkflows-argo-workflows-server" cannot list resource "serviceaccounts" in API group "" at the cluster scope

{{- if .Values.server.sso.rbac.enabled }}

Instead wouldn't we want that conditional to apply if the user has single namespace workflows set to false?

@driv
Copy link

driv commented Jul 18, 2023

Is this an issue with the chart or with Argo Workflows?

Shouldn't the server not need to read the service accounts if rbac is disabled?

  sso:
    enabled: true
    clientId:
      key: client-id
      name: argo-server-sso
    clientSecret:
      key: client-secret
      name: argo-server-sso
    issuer: https://example.com
    rbac:
      enabled: false
    redirectUrl: https://example.com/redirect

@driv
Copy link

driv commented Jul 18, 2023

If I run the chart with server.sso.rbac.enabled: true.

The ClusterRole will now have:

   - serviceaccounts
   verbs:
   - get
   - list
   - watch
 - apiGroups:
   - ""
   resources:

The server will start fine, but I would need to create ServiceAccounts with rules for users to be able to do anything.

If now I manually edit the ConfigMap, set sso.rbac.enabled: false and restart the server. Everything starts working fine.

@agilgur5
Copy link
Member

This actually seems to be a duplicate of #1624, which was auto-closed as stale. The resolution there says to set rbac.enabled: true, but as users correctly point out here, you can turn on SSO without RBAC.

Shouldn't the server not need to read the service accounts if rbac is disabled?

I might have to check the Argo source code to see if this is accurate. It might be trying to read a default SA or something. In which case, that would be a bug in the chart. If it's reading unintentionally, then that would be a bug upstream in Argo itself.

@agilgur5
Copy link
Member

agilgur5 commented Jul 23, 2023

Server currently requires SA RBAC

Ok I did some tracing and the Argo Server does currently require serviceaccount RBAC whenever SSO is enabled.

The error occurs on start-up, which was a hint to the Server's initialization code, and indeed, when SSO is enabled, a new ResourceCache is created, which is a wrapper around a k8s Informer (which itself is a k8s watch helper). The cache watches ServiceAccounts.
Hence, on start-up, serviceaccount RBAC is indeed currently necessary.

This is a bug upstream in Argo

Shouldn't the server not need to read the service accounts if rbac is disabled?

I say currently because this statement is correct. The SA cache is only used when the SSO module lists ServiceAccounts. Which only occurs during RBAC authorization, which only happens when SSO RBAC is enabled.

So this does not need to be populated what-so-ever if RBAC is not enabled. Initialization should not create a ServiceAccount cache if RBAC is not enabled.

I'll file a bug upstream and will make a PR to fix that 🙂

Secrets RBAC

Notably, the cache used to watch and list Secrets as well, until argoproj/argo-workflows#8555 changed that (Secrets of ServiceAccounts, to be specific).
That means that the current Secrets RBAC is outdated as well.

Will file a PR to update that too 😅 EDIT: See #2211

@agilgur5
Copy link
Member

Sent a PR upstream to fix this: argoproj/argo-workflows#11426

@agilgur5
Copy link
Member

agilgur5 commented Aug 15, 2023

Upstream fix will be available in v3.4.10+: argoproj/argo-workflows#11552

@agilgur5 agilgur5 removed the awaiting-upstream Is waiting for a change upstream to be completed before it can be merged. label Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants