Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting a workflow on the workflow details page doesn't do anything #11661

Closed
2 of 3 tasks
tico24 opened this issue Aug 23, 2023 · 28 comments · Fixed by #11711
Closed
2 of 3 tasks

Deleting a workflow on the workflow details page doesn't do anything #11661

tico24 opened this issue Aug 23, 2023 · 28 comments · Fixed by #11711
Labels
area/ui type/bug type/regression Regression from previous behavior (a specific type of bug)

Comments

@tico24
Copy link
Member

tico24 commented Aug 23, 2023

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

If you go to the workflows details page and hit delete (and then confirm the delete), you are taken back to the workflows list page... where your workflow is still sat there staring at you :)

We do not have workflows archiving enabled.

Version

v3.5.0-rc1

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

any workflow

Logs from the workflow controller

I couldn't find anything useful. I'll update if I find anything.

Logs from in your workflow's wait container

n/a
@terrytangyuan
Copy link
Member

cc @toyamagu-2021 This might be related to your fix

@terrytangyuan
Copy link
Member

If you refresh, would it be gone?

@toyamagu-2021
Copy link
Member

toyamagu-2021 commented Aug 24, 2023

I cannot reproduce this both on master branch and on 3.5.0-rc1 on k3d cluster... 🤔
(I don't refresh page)
Might be moving to workflows list page before workflow is completely deleted?

image image

@tico24
Copy link
Member Author

tico24 commented Aug 24, 2023

Still very repeatable. Stays after refresh
gah

My gif tool cut off the deletion popup but obviously I said 'ok' to it :)

@terrytangyuan
Copy link
Member

Check developer console and server logs?

@tico24
Copy link
Member Author

tico24 commented Aug 24, 2023

some console shouting. means nothing to me

image

Server logs show nothing special, just that the list page is loading.

@toyamagu-2021
Copy link
Member

Could you give me your workflow metadata.labels?
Might be following line returns false.

if (isWorkflowInCluster(workflow)) {
services.workflows.delete(workflow.metadata.name, workflow.metadata.namespace).catch(setError);

Or please give me your workflow (might be meaningless...).

@tico24
Copy link
Member Author

tico24 commented Aug 24, 2023

metadata:
  name: renovate-application-definitions-h7k7x
  generateName: renovate-application-definitions-
  namespace: ci
  uid: 7d878237-241a-46d6-804d-fdda0d4a5d89
  resourceVersion: '139954745'
  generation: 4
  creationTimestamp: '2023-08-24T14:41:50Z'
  labels:
    cron: 'true'
    workflows.argoproj.io/completed: 'true'
    workflows.argoproj.io/creator: system-serviceaccount-argo-argo-server
    workflows.argoproj.io/cron-workflow: renovate-application-definitions
    workflows.argoproj.io/phase: Succeeded
  annotations:
    workflows.argoproj.io/pod-name-format: v2
    workflows.argoproj.io/scheduled-time: '2023-08-24T14:41:50Z'
  ownerReferences:
    - apiVersion: argoproj.io/v1alpha1
      kind: CronWorkflow
      name: renovate-application-definitions
      uid: 0d2bb088-d382-4684-997e-13b23263e7b2
      controller: true
      blockOwnerDeletion: true
  managedFields:
    - manager: argo
      operation: Update
      apiVersion: argoproj.io/v1alpha1
      time: '2023-08-24T14:41:50Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:workflows.argoproj.io/scheduled-time: {}
          f:generateName: {}
          f:labels:
            .: {}
            f:cron: {}
            f:workflows.argoproj.io/creator: {}
            f:workflows.argoproj.io/cron-workflow: {}
          f:ownerReferences:
            .: {}
            k:{"uid":"0d2bb088-d382-4684-997e-13b23263e7b2"}: {}
        f:spec: {}
    - manager: workflow-controller
      operation: Update
      apiVersion: argoproj.io/v1alpha1
      time: '2023-08-24T14:43:25Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:workflows.argoproj.io/pod-name-format: {}
          f:labels:
            f:workflows.argoproj.io/completed: {}
            f:workflows.argoproj.io/phase: {}
        f:status: {}
spec:
  arguments:
    parameters:
      - name: repository_json
        value: '"pipekit/application-definitions"'
  workflowTemplateRef:
    name: renovate
status:
  phase: Succeeded
  startedAt: '2023-08-24T14:41:50Z'
  finishedAt: '2023-08-24T14:43:25Z'
  progress: 1/1
  nodes:
    renovate-application-definitions-h7k7x:
      id: renovate-application-definitions-h7k7x
      name: renovate-application-definitions-h7k7x
      displayName: renovate-application-definitions-h7k7x
      type: DAG
      templateName: main
      templateScope: local/
      phase: Succeeded
      startedAt: '2023-08-24T14:41:50Z'
      finishedAt: '2023-08-24T14:43:25Z'
      progress: 1/1
      resourcesDuration:
        cpu: 163
        memory: 311
      children:
        - renovate-application-definitions-h7k7x-4272356993
      outboundNodes:
        - renovate-application-definitions-h7k7x-4272356993
    renovate-application-definitions-h7k7x-4272356993:
      id: renovate-application-definitions-h7k7x-4272356993
      name: renovate-application-definitions-h7k7x.renovate
      displayName: renovate
      type: Pod
      templateName: renovate
      templateScope: local/
      phase: Succeeded
      boundaryID: renovate-application-definitions-h7k7x
      startedAt: '2023-08-24T14:41:50Z'
      finishedAt: '2023-08-24T14:43:15Z'
      progress: 1/1
      resourcesDuration:
        cpu: 163
        memory: 311
      outputs:
        artifacts:
          - name: main-logs
            s3:
              key: >-
                my-artifacts/2023/08/24/renovate-application-definitions-h7k7x/renovate-application-definitions-h7k7x-renovate-4272356993/main.log
        exitCode: '0'
      hostNodeName: ip-10-102-158-69.us-east-2.compute.internal
  storedTemplates:
    namespaced/renovate/main:
      name: main
      inputs: {}
      outputs: {}
      metadata: {}
      dag:
        tasks:
          - name: renovate
            template: renovate
            arguments: {}
    namespaced/renovate/renovate:
      name: renovate
      inputs: {}
      outputs: {}
      nodeSelector:
        nodegroup: arm-spot
      metadata:
        annotations:
          vault.hashicorp.com/agent-inject: 'true'
          vault.hashicorp.com/agent-inject-secret-ci-token: infrastructure/data/renovate
          vault.hashicorp.com/agent-inject-secret-docker-token: infrastructure/data/docker/pipekitdev
          vault.hashicorp.com/agent-inject-template-ci-token: |
            {{ with secret "infrastructure/data/renovate" -}}
                export RENOVATE_TOKEN="{{ .Data.data.github_PAT }}"
            {{- end }}
          vault.hashicorp.com/agent-inject-template-docker-token: |
            {{ with secret "infrastructure/data/docker/pipekitdev" -}}
                export RENOVATE_HOST_RULES="[{\"matchHost\":\"docker.io\",\"username\":\"tico24\",\"password\":\"{{ .Data.data.PAT }}\"}]"
            {{- end }}
          vault.hashicorp.com/agent-pre-populate-only: 'true'
          vault.hashicorp.com/auth-path: auth/runner
          vault.hashicorp.com/role: argo
      container:
        name: ''
        image: renovate/renovate:36.57.2
        command:
          - /bin/bash
          - '-c'
          - |
            . /vault/secrets/ci-token
            . /vault/secrets/docker-token
            renovate
        env:
          - name: LOG_LEVEL
            value: debug
          - name: RENOVATE_PLATFORM
            value: github
          - name: RENOVATE_CONFIG
            value: '{ "repositories": [ {{workflow.parameters.repository_json}} ] }'
          - name: RENOVATE_PR_CONCURRENT_LIMIT
            value: '0'
          - name: RENOVATE_BRANCH_CONCURRENT_LIMIT
            value: '0'
          - name: RENOVATE_PR_HOURLY_LIMIT
            value: '0'
          - name: RENOVATE_X_HARD_EXIT
            value: 'true'
        resources:
          requests:
            cpu: '1'
            memory: 256Mi
        imagePullPolicy: Always
  conditions:
    - type: PodRunning
      status: 'False'
    - type: Completed
      status: 'True'
  resourcesDuration:
    cpu: 163
    memory: 311
  storedWorkflowTemplateSpec:
    templates:
      - name: main
        inputs: {}
        outputs: {}
        metadata: {}
        dag:
          tasks:
            - name: renovate
              template: renovate
              arguments: {}
      - name: renovate
        inputs: {}
        outputs: {}
        nodeSelector:
          nodegroup: arm-spot
        metadata:
          annotations:
            vault.hashicorp.com/agent-inject: 'true'
            vault.hashicorp.com/agent-inject-secret-ci-token: infrastructure/data/renovate
            vault.hashicorp.com/agent-inject-secret-docker-token: infrastructure/data/docker/pipekitdev
            vault.hashicorp.com/agent-inject-template-ci-token: |
              {{ with secret "infrastructure/data/renovate" -}}
                  export RENOVATE_TOKEN="{{ .Data.data.github_PAT }}"
              {{- end }}
            vault.hashicorp.com/agent-inject-template-docker-token: |
              {{ with secret "infrastructure/data/docker/pipekitdev" -}}
                  export RENOVATE_HOST_RULES="[{\"matchHost\":\"docker.io\",\"username\":\"tico24\",\"password\":\"{{ .Data.data.PAT }}\"}]"
              {{- end }}
            vault.hashicorp.com/agent-pre-populate-only: 'true'
            vault.hashicorp.com/auth-path: auth/runner
            vault.hashicorp.com/role: argo
        container:
          name: ''
          image: renovate/renovate:36.57.2
          command:
            - /bin/bash
            - '-c'
            - |
              . /vault/secrets/ci-token
              . /vault/secrets/docker-token
              renovate
          env:
            - name: LOG_LEVEL
              value: debug
            - name: RENOVATE_PLATFORM
              value: github
            - name: RENOVATE_CONFIG
              value: '{ "repositories": [ {{workflow.parameters.repository_json}} ] }'
            - name: RENOVATE_PR_CONCURRENT_LIMIT
              value: '0'
            - name: RENOVATE_BRANCH_CONCURRENT_LIMIT
              value: '0'
            - name: RENOVATE_PR_HOURLY_LIMIT
              value: '0'
            - name: RENOVATE_X_HARD_EXIT
              value: 'true'
          resources:
            requests:
              cpu: '1'
              memory: 256Mi
          imagePullPolicy: Always
    entrypoint: main
    arguments:
      parameters:
        - name: repository_json
          value: '"pipekit/application-definitions"'
    serviceAccountName: ci
    nodeSelector:
      nodegroup: spot
    imagePullSecrets:
      - name: regcred
    ttlStrategy:
      secondsAfterCompletion: 3600
    activeDeadlineSeconds: 3600
    workflowTemplateRef:
      name: renovate
    volumeClaimGC:
      strategy: OnWorkflowCompletion
  artifactRepositoryRef:
    default: true
    artifactRepository:
      archiveLogs: true
      s3:
        endpoint: s3.amazonaws.com
        bucket: pipekit-argo-workflows
        region: us-east-2
        insecure: false
        useSDKCreds: true
        encryptionOptions: {}
        keyFormat: >-
          my-artifacts/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{workflow.name}}/{{pod.name}}
  artifactGCStatus:
    notSpecified: true

@tico24
Copy link
Member Author

tico24 commented Aug 24, 2023

The same issue happens with a basic hello world though:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: hello-world-
  labels:
    workflows.argoproj.io/archive-strategy: "false"
  namespace: ci
  annotations:
    workflows.argoproj.io/description: |
      This is a simple hello world example.
spec:
  serviceAccountName: ci
  entrypoint: whalesay
  templates:
  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [cowsay]
      args: ["hello world"]

@terrytangyuan
Copy link
Member

Any logs from your workflow controller?

@terrytangyuan
Copy link
Member

I am not able to reproduce this either. Please check controller logs to see if the workflow deletion request is handled successfully.

@shmruin
Copy link
Contributor

shmruin commented Aug 25, 2023

Interesting. I see some small error on provided gif when entering to the workflow as below.

image

And refer to this specific line in useEffect, are there any chances that let archivedWf: Workflow just stay undefined and goes into setWorkflow? and catched by isWorkflowInCluster condition check ?

@terrytangyuan
Copy link
Member

@shmruin Can you check if your controller/executor/server image versions match?

@terrytangyuan
Copy link
Member

terrytangyuan commented Aug 25, 2023

@shmruin That error occurs when you visit a live workflow details page for a workflow that's been deleted from the cluster. I am improving the error message in #11674.

@terrytangyuan
Copy link
Member

Also sent #11676. I suspect there are some errors that appear in the UI for a moment before redirecting to the workflow list page.

@shmruin
Copy link
Contributor

shmruin commented Aug 27, 2023

@terrytangyuan Sadly, that screenshot is @tico24 's, and I also fail to reproduce this error.
I think @agilgur5 's reviews in #11676 makes sense and can be potentially related to this issue, but cannot sure as I cannot make this situation.

@terrytangyuan
Copy link
Member

terrytangyuan commented Aug 28, 2023

@tico24 Can you try the latest image? So far looks like no one else has been able to reproduce your original issue.

@tico24
Copy link
Member Author

tico24 commented Aug 29, 2023

As expected, and as mentioned in the original issue, I can recreate the issue on latest too.

controller, server and executor were moved to latest. Here's a screenshot of the server to prove it
image

latest did not exist on this node, so was freshly pulled. So no caching here either.

@terrytangyuan
Copy link
Member

@tico24 Can you check workflow controller and K8s API server logs? I'd be interested to see if the request has been sent successfully.

@tico24
Copy link
Member Author

tico24 commented Aug 29, 2023

Both server and controller logs say nothing special. What specifically do you want from them?

@terrytangyuan
Copy link
Member

@tico24 They would indicate whether the deletion request has been received and processed. Just do a grep from the logs.

@agilgur5
Copy link
Member

The Network tab of the browser console will also show if the request was sent

@tico24
Copy link
Member Author

tico24 commented Aug 29, 2023

I see what I'd expect to see.

server:

time="2023-08-29T16:09:21.189Z" level=info duration=4.18495ms method=DELETE path=/api/v1/workflows/ci/renovate-application-definitions-dv565 size=39 status=408
time="2023-08-29T16:09:21.189Z" level=error msg="finished unary call with code Internal" error="rpc error: code = Internal desc = getting archived workflows not supported" grpc.code=Internal grpc.method=DeleteWorkflow grpc.service=workflow.WorkflowService grpc.start_time="2023-08-29T16:09:21Z" grpc.time_ms=4.023 s
pan.kind=server system=grpc

controller still has nothing.

@tico24
Copy link
Member Author

tico24 commented Aug 29, 2023

image

@agilgur5
Copy link
Member

agilgur5 commented Aug 29, 2023

status=408 in the Server logs. A 408 gives some credence to my hypothesis from #11676. Though not necessarily conclusive.

I'll work on a fix for that

@terrytangyuan
Copy link
Member

terrytangyuan commented Aug 29, 2023

Yep exactly what we needed to see. The request went through but was canceled in argo-server.

@agilgur5 Looking forward to your fix :-)

@agilgur5 agilgur5 added the type/regression Regression from previous behavior (a specific type of bug) label Aug 29, 2023
@agilgur5
Copy link
Member

agilgur5 commented Aug 29, 2023

So I too was unable to reproduce this locally. That being said, as I mentioned on Slack, if my hypothesis is correct, then this is a race condition. Specifically, higher latency (as with a remote server vs. local server) may make this more likely. The low latency of local dev may mean that the request completes shortly before or after the page reload and so is not affected by the connection drop.

I also did try using throttling in browser dev tools, but that did not reproduce either. That being said, I'm not entirely sure how the emulation works under-the-hood and a slow request/response is not entirely the same as network hops to a remote server. Server-side throttling (e.g. a sleep on the request) may be a bit closer to the real thing, but also a bit different. Forgot to try a server-side throttle though.

Wrote up #11711 which hopefully fixes this -- if my hypothesis is correct.

@terrytangyuan
Copy link
Member

@tico24 Have you tried @agilgur5's PR yet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ui type/bug type/regression Regression from previous behavior (a specific type of bug)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants