Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parent level memoization is broken #11612

Closed
2 of 3 tasks
Paritosh-Anand opened this issue Aug 18, 2023 · 5 comments · Fixed by #11623
Closed
2 of 3 tasks

parent level memoization is broken #11612

Paritosh-Anand opened this issue Aug 18, 2023 · 5 comments · Fixed by #11623
Assignees
Labels
area/memoization type/bug type/regression Regression from previous behavior (a specific type of bug)

Comments

@Paritosh-Anand
Copy link

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

What happened

Upgrading to v3.4.10 impaired an existing functionality of parent level memoization.

Scenario:

We have few workflows with memoization enabled on the parent level of a dag or steps and each step or task inside don't have memoization enabled.

Expectation:

If parent level memoization results in cache hit, then the child tasks shouldn't get evaluated or processed.

Impact

Post upgrading from v3.4.8 to v3.4.10 we observed that task or stepgot stuck in Pending state.
Due ever spinning nodes in the workflow it delayed the overall CI feedback and developers being blocked. Hence, resulting in poor developer experience.

Version

v3.4.10

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: memoized-entrypoint-
spec:
  entrypoint: entrypoint
  templates:
  - name: entrypoint
    memoize:
      key: "entrypoint-key-1"
      cache:
        configMap:
          name: cache-top-entrypoint
    outputs:
        parameters:
          - name: url
            valueFrom:
              expression: |
                'https://argo-workflows.company.com/workflows/namepace/' + '{{workflow.name}}' + '?tab=workflow'
    steps:
      - - name: whalesay
          template: whalesay

  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["cowsay hello_world `date` > /tmp/hello_world.txt"]
    outputs:
      parameters:
      - name: hello
        valueFrom:
          path: /tmp/hello_world.txt

Logs from the workflow controller

time="2023-08-18T12:43:00.971Z" level=info msg="node unchanged" namespace=argo nodeID=memoized-entrypoint-jr4fd-2323855477 workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:00.972Z" level=info msg="Workflow step group node memoized-entrypoint-jr4fd-4196811885 not yet completed" namespace=argo workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:00.972Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:00.972Z" level=info msg=reconcileAgentPod namespace=argo workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:00.972Z" level=info msg="Workflow to be dehydrated" Workflow Size=2788
time="2023-08-18T12:43:00.977Z" level=info msg="cleaning up pod" action=terminateContainers key=argo/memoized-entrypoint-jr4fd-whalesay-2323855477/terminateContainers
time="2023-08-18T12:43:00.981Z" level=info msg="Workflow update successful" namespace=argo phase=Running resourceVersion=34091 workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:02.985Z" level=info msg="cleaning up pod" action=killContainers key=argo/memoized-entrypoint-jr4fd-whalesay-2323855477/killContainers
time="2023-08-18T12:43:02.996Z" level=info msg="Processing workflow" namespace=argo workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:02.996Z" level=info msg="Task-result reconciliation" namespace=argo numObjs=1 workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:02.996Z" level=info msg="task-result changed" namespace=argo nodeID=memoized-entrypoint-jr4fd-2323855477 workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:02.996Z" level=info msg="node changed" namespace=argo new.message= new.phase=Succeeded new.progress=0/1 nodeID=memoized-entrypoint-jr4fd-2323855477 old.message= old.phase=Running old.progress=0/1 workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:02.997Z" level=info msg="Step group node memoized-entrypoint-jr4fd-4196811885 successful" namespace=argo workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:02.997Z" level=info msg="node memoized-entrypoint-jr4fd-4196811885 phase Running -> Succeeded" namespace=argo workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:02.997Z" level=info msg="node memoized-entrypoint-jr4fd-4196811885 finished: 2023-08-18 12:43:02.99713725 +0000 UTC" namespace=argo workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:02.997Z" level=info msg="Outbound nodes of memoized-entrypoint-jr4fd-2323855477 is [memoized-entrypoint-jr4fd-2323855477]" namespace=argo workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:02.997Z" level=info msg="Outbound nodes of memoized-entrypoint-jr4fd is [memoized-entrypoint-jr4fd-2323855477]" namespace=argo workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:02.999Z" level=info msg="Saving ConfigMap cache entry" key=entrypoint-key-1 name=cache-top-entrypoint namespace=argo nodeId=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:03.010Z" level=info msg="node memoized-entrypoint-jr4fd phase Pending -> Succeeded" namespace=argo workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:03.010Z" level=info msg="node memoized-entrypoint-jr4fd finished: 2023-08-18 12:43:03.010525125 +0000 UTC" namespace=argo workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:03.010Z" level=info msg="TaskSet Reconciliation" namespace=argo workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:03.010Z" level=info msg=reconcileAgentPod namespace=argo workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:03.010Z" level=info msg="Updated phase Running -> Succeeded" namespace=argo workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:03.010Z" level=info msg="Marking workflow completed" namespace=argo workflow=memoized-entrypoint-jr4fd
time="2023-08-18T12:43:03.010Z" level=info msg="Workflow to be dehydrated" Workflow Size=3126

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=memoized-entrypoint-9j8f5
time="2023-08-18T12:44:28.669Z" level=info msg="Copying /tmp/hello_world.txt from base image layer"
time="2023-08-18T12:44:28.669Z" level=info msg="Successfully saved output parameter: hello"
time="2023-08-18T12:44:28.669Z" level=info msg="No output artifacts"
time="2023-08-18T12:44:28.670Z" level=info msg="S3 Save path: /tmp/argo/outputs/logs/main.log, key: memoized-entrypoint-9j8f5/memoized-entrypoint-9j8f5-whalesay-2335098331/main.log"
time="2023-08-18T12:44:28.670Z" level=info msg="Creating minio client using static credentials" endpoint="minio:9000"
time="2023-08-18T12:44:28.670Z" level=info msg="Saving file to s3" bucket=my-bucket endpoint="minio:9000" key=memoized-entrypoint-9j8f5/memoized-entrypoint-9j8f5-whalesay-2335098331/main.log path=/tmp/argo/outputs/logs/main.log
time="2023-08-18T12:44:28.681Z" level=info msg="Save artifact" artifactName=main-logs duration=10.823ms error="<nil>" key=memoized-entrypoint-9j8f5/memoized-entrypoint-9j8f5-whalesay-2335098331/main.log
time="2023-08-18T12:44:28.681Z" level=info msg="not deleting local artifact" localArtPath=/tmp/argo/outputs/logs/main.log
time="2023-08-18T12:44:28.681Z" level=info msg="Successfully saved file: /tmp/argo/outputs/logs/main.log"
time="2023-08-18T12:44:28.687Z" level=info msg="Alloc=8527 TotalAlloc=14522 Sys=23661 NumGC=4 Goroutines=10"
@terrytangyuan
Copy link
Member

@shmruin @Joibel Could you help take a look? I think this might be related to your changes.

@shmruin
Copy link
Contributor

shmruin commented Aug 18, 2023

@terrytangyuan Yes, it seems my last commit makes this problem. I think I found the reason why, but have to refine the way to solve this. Please assign me.

And will this be a hotfix? Do I just PR to master or to release branch directly? Maybe I need some reviews.

@terrytangyuan
Copy link
Member

PR to master branch would be fine. We can send PR to release branch later once we are planning it.

@agilgur5 agilgur5 added the type/regression Regression from previous behavior (a specific type of bug) label Aug 19, 2023
@shmruin
Copy link
Contributor

shmruin commented Aug 19, 2023

@Paritosh-Anand just for sure, is this what you are experiencing now?

for the first run of your example. it is okay.

스크린샷 2023-08-19 오후 1 55 56

but for the second run, it is like below. (node appear and infinite pending)

스크린샷 2023-08-19 오후 1 56 31

...which is originally should be like this (because of caching)

스크린샷 2023-08-19 오후 1 56 05

shmruin added a commit to shmruin/argo-workflows that referenced this issue Aug 19, 2023
Signed-off-by: shmruin <meme_hm@naver.com>
@Paritosh-Anand
Copy link
Author

@shmruin thanks for looking into this. yes that is correct.

terrytangyuan pushed a commit that referenced this issue Aug 23, 2023
Signed-off-by: shmruin <meme_hm@naver.com>
shmruin added a commit to shmruin/argo-workflows that referenced this issue Aug 23, 2023
terrytangyuan pushed a commit that referenced this issue Aug 23, 2023
dpadhiar pushed a commit to dpadhiar/argo-workflows that referenced this issue May 9, 2024
…oproj#11623) (argoproj#11660)

Signed-off-by: shmruin <meme_hm@naver.com>
Signed-off-by: Dillen Padhiar <dillen_padhiar@intuit.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/memoization type/bug type/regression Regression from previous behavior (a specific type of bug)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants