Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deleteDelayDuration JSON schema and validation mismatch #12631

Open
3 of 4 tasks
hobti01 opened this issue Feb 6, 2024 · 5 comments
Open
3 of 4 tasks

deleteDelayDuration JSON schema and validation mismatch #12631

hobti01 opened this issue Feb 6, 2024 · 5 comments
Assignees
Labels
area/api Argo Server API area/spec Changes to the workflow specification. solution/workaround There's a workaround, might not be great, but exists type/bug

Comments

@hobti01
Copy link

hobti01 commented Feb 6, 2024

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issue exists when I tested with :latest
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what did you expect to happen?

I'm really not understanding the syntax for deleteDelayDuration implemented in 3.5 #11325.
The examples use

deleteDelayDuration: 1h

But the schema (https://github.com/argoproj/argo-workflows/blob/v3.5.4/api/openapi-spec/swagger.json#L14403) says it should be

deleteDelayDuration:
  duration: 1h

The schema-correct option fails with a validation error in the UI.

Is the schema wrong, the docs wrong or the code wrong? I cannot determine what is correct, although only the schema-incorrect version is submittable.

Version

3.5.4

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: pod-gc-strategy-
spec:
  entrypoint: main
  podGC:
    strategy: OnPodCompletion
    deleteDelayDuration:
      duration: 30s
  templates:
  - name: main
    container:
      image: alpine:3.7
      command: [sh, -c]
      args: ["exit 0"]

Logs from the workflow controller server

No logs from the controller, the workflow cannot be submitted from the UI as a 400 error is returned from the Server.

time="2024-02-06T13:37:24.226Z" level=info duration="145.147µs" method=POST path=/api/v1/workflows/argo size=134 status=400

Logs from in your workflow's wait container

No logs, no workflow is created.

@agilgur5 agilgur5 self-assigned this Feb 6, 2024
@agilgur5 agilgur5 added area/controller Controller issues, panics area/spec Changes to the workflow specification. labels Feb 6, 2024
@agilgur5
Copy link
Member

agilgur5 commented Feb 7, 2024

But the schema (https://github.com/argoproj/argo-workflows/blob/v3.5.4/api/openapi-spec/swagger.json#L14403) says it should be

Yea I mentioned this exact thing ask in my "Notes to Reviewers" in the PR as well as in-line comment and didn't get a response.
That was one of my first PRs to Argo.

Re-reading that 6 months old PR of mine, it's supposed to marshal to a string, but the internal representation of it is what the JSON schema shows (what it unmarshals to). Unfortunately I did not know how to resolve that.

The schema-correct option fails with a validation error in the UI.

Is the schema wrong, the docs wrong or the code wrong?

The schemas, validation, and "Field Reference" page in the docs are all generated from the Go types.
So the validation seems to confirm that the JSON schema is misleading, but uh it's generated from the same place 😕
The docs examples I wrote myself in the same PR.

Notably the CRD is just a plain string -- yet similarly generated from the same place.

@agilgur5 agilgur5 added the type/support User support issue - likely not a bug label Feb 7, 2024
@agilgur5 agilgur5 changed the title deleteDelayDuration schema and validation mismatch deleteDelayDuration JSON schema and validation mismatch Feb 8, 2024
@agilgur5 agilgur5 added area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more problem/more information needed Not enough information has been provide to diagnose this issue. labels Feb 18, 2024

This comment was marked as resolved.

@github-actions github-actions bot added the problem/stale This has not had a response in some time label Mar 6, 2024

This comment was marked as resolved.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 22, 2024
@agilgur5
Copy link
Member

agilgur5 commented Jul 19, 2024

We use durations in a few places already, and this was a duration as an env var before it became part of the spec (with the exact same name), but I learned recently that apparently it's k8s convention to not expose them in fields since they are Go specific: https://github.com/kubernetes/community/blob/fb55d44/contributors/devel/sig-architecture/api-conventions.md#units

Changing this (and other fields) to deleteDelaySeconds as an integer per the k8s convention would resolve this issue as well. Although I'm not sure how large the integer can get (probably arch dependent non-bigint?), as this number could be quite large in this case (days or weeks) EDIT: Did the math, 60 * 60 * 24 * 30 = 2592000 while int32 goes up to 2147483647 and uint32 to double that: 4294967295. Even 1 year as seconds fits into a 32 bit integer: 2592000 * 12 = 31104000

@agilgur5
Copy link
Member

agilgur5 commented Aug 14, 2024

This got reported on Slack as well. Now that it's been reported twice (and unstale), I'm thinking of applying a workaround for this. The codebase has a helper ParseStringToDuration that I had considered over a year ago when making the PR, but it complicates the processing a bit as it can no longer be plain unmarshaled/marshaled. Either that or I need to figure out why the JSON schema codegen is incorrect while the CRD codegen is correct.

Changing this (and other fields) to deleteDelaySeconds as an integer per the k8s convention would resolve this issue as well.

This would be a breaking change, so while I'd like to do that with many of the durations in the codebase as well, we should fix this issue in the meantime

@agilgur5 agilgur5 reopened this Aug 14, 2024
@agilgur5 agilgur5 added solution/workaround There's a workaround, might not be great, but exists area/api Argo Server API and removed problem/stale This has not had a response in some time problem/more information needed Not enough information has been provide to diagnose this issue. type/support User support issue - likely not a bug area/controller Controller issues, panics area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more labels Aug 14, 2024
@agilgur5 agilgur5 changed the title deleteDelayDuration JSON schema and validation mismatch deleteDelayDuration JSON schema and validation mismatch Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Argo Server API area/spec Changes to the workflow specification. solution/workaround There's a workaround, might not be great, but exists type/bug
Projects
None yet
Development

No branches or pull requests

2 participants