Skip to content
This repository has been archived by the owner on Jul 15, 2024. It is now read-only.

ApplicationSet should report error conditions/status based on the result of generator/template results #71

Closed
jgwest opened this issue Dec 9, 2020 · 7 comments · Fixed by #370
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@jgwest
Copy link
Member

jgwest commented Dec 9, 2020

EDIT: See this comment for current design proposal.

When the user-provided generator/template produce invalid Argo CD Applications, we should note that error in the ApplicationSet CRD status field. See Argo CD's ApplicationStatus conditions field.

Likewise I'm sure there are other status fields/conditions that would be useful to expose as part of an ApplicationSet status subtype.

Once implemented, one such validation that would benefit from this is #68

@jgwest jgwest changed the title ApplicationSet should report error conditions/status based on the result of generator/template resuls ApplicationSet should report error conditions/status based on the result of generator/template results Dec 16, 2020
@jgwest jgwest added the enhancement New feature or request label Jan 12, 2021
@OmerKahani
Copy link
Contributor

OmerKahani commented May 18, 2021

Hi, what do you think about this structure (based on argocd ApplicationStatus)?

status:
  applications:
    - name:
      lastSync:
      condition:
        - type:
          message:
          LastTransitionTime

@jgwest
Copy link
Member Author

jgwest commented May 24, 2021

The way that Argo CD does conditions actually seems counter to current best practices around conditions: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties

The API conventions page suggests the following condition fields:

conditions:
  - type:  # Example: ErrorOccurred / ParametersGenerated / TemplateRendered
    status: # True/False/Unknown
    reason: # Single word camelcase representing the reason for the status eg MissingParameter or ErrorOccurred
    message: # Human readable message or error
    lastTransitionTime: # When this value last changed.

EDIT: I previously proposed a .status.applications field, containing conditions for each generated Application. There is concern (myself included) that this might be too long (too large a resource payload), so I've moved it to a separate issue (#347). This current comment has been updated to remove the applications field.

We should first begin by adding general conditions that communicate success or failure for the specific ApplicationSet reconciliation tasks that the ApplicationSet controller performs:

Here is an ApplicationSet with a single Application that failed on rendering the template:

status:
  conditions:
    - type: ErrorOccurred
      status: True
      reason: MissingParameter
      message: "error occurred: Template referenced a parameter that doesn't exist" 
      lastTransitionTime: (...)
    - type: ResourcesUpToDate
      status: False
      reason: ErrorOccurred
      message: "A previous error occurred"
      lastTransitionTime:  (...)

This simulates an ApplicationSet with a single Application has no errors, it sucessfully generated, templated, and applied:

status:
  conditions:
    - type: ErrorOccurred
      status: False
      lastTransitionTime: (...)
    - type: ResourcesUpToDate
      status: True 
      lastTransitionTime: (...)

And these were the condition types I used above:

condition types for the ApplicationSet as a whole (these are for status.conditions):

  • ErrorOccurred: true if an error occurred at any point in the reconciliation, false otherwise.
  • ResourcesUpToDate: true if we were successfully able to reconcile, and thus the Applications are fully up-to-date with the ApplicationSet, false otherwise

@mksha
Copy link
Contributor

mksha commented Jun 5, 2021

may be we can add the list of generated application in the status field

@ishitasequeira
Copy link
Member

@jgwest I would like to work on this.

@jnpacker
Copy link
Contributor

I agree with @mksha a condition that the generator was successful, not sure if there is a concern of a long list of clusters.

Some possible conditions:

  • Generator complete: True/False
  • Labels match NO clusters
  • List has a cluster name NOT found in Argo CD
  • CustomDecisionResource shows a cluster NOT found in Argo CD

@jgwest
Copy link
Member Author

jgwest commented Aug 18, 2021

Hey @jnpacker, yah, on the issue of the .status.applications field being way too long if you have lots of clusters, agreed. We can probably mitigate that to some degree. But in any case, lets move implementation of that part of this issue to a separate issue, so that it doesn't block adding conditions. I'll handle that.

For your condition examples, would it be possible to instead include those in the reasons field?

For example, if CustomDecisionResource shows a cluster NOT found in Argo CD:

status:
  conditions:
    - type: ErrorOccurred
      status: True
      reason: ClusterNotFound # This field is an enum, eg a fixed set of strings that may be used here to communicate various known error states
      # Q: Is there other machine-readable information it would be useful to provide, in addition to the `reason`?
      message: Unable to locate cluster 'cluster-mcclusterface'  # This field is a human readable field, that can be passed back to the user as-is
      lastTransitionTime: (...)

This way, we don't necessarily need to have conditions for individual generators. Not that I'm against conditions that are only for individual generators, but it would good to first attempt to make it work generically.

Finally, as per the Q comment in the yaml above, also I'm curious what other additional information it would be beneficial to include for your use cases 🤔.

@jnpacker
Copy link
Contributor

jnpacker commented Aug 19, 2021

A single condition would work for tracking errors. I was thinking the reverse of a list of clusters that matched though, but a list of clusters that didn't match between the decision resource and what is in Argo. For now I would represent that as a count. But I also feel this might be more of a warning condition vs an error.. if for example two clusters didn't match between the decision resource and Argos cluster list... But I could see that being an error as well.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants