Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progressive Rollout Analysis #11932

Open
david-martin opened this issue Jan 10, 2023 · 2 comments
Open

Progressive Rollout Analysis #11932

david-martin opened this issue Jan 10, 2023 · 2 comments
Labels
appset/progressive-syncs Issues related to the ApplicationSet progressive syncs feature. enhancement New feature or request

Comments

@david-martin
Copy link

Summary

This proposal builds upon the ApplicationSet Progressive Rollouts feature added in #10048.
The idea is to extend the RollingUpdate strategy feature to allow specifying an 'analysis' configuration for checking Application health.
The 'analysis' configuration is based on the same field in the Argo Rollouts 'Rollout' resource e.g. https://github.com/argoproj/argo-rollouts/blob/master/examples/rollout-analysis-step.yaml#L35.

Motivation

The use case is a single Application that is deployed to multiple clusters for regional availability.
While it is possible to use argo-rollouts in each cluster to make a decision about 'health' of the Application (indirectly via the Rollout resource), it is limited to local context in that cluster i.e. it's not aware of other instances of the Application.

If we allow an Analysis to be declared and executed centrally, it allows for a multi cluster aware analysis to be made.
For example, my Application rollout is only 'healthy' once it is receiving a certain percentage of overall traffic in my set of Applications across all clusters (or all clusters in a region).

There are existing examples of using a custom health check for Application resources, including a AnalysisRun and Rollout, however these are limited to the context of a single Application instance in 1 cluster.

Proposal

When an Application is 'rolling out', an AnalysisTemplate, if specified, will be instantiated into an AnalysisRun.
The logic for checking if an Application is 'Healthy' would defer to the AnalysisRun status and only be seen as 'Healthy' if the AnalysisRun was successful.

Example ApplicationSet specifying an analysis configuration:

kind: ApplicationSet
metadata:
  name: example
spec:
  generators:
  - clusters:
  ...
  strategy:
    type: RollingUpdate
    rollingUpdate:
      steps:
        - matchExpressions:
            - key: rolloutWave
              operator: In
              values:
                - region-1
          maxUpdate: 1
          analysis:
            templates:
            - templateName: region-traffic-check

The main difference between the proposed feature and the existing Argo Rollouts project is that it can apply to an Application that is deployed to multiple clusters, via an ApplicationSet e.g. a multi region/multi cluster application.
It allows for declaring analysis rules centrally, using an aggregated view of metrics from all instances of the Application.
It may be necessary to pass in templated parameters, such as the cluster name, to an AnalysisRun to help with context aware queries.

The aggregated metrics is a separate problem to solve e.g. it could be implemented by prometheus and thanos.
Therefore it would make sense to defer to a configured 'provider' rather than including any metrics aggregation in the solution itself. This is in line with the providers feature in Argo Rollouts

I have previously explored a proof of concept using the ApplicationSet cluster decision generator and updating cluster placement decisions in a 'Placement' resource. With this generator based approach the analysis would happen completely outside of ArogCD and only update placement decisions as criteria is met, thereby mimicking rolling update/rollback like functionality. Although it was possible to do this, it doesn't seem like the right place for those decisions to be made. Since doing this proof of concept, the Progressive Rollouts feature landed in ArgoCD and felt like a much better fit for.

@david-martin david-martin added the enhancement New feature or request label Jan 10, 2023
@romuduck
Copy link

Top
In fact analysis run would also be great to have at application level, not just at deployment updates (like with argo rollout).

@david-martin
Copy link
Author

Another option for the strategy definition could be under a new generator rather than at the top level of the spec.
Ultimately, this is about deciding which clusters an Application should be rolled out to.
That decision will change based on a set of analysis rules defined by the user.

Here's an updated example ApplicationSet resource to show how it might look:

kind: ApplicationSet
metadata:
  name: example
spec:
  generators:
  - progressiveRollout:
    steps:
    - matchExpressions:
        - key: rolloutWave
            operator: In
            values:
            - region-1
        maxUpdate: 1
        analysis:
          templates:
          - templateName: region-1-traffic-check
    - matchExpressions:
        - key: rolloutWave
            operator: In
            values:
            - region-2
        maxUpdate: 1
        analysis:
          templates:
          - templateName: region-2-traffic-check
  template:
    metadata:
      name: '{{cluster}}-example'
    spec:
      project: "my-project"
      source:
        repoURL: https://github.com/example.git
        targetRevision: v2

In this example, the ApplicationSet controller will update existing Applications that target clusters in region-1 to have the values from the template.
All other Applications will stay unmodified.
If the analysis in the first step is successful, Applications that target clusters in region-2 will be updated.

An interesting thing to consider here is how to know the full list of clusters that the Application gets deployed to eventually.
This can be solved by iterating through each step and build up a list of clusters from each matchExpressions block.
If a cluster is not included in a step via some expression, it is not considered for Application placement.
This approach does put extra effort on the person defining the rollout steps as they need to consider carefully what rules allow for the right rollout steps and cluster inclusivity, particularly if there's a last step of 'roll out to all other clusters as we're happy after this step' (labelling can help here).

@agaudreault agaudreault added the appset/progressive-syncs Issues related to the ApplicationSet progressive syncs feature. label Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
appset/progressive-syncs Issues related to the ApplicationSet progressive syncs feature. enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants