Progressive Rollout Analysis #11932
Labels
appset/progressive-syncs
Issues related to the ApplicationSet progressive syncs feature.
enhancement
New feature or request
Summary
This proposal builds upon the ApplicationSet Progressive Rollouts feature added in #10048.
The idea is to extend the RollingUpdate strategy feature to allow specifying an 'analysis' configuration for checking Application health.
The 'analysis' configuration is based on the same field in the Argo Rollouts 'Rollout' resource e.g. https://github.com/argoproj/argo-rollouts/blob/master/examples/rollout-analysis-step.yaml#L35.
Motivation
The use case is a single Application that is deployed to multiple clusters for regional availability.
While it is possible to use argo-rollouts in each cluster to make a decision about 'health' of the Application (indirectly via the Rollout resource), it is limited to local context in that cluster i.e. it's not aware of other instances of the Application.
If we allow an Analysis to be declared and executed centrally, it allows for a multi cluster aware analysis to be made.
For example, my Application rollout is only 'healthy' once it is receiving a certain percentage of overall traffic in my set of Applications across all clusters (or all clusters in a region).
There are existing examples of using a custom health check for Application resources, including a AnalysisRun and Rollout, however these are limited to the context of a single Application instance in 1 cluster.
Proposal
When an Application is 'rolling out', an AnalysisTemplate, if specified, will be instantiated into an AnalysisRun.
The logic for checking if an Application is 'Healthy' would defer to the AnalysisRun status and only be seen as 'Healthy' if the AnalysisRun was successful.
Example ApplicationSet specifying an
analysis
configuration:The main difference between the proposed feature and the existing Argo Rollouts project is that it can apply to an Application that is deployed to multiple clusters, via an ApplicationSet e.g. a multi region/multi cluster application.
It allows for declaring analysis rules centrally, using an aggregated view of metrics from all instances of the Application.
It may be necessary to pass in templated parameters, such as the cluster name, to an AnalysisRun to help with context aware queries.
The aggregated metrics is a separate problem to solve e.g. it could be implemented by prometheus and thanos.
Therefore it would make sense to defer to a configured 'provider' rather than including any metrics aggregation in the solution itself. This is in line with the providers feature in Argo Rollouts
I have previously explored a proof of concept using the ApplicationSet cluster decision generator and updating cluster placement decisions in a 'Placement' resource. With this generator based approach the analysis would happen completely outside of ArogCD and only update placement decisions as criteria is met, thereby mimicking rolling update/rollback like functionality. Although it was possible to do this, it doesn't seem like the right place for those decisions to be made. Since doing this proof of concept, the Progressive Rollouts feature landed in ArgoCD and felt like a much better fit for.
The text was updated successfully, but these errors were encountered: