Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Application dependencies #15280

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Conversation

jannfis
Copy link
Member

@jannfis jannfis commented Aug 29, 2023

Closes #7437

There's some docs within this PR which describe the functionality and configuration.

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • Optional. My organization is added to USERS.md.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.

@codecov
Copy link

codecov bot commented Aug 29, 2023

Codecov Report

Attention: Patch coverage is 60.00000% with 118 lines in your changes are missing coverage. Please review.

Project coverage is 50.07%. Comparing base (f87897c) to head (7513255).
Report is 100 commits behind head on master.

❗ Current head 7513255 differs from pull request most recent head b388917. Consider uploading reports for the commit b388917 to get more accurate results

Files Patch % Lines
controller/appcontroller.go 36.36% 52 Missing and 4 partials ⚠️
controller/sync.go 76.78% 20 Missing and 6 partials ⚠️
controller/dependencies.go 70.96% 12 Missing and 6 partials ⚠️
pkg/apis/application/v1alpha1/types.go 41.93% 17 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #15280      +/-   ##
==========================================
+ Coverage   49.73%   50.07%   +0.34%     
==========================================
  Files         274      267       -7     
  Lines       48948    46256    -2692     
==========================================
- Hits        24343    23165    -1178     
+ Misses      22230    20825    -1405     
+ Partials     2375     2266     -109     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jannfis jannfis force-pushed the feat/app-deps branch 2 times, most recently from eba17a8 to 9686dbf Compare September 13, 2023 14:45
@jannfis jannfis marked this pull request as ready for review September 18, 2023 16:41
@LS80
Copy link

LS80 commented Sep 20, 2023

Thanks for working on this! Is there a public test image I can use to try it out?

@jannfis
Copy link
Member Author

jannfis commented Sep 20, 2023

Is there a public test image I can use to try it out?

I'm afraid there's not, sorry. We usually don't build images for PRs at this stage. But you can easily build your own, by checking out this PR and running make image (needs docker or podman with docker emulation to work)

docs/user-guide/app_dependencies.md Outdated Show resolved Hide resolved
docs/user-guide/app_dependencies.md Show resolved Hide resolved
docs/user-guide/app_dependencies.md Show resolved Hide resolved
assets/swagger.json Outdated Show resolved Hide resolved
assets/swagger.json Show resolved Hide resolved
assets/swagger.json Show resolved Hide resolved
controller/appcontroller.go Outdated Show resolved Hide resolved
controller/appcontroller.go Outdated Show resolved Hide resolved
docs/user-guide/app_dependencies.md Outdated Show resolved Hide resolved
docs/user-guide/app_dependencies.md Outdated Show resolved Hide resolved
docs/user-guide/app_dependencies.md Outdated Show resolved Hide resolved

### Configure a wait timeout

You can configure the maximum time that Argo CD should wait for dependencies to be resolved and the sync to start. The timeout also applies to the wait spent in blocking for empty dependencies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this change the behavior of argocd app sync (mentioned above)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question actually, I haven't tested behaviour with the CLI yet. I'll follow up on this.

docs/user-guide/app_dependencies.md Outdated Show resolved Hide resolved
controller/appcontroller.go Outdated Show resolved Hide resolved
docs/user-guide/app_dependencies.md Show resolved Hide resolved
controller/appcontroller.go Show resolved Hide resolved
@ellakk
Copy link

ellakk commented Oct 29, 2023

I've tried this and the functionality works great and as advertised, i did however noticed the following:

After i switched to the image and CRDs i built from this branch the CPU usage went up quite a bit for the application controller:

Screenshot from 2023-10-29 17-25-47
Each line represents half a cpu core. The first increase to 1.5 cores is when i switched from ArgoCD built from the main branch to this. Initially i had nothing deployed but after adding two applications the CPU usage went up quite a bit further, each application contained a single deployment resource.

There are no errors in the log for the controller or anything else that seems useful.

@jannfis
Copy link
Member Author

jannfis commented Nov 1, 2023

Thanks, that's some valuable feedback @ellakk! I gotta dig into this, it's probably the dependency graph being build too often.

@riuvshyn
Copy link

riuvshyn commented Dec 11, 2023

Thank you for working on this!

@jannfis is this supposed to be working with applicationSets?

@csantanapr
Copy link
Member

@jannfis Would this work with apps of application-sets, the gitops-bridge and other end users use this pattern to bootstrap a new cluster with addons/plugins/operators in a specific order.
Currently there is no status to indicate that the all child apps of an application set are healthy to move on to the next application set.

@blakepettersson
Copy link
Member

@ellakk it would be helpful if there is some way to minimally reproduce your issue. Is there a way for you to supply us with an App of Apps or similar which reproduces the issue (suitably redacted if required)?

I'm really looking forward to this PR and would love to see this pushed over the finish line!

@ellakk
Copy link

ellakk commented Feb 28, 2024

@ellakk it would be helpful if there is some way to minimally reproduce your issue. Is there a way for you to supply us with an App of Apps or similar which reproduces the issue (suitably redacted if required)?

I'm really looking forward to this PR and would love to see this pushed over the finish line!

Hi, sorry for the late reply. As I wrote in my initial post, I had nothing deployed (and no gitrepos added either). All i did was build the containers from what is currently the HEAD of this branch, applied the CRDs and deployed the containers with the manifests from this branch. The CPU usage for the controller was about 40x of what i normally get from an empty ArgoCD deployment. The CPU usage then went up even further when i added two applications where one depended on the other, but the problem with the CPU was already apparent.

I can try to recreate the problem if the developer or other people cant. I no longer have the exact manifests I used but if I have to redeploy this build I will also post the manifests.

What i think would be more helpful is if someone who is also interested in this feature built and deployed ArgoCD from the head of this branch and shared their experience.

Signed-off-by: jannfis <jann@mistrust.net>
Signed-off-by: jannfis <jann@mistrust.net>
Signed-off-by: jannfis <jann@mistrust.net>
Signed-off-by: jannfis <jann@mistrust.net>
Signed-off-by: jannfis <jann@mistrust.net>
Signed-off-by: jannfis <jann@mistrust.net>
@jannfis
Copy link
Member Author

jannfis commented Mar 8, 2024

Sorry for being so late to the recent comments.

@riuvshyn I haven't tested it with ApplicationSet, but ff you define one or more dependency selectors in an ApplicationSet's Application template, I do not see a reason why it would not work.

@csantanapr The code uses an Application's sync and health state to determine whether to proceed in the dependency chain. So it all depends on that information being exposed by your Application.

@ellakk
Copy link

ellakk commented Mar 14, 2024

Just built and deployed with the latest commit from this branch and the performance issues i had are gone (both on OpenShift and K3S). Thanks a lot for the work you are putting into this @jannfis!

@jannfis
Copy link
Member Author

jannfis commented Mar 14, 2024

Thank you so much for testing again, @ellakk. Actually, I have not changed anything I thought was performance related, so potentially the iteration of the PR you tested was against a state of the master branch that had this problem.

Anyhow, thanks for confirming that you don't see that performance issue anymore. Please let me know if there's something else you notice during testing.

Signed-off-by: jannfis <jann@mistrust.net>
Signed-off-by: jannfis <jann@mistrust.net>
@mattbrandman
Copy link

mattbrandman commented Mar 17, 2024

Just wanted to say I’m so excited to see this merged soon. Should reduce my deployment for a new cluster from 10-15 steps to 3 or so! Thanks for all this awesome work @jannfis

@jannfis
Copy link
Member Author

jannfis commented Mar 19, 2024

I think this is good to go for the first iteration. The only remaining question imho would be: Should this feature be behind a feature-flag or not?

cc @crenshaw-dev @alexmt @pasha-codefresh

@jsoref
Copy link
Member

jsoref commented Mar 19, 2024

It's certainly non-trivial. But I'd want to play with it, and my default playground is https://cd.apps.argoproj.io/applications so if the feature flag wasn't enabled there, I wouldn't play with it ...

@jannfis
Copy link
Member Author

jannfis commented Mar 19, 2024

@jsoref Thanks for that thought. But I was wondering, how would you play with that feature on a read-only public instance?

@jsoref
Copy link
Member

jsoref commented Mar 19, 2024

I'd expect to be able to see all the screens and configs w/o making changes--Ideally it's enough to get a sense of the actual layouts and interactions...

@blakepettersson
Copy link
Member

Should this feature be behind a feature-flag or not?

IMO it should not, but that's just me 🙏

@jannfis
Copy link
Member Author

jannfis commented Mar 30, 2024

@CodiumAI-Agent /review

@CodiumAI-Agent
Copy link

PR Review

⏱️ Estimated effort to review [1-5]

5, because the PR introduces a significant new feature with a wide impact across multiple files, including changes to the core data models, addition of new functionalities, and updates to the documentation. The complexity and size of the changes require a thorough review to ensure quality and maintainability.

🧪 Relevant tests

Yes

🔍 Possible issues

Possible Bug: The implementation assumes that applications can only depend on other applications within the same namespace and project. This might not always be desirable or sufficient for all use cases.

Performance Concern: The recursive dependency resolution could potentially lead to performance issues for applications with a large number of dependencies or deeply nested dependency trees.

🔒 Security concerns

No

Code feedback:
relevant filecontroller/appcontroller.go
suggestion      

Consider implementing a mechanism to prevent infinite recursion or stack overflow in refreshDependencies. This could happen if there's a circular dependency between applications. A simple way to mitigate this is by keeping track of already visited applications during the recursion and stopping if a cycle is detected. [important]

relevant linefunc (ctrl *ApplicationController) refreshDependencies(app *v1alpha1.Application, state *v1alpha1.OperationState) {

relevant filecontroller/dependencies.go
suggestion      

To improve the efficiency of dependency resolution, consider caching the results of dependency lookups. This can significantly reduce the load on the Kubernetes API server in clusters with a large number of applications. [medium]

relevant linefunc (mgr *appStateManager) buildDependencyGraph(app *v1alpha1.Application, depGraph *topsort.Graph) error {

relevant filepkg/apis/application/v1alpha1/types.go
suggestion      

For better usability and future extensibility, consider adding validation for the ApplicationDependency struct. This could include checks like ensuring syncDelay and timeout are positive values and that selectors are not empty when blockOnEmpty is true. [medium]

relevant linetype ApplicationDependency struct {

relevant filedocs/user-guide/app_dependencies.md
suggestion      

Add a section on troubleshooting common issues with application dependencies, such as circular dependencies or misconfigured selectors. This can help users diagnose and resolve issues more quickly. [medium]

relevant line## Configuring Application dependencies


✨ Review tool usage guide:

Overview:
The review tool scans the PR code changes, and generates a PR review which includes several types of feedbacks, such as possible PR issues, security threats and relevant test in the PR. More feedbacks can be added by configuring the tool.

The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on any PR.

  • When commenting, to edit configurations related to the review tool (pr_reviewer section), use the following template:
/review --pr_reviewer.some_config1=... --pr_reviewer.some_config2=...
[pr_reviewer]
some_config1=...
some_config2=...

See the review usage page for a comprehensive guide on using this tool.

@mattbrandman
Copy link

Hi all curious if there is a general timeline on this being merged as it looks like its passing all the tests?

@yinzara
Copy link

yinzara commented Apr 19, 2024

This is an absolutely essential feature for my business as well. Without this spinning up new environments with ArgoCD is nearly impossible as it requires knowledge of which apps are dependent on which apps to apply Application(s) in that order and let them sync as you apply them.

Signed-off-by: jannfis <jann@mistrust.net>
Signed-off-by: jannfis <jann@mistrust.net>
Copy link
Collaborator

@crenshaw-dev crenshaw-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some thoughts.

I'm generally uncomfortable adding another ordering mechanism to Argo CD. I think that we risk moving folks away from "eventual consistency" models of deployment in k8s back to an imperative CI-type model, just built in Argo CD instead of in Jenkins. I think if we push forward with this feature, we should write documentation outlining use cases where this tool makes sense as well as a list of alternatives which could provide a better experience (controllers that enforce ordering, validating webhooks that block deployment until prerequisites are met, readiness checks which prevent apps from spinning up too soon, etc.).

I think we also need to ship the MVP with strong UI/CLI tools to allow users to answer the question "why isn't my app syncing yet?" That probably means the UI needs to display what the app is blocked on. It might even mean building a UI/CLI experience to visualize the DAG or at least list apps that are in the blocking path. Otherwise I think Argo CD support will be filled with questions about apps not syncing, and we won't be in a good position to help the reporters diagnose the issue. At the very least there should be a section in the docs titled "how to find out why my app is blocked."

I worry that the feature will mostly be used for bootstrapping clusters and that the sync-in-order feature will actually not be that useful once the initial setup is in place and might actually be more problematic than helpful. I worry that those bootstrapping use cases represent >50% of use cases. This is part of why I think the docs need to clearly outline valid use cases.

I think the docs also need a section describing the downsides of "intentional drift" and how you lose some of the benefits of GitOps when you use a tool that introduces intentional drift.

tl;dr: I'm alright with the feature as a PoC marked alpha behind a feature flag, if the docs are really really solid and if the UX is good. I still have major concerns about encouraging anti-patterns. But maybe once the feature is available, people can help us understand which use cases make sense and which should be avoided.

@@ -0,0 +1,121 @@
# Application dependencies

Starting from Argo CD v2.9, you can natively model dependencies between Applications. This new mechanism is intended to replace the prior way of modeling dependencies when using the app-of-apps pattern with sync waves on the Application custom resources.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intended to replace the prior way of modeling dependencies when using the app-of-apps pattern with sync waves

Maybe "as an alternative to"? Neither apps-of-apps nor sync waves are going anywhere, so I think that approach would still be valid.


Using the new dependency mechanism, users are not bound to the app-of-apps pattern anymore and can define dependencies between applications using application selectors to define a directed, acyclic graph for dependency modeling. This means that an Application which defines one or more dependencies do not only depend on its direct dependencies, but to the dependencies of its dependencies as well.

Application dependencies support both, manual and automatic sync policies.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Application dependencies support both, manual and automatic sync policies.
Application dependencies support both manual and automatic sync policies.

docs/user-guide/app_dependencies.md Show resolved Hide resolved
selectors:
- labelSelector:
matchLabels:
name: some-app
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: some-app
name: some-app

docs/user-guide/app_dependencies.md Show resolved Hide resolved

## Dependency selectors

Currently, the available properties to match against are `labels` and the `name` of the Application. Each selector can yield zero or more applications.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior if my selectors match zero applications? Am I considered blocked because the dependency isn't satisfied, or unblocked because there are no dependencies?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it defaults to "don't block". Should probably mention blockOnEmpty` here and maybe link to that section of the doc.

name: some-app
```

The above YAML will have the Application depend on any other Application that matches the given label selector. In this case, any application carrying a label `name` with the value of `some-app`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using name for a label example is a bit confusing.


The above YAML will have the Application depend on the Applications `some-app` and `other-app`.

The `namePattern` matcher supports shell-style glob matching, so the following example would match all Applications whose name is starting with `stage1-`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we specify the glob engine used to evaluate these and link to its docs? We use multiple glob engines for different purposes, so being specific should be quite helpful to the user.

```yaml
spec:
dependsOn:
syncDelay: 5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename the field syncDelaySeconds?

```yaml
spec:
dependsOn:
timeout: 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe timeoutSeconds?

@simonoff
Copy link

Added some thoughts.

I'm generally uncomfortable adding another ordering mechanism to Argo CD. I think that we risk moving folks away from "eventual consistency" models of deployment in k8s back to an imperative CI-type model, just built in Argo CD instead of in Jenkins. I think if we push forward with this feature, we should write documentation outlining use cases where this tool makes sense as well as a list of alternatives which could provide a better experience (controllers that enforce ordering, validating webhooks that block deployment until prerequisites are met, readiness checks which prevent apps from spinning up too soon, etc.).

It looks like you never have head a complex apps with dynamic previews. For now we have a previews via appsets and they includes a DB, redis, etc setup. Setting of different argocd.argoproj.io/sync-wave just not working at all - as for me its not track such value withing appsets. So only way - define dependencies.

@blakepettersson
Copy link
Member

Added some thoughts.

I'm generally uncomfortable adding another ordering mechanism to Argo CD. I think that we risk moving folks away from "eventual consistency" models of deployment in k8s back to an imperative CI-type model, just built in Argo CD instead of in Jenkins. I think if we push forward with this feature, we should write documentation outlining use cases where this tool makes sense as well as a list of alternatives which could provide a better experience (controllers that enforce ordering, validating webhooks that block deployment until prerequisites are met, readiness checks which prevent apps from spinning up too soon, etc.).

I generally agree that eventual consistency should be the default way to go (and I'm sure @vfarcic would approve). I see this PR as a stepping stone to be able to enforce a specific order of deletion, which IMO is something necessary - perhaps we can get some inspiration from crossplane/crossplane#3393.

Otherwise I'm 💯 with what you're saying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Application dependencies