Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize processing managed resources in getResourceTree, ideally from quadratic to linear #18929

Closed
andrii-korotkov-verkada opened this issue Jul 4, 2024 · 4 comments · Fixed by #18972
Labels
component:api API bugs and enhancements enhancement New feature or request type:enhancement

Comments

@andrii-korotkov-verkada
Copy link
Contributor

andrii-korotkov-verkada commented Jul 4, 2024

Summary

getResourceTree has a loop through each of managed resources, calling IterateHierarchy on state cache to go over children, which ends up calling IterateHierarchy in gitops-engine, which has a loop over resources in a namespace (https://github.com/argoproj/gitops-engine/blob/master/pkg/cache/cluster.go#L973), which calls iterateChildren which has a similar loop (https://github.com/argoproj/gitops-engine/blob/master/pkg/cache/resource.go#L89). This is presumably since we only keep one-way track of child -> parent relationship, while effective traversal requires the opposite. Overall, this can result in a quadratic execution time of O(tree_size * namespace_resources_count). If we pre-construct the graph with parent -> child edges, this can be reduced to linear O(namespace_resources_count).

Motivation

getResourceTree seems to be the slowest part of reconciliation for large apps, e.g. see timing data from a build with #18926:
Screenshot 2024-07-03 at 7 16 57 PM
Screenshot 2024-07-03 at 7 17 09 PM
Getting resource tree took almost 4 minutes and was where almost all the reconciliation time went. It's an app with ~500 resources which don't even have children. And it's not the biggest cluster (staging). The biggest cluster (one of prod) can take 30-60 min to reconcile that app even with no changes.
The algorithm doesn't have to be quadratic and can be improved to linear.

Proposal

Pre-construct a graph from namespace resources parent -> child and do a linear dfs from each of managed resources, avoiding visiting same vertex twice. gitops-engine changes may not even be needed, but may be good to have anyways. Overall, this would reduce the time complexity from quadratic to linear.

A sub-optimal fix can be to configure some resource groups and kinds as not having any children, so hierarchy iteration can be skipped for them.

@andrii-korotkov-verkada
Copy link
Contributor Author

I'll try to implement this.

@andrii-korotkov-verkada
Copy link
Contributor Author

isParentOf can't be directly used to construct a graph, since it'd result in quadratic time complexity. It does some backfill, not just UID-based check https://github.com/argoproj/gitops-engine/blob/master/pkg/cache/resource.go#L36-L52, so we need to have some map of nodes based on UID and based on kind + api version + name.

@andrii-korotkov-verkada
Copy link
Contributor Author

Here's a gitops-engine PR argoproj/gitops-engine#601. It's going to be the main PR with nearly all the logic, argo-cd PR would just call a new API.

andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Jul 7, 2024
…proj#18929)

NOTE: this PR right now refences a gitops-engine in a fork for testing purposes

Closes argoproj#18929
Helps with argoproj#18500
Use iterate hierarchy v2 to have a roughly linear performance for getting the resource tree instead of up to quadratic.
Also, use more lenient locking when processing events (part of gitops-engine changes).

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Jul 7, 2024
…proj#18929)

NOTE: this PR right now refences a gitops-engine in a fork for testing purposes

Closes argoproj#18929
Helps with argoproj#18500
Use iterate hierarchy v2 to have a roughly linear performance for getting the resource tree instead of up to quadratic.
Also, use more lenient locking when processing events (part of gitops-engine changes).

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Jul 7, 2024
…proj#18929)

NOTE: this PR right now refences a gitops-engine in a fork for testing purposes

Closes argoproj#18929
Helps with argoproj#18500
Use iterate hierarchy v2 to have a roughly linear performance for getting the resource tree instead of up to quadratic.
Also, use more lenient locking when processing events (part of gitops-engine changes).

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Jul 7, 2024
…proj#18929)

NOTE: this PR right now refences a gitops-engine in a fork for testing purposes

Closes argoproj#18929
Helps with argoproj#18500
Use iterate hierarchy v2 to have a roughly linear performance for getting the resource tree instead of up to quadratic.
Also, use more lenient locking when processing events (part of gitops-engine changes).

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
@andrii-korotkov-verkada
Copy link
Contributor Author

Looks really good on live cluster. ~300ms instead of ~4m to get process managed resources for the same application!
Screenshot 2024-07-07 at 4 29 56 PM

crenshaw-dev pushed a commit to crenshaw-dev/argo-cd that referenced this issue Jul 17, 2024
…proj#18929)

NOTE: this PR right now refences a gitops-engine in a fork for testing purposes

Closes argoproj#18929
Helps with argoproj#18500
Use iterate hierarchy v2 to have a roughly linear performance for getting the resource tree instead of up to quadratic.
Also, use more lenient locking when processing events (part of gitops-engine changes).

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
crenshaw-dev pushed a commit to crenshaw-dev/argo-cd that referenced this issue Jul 18, 2024
…proj#18929)

NOTE: this PR right now refences a gitops-engine in a fork for testing purposes

Closes argoproj#18929
Helps with argoproj#18500
Use iterate hierarchy v2 to have a roughly linear performance for getting the resource tree instead of up to quadratic.
Also, use more lenient locking when processing events (part of gitops-engine changes).

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
andrii-korotkov-verkada added a commit to andrii-korotkov-verkada/argo-cd that referenced this issue Jul 19, 2024
Closes argoproj#18929
Helps with argoproj#18500
Use iterate hierarchy v2 to have a roughly linear performance for getting the resource tree instead of up to quadratic.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
ggjulio pushed a commit to ggjulio/argo-cd that referenced this issue Jul 21, 2024
…oj#18972)

Closes argoproj#18929
Helps with argoproj#18500
Use iterate hierarchy v2 to have a roughly linear performance for getting the resource tree instead of up to quadratic.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
vfaergestad pushed a commit to vfaergestad/argo-cd that referenced this issue Jul 22, 2024
…oj#18972)

Closes argoproj#18929
Helps with argoproj#18500
Use iterate hierarchy v2 to have a roughly linear performance for getting the resource tree instead of up to quadratic.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
Signed-off-by: Vegard Færgestad <vegardf@mnemonic.no>
mpelekh added a commit to mpelekh/argo-cd that referenced this issue Aug 1, 2024
…oj#18972)

Closes argoproj#18929
Helps with argoproj#18500
Use iterate hierarchy v2 to have a roughly linear performance for getting the resource tree instead of up to quadratic.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:api API bugs and enhancements enhancement New feature or request type:enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants