Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add concurrency to accelerate Application List and Cluster List #11884

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bofeng96
Copy link

@bofeng96 bofeng96 commented Jan 4, 2023

Signed-off-by: Bofeng Liu bofeng.liu@tiktok.com

This PR add concurrency to accelerate Application List and Cluster List APIs. And I use RWMutex instead of Mutex as there are more reads than writes.
When checking user groups with group policies, I use hash to avoid nested for loop. It's more efficient when the # of user groups is large.

We have been running this patch in our production environment for 1 week.
With nearly 7000 applications, the initial Application List without concurrency takes more than 30s. With concurrency, we reduce the response time under 10s.

image

image

List with cache but without concurrency:
image

List with cache and concurrency:
image

Note on DCO:

If the DCO action in the integration test fails, one or more of your commits are not signed off. Please click on the Details link next to the DCO action for instructions on how to resolve this.

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • Optional. My organization is added to USERS.md.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).

@codecov
Copy link

codecov bot commented Jan 4, 2023

Codecov Report

Base: 47.35% // Head: 47.30% // Decreases project coverage by -0.04% ⚠️

Coverage data is based on head (bbf25d7) compared to base (35fdd38).
Patch coverage: 65.11% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11884      +/-   ##
==========================================
- Coverage   47.35%   47.30%   -0.05%     
==========================================
  Files         245      245              
  Lines       41549    41711     +162     
==========================================
+ Hits        19674    19732      +58     
- Misses      19890    19993     +103     
- Partials     1985     1986       +1     
Impacted Files Coverage Δ
server/cluster/cluster.go 35.88% <0.00%> (-2.10%) ⬇️
util/rbac/rbac.go 76.76% <78.57%> (-0.98%) ⬇️
server/application/application.go 28.73% <88.46%> (+0.59%) ⬆️
server/rbacpolicy/rbacpolicy.go 83.33% <100.00%> (+1.15%) ⬆️
cmd/argocd/commands/cluster.go 7.61% <0.00%> (-1.37%) ⬇️
pkg/apis/application/v1alpha1/types.go 59.02% <0.00%> (-0.44%) ⬇️
util/settings/settings.go 48.42% <0.00%> (ø)
cmd/argocd/commands/completion.go 0.00% <0.00%> (ø)
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Signed-off-by: bofeng96 <bofeng.liu@tiktok.com>
@bofeng96
Copy link
Author

bofeng96 commented Jan 5, 2023

Got an error message: race: limit on 8128 simultaneously alive goroutines is exceeded, dying
The race detect package which Argo CD uses set a limitation on # of goroutines.

@alexef
Copy link
Member

alexef commented Jan 5, 2023

Tagging @alexmt as he worked recently on #11250

Signed-off-by: bofeng96 <bofeng.liu@tiktok.com>
@bofeng96
Copy link
Author

bofeng96 commented Jan 6, 2023

New commit limit # number of goroutines

@crenshaw-dev
Copy link
Collaborator

@bofeng96 Intuit is going to give me time to give this a thorough review in time for 2.7. I'll circle back when it's added to my sprint, sometime in the next ~1mo.

@zachaller
Copy link
Contributor

@bofeng96 I have just a few questions in trying to reproduce a bit of this and for my personal knowledge. I was having a hard time locally reproducing with around 7k apps. I wanted to try and figure out where the bottleneck was and if it was due to compute on the Enforce and or latency to remote clusters. Could you go into a little more detail on where you found the bottleneck to be are the clusters you viewing applications for have a lot of remote cluster with higher latency k8s apis calls to?

var appMap sync.Map
semaphore := make(chan struct{}, maxGoroutinesForListApplication)
for idx, a := range apps {
semaphore <- struct{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argocd already imports x/sync/semaphore here I was wondering if it would be a bit cleaner to use here vs a self rolled semaphore. I think i am fine either way just pointing it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer usage of Go's Weighted semaphore from x/sync/semaphore for its simple API, and people generally understand it.

}
for i := maxGoroutinesForListApplication; i > 0; i-- {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the x/sync package this can also be replaced with sync.WaitGroup again pretty impartial either way just bringing it up because I think its a bit easier to follow.

maxPodLogsToRender = 10
backgroundPropagationPolicy string = "background"
foregroundPropagationPolicy string = "foreground"
maxGoroutinesForListApplication = 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the logic for choosing 50 and not say 25, should this be user adjustable?

@@ -126,6 +126,7 @@ func (p *RBACPolicyEnforcer) EnforceClaims(claims jwt.Claims, rvals ...interface

// Check the subject. This is typically the 'admin' case.
// NOTE: the call to EnforceWithCustomEnforcer will also consider the default role
// TODO: better have a check here for reducing redundant enforcement, because the jwt->subject might be meaningless for Argo CD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this a bit more

@bofeng96
Copy link
Author

I've been busy this week. Will take a look on weekends.

Comment on lines -92 to +93
func (e *Enforcer) getCabinEnforcer(project string, policy string) CasbinEnforcer {
res, err := e.tryGetCabinEnforcer(project, policy)
func (e *Enforcer) getCasbinEnforcer(project string, policy string) CasbinEnforcer {
res, err := e.tryGetCasbinEnforcer(project, policy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +105 to +106
if project != "" && policy == "" {
val, ok := e.enforcerCache.Get("")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering the logic here. Is this to return the cached global policy enforcer for each project that doesn't have a dedicated policy? I think a comment explaining this could be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review (Due by 2023-3-20)
Development

Successfully merging this pull request may close these issues.

None yet

5 participants