feat: add concurrency to accelerate Application List and Cluster List #11884

bofeng96 · 2023-01-04T21:42:53Z

Signed-off-by: Bofeng Liu bofeng.liu@tiktok.com

This PR add concurrency to accelerate Application List and Cluster List APIs. And I use RWMutex instead of Mutex as there are more reads than writes.
When checking user groups with group policies, I use hash to avoid nested for loop. It's more efficient when the # of user groups is large.

We have been running this patch in our production environment for 1 week.
With nearly 7000 applications, the initial Application List without concurrency takes more than 30s. With concurrency, we reduce the response time under 10s.

List with cache but without concurrency:

List with cache and concurrency:

Note on DCO:

If the DCO action in the integration test fails, one or more of your commits are not signed off. Please click on the Details link next to the DCO action for instructions on how to resolve this.

Checklist:

codecov · 2023-01-04T21:55:13Z

Codecov Report

Base: 47.35% // Head: 47.30% // Decreases project coverage by -0.04% ⚠️

Coverage data is based on head (bbf25d7) compared to base (35fdd38).
Patch coverage: 65.11% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #11884      +/-   ##
==========================================
- Coverage   47.35%   47.30%   -0.05%     
==========================================
  Files         245      245              
  Lines       41549    41711     +162     
==========================================
+ Hits        19674    19732      +58     
- Misses      19890    19993     +103     
- Partials     1985     1986       +1

Impacted Files	Coverage Δ
server/cluster/cluster.go	`35.88% <0.00%> (-2.10%)`	⬇️
util/rbac/rbac.go	`76.76% <78.57%> (-0.98%)`	⬇️
server/application/application.go	`28.73% <88.46%> (+0.59%)`	⬆️
server/rbacpolicy/rbacpolicy.go	`83.33% <100.00%> (+1.15%)`	⬆️
cmd/argocd/commands/cluster.go	`7.61% <0.00%> (-1.37%)`	⬇️
pkg/apis/application/v1alpha1/types.go	`59.02% <0.00%> (-0.44%)`	⬇️
util/settings/settings.go	`48.42% <0.00%> (ø)`
cmd/argocd/commands/completion.go	`0.00% <0.00%> (ø)`
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Signed-off-by: bofeng96 <bofeng.liu@tiktok.com>

bofeng96 · 2023-01-05T00:02:58Z

Got an error message: race: limit on 8128 simultaneously alive goroutines is exceeded, dying
The race detect package which Argo CD uses set a limitation on # of goroutines.

alexef · 2023-01-05T12:54:19Z

Tagging @alexmt as he worked recently on #11250

Signed-off-by: bofeng96 <bofeng.liu@tiktok.com>

bofeng96 · 2023-01-06T04:15:30Z

New commit limit # number of goroutines

crenshaw-dev · 2023-02-08T20:02:53Z

@bofeng96 Intuit is going to give me time to give this a thorough review in time for 2.7. I'll circle back when it's added to my sprint, sometime in the next ~1mo.

zachaller · 2023-03-09T21:01:44Z

@bofeng96 I have just a few questions in trying to reproduce a bit of this and for my personal knowledge. I was having a hard time locally reproducing with around 7k apps. I wanted to try and figure out where the bottleneck was and if it was due to compute on the Enforce and or latency to remote clusters. Could you go into a little more detail on where you found the bottleneck to be are the clusters you viewing applications for have a lot of remote cluster with higher latency k8s apis calls to?

zachaller · 2023-03-09T21:05:31Z

server/application/application.go

+	var appMap sync.Map
+	semaphore := make(chan struct{}, maxGoroutinesForListApplication)
+	for idx, a := range apps {
+		semaphore <- struct{}{}


argocd already imports x/sync/semaphore here I was wondering if it would be a bit cleaner to use here vs a self rolled semaphore. I think i am fine either way just pointing it out.

I'd prefer usage of Go's Weighted semaphore from x/sync/semaphore for its simple API, and people generally understand it.

zachaller · 2023-03-09T21:06:56Z

server/application/application.go

 	}
+	for i := maxGoroutinesForListApplication; i > 0; i-- {


If we use the x/sync package this can also be replaced with sync.WaitGroup again pretty impartial either way just bringing it up because I think its a bit easier to follow.

zachaller · 2023-03-09T22:19:13Z

server/application/application.go

+	maxPodLogsToRender                     = 10
+	backgroundPropagationPolicy     string = "background"
+	foregroundPropagationPolicy     string = "foreground"
+	maxGoroutinesForListApplication        = 50


What was the logic for choosing 50 and not say 25, should this be user adjustable?

zachaller · 2023-03-09T22:20:05Z

server/rbacpolicy/rbacpolicy.go

@@ -126,6 +126,7 @@ func (p *RBACPolicyEnforcer) EnforceClaims(claims jwt.Claims, rvals ...interface

 	// Check the subject. This is typically the 'admin' case.
 	// NOTE: the call to EnforceWithCustomEnforcer will also consider the default role
+	// TODO: better have a check here for reducing redundant enforcement, because the jwt->subject might be meaningless for Argo CD


Can you explain this a bit more

bofeng96 · 2023-03-14T21:25:32Z

I've been busy this week. Will take a look on weekends.

jannfis · 2023-03-24T18:50:02Z

util/rbac/rbac.go

-func (e *Enforcer) getCabinEnforcer(project string, policy string) CasbinEnforcer {
-	res, err := e.tryGetCabinEnforcer(project, policy)
+func (e *Enforcer) getCasbinEnforcer(project string, policy string) CasbinEnforcer {
+	res, err := e.tryGetCasbinEnforcer(project, policy)


jannfis · 2023-03-24T19:36:08Z

util/rbac/rbac.go

+	if project != "" && policy == "" {
+		val, ok := e.enforcerCache.Get("")


I'm wondering the logic here. Is this to return the cached global policy enforcer for each project that doesn't have a dedicated policy? I think a comment explaining this could be helpful.

bofeng96 force-pushed the bofeng-bge branch from 312ad8b to 3e26711 Compare January 4, 2023 22:12

add concurrency to accelerate Application List and Cluster List

fd01385

Signed-off-by: bofeng96 <bofeng.liu@tiktok.com>

bofeng96 force-pushed the bofeng-bge branch from 3e26711 to fd01385 Compare January 4, 2023 23:24

limit # of goroutines

bbf25d7

Signed-off-by: bofeng96 <bofeng.liu@tiktok.com>

crenshaw-dev mentioned this pull request Jan 13, 2023

/api/v1/applications takes significant amount of time #4296

Closed

crenshaw-dev self-assigned this Feb 8, 2023

crenshaw-dev assigned zachaller and unassigned crenshaw-dev Feb 17, 2023

zachaller reviewed Mar 9, 2023

View reviewed changes

crenshaw-dev mentioned this pull request Mar 24, 2023

Improvement of Application LIST API Function #12985

Closed

jannfis reviewed Mar 24, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add concurrency to accelerate Application List and Cluster List #11884

feat: add concurrency to accelerate Application List and Cluster List #11884

bofeng96 commented Jan 4, 2023 •

edited

codecov bot commented Jan 4, 2023 •

edited

bofeng96 commented Jan 5, 2023

alexef commented Jan 5, 2023

bofeng96 commented Jan 6, 2023

crenshaw-dev commented Feb 8, 2023

zachaller commented Mar 9, 2023

zachaller Mar 9, 2023

jannfis Mar 24, 2023

zachaller Mar 9, 2023

zachaller Mar 9, 2023

zachaller Mar 9, 2023

bofeng96 commented Mar 14, 2023

jannfis Mar 24, 2023

jannfis Mar 24, 2023

		if project != "" && policy == "" {
		val, ok := e.enforcerCache.Get("")

feat: add concurrency to accelerate Application List and Cluster List #11884

Are you sure you want to change the base?

feat: add concurrency to accelerate Application List and Cluster List #11884

Conversation

bofeng96 commented Jan 4, 2023 • edited

codecov bot commented Jan 4, 2023 • edited

Codecov Report

bofeng96 commented Jan 5, 2023

alexef commented Jan 5, 2023

bofeng96 commented Jan 6, 2023

crenshaw-dev commented Feb 8, 2023

zachaller commented Mar 9, 2023

zachaller Mar 9, 2023

Choose a reason for hiding this comment

jannfis Mar 24, 2023

Choose a reason for hiding this comment

zachaller Mar 9, 2023

Choose a reason for hiding this comment

zachaller Mar 9, 2023

Choose a reason for hiding this comment

zachaller Mar 9, 2023

Choose a reason for hiding this comment

bofeng96 commented Mar 14, 2023

jannfis Mar 24, 2023

Choose a reason for hiding this comment

jannfis Mar 24, 2023

Choose a reason for hiding this comment

bofeng96 commented Jan 4, 2023 •

edited

codecov bot commented Jan 4, 2023 •

edited