-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add concurrency to accelerate Application List and Cluster List #11884
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportBase: 47.35% // Head: 47.30% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #11884 +/- ##
==========================================
- Coverage 47.35% 47.30% -0.05%
==========================================
Files 245 245
Lines 41549 41711 +162
==========================================
+ Hits 19674 19732 +58
- Misses 19890 19993 +103
- Partials 1985 1986 +1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Signed-off-by: bofeng96 <bofeng.liu@tiktok.com>
Got an error message: race: limit on 8128 simultaneously alive goroutines is exceeded, dying |
Signed-off-by: bofeng96 <bofeng.liu@tiktok.com>
New commit limit # number of goroutines |
@bofeng96 Intuit is going to give me time to give this a thorough review in time for 2.7. I'll circle back when it's added to my sprint, sometime in the next ~1mo. |
@bofeng96 I have just a few questions in trying to reproduce a bit of this and for my personal knowledge. I was having a hard time locally reproducing with around 7k apps. I wanted to try and figure out where the bottleneck was and if it was due to compute on the Enforce and or latency to remote clusters. Could you go into a little more detail on where you found the bottleneck to be are the clusters you viewing applications for have a lot of remote cluster with higher latency k8s apis calls to? |
var appMap sync.Map | ||
semaphore := make(chan struct{}, maxGoroutinesForListApplication) | ||
for idx, a := range apps { | ||
semaphore <- struct{}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argocd already imports x/sync/semaphore here I was wondering if it would be a bit cleaner to use here vs a self rolled semaphore. I think i am fine either way just pointing it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer usage of Go's Weighted
semaphore from x/sync/semaphore
for its simple API, and people generally understand it.
} | ||
for i := maxGoroutinesForListApplication; i > 0; i-- { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use the x/sync package this can also be replaced with sync.WaitGroup again pretty impartial either way just bringing it up because I think its a bit easier to follow.
maxPodLogsToRender = 10 | ||
backgroundPropagationPolicy string = "background" | ||
foregroundPropagationPolicy string = "foreground" | ||
maxGoroutinesForListApplication = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the logic for choosing 50 and not say 25, should this be user adjustable?
@@ -126,6 +126,7 @@ func (p *RBACPolicyEnforcer) EnforceClaims(claims jwt.Claims, rvals ...interface | |||
|
|||
// Check the subject. This is typically the 'admin' case. | |||
// NOTE: the call to EnforceWithCustomEnforcer will also consider the default role | |||
// TODO: better have a check here for reducing redundant enforcement, because the jwt->subject might be meaningless for Argo CD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this a bit more
I've been busy this week. Will take a look on weekends. |
func (e *Enforcer) getCabinEnforcer(project string, policy string) CasbinEnforcer { | ||
res, err := e.tryGetCabinEnforcer(project, policy) | ||
func (e *Enforcer) getCasbinEnforcer(project string, policy string) CasbinEnforcer { | ||
res, err := e.tryGetCasbinEnforcer(project, policy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
if project != "" && policy == "" { | ||
val, ok := e.enforcerCache.Get("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering the logic here. Is this to return the cached global policy enforcer for each project that doesn't have a dedicated policy? I think a comment explaining this could be helpful.
Signed-off-by: Bofeng Liu bofeng.liu@tiktok.com
This PR add concurrency to accelerate Application List and Cluster List APIs. And I use RWMutex instead of Mutex as there are more reads than writes.
When checking user groups with group policies, I use hash to avoid nested for loop. It's more efficient when the # of user groups is large.
We have been running this patch in our production environment for 1 week.
With nearly 7000 applications, the initial Application List without concurrency takes more than 30s. With concurrency, we reduce the response time under 10s.
List with cache but without concurrency:
List with cache and concurrency:
Note on DCO:
If the DCO action in the integration test fails, one or more of your commits are not signed off. Please click on the Details link next to the DCO action for instructions on how to resolve this.
Checklist: