Skip to content

lower the number of max concurrent reconciles to 100 for the selection controller#1405

Closed
tzneal wants to merge 1 commit intoaws:mainfrom
tzneal:reduce-number-of-concurrent-reconciles
Closed

lower the number of max concurrent reconciles to 100 for the selection controller#1405
tzneal wants to merge 1 commit intoaws:mainfrom
tzneal:reduce-number-of-concurrent-reconciles

Conversation

@tzneal
Copy link
Copy Markdown
Contributor

@tzneal tzneal commented Feb 23, 2022

1. Issue, if available:

Each 'max concurrent reconcile' is turned into a launched goroutine
by the controller-runtime that watches the queue. I found this as
the race detector errors out when you have more than 8k goroutines
running simultaneously.

2. Description of changes:

Lower concurrent reconciliations to 100

3. How was this change tested?

unit testing

4. Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: link to issue
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Each 'max concurrent reconcile' is turned into a launched goroutine
by the controller-runtime that watches the queue.  I found this as
the race detector errors out when you have more than 8k goroutines
running simultaneously.
@netlify
Copy link
Copy Markdown

netlify bot commented Feb 23, 2022

✔️ Deploy Preview for karpenter-docs-prod canceled.

🔨 Explore the source changes: 02ba184

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/62158df478ae0a00087d6a66

@tzneal
Copy link
Copy Markdown
Contributor Author

tzneal commented Feb 23, 2022

10k goroutines watching the same queue seems excessive, 100 probably still is. Any reason it was 10k?

@tzneal tzneal changed the title lower the number of max reconciles to 100 for the selection controller lower the number of max concurrent reconciles to 100 for the selection controller Feb 23, 2022
@bwagner5
Copy link
Copy Markdown
Contributor

10k goroutines watching the same queue seems excessive, 100 probably still is. Any reason it was 10k?

Since the selection controller responds on individual pod events, karpenter may need a lot of concurrency to prevent scheduling delays. Each pod event seen in the selection controller is sent to a batch in the provisioning controller which returns a channel that will close when the pod has successfully been bound to a node. The selection controller blocks on this channel to verify the pod was scheduled, and if not it can run another loop to relax constraints.

The Provisioners batching window is dynamic and can range from 1 to a few seconds so that we are efficiently packing groups of pods on to the largest nodes. The actual time to provision a node can also range from a few seconds to 10s of seconds.

So although 10k may be slightly high, I don't think it's an outrageous value with the processing times building up over selection, batching, and provisioning. 100 would be too low though since this would effectively limit our pod batches to 100 pods.

Is the problem just that race detection falls over after 8k go routines? That seems like a pretty small number to fall over on. We could go to 6k or so since each provisioner batch window maxes out at 2000 pods before it's flushed. So 6k would support 3 Provisioners doing 2000 pod scale ups at the same time which is a pretty big scale up. Scale ups could obviously be bigger over a period of 10-30 seconds.

Id like to understand the race detector limit more before we make any changes though.

@tzneal
Copy link
Copy Markdown
Contributor Author

tzneal commented Feb 23, 2022

The selection controller blocks on this channel to verify the pod was scheduled, and if not it can run another loop to relax constraints.

Ah thanks, I didn't notice the blocking before. Closing this PR for now, but having 10k goroutines idling as a "plan A" doesn't seem ideal.

Is the problem just that race detection falls over after 8k go routines? That seems like a pretty small number to fall over on. We could go to 6k or so since each provisioner batch window maxes out at 2000 pods before it's flushed. So 6k would support 3 Provisioners doing 2000 pod scale ups at the same time which is a pretty big scale up. Scale ups could obviously be bigger over a period of 10-30 seconds.

That was how I noticed what was occurring. There is a change to remove the limit, but so far it didn't make it to 1.18 (https://go-review.googlesource.com/c/go/+/333529/)

@tzneal tzneal closed this Feb 23, 2022
@tzneal tzneal deleted the reduce-number-of-concurrent-reconciles branch March 4, 2022 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants