Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter takes time to provision a node #2250

Closed
himanshurajput32 opened this issue Aug 3, 2022 · 7 comments
Closed

Karpenter takes time to provision a node #2250

himanshurajput32 opened this issue Aug 3, 2022 · 7 comments
Labels
lifecycle/closed lifecycle/stale question Further information is requested

Comments

@himanshurajput32
Copy link

Version

Karpenter: v0.10.1

Kubernetes: v1.21

Expected Behavior

Karpenter should launch a new worker node in case of unscheduled pods when there is no capacity on the cluster

Actual Behavior

Sometime Karpenter takes 10-30 minutes to provision a new node, till then the pods remain in unscheduled state.

Steps to Reproduce the Problem

This issue is intermittent. We are also not able to reproduce it explicitly.

Resource Specs and Logs

Application logs -

2022-07-20T12:50:54Z 0/8 nodes are available: 1 Insufficient memory, 1 node(s) were unschedulable, 3 Insufficient cpu, 3 node(s) didn't match Pod's node affinity/selector.
--
2022-07-20T13:00:10Z pod begins starting
2022-07-20T13:00:59Z pd promote stage deployment progression timeout (10m; progress deadline secs)
2022-07-20T13:02:40Z pod ready

Karpenter Controller logs at the same time -

2022-07-20T12:49:59.204Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:50:04.200Z	INFO	controller	Batched 1 pod(s) in 1.000169786s	{"commit": "10fc37b"}
2022-07-20T12:50:04.204Z	INFO	controller	0 pod(s) will schedule against new capacity, 1 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:50:04.204Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:50:09.207Z	INFO	controller	Batched 1 pod(s) in 1.000241421s	{"commit": "10fc37b"}
2022-07-20T12:50:09.298Z	DEBUG	controller	Discovered subnets: [subnet-05c1258158c376ed2 (us-west-2c) subnet-0825688abc394a699 (us-west-2b) subnet-003116e009003af3b (us-west-2a)]	{"commit": "10fc37b"}
2022-07-20T12:50:09.300Z	INFO	controller	0 pod(s) will schedule against new capacity, 1 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:50:09.300Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:50:14.208Z	INFO	controller	Batched 1 pod(s) in 1.000362964s	{"commit": "10fc37b"}
2022-07-20T12:50:14.212Z	INFO	controller	0 pod(s) will schedule against new capacity, 1 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:50:14.212Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:50:19.208Z	INFO	controller	Batched 1 pod(s) in 1.00018368s	{"commit": "10fc37b"}
2022-07-20T12:50:19.213Z	INFO	controller	0 pod(s) will schedule against new capacity, 1 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:50:19.213Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:50:24.208Z	INFO	controller	Batched 1 pod(s) in 1.000012813s	{"commit": "10fc37b"}
2022-07-20T12:50:24.213Z	INFO	controller	0 pod(s) will schedule against new capacity, 1 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:50:24.213Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:50:29.209Z	INFO	controller	Batched 1 pod(s) in 1.00010796s	{"commit": "10fc37b"}
2022-07-20T12:50:29.216Z	INFO	controller	0 pod(s) will schedule against new capacity, 1 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:50:29.216Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:50:34.211Z	INFO	controller	Batched 1 pod(s) in 1.000954899s	{"commit": "10fc37b"}
2022-07-20T12:50:34.215Z	INFO	controller	0 pod(s) will schedule against new capacity, 1 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:50:34.216Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:50:39.211Z	INFO	controller	Batched 1 pod(s) in 1.000202283s	{"commit": "10fc37b"}
2022-07-20T12:50:39.215Z	INFO	controller	0 pod(s) will schedule against new capacity, 1 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:50:39.215Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:50:44.211Z	INFO	controller	Batched 1 pod(s) in 1.000462864s	{"commit": "10fc37b"}
2022-07-20T12:50:44.216Z	INFO	controller	0 pod(s) will schedule against new capacity, 1 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:50:44.216Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:50:49.212Z	INFO	controller	Batched 1 pod(s) in 1.000077565s	{"commit": "10fc37b"}
2022-07-20T12:50:49.216Z	INFO	controller	0 pod(s) will schedule against new capacity, 1 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:50:49.216Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:50:54.213Z	INFO	controller	Batched 1 pod(s) in 1.000376571s	{"commit": "10fc37b"}
2022-07-20T12:50:54.217Z	INFO	controller	0 pod(s) will schedule against new capacity, 1 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:50:54.217Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:50:55.849Z	INFO	controller	Batched 2 pod(s) in 1.001031003s	{"commit": "10fc37b"}
2022-07-20T12:50:55.854Z	INFO	controller	0 pod(s) will schedule against new capacity, 2 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:50:55.854Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:50:59.213Z	INFO	controller	Batched 2 pod(s) in 1.000024757s	{"commit": "10fc37b"}
2022-07-20T12:50:59.218Z	INFO	controller	0 pod(s) will schedule against new capacity, 2 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:50:59.218Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:51:00.848Z	INFO	controller	Batched 2 pod(s) in 1.000274314s	{"commit": "10fc37b"}
2022-07-20T12:51:00.853Z	INFO	controller	0 pod(s) will schedule against new capacity, 2 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:51:00.853Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:51:04.214Z	INFO	controller	Batched 2 pod(s) in 1.000664114s	{"commit": "10fc37b"}
2022-07-20T12:51:04.222Z	INFO	controller	0 pod(s) will schedule against new capacity, 2 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:51:04.222Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:51:05.850Z	INFO	controller	Batched 2 pod(s) in 1.000800108s	{"commit": "10fc37b"}
2022-07-20T12:51:05.855Z	INFO	controller	0 pod(s) will schedule against new capacity, 2 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:51:05.855Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:51:09.215Z	INFO	controller	Batched 2 pod(s) in 1.000367303s	{"commit": "10fc37b"}
2022-07-20T12:51:09.220Z	INFO	controller	0 pod(s) will schedule against new capacity, 2 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:51:09.220Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:51:10.850Z	INFO	controller	Batched 2 pod(s) in 1.000368933s	{"commit": "10fc37b"}
2022-07-20T12:51:10.897Z	DEBUG	controller	Discovered subnets: [subnet-05c1258158c376ed2 (us-west-2c) subnet-0825688abc394a699 (us-west-2b) subnet-003116e009003af3b (us-west-2a)]	{"commit": "10fc37b"}
2022-07-20T12:51:10.898Z	INFO	controller	0 pod(s) will schedule against new capacity, 2 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:51:10.898Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:51:14.216Z	INFO	controller	Batched 2 pod(s) in 1.000066316s	{"commit": "10fc37b"}
2022-07-20T12:51:14.220Z	INFO	controller	0 pod(s) will schedule against new capacity, 2 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:51:14.220Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:51:15.851Z	INFO	controller	Batched 2 pod(s) in 1.000464649s	{"commit": "10fc37b"}
2022-07-20T12:51:15.856Z	INFO	controller	0 pod(s) will schedule against new capacity, 2 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:51:15.856Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:51:19.217Z	INFO	controller	Batched 2 pod(s) in 1.000396259s	{"commit": "10fc37b"}
2022-07-20T12:51:19.224Z	INFO	controller	0 pod(s) will schedule against new capacity, 2 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:51:19.224Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:51:20.852Z	INFO	controller	Batched 2 pod(s) in 1.000796909s	{"commit": "10fc37b"}
2022-07-20T12:51:20.857Z	INFO	controller	0 pod(s) will schedule against new capacity, 2 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:51:20.857Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:51:24.217Z	INFO	controller	Batched 2 pod(s) in 1.000775479s	{"commit": "10fc37b"}
2022-07-20T12:51:24.222Z	INFO	controller	0 pod(s) will schedule against new capacity, 2 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:51:24.222Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:51:25.852Z	INFO	controller	Batched 2 pod(s) in 1.000728738s	{"commit": "10fc37b"}
2022-07-20T12:51:25.857Z	INFO	controller	0 pod(s) will schedule against new capacity, 2 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:51:25.857Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:51:29.218Z	INFO	controller	Batched 2 pod(s) in 1.000394194s	{"commit": "10fc37b"}
2022-07-20T12:51:29.222Z	INFO	controller	0 pod(s) will schedule against new capacity, 2 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:51:29.222Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:51:30.853Z	INFO	controller	Batched 1 pod(s) in 1.00098024s	{"commit": "10fc37b"}
2022-07-20T12:51:30.858Z	INFO	controller	0 pod(s) will schedule against new capacity, 1 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:51:30.858Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:51:35.852Z	INFO	controller	Batched 1 pod(s) in 1.000481963s	{"commit": "10fc37b"}
2022-07-20T12:51:35.858Z	INFO	controller	0 pod(s) will schedule against new capacity, 1 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:51:35.858Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}
2022-07-20T12:51:40.854Z	INFO	controller	Batched 1 pod(s) in 1.001049501s	{"commit": "10fc37b"}
2022-07-20T12:51:40.858Z	INFO	controller	0 pod(s) will schedule against new capacity, 1 pod(s) against existing capacity	{"commit": "10fc37b"}
2022-07-20T12:51:40.858Z	INFO	controller	Waiting for unschedulable pods	{"commit": "10fc37b"}

Note - There was no affinity/antiaffinity or nodeSelector on the pods which Karpenter shouldn't have satisfied.

@himanshurajput32 himanshurajput32 added the bug Something isn't working label Aug 3, 2022
@tzneal
Copy link
Contributor

tzneal commented Aug 3, 2022

In this case, Karpenter thought that the pod would schedule against a node already in your cluster. kube-scheduler thought that it wouldn't schedule due to one of these reasons:

  • 1 Insufficient memory
  • 1 node(s) were unschedulable
  • 3 Insufficient cpu
  • 3 node(s) didn't match Pod's node affinity/selector

In newer versions of Karpenter, you can run a kubectl describe pod pod-name to see the events on the pod. We now record an event to indicate which node we think the pod should schedule against which makes diagnosing any problems easier. There have also been scheduling fixes related to max volumes per node that could have caused this problem.

Can you provide the pod spec of the pod that wouldn't schedule?

@himanshurajput32
Copy link
Author

@tzneal
The issue is intermittent. Unfortunately we don't have the spec of pod when the issue occurs because in most of the cases redeployment worked after sometime.

We notice that scheduling logic has been changed on Karpenter in v0.13.2 here https://karpenter.sh/v0.13.2/concepts/#scheduling

Can this newer version solve this issue where Karpenter thought that the pod would schedule against a node already in your cluster. kube-scheduler thought that it wouldn't schedule due to some reasons ?

@tzneal
Copy link
Contributor

tzneal commented Aug 3, 2022

It's possible, and if it does re-occur you'll be able to describe the pod to determine which node we thought it would schedule to.

@dewjam dewjam added question Further information is requested and removed bug Something isn't working labels Aug 8, 2022
@cebernardi
Copy link
Contributor

what does it happen when Karpenter thinks that a pod should be scheduled against existing capacity, while the kube-scheduler marked it unschedulable? What schedules the pod in the end?

@tzneal
Copy link
Contributor

tzneal commented Aug 15, 2022

@cebernardi If that occurs, it's a bug and you can report it. We rely on kube-scheduler to perform the scheduling.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 5, 2022

Labeled for closure due to inactivity in 10 days.

@ricardorqr
Copy link

Any solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/closed lifecycle/stale question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants