Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting a provisioner causes nodes to be cordoned and removed #2466

Closed
andrewhibbert opened this issue Sep 5, 2022 · 7 comments
Closed

Comments

@andrewhibbert
Copy link
Contributor

Version

Karpenter: v0.16.1

Kubernetes: v1.0.0

Expected Behavior

Deleting a provisioner does not affect running nodes

Actual Behavior

Deleting a provisioner cordoned and removed nodes

karpenter-7fd86b488d-b7wfs controller 2022-09-05T17:36:53.868Z	DEBUG	controller.consolidation	Discovered EC2 instance types zonal offerings for subnets {"karpenter.sh/discovery/nonprod-shared1":"nonprod-shared1"}	{"commit": "b157d45"}


karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:40.965Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-110-3.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.006Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-107-112.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.010Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-103-16.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.013Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-109-47.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.015Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-100-26.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.016Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-100-156.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.018Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-101-228.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.020Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-102-109.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.020Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-110-16.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.039Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-107-134.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.059Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-105-194.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.064Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-108-239.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.068Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-101-199.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.072Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-108-114.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.073Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-107-207.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.074Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-102-222.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.094Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-106-164.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.113Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-106-78.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.116Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-103-245.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.122Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-108-203.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.325Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-100-70.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.385Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-100-205.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.389Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-100-70.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.423Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-101-158.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.495Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-108-160.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.553Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-100-157.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.559Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-108-213.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:41.588Z	INFO	controller.termination	Cordoned node	{"commit": "b157d45", "node": "ip-10-138-103-51.eu-west-1.compute.internal"}


karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:44.491Z	DEBUG	controller.provisioning	4 out of 509 instance types were excluded because they would breach provisioner limits	{"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:44.499Z	INFO	controller.provisioning	Found 2 provisionable pod(s)	{"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:44.499Z	INFO	controller.provisioning	Computed 1 new node(s) will fit 2 pod(s)	{"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:44.501Z	INFO	controller.provisioning	Launching node with 2 pods requesting {"cpu":"2001m","memory":"11644821760","pods":"12"} from types t3a.2xlarge, c6a.2xlarge, c5a.2xlarge, c6i.2xlarge, t3.2xlarge and 91 other(s)	{"commit": "b157d45", "provisioner": "default-ondemand1"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:44.649Z	DEBUG	controller.provisioning.cloudprovider	Discovered security groups: [sg-04993f2c13c6edf86 sg-0e4f544154d98f812]	{"commit": "b157d45", "provisioner": "default-ondemand1"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:44.650Z	DEBUG	controller.provisioning.cloudprovider	Discovered kubernetes version 1.20	{"commit": "b157d45", "provisioner": "default-ondemand1"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:44.698Z	DEBUG	controller.provisioning.cloudprovider	Discovered images: [ami-00569a4bbe9c7c3b6]	{"commit": "b157d45", "provisioner": "default-ondemand1"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:44.877Z	DEBUG	controller.provisioning.cloudprovider	Created launch template, Karpenter-nonprod-shared1-12545305923228748649	{"commit": "b157d45", "provisioner": "default-ondemand1"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:46.521Z	INFO	controller.provisioning.cloudprovider	Launched instance: i-0041eacb3a54c7f9c, hostname: ip-10-138-110-246.eu-west-1.compute.internal, type: t3a.2xlarge, zone: eu-west-1c, capacityType: on-demand	{"commit": "b157d45", "provisioner": "default-ondemand1"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:57.942Z	DEBUG	controller.provisioning	4 out of 509 instance types were excluded because they would breach provisioner limits	{"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:57.953Z	DEBUG	controller.provisioning	9 out of 509 instance types were excluded because they would breach provisioner limits	{"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:57.968Z	INFO	controller.provisioning	Found 20 provisionable pod(s)	{"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:57.968Z	INFO	controller.provisioning	Computed 2 new node(s) will fit 15 pod(s)	{"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:57.968Z	INFO	controller.provisioning	Computed 1 unready node(s) will fit 5 pod(s)	{"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:57.968Z	INFO	controller.provisioning	Launching node with 8 pods requesting {"cpu":"14901m","memory":"83182870784","pods":"18"} from types r6a.4xlarge, r5a.4xlarge, r5.4xlarge, r6i.4xlarge, r5ad.4xlarge and 13 other(s)	{"commit": "b157d45", "provisioner": "default-ondemand2"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:57.970Z	INFO	controller.provisioning	Launching node with 7 pods requesting {"cpu":"3251m","memory":"71774363904","pods":"17"} from types r6a.4xlarge, r5a.4xlarge, r6i.4xlarge, r5.4xlarge, r5ad.4xlarge and 18 other(s)	{"commit": "b157d45", "provisioner": "default-ondemand2"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:58.229Z	DEBUG	controller.provisioning.cloudprovider	Created launch template, Karpenter-nonprod-shared1-8590706432745671482	{"commit": "b157d45", "provisioner": "default-ondemand2"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:29:59.790Z	INFO	controller.provisioning.cloudprovider	Launched instance: i-055b0e1bd19beb5ba, hostname: ip-10-138-101-50.eu-west-1.compute.internal, type: r6a.4xlarge, zone: eu-west-1a, capacityType: on-demand	{"commit": "b157d45", "provisioner": "default-ondemand2"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:30:01.377Z	INFO	controller.provisioning.cloudprovider	Launched instance: i-0317ab423f0882e9b, hostname: ip-10-138-105-185.eu-west-1.compute.internal, type: r6a.4xlarge, zone: eu-west-1b, capacityType: on-demand	{"commit": "b157d45", "provisioner": "default-ondemand2"}
karpenter-7fd86b488d-b7wfs controller 2022-09-05T19:30:01.657Z	INFO	controller.termination	Deleted node	{"commit": "b157d45", "node": "ip-10-138-109-47.eu-west-1.compute.internal"}

Steps to Reproduce the Problem

Resource Specs and Logs

@andrewhibbert andrewhibbert added the bug Something isn't working label Sep 5, 2022
@spring1843 spring1843 removed the bug Something isn't working label Sep 6, 2022
@spring1843
Copy link
Contributor

This behavior is expected since v0.12.0 #1934

@andrewhibbert
Copy link
Contributor Author

Okay. Will try and remember this for next time!

@spring1843
Copy link
Contributor

I'm curious to learn more about your experience, why was the provisioner deleted? Wasn't there another provisioner that Karpenter would use to start a new node?

Not having any provisioners seems to inherently puts the cluster in an undesirable state because if the node(s) launched according to the template die for whatever reason there are no instructions on what type of node(s) to bring up next.

@ishworg
Copy link

ishworg commented Sep 8, 2022

there are no instructions on what type of node(s) to bring up next.

I'd think that there would be reasonable/sane default / best-effort instance types launched to maintain capacity. Node type(s) may or may not match at which point a metric to denote degradation should be raised.

@FernandoMiguel
Copy link
Contributor

there are no instructions on what type of node(s) to bring up next.

I'd think that there would be reasonable/sane default / best-effort instance types launched to maintain capacity. Node type(s) may or may not match at which point a metric to denote degradation should be raised.

especially if those nodes have workloads that have nowhere to go....
i think that just force terminating them is really bad

@olemarkus
Copy link
Contributor

This is analog to just terminating an ASG. You'd see the same behavior then. If you have other providers available, the workloads will have other places to go. If not, I am wondering why you are removing the Provider to begin with.

@github-actions
Copy link
Contributor

Labeled for closure due to inactivity in 10 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants