Improve resource cleanup #225

g-gaston · 2021-09-16T15:24:00Z

When a cluster creation fails, the process of cleaning up resources is manual, cumbersome and prone to errors
We could automate most of this to improve user experience when debugging.
We already have a --force-cleanup flag, it just doesn't do a lot. Think about everything you need to do when a cluster creation fails before running the cli again, that's what we should try to add to this flow. Examples:

Delete bootstrap cluster. We do that today, but it's not super robust. Find when it doesn't work and fix it.
If we don't support more than one kind cluster running, even if it's not an eks-a one, add a validation for this and give instructions to delete them
Cleanup vsphere vms if it's a vsphere cluster
Cleanup docker resources if it's a docker cluster
Delete <cluster-name> folder

The text was updated successfully, but these errors were encountered:

vivek-koppuru · 2021-09-27T23:16:31Z

@g-gaston Is this a duplicate of #163?

jasonboche · 2021-10-14T16:01:50Z

When a cluster creation fails, the process of cleaning up resources is manual, cumbersome and prone to errors We could automate most of this to improve user experience when debugging. We already have a --force-cleanup flag, it just doesn't do a lot. Think about everything you need to do when a cluster creation fails before running the cli again, that's what we should try to add to this flow. Examples:

Delete bootstrap cluster. We do that today, but it's not super robust. Find when it doesn't work and fix it.

If we don't support more than one kind cluster running, even if it's not an eks-a one, add a validation for this and give instructions to delete them

Cleanup vsphere vms if it's a vsphere cluster

Cleanup docker resources if it's a docker cluster

Delete <cluster-name> folder

What are the manual cleanup steps for a failed cluster deployment? My administrative machine is stuck after a failed deployment to vSphere. I've already powered off and deleted the VMs manually from vSphere.

eksctl anywhere create cluster -f eksa-cluster.yaml
Error: failed to create cluster: error creating bootstrap cluster: error executing create cluster: ERROR: failed to create cluster: node(s) already exist for a cluster with the name "prod-eks-a-cluster"
, try rerunning with --force-cleanup to force delete previously created bootstrap cluster

eksctl anywhere create cluster -f eksa-cluster.yaml --force-cleanup
Error: failed to create cluster: error deleting bootstrap cluster: management cluster in bootstrap cluster

g-gaston · 2021-10-22T15:03:38Z

When a cluster creation fails, the process of cleaning up resources is manual, cumbersome and prone to errors We could automate most of this to improve user experience when debugging. We already have a --force-cleanup flag, it just doesn't do a lot. Think about everything you need to do when a cluster creation fails before running the cli again, that's what we should try to add to this flow. Examples:

Delete bootstrap cluster. We do that today, but it's not super robust. Find when it doesn't work and fix it.

If we don't support more than one kind cluster running, even if it's not an eks-a one, add a validation for this and give instructions to delete them

Cleanup vsphere vms if it's a vsphere cluster

Cleanup docker resources if it's a docker cluster

Delete <cluster-name> folder

What are the manual cleanup steps for a failed cluster deployment? My administrative machine is stuck after a failed deployment to vSphere. I've already powered off and deleted the VMs manually from vSphere.

eksctl anywhere create cluster -f eksa-cluster.yaml Error: failed to create cluster: error creating bootstrap cluster: error executing create cluster: ERROR: failed to create cluster: node(s) already exist for a cluster with the name "prod-eks-a-cluster" , try rerunning with --force-cleanup to force delete previously created bootstrap cluster

eksctl anywhere create cluster -f eksa-cluster.yaml --force-cleanup Error: failed to create cluster: error deleting bootstrap cluster: management cluster in bootstrap cluster

@jasonboche try:

kind delete cluster --name prod-eks-a-cluster

jasonboche · 2021-10-25T17:11:09Z

kind delete cluster --name prod-eks-a-cluster

Thank you kindly. I'll give that a try!

ataince · 2022-03-25T10:54:09Z

When a cluster creation fails, the process of cleaning up resources is manual, cumbersome and prone to errors We could automate most of this to improve user experience when debugging. We already have a --force-cleanup flag, it just doesn't do a lot. Think about everything you need to do when a cluster creation fails before running the cli again, that's what we should try to add to this flow. Examples:

Delete bootstrap cluster. We do that today, but it's not super robust. Find when it doesn't work and fix it.

If we don't support more than one kind cluster running, even if it's not an eks-a one, add a validation for this and give instructions to delete them

Cleanup vsphere vms if it's a vsphere cluster

Cleanup docker resources if it's a docker cluster

Delete <cluster-name> folder

What are the manual cleanup steps for a failed cluster deployment? My administrative machine is stuck after a failed deployment to vSphere. I've already powered off and deleted the VMs manually from vSphere.
eksctl anywhere create cluster -f eksa-cluster.yaml Error: failed to create cluster: error creating bootstrap cluster: error executing create cluster: ERROR: failed to create cluster: node(s) already exist for a cluster with the name "prod-eks-a-cluster" , try rerunning with --force-cleanup to force delete previously created bootstrap cluster
eksctl anywhere create cluster -f eksa-cluster.yaml --force-cleanup Error: failed to create cluster: error deleting bootstrap cluster: management cluster in bootstrap cluster

@jasonboche try:
kind delete cluster --name prod-eks-a-cluster

It didn't work on me :/

chrisdoherty4 · 2022-03-25T13:06:31Z

@ataince What output did you get?

ataince · 2022-03-25T13:09:32Z

@ataince What output did you get?

I think it was abt the memory now I increased it but now it's stuck on this step.

⏳ Collecting support bundle from cluster, this can take a while {"cluster": "dev-cluster", "bundle": "dev-cluster/generated/dev-cluster-2022-03-25T12:40:36Z-bundle.yaml", "since": 1648208436523389598, "kubeconfig": "dev-cluster/dev-cluster-eks-a-cluster.kubeconfig"}

jasonboche · 2022-03-28T21:44:39Z

When a cluster creation fails, the process of cleaning up resources is manual, cumbersome and prone to errors We could automate most of this to improve user experience when debugging. We already have a --force-cleanup flag, it just doesn't do a lot. Think about everything you need to do when a cluster creation fails before running the cli again, that's what we should try to add to this flow. Examples:

Delete bootstrap cluster. We do that today, but it's not super robust. Find when it doesn't work and fix it.

If we don't support more than one kind cluster running, even if it's not an eks-a one, add a validation for this and give instructions to delete them

Cleanup vsphere vms if it's a vsphere cluster

Cleanup docker resources if it's a docker cluster

Delete <cluster-name> folder

What are the manual cleanup steps for a failed cluster deployment? My administrative machine is stuck after a failed deployment to vSphere. I've already powered off and deleted the VMs manually from vSphere.
eksctl anywhere create cluster -f eksa-cluster.yaml Error: failed to create cluster: error creating bootstrap cluster: error executing create cluster: ERROR: failed to create cluster: node(s) already exist for a cluster with the name "prod-eks-a-cluster" , try rerunning with --force-cleanup to force delete previously created bootstrap cluster
eksctl anywhere create cluster -f eksa-cluster.yaml --force-cleanup Error: failed to create cluster: error deleting bootstrap cluster: management cluster in bootstrap cluster

@jasonboche try:
kind delete cluster --name prod-eks-a-cluster
It didn't work on me :/

@ataince I'm sorry to hear that. I wrapped up this project before I was able to get to the bottom of this. I just learned the many traps to avoid so that I didn't get stuck and have to re-deploy all over again. I'll probably end up revisiting this project within the next year and my hope is by then AWS will have put in much better error trapping and clear cleanup steps that actually work. This isn't a total knock on AWS. I realize this was relatively new and uncharted territory and these types of issues go with the territory until things mature.

Jas

abhinavmpandey08 · 2022-07-08T21:16:54Z

Bumping up the priority on this one as it's pretty important issue that has come up multiple times especially for cleaning up local bootstrap cluster and the cluster-name folder.

AndreasDavour · 2022-11-14T13:38:14Z

Adding my voice. This needs to be prioritized. While figuring out how to get everything working you tear through quite a few clusters. A quick and thorough cleanup is a must.

g-gaston added this to the techdebt milestone Sep 16, 2021

vignesh-goutham added the status/notstarted label Sep 17, 2021

danbudris closed this as completed Mar 29, 2022

danbudris reopened this Mar 29, 2022

g-gaston added kind/enhancement New feature or request team/cli area/cli Generic EKS-A CLI features labels Apr 25, 2022

g-gaston modified the milestones: techdebt, backlog Apr 25, 2022

abhinavmpandey08 mentioned this issue Jul 8, 2022

Improve cleanup of resources on failure #163

Closed

abhinavmpandey08 modified the milestones: backlog, next Jul 8, 2022

ndeksa assigned ptrivedi Jul 19, 2022

ndeksa unassigned ptrivedi Aug 25, 2022

drewvanstone modified the milestones: next, v0.15.0, v0.16.0 Mar 20, 2023

drewvanstone modified the milestones: v0.16.0, v0.17.0 May 22, 2023

drewvanstone added the priority/p1 On the list but not scheduled label May 24, 2023

ndeksa assigned ahreehong May 26, 2023

ndeksa modified the milestones: v0.17.0, backlog Jun 1, 2023

ndeksa unassigned ahreehong Jul 20, 2023

ndeksa removed this from the backlog milestone Jul 31, 2023

drewvanstone added the stale label Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve resource cleanup #225

Improve resource cleanup #225

g-gaston commented Sep 16, 2021

vivek-koppuru commented Sep 27, 2021

jasonboche commented Oct 14, 2021

g-gaston commented Oct 22, 2021

jasonboche commented Oct 25, 2021

ataince commented Mar 25, 2022

chrisdoherty4 commented Mar 25, 2022

ataince commented Mar 25, 2022

jasonboche commented Mar 28, 2022

abhinavmpandey08 commented Jul 8, 2022

AndreasDavour commented Nov 14, 2022

Improve resource cleanup #225

Improve resource cleanup #225

Comments

g-gaston commented Sep 16, 2021

vivek-koppuru commented Sep 27, 2021

jasonboche commented Oct 14, 2021

g-gaston commented Oct 22, 2021

jasonboche commented Oct 25, 2021

ataince commented Mar 25, 2022

chrisdoherty4 commented Mar 25, 2022

ataince commented Mar 25, 2022

jasonboche commented Mar 28, 2022

abhinavmpandey08 commented Jul 8, 2022

AndreasDavour commented Nov 14, 2022