Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve resource cleanup #225

Open
g-gaston opened this issue Sep 16, 2021 · 10 comments
Open

Improve resource cleanup #225

g-gaston opened this issue Sep 16, 2021 · 10 comments
Labels
area/cli Generic EKS-A CLI features kind/enhancement New feature or request priority/p1 On the list but not scheduled stale status/notstarted team/cli

Comments

@g-gaston
Copy link
Member

When a cluster creation fails, the process of cleaning up resources is manual, cumbersome and prone to errors
We could automate most of this to improve user experience when debugging.
We already have a --force-cleanup flag, it just doesn't do a lot. Think about everything you need to do when a cluster creation fails before running the cli again, that's what we should try to add to this flow. Examples:

  • Delete bootstrap cluster. We do that today, but it's not super robust. Find when it doesn't work and fix it.
  • If we don't support more than one kind cluster running, even if it's not an eks-a one, add a validation for this and give instructions to delete them
  • Cleanup vsphere vms if it's a vsphere cluster
  • Cleanup docker resources if it's a docker cluster
  • Delete <cluster-name> folder
@vivek-koppuru
Copy link
Member

@g-gaston Is this a duplicate of #163?

@jasonboche
Copy link

When a cluster creation fails, the process of cleaning up resources is manual, cumbersome and prone to errors We could automate most of this to improve user experience when debugging. We already have a --force-cleanup flag, it just doesn't do a lot. Think about everything you need to do when a cluster creation fails before running the cli again, that's what we should try to add to this flow. Examples:

  • Delete bootstrap cluster. We do that today, but it's not super robust. Find when it doesn't work and fix it.
  • If we don't support more than one kind cluster running, even if it's not an eks-a one, add a validation for this and give instructions to delete them
  • Cleanup vsphere vms if it's a vsphere cluster
  • Cleanup docker resources if it's a docker cluster
  • Delete <cluster-name> folder

What are the manual cleanup steps for a failed cluster deployment? My administrative machine is stuck after a failed deployment to vSphere. I've already powered off and deleted the VMs manually from vSphere.

eksctl anywhere create cluster -f eksa-cluster.yaml
Error: failed to create cluster: error creating bootstrap cluster: error executing create cluster: ERROR: failed to create cluster: node(s) already exist for a cluster with the name "prod-eks-a-cluster"
, try rerunning with --force-cleanup to force delete previously created bootstrap cluster

eksctl anywhere create cluster -f eksa-cluster.yaml --force-cleanup
Error: failed to create cluster: error deleting bootstrap cluster: management cluster in bootstrap cluster

@g-gaston
Copy link
Member Author

When a cluster creation fails, the process of cleaning up resources is manual, cumbersome and prone to errors We could automate most of this to improve user experience when debugging. We already have a --force-cleanup flag, it just doesn't do a lot. Think about everything you need to do when a cluster creation fails before running the cli again, that's what we should try to add to this flow. Examples:

  • Delete bootstrap cluster. We do that today, but it's not super robust. Find when it doesn't work and fix it.
  • If we don't support more than one kind cluster running, even if it's not an eks-a one, add a validation for this and give instructions to delete them
  • Cleanup vsphere vms if it's a vsphere cluster
  • Cleanup docker resources if it's a docker cluster
  • Delete <cluster-name> folder

What are the manual cleanup steps for a failed cluster deployment? My administrative machine is stuck after a failed deployment to vSphere. I've already powered off and deleted the VMs manually from vSphere.

eksctl anywhere create cluster -f eksa-cluster.yaml Error: failed to create cluster: error creating bootstrap cluster: error executing create cluster: ERROR: failed to create cluster: node(s) already exist for a cluster with the name "prod-eks-a-cluster" , try rerunning with --force-cleanup to force delete previously created bootstrap cluster

eksctl anywhere create cluster -f eksa-cluster.yaml --force-cleanup Error: failed to create cluster: error deleting bootstrap cluster: management cluster in bootstrap cluster

@jasonboche try:

kind delete cluster --name prod-eks-a-cluster

@jasonboche
Copy link

kind delete cluster --name prod-eks-a-cluster

Thank you kindly. I'll give that a try!

@ataince
Copy link

ataince commented Mar 25, 2022

When a cluster creation fails, the process of cleaning up resources is manual, cumbersome and prone to errors We could automate most of this to improve user experience when debugging. We already have a --force-cleanup flag, it just doesn't do a lot. Think about everything you need to do when a cluster creation fails before running the cli again, that's what we should try to add to this flow. Examples:

  • Delete bootstrap cluster. We do that today, but it's not super robust. Find when it doesn't work and fix it.
  • If we don't support more than one kind cluster running, even if it's not an eks-a one, add a validation for this and give instructions to delete them
  • Cleanup vsphere vms if it's a vsphere cluster
  • Cleanup docker resources if it's a docker cluster
  • Delete <cluster-name> folder

What are the manual cleanup steps for a failed cluster deployment? My administrative machine is stuck after a failed deployment to vSphere. I've already powered off and deleted the VMs manually from vSphere.
eksctl anywhere create cluster -f eksa-cluster.yaml Error: failed to create cluster: error creating bootstrap cluster: error executing create cluster: ERROR: failed to create cluster: node(s) already exist for a cluster with the name "prod-eks-a-cluster" , try rerunning with --force-cleanup to force delete previously created bootstrap cluster
eksctl anywhere create cluster -f eksa-cluster.yaml --force-cleanup Error: failed to create cluster: error deleting bootstrap cluster: management cluster in bootstrap cluster

@jasonboche try:

kind delete cluster --name prod-eks-a-cluster

It didn't work on me :/

@chrisdoherty4
Copy link
Contributor

@ataince What output did you get?

@ataince
Copy link

ataince commented Mar 25, 2022

@ataince What output did you get?

I think it was abt the memory now I increased it but now it's stuck on this step.

⏳ Collecting support bundle from cluster, this can take a while {"cluster": "dev-cluster", "bundle": "dev-cluster/generated/dev-cluster-2022-03-25T12:40:36Z-bundle.yaml", "since": 1648208436523389598, "kubeconfig": "dev-cluster/dev-cluster-eks-a-cluster.kubeconfig"}

@jasonboche
Copy link

When a cluster creation fails, the process of cleaning up resources is manual, cumbersome and prone to errors We could automate most of this to improve user experience when debugging. We already have a --force-cleanup flag, it just doesn't do a lot. Think about everything you need to do when a cluster creation fails before running the cli again, that's what we should try to add to this flow. Examples:

  • Delete bootstrap cluster. We do that today, but it's not super robust. Find when it doesn't work and fix it.
  • If we don't support more than one kind cluster running, even if it's not an eks-a one, add a validation for this and give instructions to delete them
  • Cleanup vsphere vms if it's a vsphere cluster
  • Cleanup docker resources if it's a docker cluster
  • Delete <cluster-name> folder

What are the manual cleanup steps for a failed cluster deployment? My administrative machine is stuck after a failed deployment to vSphere. I've already powered off and deleted the VMs manually from vSphere.
eksctl anywhere create cluster -f eksa-cluster.yaml Error: failed to create cluster: error creating bootstrap cluster: error executing create cluster: ERROR: failed to create cluster: node(s) already exist for a cluster with the name "prod-eks-a-cluster" , try rerunning with --force-cleanup to force delete previously created bootstrap cluster
eksctl anywhere create cluster -f eksa-cluster.yaml --force-cleanup Error: failed to create cluster: error deleting bootstrap cluster: management cluster in bootstrap cluster

@jasonboche try:

kind delete cluster --name prod-eks-a-cluster

It didn't work on me :/

@ataince I'm sorry to hear that. I wrapped up this project before I was able to get to the bottom of this. I just learned the many traps to avoid so that I didn't get stuck and have to re-deploy all over again. I'll probably end up revisiting this project within the next year and my hope is by then AWS will have put in much better error trapping and clear cleanup steps that actually work. This isn't a total knock on AWS. I realize this was relatively new and uncharted territory and these types of issues go with the territory until things mature.

Jas

@danbudris danbudris reopened this Mar 29, 2022
@g-gaston g-gaston added kind/enhancement New feature or request team/cli area/cli Generic EKS-A CLI features labels Apr 25, 2022
@g-gaston g-gaston modified the milestones: techdebt, backlog Apr 25, 2022
@abhinavmpandey08
Copy link
Member

Bumping up the priority on this one as it's pretty important issue that has come up multiple times especially for cleaning up local bootstrap cluster and the cluster-name folder.

@AndreasDavour
Copy link

Adding my voice. This needs to be prioritized. While figuring out how to get everything working you tear through quite a few clusters. A quick and thorough cleanup is a must.

@drewvanstone drewvanstone modified the milestones: next, v0.15.0, v0.16.0 Mar 20, 2023
@drewvanstone drewvanstone modified the milestones: v0.16.0, v0.17.0 May 22, 2023
@drewvanstone drewvanstone added the priority/p1 On the list but not scheduled label May 24, 2023
@ndeksa ndeksa modified the milestones: v0.17.0, backlog Jun 1, 2023
@ndeksa ndeksa removed this from the backlog milestone Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cli Generic EKS-A CLI features kind/enhancement New feature or request priority/p1 On the list but not scheduled stale status/notstarted team/cli
Projects
None yet
Development

No branches or pull requests