Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --skip-drain flag to delete-deployment #456

Closed
jasonkeene opened this issue Jul 27, 2018 · 5 comments
Closed

Add --skip-drain flag to delete-deployment #456

jasonkeene opened this issue Jul 27, 2018 · 5 comments
Labels

Comments

@jasonkeene
Copy link

Our CI was deleting a CFCR cluster and got these errors:

Task 24732 | 13:01:08 | Deleting instances: worker/06ede323-26a8-41e5-b3a3-72b78a0f44ca (1)
Task 24732 | 13:01:08 | Deleting instances: worker/95bbcbe6-a4d8-4b78-9247-662fab4ba451 (2)
Task 24732 | 13:01:08 | Deleting instances: worker/5a0c013f-5c48-4dd1-bb46-efe8a99118df (0)
Task 24732 | 13:01:20 | Deleting instances: worker/06ede323-26a8-41e5-b3a3-72b78a0f44ca (1) (00:00:12)
                     L Error: Action Failed get_task: Task 35938b9a-255b-446c-44ca-63eb37a2dce3 result: 1 of 1 drain scripts failed. Failed Jobs: kubelet.
Task 24732 | 13:01:20 | Deleting instances: worker/95bbcbe6-a4d8-4b78-9247-662fab4ba451 (2) (00:00:12)
                     L Error: Action Failed get_task: Task d1388816-524d-48d5-55d6-4bdd1df7848e result: 1 of 1 drain scripts failed. Failed Jobs: kubelet.
Task 24732 | 13:01:20 | Deleting instances: worker/5a0c013f-5c48-4dd1-bb46-efe8a99118df (0) (00:00:12)
                     L Error: Action Failed get_task: Task 4a7cdf1a-e7c4-442a-6955-b9e12a93324a result: 1 of 1 drain scripts failed. Failed Jobs: kubelet.
Task 24732 | 13:01:20 | Error: Action Failed get_task: Task 35938b9a-255b-446c-44ca-63eb37a2dce3 result: 1 of 1 drain scripts failed. Failed Jobs: kubelet.

I do not care about draining the kubelets and just want to blow away the VMs. It would be nice if delete-deployment supported --skip-drain to allow this.

@jasonkeene jasonkeene changed the title Add --skip-drain flag to delete-deployment Add --skip-drain flag to delete-deployment Jul 27, 2018
@dpb587-pivotal
Copy link
Contributor

Thinking out loud... using --skip-drain on a delete-deployment could be risky. For example, perhaps you know you don't care about worker's drain... but a different, later job you may not be aware of genuinely does want to use it to clean up external resources. I guess it's still up to the operator to thoroughly know their deployment and whether it's actually safe use... but still seems easy to make a mistake.

A workaround for you might be running bosh stop [--hard] --skip-drain worker first. It's relatively safe since it specifies a specific, expected subset, and it should work in the existing world of BOSH. Then you should be able to follow it normally with a delete-deployment.

@jasonkeene
Copy link
Author

I ended up using --force which is way more broad than needed. I considered using stop since it had --skip-drain but I honestly didn't know the behavior of stop and if it would prevent drain scripts from running on delete-deployment. Good to know that exists.

@alex-slynko
Copy link
Contributor

I would suggest for CFCR run bosh stop --hard worker and then run delete-deployment.

@voelzmo
Copy link
Contributor

voelzmo commented Aug 2, 2018

I'm leaving this here, just for reference: https://github.com/cloudfoundry/bosh-notes/blob/master/proposals/deployment-steps.md

@voelzmo
Copy link
Contributor

voelzmo commented Mar 17, 2020

Just for further documenting this: When your deployment consumes a link from an already deleted deployment, it seems that the bosh stop workaround doesn't work:

✗ bosh stop -d ping-app --hard --skip-drain
Using environment '192.168.50.6' as client 'admin'

Using deployment 'ping-app'

Continue? [yN]: y

Task 345

Task 345 | 12:49:55 | Error: Link 'common_link' in job 'consul_agent' from instance group 'ping-app-front' consumes from deployment 'consul', but the deployment does not exist.

Task 345 Started  Tue Mar 17 12:49:55 UTC 2020
Task 345 Finished Tue Mar 17 12:49:55 UTC 2020
Task 345 Duration 00:00:00
Task 345 error

Changing state:
  Expected task '345' to succeed but state is 'error'

Exit code 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants