Skip to content
This repository has been archived by the owner on Feb 9, 2022. It is now read-only.

Cannot delete kubeprod namespace #739

Closed
dsyer opened this issue Mar 3, 2020 · 9 comments · Fixed by #875
Closed

Cannot delete kubeprod namespace #739

dsyer opened this issue Mar 3, 2020 · 9 comments · Fixed by #875
Projects

Comments

@dsyer
Copy link

dsyer commented Mar 3, 2020

Nothing seems to work when I want to tear down and uninstall BKPR. I followed the instructions and the step where the namespace is deleted always times out. Also tried doing that manually. Same result. The only thing that worked for me was to delete the whole cluster. This was on GKE.

Here's a clue:

$ kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get --show-kind --ignore-not-found -n kubeprod
NAME                                                                               STATE     DOMAIN                      AGE
challenge.acme.cert-manager.io/grafana-tls-2452413783-1426328939-1827263010        pending   grafana.test.dsyer.com      16h
challenge.acme.cert-manager.io/kibana-logging-tls-142272255-928573681-417038981    pending   kibana.test.dsyer.com       16h
challenge.acme.cert-manager.io/oauth2-ingress-tls-508411700-2982213363-323180946   pending   auth.test.dsyer.com         16h
challenge.acme.cert-manager.io/prometheus-tls-2987423791-3012409273-2458947951     pending   prometheus.test.dsyer.com   16h
@project-bot project-bot bot added this to Inbox in BKPR Mar 3, 2020
@dsyer
Copy link
Author

dsyer commented Mar 3, 2020

$ kubectl get -n kubeprod challenge.acme.cert-manager.io grafana-tls-2452413783-1426328939-1827263010 -o yaml
metadata:
  creationTimestamp: "2020-03-02T16:13:42Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2020-03-03T08:16:27Z"
  finalizers:
  - finalizer.acme.cert-manager.io
...
status:
  presented: true
  processing: true
  reason: 'Waiting for http-01 challenge propagation: failed to perform self check
    GET request ''http://grafana.test.dsyer.com/.well-known/acme-challenge/HWI3d63ZLii7-Rfs7Q6S-p__m_5cuawDA5Ne81RGBp4'':
    Get http://grafana.test.dsyer.com/.well-known/acme-challenge/HWI3d63ZLii7-Rfs7Q6S-p__m_5cuawDA5Ne81RGBp4:
    dial tcp: lookup grafana.test.dsyer.com on 10.12.0.10:53: no such host'
  state: pending

@dsyer
Copy link
Author

dsyer commented Mar 3, 2020

So I manually edited all those challenge resources and removed the finalizers. Then the namespace terminated and was deleted. This looks like a bug though, right?

@sameersbn
Copy link
Contributor

Generally the cleanup should work as expected, but it appears such issues can be encountered on k8s due to third party resource (cert-manager) in this case. I think this is a bug, but in not in BKPR. We should document it in the troubleshooting guide though.

ref:

@dsyer
Copy link
Author

dsyer commented Mar 4, 2020

The cert manager issue seems to suggest they are treating it as not their problem: "it is a symptom of not following the install/upgrade instructions, which, at this point REQUIRES uninstalling cert-manager + all CRDs and CRs". Doesn't that imply we should be able to fix this in BKPR with some more careful orchestration of tear down?

@evelkey
Copy link

evelkey commented Jun 16, 2020

Maybe a --force flag would make sense for similar cases. It got stuck in the deletion process for 4 hours, and I guess it would even hang indefinitely.

@javsalgar
Copy link
Contributor

Hi,

In this case, the kubeprod binary does not perform any kind of cleanup or teardown, so it is the admin that should use the proper kubectl/kubecfg commands. I believe that the best choice should be updating the documentation so it contains a "Uninstall cert-manager" is clearly mentioned. Would that make sense?

@vsimon
Copy link
Contributor

vsimon commented Jun 29, 2020

Hit this issue as well.

In case for anyone else, for each of the challenge resources listed above, had to manually edit each of them and delete the "finalizer:" key from the yaml file. Then the ns terminated.

Example:

kubectl edit -n kubeprod challenge.acme.cert-manager.io/kibana-logging-tls-1131854262-1308064938-2038008363

@javsalgar
Copy link
Contributor

Hi,

Thank you very much for the tips! The community will benefit from this :)

@jjo
Copy link
Contributor

jjo commented Jul 15, 2020

Confirming that challenge.acme.cert-manager.io finalizers hold kubeprod NS from being removed, FYI below CLI will programmatically take care of that:

kubectl get -n kubeprod challenges.acme.cert-manager.io -oname| \
   xargs -rtI{} kubectl patch -n kubeprod {} \
     --type=json -p='[{"op": "remove", "path": "/metadata/finalizers"}]'

@bors bors bot closed this as completed in 45eb3fa Jul 16, 2020
BKPR automation moved this from Inbox to Done Jul 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
BKPR
  
Done
Development

Successfully merging a pull request may close this issue.

6 participants