Skip to content
This repository has been archived by the owner on Jan 19, 2023. It is now read-only.

Increase visibility to resources stuck in a terminating state #1408

Closed
GuessWhoSamFoo opened this issue Sep 25, 2020 · 7 comments · Fixed by #2698
Closed

Increase visibility to resources stuck in a terminating state #1408

GuessWhoSamFoo opened this issue Sep 25, 2020 · 7 comments · Fixed by #2698
Assignees
Labels
enhancement New feature or request
Projects

Comments

@GuessWhoSamFoo
Copy link
Contributor

Describe the problem/challenge you have
xref: kubernetes/kubernetes#60807

Cases where resources can be stuck in a terminating state:

Describe the solution you'd like
Users of Kubernetes who are unaware of this issue would have to find the relevant Github issue to fix this. Octant can provide value by suggesting cleanup of orphaned apiservices or an interface to patch over an unused finalizer.

Environment:

  • Octant version (use octant version): 0.16
  • Kubernetes version (use kubectl version): 1.18.2
  • OS (macOS 10.15, Windows 10, Ubuntu 19.10 etc): ubuntu
@GuessWhoSamFoo GuessWhoSamFoo added the enhancement New feature or request label Sep 25, 2020
@scothis
Copy link
Contributor

scothis commented Sep 25, 2020

How do you distinguish "stuck" resources vs resources that are gracefully cleaning up external state before being fully deleted?

@GuessWhoSamFoo
Copy link
Contributor Author

For namespaces specifically, we can highlight the condition (kubernetes/kubernetes#82189).

Octant doesn't have to distinguish the graceful case. It can show a warning to the effect of "if this resources is not terminating like you'd expect, consider looking at relevant finalizer/apiserver/controller". Maybe these is a way to be more specific

@wwitzel3 wwitzel3 added this to Unsorted in Backlog Dec 15, 2020
@wwitzel3 wwitzel3 moved this from Unsorted to Themes in Backlog Jul 14, 2021
@GuessWhoSamFoo GuessWhoSamFoo removed this from Themes in Backlog Jul 14, 2021
@GuessWhoSamFoo GuessWhoSamFoo added this to To do in 0.23 via automation Jul 14, 2021
@wwitzel3
Copy link
Contributor

Example of a terminating namespace status

status:
  conditions:
  - lastTransitionTime: "2021-07-14T18:19:44Z"
    message: 'Discovery failed for some groups, 1 failing: unable to retrieve the
      complete list of server APIs: admission.certmanager.k8s.io/v1beta1: the server
      is currently unable to handle the request'
    reason: DiscoveryFailed
    status: "True"
    type: NamespaceDeletionDiscoveryFailure
  - lastTransitionTime: "2021-07-14T18:19:44Z"
    message: All legacy kube types successfully parsed
    reason: ParsedGroupVersions
    status: "False"
    type: NamespaceDeletionGroupVersionParsingFailure
  - lastTransitionTime: "2021-07-14T18:19:44Z"
    message: All content successfully deleted, may be waiting on finalization
    reason: ContentDeleted
    status: "False"
    type: NamespaceDeletionContentFailure
  - lastTransitionTime: "2021-07-14T18:19:44Z"
    message: All content successfully removed
    reason: ContentRemoved
    status: "False"
    type: NamespaceContentRemaining
  - lastTransitionTime: "2021-07-14T18:19:44Z"
    message: All content-preserving finalizers finished
    reason: ContentHasNoFinalizers
    status: "False"
    type: NamespaceFinalizersRemaining
  phase: Terminating

@wwitzel3
Copy link
Contributor

I think is less about guess if something is "stuck" and more about surfacing the conditions in a way that can help a user decide if they want to wait or intervene.

An idea that was discussed was looking at the lastTransitionTime of objects in the Terminating phase and using some delta of that to provider an extra layer of condition introspection.

@ftovaro
Copy link
Contributor

ftovaro commented Jul 15, 2021

What about having a toast that informs that a resource is stuck, it could have the link to the resource and with that you don't need to keep checking a table to know the state of the resource, instead, you get an alert that something is not right

@wwitzel3
Copy link
Contributor

wwitzel3 commented Jul 15, 2021

To test this behavior, adding a finalizer that you know will never finish.

  • time threshold for terminating: 5 minutes (configurable via preferences)
  • at threshold - fetch further details
  • should part of existing object printer as a conditional
  • UI suggestions
    • Object Summary
      • Add clear callout to the existing conditions table when threshold is met
    • Object List
      • Phase field has a an asterisks "Terminating *" and on hover you get a flyout?
      • Add a column that has conditions?

@lenriquez
Copy link
Contributor

@lenriquez lenriquez self-assigned this Jul 19, 2021
@lenriquez lenriquez moved this from To do to In progress in 0.23 Jul 21, 2021
@lenriquez lenriquez moved this from In progress to Review in progress in 0.23 Jul 28, 2021
@lenriquez lenriquez removed their assignment Jul 28, 2021
@lenriquez lenriquez moved this from Review in progress to To do in 0.23 Jul 28, 2021
@wwitzel3 wwitzel3 moved this from To do to In progress in 0.23 Jul 29, 2021
@wwitzel3 wwitzel3 self-assigned this Jul 29, 2021
@wwitzel3 wwitzel3 moved this from In progress to Review in progress in 0.23 Jul 29, 2021
0.23 automation moved this from Review in progress to Done Jul 31, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
No open projects
0.23
Done
5 participants