Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volume attachment not cleared before worker delete #9527

Open
linwalth opened this issue Apr 4, 2024 · 2 comments
Open

Volume attachment not cleared before worker delete #9527

linwalth opened this issue Apr 4, 2024 · 2 comments
Labels
area/storage Storage related kind/bug Bug

Comments

@linwalth
Copy link

linwalth commented Apr 4, 2024

How to categorize this issue?

Our Gardener Cluster running on OpenStack runs in reconciliation issues because it is not detaching volumes from shoot workers.

/area storage
/topology shoot
/kind bug

What happened:
When an instance in a shoot is deleted, the volume detachment is not finalized before deletion of the shoot worker node. Thus, in some cases, the volume will still be counted as attached and the shoot cluster is thus running into reconciliation errors.

What you expected to happen:
An instance will only ever be deleted after all cinder volumes are functionally detached from the Instance.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Gardener version: 1.76.3
  • Kubernetes version (use kubectl version): 1.24.5
  • Cloud provider or hardware configuration: OpenStack
@gardener-prow gardener-prow bot added area/storage Storage related kind/bug Bug labels Apr 4, 2024
@kon-angelo
Copy link
Contributor

Potentially this could be an issue for https://github.com/gardener/gardener-extension-provider-openstack. But can you actually go into more detail (or potentially a step-by-step description) of what is happening ? I find the description somewhat too terse.

On instance (by instance I assume an OS server/node), the MCM should try to drain the node before attempting to delete it. That should take care of moving most workloads out of the node, including moving the volumes after their pods have been scheduled elsewhere. Did you see issues during this process ?
Were the volumeattachments not being deleted, or maybe where there issues with CSI preventing the detach ?

@linwalth
Copy link
Author

I am sorry, i cannot provide more context, as this ticket is already me describing a blackbox ;)

Given that the mechanism you described should take care of detaching/migrating all existing resources attached to and scheduled on that node I think this problem might be more on openstack side than on gardener side...

This is something that usually happens during and after Openstack upgrades.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage Storage related kind/bug Bug
Projects
None yet
Development

No branches or pull requests

2 participants