Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PersistentVolumeLabel admission plugin provides wrong Azure region #124525

Open
jsafrane opened this issue Apr 25, 2024 · 10 comments
Open

PersistentVolumeLabel admission plugin provides wrong Azure region #124525

jsafrane opened this issue Apr 25, 2024 · 10 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@jsafrane
Copy link
Member

jsafrane commented Apr 25, 2024

What happened?

On Kubernetes 1.29 with PersistentVolumeLabel admission plugin enabled and no in-tree cloud provider configured in the kube-apiserver, the admission plugins adds label topology.kubernetes.io/region: "" to in-tree Azure Disk PVs.

The label is completely wrong - the Azure cloud provider is not initialized and this just provides empty string. This makes any pod that uses these PVs unschedulable.

As result, all in-tree PVs dynamically provisioned using CSI migration are not usable, because they have a wrong region.

What did you expect to happen?

When there is no cloud config provided, PersistentVolumeLabel does not add any labels to PVs. PVs created by the CSI driver + CSI migration already have correct nodeAffinity, the labels are useless.

How can we reproduce it (as minimally and precisely as possible)?

  1. Enable PersistentVolumeLabel admission in your cluster and disable any cloud providers. It does not need access to Azure, you can run it anywhere.

    Example for local-up-cluster.sh:

    ENABLE_ADMISSION_PLUGINS="NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,Priority,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,NodeRestriction,PersistentVolumeLabel"
    hack/local-up-cluster.sh
    
  2. Create in-tree Azure Disk PV. Again, we're testing just the admission, we're not going to use the PV in a pod, so no access to Azure is needed. UUIDs were sanitized ;-)

    $ kubectl create -f <<EOF
    
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: myvol
    spec:
      accessModes:
      - ReadWriteOnce
      capacity:
        storage: 5Gi
      azureDisk:
          cachingMode: ReadWrite
          diskName: pvc-ade30d58-5a51-4a62-a83a-142267b74f9e
          diskURI: /subscriptions/4347f140-6c74-4199-b4fc-1938eb6c16c0/resourceGroups/jsafrane-h-jsafrane-h-kxvpr/providers/Microsoft.Compute/disks/pvc-ade30d58-5a51-4a62-a83a-142267b74f9e
          fsType: ext4
          kind: Managed
          readOnly: false
    
    EOF
    

Actual result: wrong region label on the PV:

$ kubectl get pv --show-labels
LABELS
topology.kubernetes.io/region=

Expected result: no labels on the PV.

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Server Version: v1.29.3

Cloud provider

None

OS version

No response

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@jsafrane jsafrane added the kind/bug Categorizes issue or PR as related to a bug. label Apr 25, 2024
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 25, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jsafrane
Copy link
Member Author

/sig storage
/sig cloud-provider

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. labels Apr 25, 2024
@neolit123
Copy link
Member

/sig storage

@k8s-ci-robot k8s-ci-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Apr 25, 2024
@jsafrane
Copy link
Member Author

@andyzhangx it would be great if Azure cloud provider did not provide any labels when it's actually not configured (i.e. the external one is used). Is there an easy way to detect that in the admission plugin in the API server?

GetCloudProvider here could return some error + the admission would ignore it (i.e. add no labels to PVs, but admit it; now the admission is denied on error)

Or GetAzureDiskLabels deep inside the cloud provider would see it's not initialized and return no labels, say

if c.Location == "" {
    // Cloud provider is not initialized.
    return nil, nil
}

Note that it affects 1.29 and older that still have the cloud provider. I am fixing 1.30 differently in #124505

@jsafrane
Copy link
Member Author

/priority important-soon

The PersistentVolumeLabel admission plugin + CSI migration on azure are basically unusable when used together. I guess it affects a small nr. of clusters, but there is no workaround.

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 25, 2024
@andyzhangx
Copy link
Member

@jsafrane the pv lable issue only exists on in-tree disk pv, right? since in-tree disk volume is already deprecated long time ago, I think we could live with that issue.

@jsafrane
Copy link
Member Author

In-tree PVs are deprecated, but still supported. If someone upgrades a fully working cluster to a version that has the external cloud provider instead of the in-tree one, they cannot use their StorageClass that was working for many releases.
CSI migration was supposed to provide guarantee that in-tree PVs and SCs keep working.

@jsafrane
Copy link
Member Author

BTW, I would prefer removing the whole PersistentVolumeLabel admission plugin instead, it's not very useful, but that's for v1.31. See #124504.

I need 1.29 and older working somehow even with the admission enabled.

@jsafrane
Copy link
Member Author

I filed #124528 to fix it as simply as possible, PTAL. Can there be a real Azure VM with topology.kubernetes.io/region: "" ?

@andyzhangx
Copy link
Member

I filed #124528 to fix it as simply as possible, PTAL. Can there be a real Azure VM with topology.kubernetes.io/region: "" ?

this empty value won't make sense, it should always be topology.kubernetes.io/region=location

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests

4 participants