Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

Unable to upgrade a cluster from 1.19.13 -> 1.20.9 #4698

Closed
vladimirjk opened this issue Oct 15, 2021 · 6 comments
Closed

Unable to upgrade a cluster from 1.19.13 -> 1.20.9 #4698

vladimirjk opened this issue Oct 15, 2021 · 6 comments
Labels
bug Something isn't working

Comments

@vladimirjk
Copy link

vladimirjk commented Oct 15, 2021

Hello there. I initially had a cluster created with an AKS engine running k8s 1.16.9 (aks-ubuntu-18.04). Recently I started an upgrade of it with aks-engine upgrade and it went ok for 1.16.9->1.18.20->1.19.13 but now cannot upgrade to 1.20.9.

Getting this error:

INFO[0026] Error creating upgraded master VM with index: 0 
Error: upgrading cluster: Code="DeploymentFailed" Message="At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details." Details=[{"code":"Conflict","message":"{\r\n  \"error\": {\r\n    \"code\": \"PropertyChangeNotAllowed\",\r\n    \"message\": \"Changing property 'customData' is not allowed.\",\r\n    \"target\": \"customData\"\r\n  }\r\n}"}]

Anything I can do about it? Please kindly advise.

Steps To Reproduce
I'm using this command to upgrade:

aks-engine upgrade \
  --api-model my_cluster/_output/apimodel.json \
  --resource-group my_cluster \
  --subscription-id <redacted> \
  --location <redacted> \
  --auth-method cli \
  --upgrade-version 1.20.9

Here's the part of a manifest with masterProfile.

    "masterProfile": {
      "count": 3,
      "dnsPrefix": "<redacted>",
      "subjectAltNames": null,
      "vmSize": "Standard_D2_v2",
      "firstConsecutiveStaticIP": "10.255.255.5",
      "storageProfile": "ManagedDisks",
      "oauthEnabled": false,
      "preProvisionExtension": null,
      "extensions": [],
      "distro": "aks-ubuntu-18.04",
      "kubernetesConfig": {
        "kubeletConfig": {
          "--address": "0.0.0.0",
          "--anonymous-auth": "false",
          "--authentication-token-webhook": "true",
          "--authorization-mode": "Webhook",
          "--azure-container-registry-config": "/etc/kubernetes/azure.json",
          "--cgroups-per-qos": "true",
          "--client-ca-file": "/etc/kubernetes/certs/ca.crt",
          "--cloud-config": "/etc/kubernetes/azure.json",
          "--cloud-provider": "azure",
          "--cluster-dns": "10.0.0.10",
          "--cluster-domain": "cluster.local",
          "--enforce-node-allocatable": "pods",
          "--event-qps": "0",
          "--eviction-hard": "memory.available<750Mi,nodefs.available<10%,nodefs.inodesFree<5%",
          "--feature-gates": "RotateKubeletServerCertificate=true",
          "--healthz-port": "10248",
          "--image-gc-high-threshold": "85",
          "--image-gc-low-threshold": "80",
          "--image-pull-progress-deadline": "30m",
          "--keep-terminated-pod-volumes": "false",
          "--kubeconfig": "/var/lib/kubelet/kubeconfig",
          "--max-pods": "100",
          "--network-plugin": "cni",
          "--node-status-update-frequency": "10s",
          "--non-masquerade-cidr": "0.0.0.0/0",
          "--pod-infra-container-image": "mcr.microsoft.com/oss/kubernetes/pause:3.4.1",
          "--pod-manifest-path": "/etc/kubernetes/manifests",
          "--pod-max-pids": "-1",
          "--protect-kernel-defaults": "true",
          "--read-only-port": "0",
          "--register-with-taints": "node-role.kubernetes.io/master=true:NoSchedule",
          "--resolv-conf": "/run/systemd/resolve/resolv.conf",
          "--rotate-certificates": "true",
          "--streaming-connection-idle-timeout": "4h",
          "--tls-cert-file": "/etc/kubernetes/certs/kubeletserver.crt",
          "--tls-cipher-suites": "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256",
          "--tls-private-key-file": "/etc/kubernetes/certs/kubeletserver.key",
          "--v": "2",
          "--volume-plugin-dir": "/etc/kubernetes/volumeplugins"
        },
        "cloudProviderBackoffMode": ""

Expected behavior
Cluster is upgraded to 1.20.9

AKS Engine version

Version: v0.66.1
GitCommit: e371e37
GitTreeState: clean

Kubernetes version
1.19.13

Additional context

@vladimirjk vladimirjk added the bug Something isn't working label Oct 15, 2021
@welcome
Copy link

welcome bot commented Oct 15, 2021

👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.

@jadarsie
Copy link
Member

What happened before INFO[0026] Error creating upgraded master VM with index: 0? could you please include as much output as possible?

The upgrade process deletes/recreates control plane nodes one by one, is master0 still around? was it deleted/recreated?

@vladimirjk
Copy link
Author

Hello coming back to this, was able to produce more output so lemme post it here.

  1. First of all I'd like to mention that even though the desired master count is 3 we have two nodes available (looks kinda weird).
  2. I've tried to force the upgrade control-plane-only to 1.19.3 (the same version we have now)
aks-engine upgrade \
  --api-model <truncated>/_output/apimodel.json \
  --resource-group <truncated> \
  --subscription-id <truncated> \
  --force \
  --debug \
  --control-plane-only \
  --upgrade-version 1.19.13

Here's an output (truncated):

WARN[0000] The 'aks-ubuntu-16.04' distro uses Ubuntu 16.04-LTS, which is End of Life (EOL) and will no longer receive security updates 
DEBU[0000] Resolving tenantID for subscriptionID: <truncated> 
DEBU[0004] Already registered for "Microsoft.Compute"   
DEBU[0004] Already registered for "Microsoft.Storage"   
DEBU[0004] Already registered for "Microsoft.Network"   
INFO[0006] Upgrading cluster with name suffix: <truncated> 
INFO[0007] Master VM name: k8s-master-<truncated>-0, orchestrator: 1.19.13 (MasterVMs) 
INFO[0007] Master VM name: k8s-master-<truncated>-2, orchestrator: 1.19.13 (MasterVMs) 
INFO[0007] Upgrading control plane nodes to Kubernetes version 1.19.13 
INFO[0007] Master nodes StorageProfile: ManagedDisks    
INFO[0007] Prepping master nodes for upgrade...         
INFO[0007] Resource count before running NormalizeResourcesForK8sMasterUpgrade: 14 
INFO[0007] Removing Microsoft.Compute/availabilitySets dependency from [concat(variables('masterVMNamePrefix'), copyIndex(variables('masterOffset')))] 
INFO[0007] Resource count after running NormalizeResourcesForK8sMasterUpgrade: 4 
INFO[0007] Total expected master count: 3               
INFO[0007] Master nodes that need to be upgraded: 3     
INFO[0007] Master nodes that have been upgraded: 0      
INFO[0007] Starting upgrade of master nodes...          
INFO[0007] masterNodesInCluster: 2                      
INFO[0007] Found missing master VMs in the cluster. Reconstructing names of missing master VMs for recreation during upgrade... 
INFO[0007] Expected master count: 3, Creating 1 more master VMs 
INFO[0007] Creating upgraded master VM with index: 0    
INFO[0007] Master offset: 0                             
INFO[0007] Master pool set count to: 1 temporarily during upgrade... 
INFO[0007] Starting ARM Deployment k8s-upgrade-master-0-<truncated> in resource group<truncated>. This will take some time... 
INFO[0031] Finished ARM Deployment (k8s-upgrade-master-0-<truncated>). Succeeded 
WARN[0031] VM name was empty. Skipping node condition check 
INFO[0031] Upgrading Master VM: k8s-master-<truncated>-0   
INFO[0031] deleting VM k8s-master-<truncated>-0 in resource group <truncated> ... 
INFO[0182] deleting NIC k8s-master-<truncated>-nic-0 in resource group <truncated> ... 
INFO[0195] deleting managed disk k8s-master-<truncated>-0_OsDisk_1_<truncated> in resource group <truncated>... 
INFO[0196] Master offset: 0                             
INFO[0196] Master pool set count to: 1 temporarily during upgrade... 
INFO[0196] Starting ARM Deployment k8s-upgrade-master-0-<truncated> in resource group <truncated>. This will take some time... 
INFO[0572] Finished ARM Deployment (k8s-upgrade-master-0-<truncated>). Succeeded 
INFO[0572] Master node: k8s-master-<truncated>-0 is ready  
INFO[0572] Upgrading Master VM: k8s-master-<truncated>-2   
INFO[0573] Master node: k8s-master-<truncated>-0 is ready  
INFO[0573] Master node: k8s-master-<truncated>-0 is ready  
INFO[0573] deleting VM k8s-master-<truncated>-2 in resource group <truncated> ... 
INFO[0725] deleting NIC k8s-master-<truncated>-nic-2 in resource group <truncated> ... 
INFO[0737] deleting managed disk k8s-master-<truncated>-2_OsDisk_1_<truncated> in resource group <truncated>  ... 
INFO[0739] Master offset: 2                             
INFO[0739] Master pool set count to: 3 temporarily during upgrade... 
INFO[0739] Starting ARM Deployment k8s-upgrade-master-2-<truncated> in resource group <truncated>. This will take some time... 
INFO[1048] Finished ARM Deployment (k8s-upgrade-master-2-<truncated>). Succeeded 
INFO[1049] Master node: k8s-master-<truncated>-2 is ready  
INFO[1049] Control plane upgraded successfully to Kubernetes version 1.19.13 
DEBU[1049] output: wrote <truncated>/_output/apimodel.json 
  1. Then I tried to upgrade to version 1.20.9
k8s-desired-state-v2 % aks-engine upgrade \
  --api-model <truncated>/_output/apimodel.json \
  --resource-group <truncated> \
  --subscription-id <truncated> \
  --debug \
  --upgrade-version 1.20.9

and got this output

WARN[0000] The 'aks-ubuntu-16.04' distro uses Ubuntu 16.04-LTS, which is End of Life (EOL) and will no longer receive security updates 
DEBU[0000] Resolving tenantID for subscriptionID: <truncated> 
DEBU[0004] Already registered for "Microsoft.Compute"   
DEBU[0004] Already registered for "Microsoft.Storage"   
DEBU[0004] Already registered for "Microsoft.Network"   
INFO[0006] Upgrading cluster with name suffix: <truncated> 
INFO[0011] Master VM name: k8s-master-<truncated>-0, orchestrator: 1.19.13 (MasterVMs) 
INFO[0011] Master VM name: k8s-master-<truncated>-2, orchestrator: 1.19.13 (MasterVMs) 
INFO[0011] Upgrading control plane and all nodes to Kubernetes version 1.20.9 
INFO[0011] Master nodes StorageProfile: ManagedDisks    
INFO[0011] Prepping master nodes for upgrade...         
INFO[0011] Resource count before running NormalizeResourcesForK8sMasterUpgrade: 14 
INFO[0011] Removing Microsoft.Compute/availabilitySets dependency from [concat(variables('masterVMNamePrefix'), copyIndex(variables('masterOffset')))] 
INFO[0011] Resource count after running NormalizeResourcesForK8sMasterUpgrade: 4 
INFO[0011] Total expected master count: 3               
INFO[0011] Master nodes that need to be upgraded: 3     
INFO[0011] Master nodes that have been upgraded: 0      
INFO[0011] Starting upgrade of master nodes...          
INFO[0011] masterNodesInCluster: 2                      
INFO[0011] Found missing master VMs in the cluster. Reconstructing names of missing master VMs for recreation during upgrade... 
INFO[0011] Expected master count: 3, Creating 1 more master VMs 
INFO[0011] Creating upgraded master VM with index: 0    
INFO[0011] Master offset: 0                             
INFO[0011] Master pool set count to: 1 temporarily during upgrade... 
INFO[0011] Starting ARM Deployment k8s-upgrade-master-0-<truncated> in resource group <truncated>. This will take some time... 
INFO[0025] Finished ARM Deployment (k8s-upgrade-master-0-<truncated>). Error: Code="DeploymentFailed" Message="At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details." Details=[{"code":"Conflict","message":"{\r\n  \"error\": {\r\n    \"code\": \"PropertyChangeNotAllowed\",\r\n    \"message\": \"Changing property 'customData' is not allowed.\",\r\n    \"target\": \"customData\"\r\n  }\r\n}"}] 
INFO[0025] Error creating upgraded master VM with index: 0 
Error: upgrading cluster: Code="DeploymentFailed" Message="At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details." Details=[{"code":"Conflict","message":"{\r\n  \"error\": {\r\n    \"code\": \"PropertyChangeNotAllowed\",\r\n    \"message\": \"Changing property 'customData' is not allowed.\",\r\n    \"target\": \"customData\"\r\n  }\r\n}"}]

@jadarsie
Copy link
Member

Thx @vladimirjk, the latest release (v0.67.0) includes a fix that can potentially help you. Could you please upgrade your aks-engine version and give it another try?

@vladimirjk
Copy link
Author

@jadarsie Thanks! Will do and report back.

@vladimirjk
Copy link
Author

@jadarsie I was able to upgrade up to the latest k8s version available. Closing the issue.
Thanks for the support!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants