Skip to content
This repository was archived by the owner on Jan 11, 2023. It is now read-only.
This repository was archived by the owner on Jan 11, 2023. It is now read-only.

Custom Script Extension Error Code 14 - failed to start etcd - highly-available key value store #4168

@briandenicola

Description

@briandenicola

Is this a request for help?
Yes

Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE

What version of acs-engine?:
acs-engine version
Version: v0.24.3
GitCommit: ad007b6
GitTreeState: clean

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes 1.12.1

What happened:
Master VMSS scale fails with the custom script extension throwing an error 14.

{
  "status": "Failed",
  "error": {
    "code": "ResourceDeploymentFailure",
    "message": "The resource operation completed with terminal provisioning state 'Failed'.",
    "details": [
      {
        "code": "VMExtensionProvisioningError",
        "message": "VM has reported a failure when processing extension 'k8s-master-22668494-vmssCSE'. Error message: \"Enable failed: failed to execute command: command terminated with exit status=14\n[stdout]\n\n[stderr]\n\"."
      }
    ]
  }
}

What you expected to happen:
Success setup of cluster

How to reproduce it (as minimally and precisely as possible):
I was trying to setup cluster across Availability Zones in Central US region.
The cluster is attached to an existing VNet and I had two agent types - one Windows and one Linux.

Here is my acs-engine config file

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorVersion": "1.12.1"
    },
    "masterProfile": {
      "count": 5,
      "dnsPrefix": "",
      "vmSize": "Standard_DS2_v2",
      "OSDiskSizeGB": 64,
      "storageProfile" : "ManagedDisks",
      "vnetSubnetId": "/subscriptions/[redacted]/resourceGroups/DevSub02_Network_RG/providers/Microsoft.Network/virtualNetworks/DevSub02-VNet-001/subnets/Kubernetes-Master",
      "agentVnetSubnetID": "/subscriptions/[redacted]/resourceGroups/DevSub02_Network_RG/providers/Microsoft.Network/virtualNetworks/DevSub02-VNet-001/subnets/Kubernetes",
      "firstConsecutiveStaticIP": "192.168.15.10",
      "availabilityProfile": "VirtualMachineScaleSets",
      "availabilityZones": [
        "1",
        "2"
      ]
    },
    "agentPoolProfiles": [
      {
        "name": "linuxagents",
        "count": 4,
        "vmSize": "Standard_DS2_v2",
        "OSDiskSizeGB": 64,
        "storageProfile" : "ManagedDisks",
        "vnetSubnetId": "/subscriptions/[redacted]/resourceGroups/DevSub02_Network_RG/providers/Microsoft.Network/virtualNetworks/DevSub02-VNet-001/subnets/Kubernetes",
        "availabilityProfile": "VirtualMachineScaleSets",
        "availabilityZones": [
          "1",
          "2"
        ]
      },
      {
        "name": "winagents",
        "count": 4,
        "vmSize": "Standard_DS2_v2",
        "OSDiskSizeGB": 64,
        "storageProfile" : "ManagedDisks",
        "vnetSubnetId": "/subscriptions/[redacted]/resourceGroups/DevSub02_Network_RG/providers/Microsoft.Network/virtualNetworks/DevSub02-VNet-001/subnets/Kubernetes",
        "availabilityProfile": "VirtualMachineScaleSets",
        "osType": "Windows",
        "availabilityZones": [
          "1",
          "2"
        ]
      }
    ],
    "windowsProfile": {
      "adminUsername": "manager",
      "adminPassword": "[redacted]",
      "windowsPublisher": "MicrosoftWindowsServer",
      "windowsOffer": "WindowsServerSemiAnnual",
      "windowsSku": "Datacenter-Core-1803-with-Containers-smalldisk",
      "imageVersion": "1803.0.20180912"
    },
    "linuxProfile": {
      "adminUsername": "manager",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "[redacted]"
          }
        ]
      }
    },
    "servicePrincipalProfile": {
      "clientId": "[redacted]",
      "secret": "[redacted]"
    }
  }
}

Anything else we need to know:
No NSG rules applied to the VNet/Subnet

Partial output from cluster-provision.log (full log attached)

  • '[' 1 -eq 0 ']'+ '[' 99 -eq 100 ']'+ sleep 5+ for i in '$(seq 1 $retries)'+ timeout 30 systemctl daemon-reload+ timeout 30 systemctl restart etcdJob for etcd.service failed because the control process exited with error code. See "systemctl status etcd.service" and "journalctl -xe" for details.+ '[' 1 -eq 0 ']'+ '[' 100 -eq 100 ']'+ return 1+ RESTART_STATUS=1+ systemctl status etcd --no-pager -l+ '[' 1 -ne 0 ']'+ echo 'etcd could not be started'etcd could

Output of etcd.log
● etcd.service - etcd - highly-available key value store
Loaded: loaded (/etc/systemd/system/etcd.service; disabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Fri 2018-11-02 02:29:07 UTC; 6ms ago
Docs: https://github.com/coreos/etcd
man:etcd
Process: 29284 ExecStart=/usr/bin/etcd $DAEMON_ARGS (code=exited, status=1/FAILURE)
Main PID: 29284 (code=exited, status=1/FAILURE)

Nov 02 02:29:07 k8s-master-22668494-vmss000000 systemd[1]: Failed to start etcd - highly-available key value store.
Nov 02 02:29:07 k8s-master-22668494-vmss000000 systemd[1]: etcd.service: Unit entered failed state.
Nov 02 02:29:07 k8s-master-22668494-vmss000000 systemd[1]: etcd.service: Failed with result 'exit-code'.
cluster-provision.log

systemctl status etcd.service:
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: found invalid file/dir lost+found under data dir /var/lib/etcddisk (Ignore this if you are upgrading etcd)
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: the server is already initialized as member before, starting as etcd member...
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: peerTLS: cert = /etc/kubernetes/certs/etcdpeer0.crt, key = /etc/kubernetes/certs/etcdpeer0.key, ca = , trusted-ca = /etc/kubernetes/certs/ca.crt, client-cert-auth = true
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: listening for peers on https://192.168.21.4:2380
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: listening for client requests on 127.0.0.1:2379
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: listening for client requests on 192.168.21.4:2379
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: --initial-cluster must include k8s-master-22668494-vmss000000=https://192.168.21.4:2380 given --initial-advertise-peer-urls=https://192.168.21.4:2380
Nov 02 02:28:26 k8s-master-22668494-vmss000000 systemd[1]: etcd.service: Main process exited, code=exited, status=1/FAILURE
Nov 02 02:28:26 k8s-master-22668494-vmss000000 systemd[1]: Failed to start etcd - highly-available key value store.

journalctl -xe | more
Subject: Unit etcd.service has failed
Defined-By: systemd
Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Unit etcd.service has failed.

The result is failed.
Nov 02 02:28:26 k8s-master-22668494-vmss000000 systemd[1]: etcd.service: Unit entered failed state.
Nov 02 02:28:26 k8s-master-22668494-vmss000000 systemd[1]: etcd.service: Failed with result 'exit-code'.
Nov 02 02:28:27 k8s-master-22668494-vmss000000 systemd[1]: etcd.service: Service hold-off time over, scheduling restart.
Nov 02 02:28:27 k8s-master-22668494-vmss000000 systemd[1]: Stopped etcd - highly-available key value store.
Subject: Unit etcd.service has finished shutting down
Defined-By: systemd
Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Unit etcd.service has finished shutting down.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions