Custom Script Extension Error Code 14 - failed to start etcd - highly-available key value store

**Is this a request for help?**
Yes

**Is this an ISSUE or FEATURE REQUEST?** (choose one):
ISSUE

**What version of acs-engine?**:
acs-engine version
Version: v0.24.3
GitCommit: ad007b644
GitTreeState: clean

**Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)**
Kubernetes 1.12.1 

**What happened**:
Master VMSS scale fails with the custom script extension throwing an error 14. 
```
{
  "status": "Failed",
  "error": {
    "code": "ResourceDeploymentFailure",
    "message": "The resource operation completed with terminal provisioning state 'Failed'.",
    "details": [
      {
        "code": "VMExtensionProvisioningError",
        "message": "VM has reported a failure when processing extension 'k8s-master-22668494-vmssCSE'. Error message: \"Enable failed: failed to execute command: command terminated with exit status=14\n[stdout]\n\n[stderr]\n\"."
      }
    ]
  }
}
```

**What you expected to happen**:
Success setup of cluster

**How to reproduce it** (as minimally and precisely as possible):
I was trying to setup cluster across Availability Zones in Central US region.  
The cluster is attached to an existing VNet and I had two agent types - one Windows and one Linux. 

Here is my acs-engine config file
```
{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorVersion": "1.12.1"
    },
    "masterProfile": {
      "count": 5,
      "dnsPrefix": "",
      "vmSize": "Standard_DS2_v2",
      "OSDiskSizeGB": 64,
      "storageProfile" : "ManagedDisks",
      "vnetSubnetId": "/subscriptions/[redacted]/resourceGroups/DevSub02_Network_RG/providers/Microsoft.Network/virtualNetworks/DevSub02-VNet-001/subnets/Kubernetes-Master",
      "agentVnetSubnetID": "/subscriptions/[redacted]/resourceGroups/DevSub02_Network_RG/providers/Microsoft.Network/virtualNetworks/DevSub02-VNet-001/subnets/Kubernetes",
      "firstConsecutiveStaticIP": "192.168.15.10",
      "availabilityProfile": "VirtualMachineScaleSets",
      "availabilityZones": [
        "1",
        "2"
      ]
    },
    "agentPoolProfiles": [
      {
        "name": "linuxagents",
        "count": 4,
        "vmSize": "Standard_DS2_v2",
        "OSDiskSizeGB": 64,
        "storageProfile" : "ManagedDisks",
        "vnetSubnetId": "/subscriptions/[redacted]/resourceGroups/DevSub02_Network_RG/providers/Microsoft.Network/virtualNetworks/DevSub02-VNet-001/subnets/Kubernetes",
        "availabilityProfile": "VirtualMachineScaleSets",
        "availabilityZones": [
          "1",
          "2"
        ]
      },
      {
        "name": "winagents",
        "count": 4,
        "vmSize": "Standard_DS2_v2",
        "OSDiskSizeGB": 64,
        "storageProfile" : "ManagedDisks",
        "vnetSubnetId": "/subscriptions/[redacted]/resourceGroups/DevSub02_Network_RG/providers/Microsoft.Network/virtualNetworks/DevSub02-VNet-001/subnets/Kubernetes",
        "availabilityProfile": "VirtualMachineScaleSets",
        "osType": "Windows",
        "availabilityZones": [
          "1",
          "2"
        ]
      }
    ],
    "windowsProfile": {
      "adminUsername": "manager",
      "adminPassword": "[redacted]",
      "windowsPublisher": "MicrosoftWindowsServer",
      "windowsOffer": "WindowsServerSemiAnnual",
      "windowsSku": "Datacenter-Core-1803-with-Containers-smalldisk",
      "imageVersion": "1803.0.20180912"
    },
    "linuxProfile": {
      "adminUsername": "manager",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "[redacted]"
          }
        ]
      }
    },
    "servicePrincipalProfile": {
      "clientId": "[redacted]",
      "secret": "[redacted]"
    }
  }
}
```
**Anything else we need to know**:
No NSG rules applied to the VNet/Subnet 

Partial output from cluster-provision.log (full log attached)
+ '[' 1 -eq 0 ']'+ '[' 99 -eq 100 ']'+ sleep 5+ for i in '$(seq 1 $retries)'+ timeout 30 systemctl daemon-reload+ timeout 30 systemctl restart etcdJob for etcd.service failed because the control process exited with error code. See "systemctl status etcd.service" and "journalctl -xe" for details.+ '[' 1 -eq 0 ']'+ '[' 100 -eq 100 ']'+ return 1+ RESTART_STATUS=1+ systemctl status etcd --no-pager -l+ '[' 1 -ne 0 ']'+ echo 'etcd could not be started'etcd could 

Output of etcd.log
● etcd.service - etcd - highly-available key value store
   Loaded: loaded (/etc/systemd/system/etcd.service; disabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Fri 2018-11-02 02:29:07 UTC; 6ms ago
     Docs: https://github.com/coreos/etcd
           man:etcd
  Process: 29284 ExecStart=/usr/bin/etcd $DAEMON_ARGS (code=exited, status=1/FAILURE)
 Main PID: 29284 (code=exited, status=1/FAILURE)

Nov 02 02:29:07 k8s-master-22668494-vmss000000 systemd[1]: Failed to start etcd - highly-available key value store.
Nov 02 02:29:07 k8s-master-22668494-vmss000000 systemd[1]: etcd.service: Unit entered failed state.
Nov 02 02:29:07 k8s-master-22668494-vmss000000 systemd[1]: etcd.service: Failed with result 'exit-code'.
[cluster-provision.log](https://github.com/Azure/acs-engine/files/2540959/cluster-provision.log)

systemctl status etcd.service:
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: found invalid file/dir lost+found under data dir /var/lib/etcddisk (Ignore this if you are upgrading etcd)
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: the server is already initialized as member before, starting as etcd member...
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: peerTLS: cert = /etc/kubernetes/certs/etcdpeer0.crt, key = /etc/kubernetes/certs/etcdpeer0.key, ca = , trusted-ca = /etc/kubernetes/certs/ca.crt, client-cert-auth = true
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: listening for peers on https://192.168.21.4:2380
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: listening for client requests on 127.0.0.1:2379
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: listening for client requests on 192.168.21.4:2379
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: --initial-cluster must include k8s-master-22668494-vmss000000=https://192.168.21.4:2380 given --initial-advertise-peer-urls=https://192.168.21.4:2380
Nov 02 02:28:26 k8s-master-22668494-vmss000000 systemd[1]: etcd.service: Main process exited, code=exited, status=1/FAILURE
Nov 02 02:28:26 k8s-master-22668494-vmss000000 systemd[1]: Failed to start etcd - highly-available key value store.

journalctl -xe | more
Subject: Unit etcd.service has failed
Defined-By: systemd
Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Unit etcd.service has failed.

The result is failed.
Nov 02 02:28:26 k8s-master-22668494-vmss000000 systemd[1]: etcd.service: Unit entered failed state.
Nov 02 02:28:26 k8s-master-22668494-vmss000000 systemd[1]: etcd.service: Failed with result 'exit-code'.
Nov 02 02:28:27 k8s-master-22668494-vmss000000 systemd[1]: etcd.service: Service hold-off time over, scheduling restart.
Nov 02 02:28:27 k8s-master-22668494-vmss000000 systemd[1]: Stopped etcd - highly-available key value store.
Subject: Unit etcd.service has finished shutting down
Defined-By: systemd
Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Unit etcd.service has finished shutting down.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Custom Script Extension Error Code 14 - failed to start etcd - highly-available key value store #4168

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Custom Script Extension Error Code 14 - failed to start etcd - highly-available key value store #4168

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions