-
Notifications
You must be signed in to change notification settings - Fork 553
Custom Script Extension Error Code 14 - failed to start etcd - highly-available key value store #4168
Description
Is this a request for help?
Yes
Is this an ISSUE or FEATURE REQUEST? (choose one):
ISSUE
What version of acs-engine?:
acs-engine version
Version: v0.24.3
GitCommit: ad007b6
GitTreeState: clean
Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes 1.12.1
What happened:
Master VMSS scale fails with the custom script extension throwing an error 14.
{
"status": "Failed",
"error": {
"code": "ResourceDeploymentFailure",
"message": "The resource operation completed with terminal provisioning state 'Failed'.",
"details": [
{
"code": "VMExtensionProvisioningError",
"message": "VM has reported a failure when processing extension 'k8s-master-22668494-vmssCSE'. Error message: \"Enable failed: failed to execute command: command terminated with exit status=14\n[stdout]\n\n[stderr]\n\"."
}
]
}
}
What you expected to happen:
Success setup of cluster
How to reproduce it (as minimally and precisely as possible):
I was trying to setup cluster across Availability Zones in Central US region.
The cluster is attached to an existing VNet and I had two agent types - one Windows and one Linux.
Here is my acs-engine config file
{
"apiVersion": "vlabs",
"properties": {
"orchestratorProfile": {
"orchestratorType": "Kubernetes",
"orchestratorVersion": "1.12.1"
},
"masterProfile": {
"count": 5,
"dnsPrefix": "",
"vmSize": "Standard_DS2_v2",
"OSDiskSizeGB": 64,
"storageProfile" : "ManagedDisks",
"vnetSubnetId": "/subscriptions/[redacted]/resourceGroups/DevSub02_Network_RG/providers/Microsoft.Network/virtualNetworks/DevSub02-VNet-001/subnets/Kubernetes-Master",
"agentVnetSubnetID": "/subscriptions/[redacted]/resourceGroups/DevSub02_Network_RG/providers/Microsoft.Network/virtualNetworks/DevSub02-VNet-001/subnets/Kubernetes",
"firstConsecutiveStaticIP": "192.168.15.10",
"availabilityProfile": "VirtualMachineScaleSets",
"availabilityZones": [
"1",
"2"
]
},
"agentPoolProfiles": [
{
"name": "linuxagents",
"count": 4,
"vmSize": "Standard_DS2_v2",
"OSDiskSizeGB": 64,
"storageProfile" : "ManagedDisks",
"vnetSubnetId": "/subscriptions/[redacted]/resourceGroups/DevSub02_Network_RG/providers/Microsoft.Network/virtualNetworks/DevSub02-VNet-001/subnets/Kubernetes",
"availabilityProfile": "VirtualMachineScaleSets",
"availabilityZones": [
"1",
"2"
]
},
{
"name": "winagents",
"count": 4,
"vmSize": "Standard_DS2_v2",
"OSDiskSizeGB": 64,
"storageProfile" : "ManagedDisks",
"vnetSubnetId": "/subscriptions/[redacted]/resourceGroups/DevSub02_Network_RG/providers/Microsoft.Network/virtualNetworks/DevSub02-VNet-001/subnets/Kubernetes",
"availabilityProfile": "VirtualMachineScaleSets",
"osType": "Windows",
"availabilityZones": [
"1",
"2"
]
}
],
"windowsProfile": {
"adminUsername": "manager",
"adminPassword": "[redacted]",
"windowsPublisher": "MicrosoftWindowsServer",
"windowsOffer": "WindowsServerSemiAnnual",
"windowsSku": "Datacenter-Core-1803-with-Containers-smalldisk",
"imageVersion": "1803.0.20180912"
},
"linuxProfile": {
"adminUsername": "manager",
"ssh": {
"publicKeys": [
{
"keyData": "[redacted]"
}
]
}
},
"servicePrincipalProfile": {
"clientId": "[redacted]",
"secret": "[redacted]"
}
}
}
Anything else we need to know:
No NSG rules applied to the VNet/Subnet
Partial output from cluster-provision.log (full log attached)
- '[' 1 -eq 0 ']'+ '[' 99 -eq 100 ']'+ sleep 5+ for i in '$(seq 1 $retries)'+ timeout 30 systemctl daemon-reload+ timeout 30 systemctl restart etcdJob for etcd.service failed because the control process exited with error code. See "systemctl status etcd.service" and "journalctl -xe" for details.+ '[' 1 -eq 0 ']'+ '[' 100 -eq 100 ']'+ return 1+ RESTART_STATUS=1+ systemctl status etcd --no-pager -l+ '[' 1 -ne 0 ']'+ echo 'etcd could not be started'etcd could
Output of etcd.log
● etcd.service - etcd - highly-available key value store
Loaded: loaded (/etc/systemd/system/etcd.service; disabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Fri 2018-11-02 02:29:07 UTC; 6ms ago
Docs: https://github.com/coreos/etcd
man:etcd
Process: 29284 ExecStart=/usr/bin/etcd $DAEMON_ARGS (code=exited, status=1/FAILURE)
Main PID: 29284 (code=exited, status=1/FAILURE)
Nov 02 02:29:07 k8s-master-22668494-vmss000000 systemd[1]: Failed to start etcd - highly-available key value store.
Nov 02 02:29:07 k8s-master-22668494-vmss000000 systemd[1]: etcd.service: Unit entered failed state.
Nov 02 02:29:07 k8s-master-22668494-vmss000000 systemd[1]: etcd.service: Failed with result 'exit-code'.
cluster-provision.log
systemctl status etcd.service:
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: found invalid file/dir lost+found under data dir /var/lib/etcddisk (Ignore this if you are upgrading etcd)
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: the server is already initialized as member before, starting as etcd member...
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: peerTLS: cert = /etc/kubernetes/certs/etcdpeer0.crt, key = /etc/kubernetes/certs/etcdpeer0.key, ca = , trusted-ca = /etc/kubernetes/certs/ca.crt, client-cert-auth = true
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: listening for peers on https://192.168.21.4:2380
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: listening for client requests on 127.0.0.1:2379
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: listening for client requests on 192.168.21.4:2379
Nov 02 02:28:26 k8s-master-22668494-vmss000000 etcd[27951]: --initial-cluster must include k8s-master-22668494-vmss000000=https://192.168.21.4:2380 given --initial-advertise-peer-urls=https://192.168.21.4:2380
Nov 02 02:28:26 k8s-master-22668494-vmss000000 systemd[1]: etcd.service: Main process exited, code=exited, status=1/FAILURE
Nov 02 02:28:26 k8s-master-22668494-vmss000000 systemd[1]: Failed to start etcd - highly-available key value store.
journalctl -xe | more
Subject: Unit etcd.service has failed
Defined-By: systemd
Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Unit etcd.service has failed.
The result is failed.
Nov 02 02:28:26 k8s-master-22668494-vmss000000 systemd[1]: etcd.service: Unit entered failed state.
Nov 02 02:28:26 k8s-master-22668494-vmss000000 systemd[1]: etcd.service: Failed with result 'exit-code'.
Nov 02 02:28:27 k8s-master-22668494-vmss000000 systemd[1]: etcd.service: Service hold-off time over, scheduling restart.
Nov 02 02:28:27 k8s-master-22668494-vmss000000 systemd[1]: Stopped etcd - highly-available key value store.
Subject: Unit etcd.service has finished shutting down
Defined-By: systemd
Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Unit etcd.service has finished shutting down.