Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

VirtualMachineScaleSets support for Kubernetes #2620

Merged
merged 24 commits into from
Apr 24, 2018
Merged

Conversation

sozercan
Copy link
Member

@sozercan sozercan commented Apr 6, 2018

What this PR does / why we need it:

  • Adds support for Virtual Machine Scale sets for Kubernetes v1.10 agent pools

Which issue this PR fixes:
fixes #2403

If applicable:

  • documentation
  • unit tests
  • tested backward compatibility (ie. deploy with previous version, upgrade with this branch)

Release note:

Add Virtual Machine Scale Sets support for Kubernetes v1.10+ agent availability profile

cc @feiskyer @JiangtianLi

@ghost ghost assigned sozercan Apr 6, 2018
@ghost ghost added the in progress label Apr 6, 2018
@feiskyer
Copy link
Member

feiskyer commented Apr 7, 2018

👍 VirtualMachineScaleSets is only supported with kubernetes v1.10 and above versions. So I think we don't need to check backward compatibility.

@todoesverso
Copy link

I don't really understand why is this documented as supported if there is an open PR. Is that a bug?

https://github.com/Azure/acs-engine/blob/master/docs/clusterdefinition.md#agentpoolprofiles

@sozercan
Copy link
Member Author

sozercan commented Apr 10, 2018

@todoesverso VMSS is supported for Swarm and DCOS but not for Kubernetes yet (support is added with Kubernetes v1.10)

@PoisonousJohn
Copy link

PoisonousJohn commented Apr 11, 2018

@sozercan hey! we're trying to test this PR with kubernetes 1.10.0 and cluster autoscaler. We've got a problem with unregistered nodes.

I0410 05:35:30.594811       1 static_autoscaler.go:184] 1 unregistered nodes present
I0410 05:35:30.594853       1 utils.go:330] Removing unregistered node 
I0410 05:35:30.594952       1 azure_scale_set.go:273] Deleting vmss instances [%!q(*azure.azureRef=&{})]
E0410 05:35:30.594990       1 azure_scale_set.go:293] getLastSegment failed with error: identifier '/' not found in resource name ""
W0410 05:35:30.595004       1 utils.go:353] Failed to remove node : identifier '/' not found in resource name ""
E0410 05:35:30.595027       1 static_autoscaler.go:191] Failed to remove unregistered nodes: identifier '/' not found in resource name ""

Digging through other issues, I found this

kubernetes/autoscaler#775

Is useInstanceMetadata supposed to be false with VMSS and k8s 1.10.0?
We've tried to set useInstanceMetadata to true, but cluster fails to deploy.

@feiskyer
Copy link
Member

Is useInstanceMetadata supposed to be false with VMSS and k8s 1.10.0?

Yep, this is supported in v1.10. But there are a bug for vmss with autoscaler 1.2.0, the fix is here kubernetes/autoscaler#758. It should be included in v1.2.1.

@PoisonousJohn
Copy link

@feiskyer oh thx! We'll try it with 1.2.1.

Copy link
Contributor

@khenidak khenidak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few questions/changes but i think it is a great start

]
}
},
"servicePrincipalProfile": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us default our examples to MSI. MSI (and later EMSi when possible)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, updated

@@ -119,7 +119,11 @@
"cloudProviderRatelimitQPS": "[parameters('cloudProviderRatelimitQPS')]",
"cloudProviderRatelimitBucket": "[parameters('cloudProviderRatelimitBucket')]",
"useManagedIdentityExtension": "{{ UseManagedIdentity }}",
{{if .HasVirtualMachineScaleSets}}
"useInstanceMetadata": "false",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.10 supports setting useInstanceMetadata to both true and false, so we should keep same logic: "useInstanceMetadata": "{{ UseInstanceMetadata }}"

Copy link
Member Author

@sozercan sozercan Apr 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

due to kubernetes/kubernetes#62611, if instanceMetadata is enabled and vmType is set to vmss, master kubelet won't start. If it's disabled, it works fine. One possible (and messy) workaround is to duplicate cloud config on the masters, so master has kubelet vmType as standard and controller-manager has vmss. Better option is to wait until 1.10.2 with above pr merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is experimental (we'll add docs later) let's change the above to "useInstanceMetadata": "{{ UseInstanceMetadata }}". Experimenters pre-1.10.2 should specify "useInstanceMetadata": "false" in their api models until then.

This way we don't have to submit a follow-up PR after 1.10.2.

@@ -55,6 +55,8 @@
{{if $index}}, {{end}}
{{if .IsWindows}}
{{template "k8s/kuberneteswinagentresourcesvmas.t" .}}
{{else if .IsVirtualMachineScaleSets}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a reason not support windows as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

windows agents should be working now

for _, agentProfile := range cs.Properties.AgentPoolProfiles {
if agentProfile.IsAvailabilitySets() {
return true
}
}
return false
},
"AnyAgentUsesVirtualMachineScaleSets": func() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't allow bootstrapping a mixed mode cluster. Either availability sets or scale sets but not both.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added validation check so mixed mode clusters are not allowed

"secrets": "[variables('linuxProfileSecrets')]"
{{end}}
},
"storageProfile": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a PR recently enabling users to provision using their own images. is this included? Used currently by openshift on acs-engine @jim-minter FYI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, seems to be included, thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custom images should be included now

"osDisk": {
"createOption": "FromImage",
"caching": "ReadWrite"
{{if .IsStorageAccount}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackfrancis I think we should validate against storage accounts for new clusters while allowing them for scale ops. Is this possible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khenidak I'm not sure what you mean.

@JackQuincy can you confirm that there's no special validation/template processing context for deployment vs scale?

@khenidak
Copy link
Contributor

@kkmsft for autoscaler + VMSS work

@sozercan sozercan force-pushed the vmss branch 2 times, most recently from 90346d0 to 092547e Compare April 14, 2018 00:33
@sozercan sozercan changed the title [WIP] VirtualMachineScaleSets support for Kubernetes VirtualMachineScaleSets support for Kubernetes Apr 17, 2018
@feiskyer
Copy link
Member

@sozercan Thanks for awesome work. LGTM

@JiangtianLi Could you also have a look at this?

"subnetName": "$global:SubnetName",
"securityGroupName": "$global:SecurityGroupName",
"vnetName": "$global:VNetName",
"routeTableName": "$global:RouteTableName",
"primaryAvailabilitySetName": "$global:PrimaryAvailabilitySetName",
"primaryScaleSetName": "$global:PrimaryScaleSetName",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is primaryScaleSetName used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both vmType and primaryScaleSetName are part of the cloud provider config for Azure for v1.10.

@@ -188,11 +186,13 @@ Write-AzureConfig()
"aadClientSecret": "$AADClientSecret",
"resourceGroup": "$global:ResourceGroup",
"location": "$Location",
"vmType": "$global:VmType",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is vmType used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@@ -346,7 +345,7 @@ $KubeletCommandLine
"@
} else {
$kubeStartStr += @"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for cleaning up. :)

@@ -0,0 +1,193 @@
{{if .IsStorageAccount}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the diff of this file with kubernetesagentresourcesvmss.t?

Copy link
Member Author

@sozercan sozercan Apr 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

74c74
<       "resourceNameSuffix" : "[variables('winResourceNamePrefix')]",
---
>       "resourceNameSuffix" : "[variables('nameSuffix')]",
131,134c131,148
<           "computerNamePrefix": "[concat(substring(variables('nameSuffix'), 0, 5), 'acs')]",
<           {{GetKubernetesWindowsAgentCustomData .}}
<           "adminUsername": "[variables('windowsAdminUsername')]",
<           "adminPassword": "[variables('windowsAdminPassword')]"
---
>           "adminUsername": "[variables('username')]",
>           "computerNamePrefix": "[variables('{{.Name}}VMNamePrefix')]",
>           {{GetKubernetesAgentCustomData .}}
>           "linuxConfiguration": {
>               "disablePasswordAuthentication": "true",
>               "ssh": {
>                 "publicKeys": [
>                   {
>                     "keyData": "[parameters('sshRSAPublicKey')]",
>                     "path": "[variables('sshKeyPath')]"
>                   }
>                 ]
>               }
>             }
>             {{if HasLinuxSecrets}}
>               ,
>               "secrets": "[variables('linuxProfileSecrets')]"
>             {{end}}
137c151,153
<           {{GetDataDisks .}}
---
>           {{if not (UseAgentCustomImage .)}}
>             {{GetDataDisks .}}
>           {{end}}
139,142c155,162
<             "offer": "[variables('agentWindowsOffer')]",
<             "publisher": "[variables('agentWindowsPublisher')]",
<             "sku": "[variables('agentWindowsSku')]",
<             "version": "[variables('agentWindowsVersion')]"
---
>             {{if UseAgentCustomImage .}}
>             "id": "[resourceId(variables('{{.Name}}osImageResourceGroup'), 'Microsoft.Compute/images', variables('{{.Name}}osImageName'))]"
>             {{else}}
>             "offer": "[variables('{{.Name}}osImageOffer')]",
>             "publisher": "[variables('{{.Name}}osImagePublisher')]",
>             "sku": "[variables('{{.Name}}osImageSKU')]",
>             "version": "[variables('{{.Name}}osImageVersion')]"
>             {{end}}
163,165c183,185
<                 "publisher": "Microsoft.Compute",
<                 "type": "CustomScriptExtension",
<                 "typeHandlerVersion": "1.8",
---
>                 "publisher": "Microsoft.Azure.Extensions",
>                 "type": "CustomScript",
>                 "typeHandlerVersion": "2.0",
169c189
<                     "commandToExecute": "[concat('powershell.exe -ExecutionPolicy Unrestricted -command \"', '$arguments = ', variables('singleQuote'),'-MasterIP ',variables('kubernetesAPIServerIP'),' -KubeDnsServiceIp ',variables('kubeDnsServiceIp'),' -MasterFQDNPrefix ',variables('masterFqdnPrefix'),' -Location ',variables('location'),' -AgentKey ',variables('clientPrivateKey'),' -AADClientId ',variables('servicePrincipalClientId'),' -AADClientSecret ',variables('servicePrincipalClientSecret'),variables('singleQuote'), ' ; ', variables('windowsCustomScriptSuffix'), '\" > %SYSTEMDRIVE%\\AzureData\\CustomDataSetupScript.log 2>&1')]"
---
>                   "commandToExecute": "[concat(variables('provisionScriptParametersCommon'),' /usr/bin/nohup /bin/bash -c \"/bin/bash /opt/azure/containers/provision.sh >> /var/log/azure/cluster-provision.log 2>&1\"')]"
178c198
<                 "type": "ManagedIdentityExtensionForWindows",
---
>                 "type": "ManagedIdentityExtensionForLinux",

Copy link
Contributor

@JiangtianLi JiangtianLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. Generally lgtm. left a few comments.

@jackfrancis
Copy link
Member

Just one more request @sozercan: could you kindly add a validation that requires instanceMetadata to be false for k8s < 1.10.2? (I know that's basically everything now but that will become useful over time.)

Thanks so much for this, looks great!

@khenidak
Copy link
Contributor

@jackfrancis that is a bit of an overkill? we can run imds on 1.8 and 1.9

@CecileRobertMichon
Copy link
Contributor

Is the description for availabilityProfile at https://github.com/Azure/acs-engine/blob/master/docs/clusterdefinition.md#agentpoolprofiles still accurate & relevant?

Supported values are VirtualMachineScaleSets (default) and AvailabilitySet. For Kubernetes clusters before k8s version 1.10, use AvailabilitySet. Otherwise, you should use VirtualMachineScaleSets, unless you need features such as dynamic attached disks

@ghost ghost added the in progress label Apr 24, 2018
@sozercan
Copy link
Member Author

sozercan commented Apr 24, 2018

@jackfrancis added validation when using VMSS with k8s < v1.10.2

@CecileRobertMichon thanks, updated

@jackfrancis
Copy link
Member

@khenidak Sorry I wasn't clear, I meant require imds = false for vmss

I see you did that in a recent commit, thanks @sozercan!

Copy link
Member

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support vmss in kubernetes deployment
9 participants