Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Commit

Permalink
Generic kubelet config interface (#1854)
Browse files Browse the repository at this point in the history
* demonstrating a generic kubelet config interface

* setting the desired outcome

* filling out default kubelet config

* cleanup

* docs and blacklist

* docs

* dead code

* line continuation

* punt on blacklist

* revert

* revert

* agent nodes updates

* implement read-only kubelet config properties

* deprecate KubernetesConfig.HardEvictionThreshold

* deprecated KubernetesConfig NodeStatusUpdateFrequency

* agent kubelet doesn’t use register node or taints

* moving KubernetesConfig into master/agent namespace

* using MasterProfile reference to be consistent with agents

* optimize non-user configurable values

* fixed for agent pool-only

* NonMasqueradeCidr and docs

* table markdown

* more kubelet config defaults, documented

* restored validations

* windows-specific network-policy enforcement

* added plugins volume mount

* remove docs for hardEvictionThreshold property

* work toward v1.5 support

* enforce kubelet config key ordering for consistency

and to pass tests

* really enforcing > 1.5

* make + append = twice too many

* rearranging furniture

* systemd and ${}

* using a common kubelet config

* cleaner diff

* easier easier diff

* cruft

* need kubenet override

* rationalize defaults + overrides
  • Loading branch information
jackfrancis committed Dec 19, 2017
1 parent cda9631 commit 3c70413
Show file tree
Hide file tree
Showing 27 changed files with 4,646 additions and 230 deletions.
53 changes: 51 additions & 2 deletions docs/clusterdefinition.md
Expand Up @@ -37,15 +37,16 @@ Here are the valid values for the orchestrator types:
|dnsServiceIP|no|IP address for kube-dns to listen on. If specified must be in the range of `serviceCidr`.|
|dockerBridgeSubnet|no|The specific IP and subnet used for allocating IP addresses for the docker bridge network created on the kubernetes master and agents. Default value is 172.17.0.1/16. This value is used to configure the docker daemon using the [--bip flag](https://docs.docker.com/engine/userguide/networking/default_network/custom-docker0).|
|serviceCidr|no|IP range for Service IPs, Default is "10.0.0.0/16". This range is never routed outside of a node so does not need to lie within clusterSubnet or the VNet.|
|nonMasqueradeCidr|no|CIDR block to exclude from default source NAT, Default is "10.0.0.0/8".|
|enableRbac|no|Enable [Kubernetes RBAC](https://kubernetes.io/docs/admin/authorization/rbac/) (boolean - default == false) |
|enableAggregatedAPIs|no|Enable [Kubernetes Aggregated APIs](https://kubernetes.io/docs/concepts/api-extension/apiserver-aggregation/).This is required by [Service Catalog](https://github.com/kubernetes-incubator/service-catalog/blob/master/README.md). (boolean - default == false) |
|maxPods|no|The maximum number of pods per node. The minimum valid value, necessary for running kube-system pods, is 5. Default value is 30 when networkPolicy equals azure, 110 otherwise.|
|gcHighThreshold|no|Sets the --image-gc-high-threshold value on the kublet configuration. Default is 85. [See kubelet Garbage Collection](https://kubernetes.io/docs/concepts/cluster-administration/kubelet-garbage-collection/) |
|gcLowThreshold|no|Sets the --image-gc-low-threshold value on the kublet configuration. Default is 80. [See kubelet Garbage Collection](https://kubernetes.io/docs/concepts/cluster-administration/kubelet-garbage-collection/) |
|hardEvictionThreshold|no|Sets the --eviction-hard value on the kublet configuration. Default is `memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%`. [See Hard Eviction Thesholds](https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#hard-eviction-thresholds) |
|useInstanceMetadata|no|Use the Azure cloudprovider instance metadata service for appropriate resource discovery operations. Default is `true`.|
|addons|no|Configure various Kubernetes addons configuration (currently supported: tiller, kubernetes-dashboard). See `addons` configuration below.|
|kubeletConfig|no|Configure various runtime configuration for kubelet. See `kubeletConfig` below.|

#### addons

`addons` describes various addons configuration. It is a child property of `kubernetesConfig`. Below is a list of currently available addons:

Expand Down Expand Up @@ -134,6 +135,54 @@ Additionally above, we specified a custom docker image for tiller, let's say we

Finally, the `addons.enabled` boolean property was omitted above; that's by design. If you specify a `containers` configuration, acs-engine assumes you're enabling the addon. The very first example above demonstrates a simple "enable this addon with default configuration" declaration.

#### kubeletConfig

`kubeletConfig` declares runtime configuration for the kubelet running on all master and agent nodes. It is a generic key/value object, and a child property of `kubernetesConfig`. An example custom kubelet config:

```
"kubernetesConfig": {
"kubeletConfig": {
"--eviction-hard": "memory.available<250Mi,nodefs.available<20%,nodefs.inodesFree<10%"
}
}
```

See [here](https://kubernetes.io/docs/reference/generated/kubelet/) for a reference of supported kubelet options.

Below is a list of kubelet options that acs-engine will configure by default:

|kubelet option|default value|
|---|---|
|"--pod-infra-container-image"|"pause-amd64:<version>"|
|"--max-pods"|"110"|
|"--eviction-hard"|"memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%"|
|"--node-status-update-frequency"|"10s"|
|"--image-gc-high-threshold"|"85"|
|"--image-gc-low-threshold"|"850"|
|"--non-masquerade-cidr"|"10.0.0.0/8"|

Below is a list of kubelet options that are *not* currently user-configurable, either because a higher order configuration vector is available that enforces kubelet configuration, or because a static configuration is required to build a functional cluster:

|kubelet option|default value|
|---|---|
|"--address"|"0.0.0.0"|
|"--azure-container-registry-config"|"/etc/kubernetes/azure.json"|
|"--allow-privileged"|"true"|
|"--pod-manifest-path"|"/etc/kubernetes/manifests"|
|"--cluster-domain"|"cluster.local"|
|"--cloud-config"|"/etc/kubernetes/azure.json"|
|"--cloud-provider"|"azure"|
|"--network-plugin"|"cni"|
|"--node-labels"|(based on Azure node metadata)|
|"--cgroups-per-qos"|"false"|
|"--enforce-node-allocatable"|""|
|"--kubeconfig"|"/var/lib/kubelet/kubeconfig"|
|"--register-node" (master nodes only)|"true"|
|"--register-with-taints" (master nodes only)|"node-role.kubernetes.io/master=true:NoSchedule"|
|"--feature-gates" (agent nodes only)|"Accelerators=true"|

We consider `kubeletConfig` to be a generic convenience that is powerful and comes with no operational guarantees when used! It is a manual tuning feature that enables low-level configuration of a kubernetes cluster.

### masterProfile
`masterProfile` describes the settings for master configuration.

Expand Down
4 changes: 2 additions & 2 deletions docs/kubernetes-large-clusters.md
Expand Up @@ -46,8 +46,8 @@ The following configuration parameters are available in the `properties.orchestr
"kubernetesCtrlMgrRouteReconciliationPeriod": {
"value": "1m" // how often to reconcile cloudprovider-originating node routes
},
"kubernetesNodeStatusUpdateFrequency": {
"value": "1m" // how often kubelet posts node status to master
"kubeletConfig": {
"--node-status-update-frequency": "1m" // how often kubelet posts node status to master
}
```
The [examples/largeclusters/kubernetes.json](https://github.com/Azure/acs-engine/blob/master/examples/largeclusters/kubernetes.json) api model example suggests how you might opt into these large cluster features following the guidelines above.
6 changes: 4 additions & 2 deletions examples/largeclusters/kubernetes.json
Expand Up @@ -5,7 +5,6 @@
"orchestratorType": "Kubernetes",
"orchestratorRelease": "1.6",
"kubernetesConfig": {
"nodeStatusUpdateFrequency": "1m",
"ctrlMgrNodeMonitorGracePeriod": "5m",
"ctrlMgrPodEvictionTimeout": "1m",
"ctrlMgrRouteReconciliationPeriod": "1m",
Expand All @@ -16,7 +15,10 @@
"cloudProviderBackoffExponent": 1.5,
"cloudProviderRateLimit": true,
"cloudProviderRateLimitQPS": 3,
"cloudProviderRateLimitBucket": 10
"cloudProviderRateLimitBucket": 10,
"kubeletConfig": {
"--node-status-update-frequency": "1m"
}
}
},
"masterProfile": {
Expand Down
26 changes: 2 additions & 24 deletions parts/k8s/kubernetesagentcustomdata.yml
Expand Up @@ -103,36 +103,14 @@ write_files:
permissions: "0644"
owner: "root"
content: |
KUBELET_CLUSTER_DNS={{WrapAsVariable "kubeDNSServiceIP"}}
KUBELET_API_SERVERS=https://{{WrapAsVariable "kubernetesAPIServerIP"}}:443
KUBELET_CONFIG={{GetKubeletConfigKeyVals}}
KUBELET_IMAGE={{WrapAsVariable "kubernetesHyperkubeSpec"}}
KUBELET_NETWORK_PLUGIN=kubenet
KUBELET_MAX_PODS=110
DOCKER_OPTS=
CUSTOM_CMD=/bin/true
KUBELET_REGISTER_SCHEDULABLE=true
KUBELET_NODE_LABELS={{GetAgentKubernetesLabels . "',variables('labelResourceGroup'),'"}}
KUBELET_POD_INFRA_CONTAINER_IMAGE={{WrapAsVariable "kubernetesPodInfraContainerSpec"}}
KUBELET_HARD_EVICTION_THRESHOLD={{WrapAsVariable "kubernetesHardEvictionThreshold"}}
KUBELET_NODE_STATUS_UPDATE_FREQUENCY={{WrapAsVariable "kubernetesNodeStatusUpdateFrequency"}}
KUBE_CTRL_MGR_NODE_MONITOR_GRACE_PERIOD={{WrapAsVariable "kubernetesCtrlMgrNodeMonitorGracePeriod"}}
KUBE_CTRL_MGR_POD_EVICTION_TIMEOUT={{WrapAsVariable "kubernetesCtrlMgrPodEvictionTimeout"}}
KUBE_CTRL_MGR_ROUTE_RECONCILIATION_PERIOD={{WrapAsVariable "kubernetesCtrlMgrRouteReconciliationPeriod"}}
KUBELET_IMAGE_GC_HIGH_THRESHOLD={{WrapAsVariable "gchighthreshold"}}
KUBELET_IMAGE_GC_LOW_THRESHOLD={{WrapAsVariable "gclowthreshold"}}
{{if IsKubernetesVersionGe "1.6.0"}}
KUBELET_NON_MASQUERADE_CIDR=--non-masquerade-cidr={{WrapAsVariable "kubernetesNonMasqueradeCidr"}}
KUBELET_NON_MASQUERADE_CIDR={{WrapAsVariable "kubernetesNonMasqueradeCidr"}}
KUBELET_FEATURE_GATES=--feature-gates=Accelerators=true
{{if IsKubernetesVersionTilde "1.6.x"}}
KUBELET_FIX_43704_1=--cgroups-per-qos=false
KUBELET_FIX_43704_2=--enforce-node-allocatable=
KUBELET_FIX_43704_3=""
{{end}}
{{end}}
{{if UseCloudControllerManager }}
CLOUD_PROVIDER=external
{{else}}
CLOUD_PROVIDER=azure
{{end}}

- path: "/etc/systemd/system/kubelet.service"
Expand Down
23 changes: 3 additions & 20 deletions parts/k8s/kuberneteskubelet.service
Expand Up @@ -33,30 +33,13 @@ ExecStart=/usr/bin/docker run \
--volume=/usr/libexec/kubernetes/kubelet-plugins:/usr/libexec/kubernetes/kubelet-plugins \
${KUBELET_IMAGE} \
/hyperkube kubelet \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--require-kubeconfig \
--pod-infra-container-image="${KUBELET_POD_INFRA_CONTAINER_IMAGE}" \
--address=0.0.0.0 \
--allow-privileged=true \
${KUBELET_FIX_43704_1} \
${KUBELET_FIX_43704_2}${KUBELET_FIX_43704_3} \
--enable-server \
--pod-manifest-path=/etc/kubernetes/manifests \
--cluster-dns=${KUBELET_CLUSTER_DNS} \
--cluster-domain=cluster.local \
--node-labels="${KUBELET_NODE_LABELS}" \
--cloud-provider=${CLOUD_PROVIDER} \
--cloud-config=/etc/kubernetes/azure.json \
--azure-container-registry-config=/etc/kubernetes/azure.json \
--network-plugin=${KUBELET_NETWORK_PLUGIN} \
--max-pods=${KUBELET_MAX_PODS} \
--eviction-hard="${KUBELET_HARD_EVICTION_THRESHOLD}" \
--node-status-update-frequency=${KUBELET_NODE_STATUS_UPDATE_FREQUENCY} \
--image-gc-high-threshold=${KUBELET_IMAGE_GC_HIGH_THRESHOLD} \
--image-gc-low-threshold=${KUBELET_IMAGE_GC_LOW_THRESHOLD} \
--v=2 ${KUBELET_FEATURE_GATES} \
${KUBELET_NON_MASQUERADE_CIDR} \
--non-masquerade-cidr=${KUBELET_NON_MASQUERADE_CIDR} \
$KUBELET_CONFIG \
${KUBELET_REGISTER_NODE} ${KUBELET_REGISTER_WITH_TAINTS}

[Install]
WantedBy=multi-user.target
WantedBy=multi-user.target
15 changes: 1 addition & 14 deletions parts/k8s/kuberneteskubelet1.5.service
Expand Up @@ -33,26 +33,13 @@ ExecStart=/usr/bin/docker run \
--volume=/usr/libexec/kubernetes/kubelet-plugins:/usr/libexec/kubernetes/kubelet-plugins \
${KUBELET_IMAGE} \
/hyperkube kubelet \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--require-kubeconfig \
--pod-infra-container-image="${KUBELET_POD_INFRA_CONTAINER_IMAGE}" \
--address=0.0.0.0 \
--allow-privileged=true \
--enable-server \
--enable-debugging-handlers \
--pod-manifest-path=/etc/kubernetes/manifests \
--cluster-dns=${KUBELET_CLUSTER_DNS} \
--cluster-domain=cluster.local \
--register-schedulable=${KUBELET_REGISTER_SCHEDULABLE} \
--node-labels="${KUBELET_NODE_LABELS}" \
--cloud-provider=azure \
--cloud-config=/etc/kubernetes/azure.json \
--azure-container-registry-config=/etc/kubernetes/azure.json \
--hairpin-mode=promiscuous-bridge \
--network-plugin=${KUBELET_NETWORK_PLUGIN} \
--node-status-update-frequency=${KUBELET_NODE_STATUS_UPDATE_FREQUENCY} \
--image-gc-high-threshold=${KUBELET_IMAGE_GC_HIGH_THRESHOLD} \
--image-gc-low-threshold=${KUBELET_IMAGE_GC_LOW_THRESHOLD} \
${KUBELET_CONFIG} \
--v=2 ${KUBELET_FEATURE_GATES}

[Install]
Expand Down
25 changes: 2 additions & 23 deletions parts/k8s/kubernetesmastercustomdata.yml
Expand Up @@ -232,40 +232,19 @@ write_files:
permissions: "0644"
owner: "root"
content: |
KUBELET_CLUSTER_DNS={{WrapAsVariable "kubeDNSServiceIP"}}
KUBELET_API_SERVERS={{WrapAsVerbatim "concat('https://', variables('masterPrivateIpAddrs')[copyIndex(variables('masterOffset'))], ':443')"}}
KUBELET_CONFIG={{GetKubeletConfigKeyVals}}
KUBELET_IMAGE={{WrapAsVariable "kubernetesHyperkubeSpec"}}
KUBELET_NETWORK_PLUGIN=
KUBELET_MAX_PODS=110
DOCKER_OPTS=
KUBELET_NODE_LABELS={{GetMasterKubernetesLabels "',variables('labelResourceGroup'),'"}}
KUBELET_POD_INFRA_CONTAINER_IMAGE={{WrapAsVariable "kubernetesPodInfraContainerSpec"}}
KUBELET_HARD_EVICTION_THRESHOLD={{WrapAsVariable "kubernetesHardEvictionThreshold"}}
KUBELET_NODE_STATUS_UPDATE_FREQUENCY={{WrapAsVariable "kubernetesNodeStatusUpdateFrequency"}}
KUBE_CTRL_MGR_NODE_MONITOR_GRACE_PERIOD={{WrapAsVariable "kubernetesCtrlMgrNodeMonitorGracePeriod"}}
KUBE_CTRL_MGR_POD_EVICTION_TIMEOUT={{WrapAsVariable "kubernetesCtrlMgrPodEvictionTimeout"}}
KUBE_CTRL_MGR_ROUTE_RECONCILIATION_PERIOD={{WrapAsVariable "kubernetesCtrlMgrRouteReconciliationPeriod"}}
KUBELET_IMAGE_GC_HIGH_THRESHOLD={{WrapAsVariable "gchighthreshold"}}
KUBELET_IMAGE_GC_LOW_THRESHOLD={{WrapAsVariable "gclowthreshold"}}
{{if IsKubernetesVersionGe "1.6.0"}}
{{if HasLinuxAgents}}
KUBELET_NON_MASQUERADE_CIDR=--non-masquerade-cidr={{WrapAsVariable "kubernetesNonMasqueradeCidr"}}
KUBELET_NON_MASQUERADE_CIDR={{WrapAsVariable "kubernetesNonMasqueradeCidr"}}
KUBELET_REGISTER_NODE=--register-node=true
KUBELET_REGISTER_WITH_TAINTS=--register-with-taints={{WrapAsVariable "registerWithTaints"}}
{{end}}
{{if IsKubernetesVersionTilde "1.6.x"}}
KUBELET_FIX_43704_1=--cgroups-per-qos=false
KUBELET_FIX_43704_2=--enforce-node-allocatable=
KUBELET_FIX_43704_3=""
{{end}}
{{else}}
KUBELET_REGISTER_SCHEDULABLE={{WrapAsVariable "registerSchedulable"}}
{{end}}
{{if UseCloudControllerManager }}
CLOUD_PROVIDER=external
{{else}}
CLOUD_PROVIDER=azure
{{end}}

- path: "/etc/systemd/system/kubelet.service"
permissions: "0644"
Expand Down
2 changes: 0 additions & 2 deletions parts/k8s/kubernetesmastervars.t
Expand Up @@ -48,8 +48,6 @@
"kubernetesReschedulerCPULimit": "[parameters('kubernetesReschedulerCPULimit')]",
"kubernetesReschedulerMemoryLimit": "[parameters('kubernetesReschedulerMemoryLimit')]",
"kubernetesPodInfraContainerSpec": "[parameters('kubernetesPodInfraContainerSpec')]",
"kubernetesNodeStatusUpdateFrequency": "[parameters('kubernetesNodeStatusUpdateFrequency')]",
"kubernetesHardEvictionThreshold": "[parameters('kubernetesHardEvictionThreshold')]",
"kubernetesCtrlMgrNodeMonitorGracePeriod": "[parameters('kubernetesCtrlMgrNodeMonitorGracePeriod')]",
"kubernetesCtrlMgrPodEvictionTimeout": "[parameters('kubernetesCtrlMgrPodEvictionTimeout')]",
"kubernetesCtrlMgrRouteReconciliationPeriod": "[parameters('kubernetesCtrlMgrRouteReconciliationPeriod')]",
Expand Down
14 changes: 0 additions & 14 deletions parts/k8s/kubernetesparams.t
Expand Up @@ -372,20 +372,6 @@
},
"type": "string"
},
"kubernetesNodeStatusUpdateFrequency": {
{{PopulateClassicModeDefaultValue "kubernetesNodeStatusUpdateFrequency"}}
"metadata": {
"description": "Kubelet config for node status update frequency interval."
},
"type": "string"
},
"kubernetesHardEvictionThreshold": {
{{PopulateClassicModeDefaultValue "kubernetesHardEvictionThreshold"}}
"metadata": {
"description": "Kubelet Hard Eviction threshold."
},
"type": "string"
},
"kubernetesCtrlMgrNodeMonitorGracePeriod": {
{{PopulateClassicModeDefaultValue "kubernetesCtrlMgrNodeMonitorGracePeriod"}}
"metadata": {
Expand Down
10 changes: 8 additions & 2 deletions pkg/acsengine/const.go
Expand Up @@ -36,10 +36,14 @@ const (
// DefaultInternalLbStaticIPOffset specifies the offset of the internal LoadBalancer's IP
// address relative to the first consecutive Kubernetes static IP
DefaultInternalLbStaticIPOffset = 10
// NetworkPolicyNone is the string expression for no network policy
NetworkPolicyNone = "none"
// NetworkPluginKubenet is the string expression for kubenet network plugin
NetworkPluginKubenet = "kubenet"
// DefaultNetworkPolicy defines the network policy to use by default
DefaultNetworkPolicy = "none"
DefaultNetworkPolicy = NetworkPolicyNone
// DefaultNetworkPolicyWindows defines the network policy to use by default for clusters with Windows agent pools
DefaultNetworkPolicyWindows = "none"
DefaultNetworkPolicyWindows = NetworkPolicyNone
// DefaultKubernetesNodeStatusUpdateFrequency is 10s, see --node-status-update-frequency at https://kubernetes.io/docs/admin/kubelet/
DefaultKubernetesNodeStatusUpdateFrequency = "10s"
// DefaultKubernetesHardEvictionThreshold is memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%, see --eviction-hard at https://kubernetes.io/docs/admin/kubelet/
Expand Down Expand Up @@ -99,6 +103,8 @@ const (
DefaultReschedulerImage = "rescheduler:v0.3.1"
// DefaultReschedulerAddonName is the name of the rescheduler addon deployment
DefaultReschedulerAddonName = "rescheduler"
// DefaultKubernetesKubeletMaxPods is the max pods per kubelet
DefaultKubernetesKubeletMaxPods = 110
)

const (
Expand Down

0 comments on commit 3c70413

Please sign in to comment.