Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

docs: upgrade + cluster-autoscaler notes #381

Merged
merged 5 commits into from
Jan 31, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
14 changes: 12 additions & 2 deletions docs/topics/upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,16 @@ For example,
--client-secret xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
```

### What can go wrong
## Known Limitations

By its nature, the upgrade operation is long running and potentially could fail for various reasons, such as temporary lack of resources, etc. In this case, rerun the command. The `upgrade` command is idempotent, and will pick up execution from the point it failed on.
### Manual reconciliation

The upgrade operation is long running, and for large clusters, more susceptible to single operational failures. This is based on the design principle of upgrade enumerating, one-at-a-time, through each node in the cluster. A transient Azure resource allocation error could thus interrupt the successful progression of the overall transaction. At present, the upgrade operation is implemented to "fail fast"; and so, if a well formed upgrade operation fails before completing, it can be manually retried by invoking the exact same command line arguments as were sent originally. The upgrade operation will enumerate through the cluster nodes, skipping any nodes that have already been upgraded to the desired Kubernetes version. Those nodes that match the *original* Kubernetes version will then, one-at-a-time, be cordon and drained, and upgraded to the desired version. Put another way, an upgrade command is designed to be idempotent across retry scenarios.

### Cluster-autoscaler + VMSS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a link/reference to this doc in the cluster-autoscaler doc examples/addons/cluster-autoscaler/README.md

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, we can add one after this is merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add it in the same PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I mean add a link in cluster-autoscaler repo back to here.

And yep, in examples, it could be added in the same PR.


There are known limitations with VMSS cluster-autoscaler scenarios and upgrade. Our current guidance is not to use `aks-engine upgrade` on clusters with `cluster-autoscaler` functionality. See [here](https://github.com/Azure/aks-engine/issues/400) to get more information and to track progress of the issues related to these limitations.

### Cluster-autoscaler + Availability Set

We don't recommend using `aks-engine upgrade` on clusters that have Availability Set (non-VMSS) agent pools `cluster-autoscaler` at this time.
6 changes: 5 additions & 1 deletion examples/addons/cluster-autoscaler/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,13 @@

This is the Kubernetes Cluster Autoscaler add-on for Virtual Machine Scale Sets. Add this add-on to your json file as shown below to automatically enable cluster autoscaler in your new Kubernetes cluster.

See [this document](../../../docs/topic/upgrade.md) for details on known limitiations with `aks-engine upgrade` and VMSS.

To use this add-on, make sure your cluster's Kubernetes version is 1.10 or above and your agent pool `availabilityProfile` is set to `VirtualMachineScaleSets`. By default, the first agent pool will autoscale the node count between 1 and 5. You can override these settings in `config` section of the `cluster-autoscaler` add-on.

> At this time, only the primaryScaleSet (the first agent pool) is monitored by the autoscaler. To configure autoscale to monitor (and scale) other node pools, you must manually edit the autoscaler YAML at `/etc/kubernetes/addons` on each master node. See the [cluster-autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/README.md) docs for guidance.
> At this time, only the primaryScaleSet (the first agent pool) is monitored by the autoscaler. If you use this addon on your single agent pool cluster and later manually add more VMSS agent pools and want to use this cluster-autoscaler addon to monitor (and scale) other node pools, you must manually edit the autoscaler YAML at `/etc/kubernetes/addons` on each master node.

> If you are creating a cluster with multiple VMSS agent pools, or you plan to manually evolve your agent pool count over time, we recommend you don't use this addon. See the [cluster-autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/README.md) docs for guidance on how to craft a cluster-autoscaler specification that can accommodate a multiple VMSS agent pool cluster.

The following is an example:

Expand Down