Skip to content

Commit

Permalink
docs: refactor AKS installation instructions
Browse files Browse the repository at this point in the history
BYOCNI was initially introduced as the preferred installation method for
Cilium on AKS clusters in d8259c1, at
the cost of doubling the number of AKS tabs in Getting Started and Helm
guides.

Since then:

- More tabs have been added, making it even more complex to navigate the
  options.
- BYOCNI is now GA (Azure CLI version 2.39.0).
- [Azure CNI Powered by
  Cilium](https://learn.microsoft.com/en-us/azure/aks/azure-cni-powered-by-cilium)
  has been announced, further complexifying the Cilium on AKS landscape.

In order to reduce bloat and streamline AKS installation instructions,
we refactor the AKS instructions in a single tab. We use this
opportunity to strongly encourage users to use BYOCNI, and prepare for
Azure IPAM legacy retirement.

Even though we could very add it into the current structure, Azure CNI
Powered by Cilium has not been introduced as another installation option
here, because the Cilium distribution used in this case is maintained
and controlled by AKS, and not by the Cilium community. We felt this was
sensible considering there is already a similar situation with GKE's
Dataplane V2 and it is not listed in Cilium documentation either.

[
Full details of edits since the diff is a bit hard to parse:

- Getting Started Guides:
  - We had 2 separate AKS tabs for creating AKS clusters (one for BYOCNI
    and one for Azure IPAM), now we have only one AKS tab and it only
    explains how to create a cluster for BYOCNI. This is the only set of
    instructions that was removed, and it was done intentionally so as
    to just silently encourage users that don't have a cluster yet to
    use BYOCNI.
  - We had 2 separate AKS tabs for installing Cilium in an AKS cluster
    (one for BYOCNI and one for Azure IPAM) but they actually contained
    the exact same installation instructions. This is because the Cilium
    CLI is responsible for automatically detecting which mode to use
    based on the cluster type. Now we have only one AKS tab with the
    installation instructions up front, and then sub-tabs for both modes
    with the rest of the previous info we had (requirements +
    limitations).
  - So putting the 2 together, if it happens that someone already had an
    AKS cluster and did not create it with BYOCNI, it'll still work, and
    if someone actually does want to use Azure IPAM intentionally, they
    can still figure it out based on the requirements.
- Helm:
  - We had 2 separate AKS tabs for installing Cilium in an AKS cluster
    (one for BYOCNI and one for Azure IPAM). Now we have only one AKS
    tab with sub-tabs for both modes with all the previous info we had
    (installation instructions + requirements + limitations).
- Both:
  - BYOCNI is made even more explicit as the preferred option for
    installing Cilium on AKS, since it's now GA on AKS.
  - Azure IPAM has been re-dubbed Legacy Azure IPAM to double down on
    that, and also in preparation for the fact we might want to stop
    maintaining it.
]

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
  • Loading branch information
nbusseneau committed Jan 25, 2023
1 parent 5008dbc commit ac49ee7
Show file tree
Hide file tree
Showing 6 changed files with 109 additions and 175 deletions.
71 changes: 12 additions & 59 deletions Documentation/gettingstarted/k8s-install-default.rst
Expand Up @@ -56,14 +56,7 @@ to create a Kubernetes cluster locally or using a managed Kubernetes service:

Please make sure to read and understand the documentation page on :ref:`taint effects and unmanaged pods<taint_effects>`.

.. group-tab:: AKS (BYOCNI)

.. note::

BYOCNI is the preferred way to run Cilium on AKS, however integration
with the Azure stack via the :ref:`Azure IPAM<ipam_azure>` is not
available. If you require Azure IPAM, refer to the AKS (Azure IPAM)
installation.
.. group-tab:: AKS

The following commands create a Kubernetes cluster using `Azure
Kubernetes Service <https://docs.microsoft.com/en-us/azure/aks/>`_ with
Expand All @@ -74,11 +67,6 @@ to create a Kubernetes cluster locally or using a managed Kubernetes service:
<https://docs.microsoft.com/en-us/azure/aks/use-byo-cni?tabs=azure-cli>`_
for more details about BYOCNI prerequisites / implications.

.. note::

BYOCNI requires the ``aks-preview`` CLI extension with version >=
0.5.55, which itself requires an ``az`` CLI version >= 2.32.0 .

.. code-block:: bash
export NAME="$(whoami)-$RANDOM"
Expand All @@ -94,43 +82,6 @@ to create a Kubernetes cluster locally or using a managed Kubernetes service:
# Get the credentials to access the cluster with kubectl
az aks get-credentials --resource-group "${AZURE_RESOURCE_GROUP}" --name "${NAME}"
.. group-tab:: AKS (Azure IPAM)

.. note::

:ref:`Azure IPAM<ipam_azure>` offers integration with the Azure stack
but is not the preferred way to run Cilium on AKS. If you do not
require Azure IPAM, we recommend you to switch to the AKS (BYOCNI)
installation.

The following commands create a Kubernetes cluster using `Azure
Kubernetes Service <https://docs.microsoft.com/en-us/azure/aks/>`_. See
`Azure Cloud CLI
<https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest>`_
for instructions on how to install ``az`` and prepare your account.

.. code-block:: bash
export NAME="$(whoami)-$RANDOM"
export AZURE_RESOURCE_GROUP="${NAME}-group"
az group create --name "${AZURE_RESOURCE_GROUP}" -l westus2
# Create AKS cluster
az aks create \
--resource-group "${AZURE_RESOURCE_GROUP}" \
--name "${NAME}" \
--network-plugin azure \
--node-count 2
# Get the credentials to access the cluster with kubectl
az aks get-credentials --resource-group "${AZURE_RESOURCE_GROUP}" --name "${NAME}"
.. attention::

Do NOT specify the ``--network-policy`` flag when creating the
cluster, as this will cause the Azure CNI plugin to install unwanted
iptables rules.

.. group-tab:: EKS

The following commands create a Kubernetes cluster with ``eksctl``
Expand Down Expand Up @@ -271,9 +222,7 @@ You can install Cilium on any Kubernetes cluster. Pick one of the options below:
cilium install
.. group-tab:: AKS (BYOCNI)

.. include:: ../installation/requirements-aks-byocni.rst
.. group-tab:: AKS

**Install Cilium:**

Expand All @@ -283,17 +232,21 @@ You can install Cilium on any Kubernetes cluster. Pick one of the options below:
cilium install --azure-resource-group "${AZURE_RESOURCE_GROUP}"
.. group-tab:: AKS (Azure IPAM)
The Cilium CLI will automatically install Cilium using one of the
following installation modes based on the ``--network-plugin``
configuration detected from the AKS cluster:

.. include:: ../installation/requirements-aks-azure-ipam.rst
.. include:: ../installation/requirements-aks.rst

**Install Cilium:**
.. tabs::

Install Cilium into the AKS cluster:
.. tab:: BYOCNI

.. code-block:: shell-session
.. include:: ../installation/requirements-aks-byocni.rst

cilium install --azure-resource-group "${AZURE_RESOURCE_GROUP}"
.. tab:: Legacy Azure IPAM

.. include:: ../installation/requirements-aks-azure-ipam.rst

.. group-tab:: EKS

Expand Down
104 changes: 55 additions & 49 deletions Documentation/installation/k8s-install-helm.rst
Expand Up @@ -81,73 +81,79 @@ Install Cilium
* Reconfigure kubelet to run in CNI mode
* Mount the eBPF filesystem

.. group-tab:: AKS (BYOCNI)
.. group-tab:: AKS

.. include:: requirements-aks-byocni.rst
.. include:: ../installation/requirements-aks.rst

**Install Cilium:**
.. tabs::

Deploy Cilium release via Helm:
.. tab:: BYOCNI

.. parsed-literal::
.. include:: ../installation/requirements-aks-byocni.rst

helm install cilium |CHART_RELEASE| \\
--namespace kube-system \\
--set aksbyocni.enabled=true \\
--set nodeinit.enabled=true
**Install Cilium:**

.. group-tab:: AKS (Azure IPAM)
Deploy Cilium release via Helm:

.. include:: requirements-aks-azure-ipam.rst
.. parsed-literal::
**Create a Service Principal:**
helm install cilium |CHART_RELEASE| \\
--namespace kube-system \\
--set aksbyocni.enabled=true \\
--set nodeinit.enabled=true
In order to allow cilium-operator to interact with the Azure API, a
Service Principal with ``Contributor`` privileges over the AKS cluster is
required (see :ref:`Azure IPAM required privileges <ipam_azure_required_privileges>`
for more details). It is recommended to create a dedicated Service
Principal for each Cilium installation with minimal privileges over the
AKS node resource group:
.. tab:: Legacy Azure IPAM

.. code-block:: shell-session
.. include:: ../installation/requirements-aks-azure-ipam.rst

AZURE_SUBSCRIPTION_ID=$(az account show --query "id" --output tsv)
AZURE_NODE_RESOURCE_GROUP=$(az aks show --resource-group ${RESOURCE_GROUP} --name ${CLUSTER_NAME} --query "nodeResourceGroup" --output tsv)
AZURE_SERVICE_PRINCIPAL=$(az ad sp create-for-rbac --scopes /subscriptions/${AZURE_SUBSCRIPTION_ID}/resourceGroups/${AZURE_NODE_RESOURCE_GROUP} --role Contributor --output json --only-show-errors)
AZURE_TENANT_ID=$(echo ${AZURE_SERVICE_PRINCIPAL} | jq -r '.tenant')
AZURE_CLIENT_ID=$(echo ${AZURE_SERVICE_PRINCIPAL} | jq -r '.appId')
AZURE_CLIENT_SECRET=$(echo ${AZURE_SERVICE_PRINCIPAL} | jq -r '.password')
**Create a Service Principal:**

.. note::
In order to allow cilium-operator to interact with the Azure API, a
Service Principal with ``Contributor`` privileges over the AKS cluster is
required (see :ref:`Azure IPAM required privileges <ipam_azure_required_privileges>`
for more details). It is recommended to create a dedicated Service
Principal for each Cilium installation with minimal privileges over the
AKS node resource group:

The ``AZURE_NODE_RESOURCE_GROUP`` node resource group is *not* the
resource group of the AKS cluster. A single resource group may hold
multiple AKS clusters, but each AKS cluster regroups all resources in
an automatically managed secondary resource group. See `Why are two
resource groups created with AKS? <https://docs.microsoft.com/en-us/azure/aks/faq#why-are-two-resource-groups-created-with-aks>`__
for more details.
.. code-block:: shell-session
This ensures the Service Principal only has privileges over the AKS
cluster itself and not any other resources within the resource group.
AZURE_SUBSCRIPTION_ID=$(az account show --query "id" --output tsv)
AZURE_NODE_RESOURCE_GROUP=$(az aks show --resource-group ${RESOURCE_GROUP} --name ${CLUSTER_NAME} --query "nodeResourceGroup" --output tsv)
AZURE_SERVICE_PRINCIPAL=$(az ad sp create-for-rbac --scopes /subscriptions/${AZURE_SUBSCRIPTION_ID}/resourceGroups/${AZURE_NODE_RESOURCE_GROUP} --role Contributor --output json --only-show-errors)
AZURE_TENANT_ID=$(echo ${AZURE_SERVICE_PRINCIPAL} | jq -r '.tenant')
AZURE_CLIENT_ID=$(echo ${AZURE_SERVICE_PRINCIPAL} | jq -r '.appId')
AZURE_CLIENT_SECRET=$(echo ${AZURE_SERVICE_PRINCIPAL} | jq -r '.password')
**Install Cilium:**
.. note::

Deploy Cilium release via Helm:
The ``AZURE_NODE_RESOURCE_GROUP`` node resource group is *not* the
resource group of the AKS cluster. A single resource group may hold
multiple AKS clusters, but each AKS cluster regroups all resources in
an automatically managed secondary resource group. See `Why are two
resource groups created with AKS? <https://docs.microsoft.com/en-us/azure/aks/faq#why-are-two-resource-groups-created-with-aks>`__
for more details.

.. parsed-literal::
This ensures the Service Principal only has privileges over the AKS
cluster itself and not any other resources within the resource group.

helm install cilium |CHART_RELEASE| \\
--namespace kube-system \\
--set azure.enabled=true \\
--set azure.resourceGroup=$AZURE_NODE_RESOURCE_GROUP \\
--set azure.subscriptionID=$AZURE_SUBSCRIPTION_ID \\
--set azure.tenantID=$AZURE_TENANT_ID \\
--set azure.clientID=$AZURE_CLIENT_ID \\
--set azure.clientSecret=$AZURE_CLIENT_SECRET \\
--set tunnel=disabled \\
--set ipam.mode=azure \\
--set enableIPv4Masquerade=false \\
--set nodeinit.enabled=true
**Install Cilium:**

Deploy Cilium release via Helm:

.. parsed-literal::
helm install cilium |CHART_RELEASE| \\
--namespace kube-system \\
--set azure.enabled=true \\
--set azure.resourceGroup=$AZURE_NODE_RESOURCE_GROUP \\
--set azure.subscriptionID=$AZURE_SUBSCRIPTION_ID \\
--set azure.tenantID=$AZURE_TENANT_ID \\
--set azure.clientID=$AZURE_CLIENT_ID \\
--set azure.clientSecret=$AZURE_CLIENT_SECRET \\
--set tunnel=disabled \\
--set ipam.mode=azure \\
--set enableIPv4Masquerade=false \\
--set nodeinit.enabled=true
.. group-tab:: EKS

Expand Down
67 changes: 22 additions & 45 deletions Documentation/installation/requirements-aks-azure-ipam.rst
@@ -1,65 +1,42 @@
To install Cilium on `Azure Kubernetes Service (AKS) <https://docs.microsoft.com/en-us/azure/aks/>`_
with Azure integration via :ref:`Azure IPAM<ipam_azure>`, perform the following
steps:

**Default Configuration:**

=============== =================== ==============
Datapath IPAM Datastore
=============== =================== ==============
Direct Routing Azure IPAM Kubernetes CRD
=============== =================== ==============

.. note::

:ref:`Azure IPAM<ipam_azure>` offers integration with the Azure stack but is
not the preferred way to run Cilium on AKS. If you do not require Azure IPAM,
we recommend you to switch to the AKS (BYOCNI) installation.

.. tip::

If you want to chain Cilium on top of the Azure CNI, refer to the guide
:ref:`chaining_azure`.

**Requirements:**

* The AKS cluster must be created with ``--network-plugin azure`` for
compatibility with Cilium. The Azure network plugin will be replaced with
Cilium by the installer.
* The AKS cluster must be created with ``--network-plugin azure``. The
Azure network plugin will be replaced with Cilium by the installer.

**Limitations:**

* All VMs and VM scale sets used in a cluster must belong to the same resource
group.
* All VMs and VM scale sets used in a cluster must belong to the same
resource group.

* Adding new nodes to node pools might result in application pods being
scheduled on the new nodes before Cilium is ready to properly manage them.
The only way to fix this is either by making sure application pods are not
scheduled on new nodes before Cilium is ready, or by restarting any unmanaged
pods on the nodes once Cilium is ready.
scheduled on the new nodes before Cilium is ready to properly manage
them. The only way to fix this is either by making sure application pods
are not scheduled on new nodes before Cilium is ready, or by restarting
any unmanaged pods on the nodes once Cilium is ready.

Ideally we would recommend node pools should be tainted with
``node.cilium.io/agent-not-ready=true:NoExecute`` to ensure application pods
will only be scheduled/executed once Cilium is ready to manage them (see
:ref:`Considerations on node pool taints and unmanaged pods <taint_effects>`
``node.cilium.io/agent-not-ready=true:NoExecute`` to ensure application
pods will only be scheduled/executed once Cilium is ready to manage them
(see :ref:`Considerations on node pool taints and unmanaged pods <taint_effects>`
for more details), however this is not an option on AKS clusters:

* It is not possible to assign custom node taints such as
``node.cilium.io/agent-not-ready=true:NoExecute`` to system node pools,
cf. `Azure/AKS#2578 <https://github.com/Azure/AKS/issues/2578>`_: only
``CriticalAddonsOnly=true:NoSchedule`` is available for our use case. To
make matters worse, it is not possible to assign taints to the initial node
pool created for new AKS clusters, cf.
``node.cilium.io/agent-not-ready=true:NoExecute`` to system node
pools, cf. `Azure/AKS#2578 <https://github.com/Azure/AKS/issues/2578>`_:
only ``CriticalAddonsOnly=true:NoSchedule`` is available for our use
case. To make matters worse, it is not possible to assign taints to
the initial node pool created for new AKS clusters, cf.
`Azure/AKS#1402 <https://github.com/Azure/AKS/issues/1402>`_.

* Custom node taints on user node pools cannot be properly managed at will
anymore, cf. `Azure/AKS#2934 <https://github.com/Azure/AKS/issues/2934>`_.
* Custom node taints on user node pools cannot be properly managed at
will anymore, cf. `Azure/AKS#2934 <https://github.com/Azure/AKS/issues/2934>`_.

* These issues prevent usage of our previously recommended scenario via
replacement of initial system node pool with
``CriticalAddonsOnly=true:NoSchedule`` and usage of additional user
node pools with ``node.cilium.io/agent-not-ready=true:NoExecute``.

We do not have a standard and foolproof alternative to recommend, hence the
only solution is to craft a custom mechanism that will work in your
environment to handle this scenario when adding new nodes to AKS clusters.
We do not have a standard and foolproof alternative to recommend, hence
the only solution is to craft a custom mechanism that will work in your
environment to handle this scenario when adding new nodes to AKS
clusters.
24 changes: 3 additions & 21 deletions Documentation/installation/requirements-aks-byocni.rst
@@ -1,23 +1,5 @@
To install Cilium on `Azure Kubernetes Service (AKS) <https://docs.microsoft.com/en-us/azure/aks/>`_
in `Bring your own CNI <https://docs.microsoft.com/en-us/azure/aks/use-byo-cni?tabs=azure-cli>`_
mode, perform the following steps:

**Default Configuration:**

=============== =================== ==============
Datapath IPAM Datastore
=============== =================== ==============
Encapsulation Cluster Pool Kubernetes CRD
=============== =================== ==============

.. note::

BYOCNI is the preferred way to run Cilium on AKS, however integration with
the Azure stack via the :ref:`Azure IPAM<ipam_azure>` is not available. If
you require Azure IPAM, refer to the AKS (Azure IPAM) installation.

**Requirements:**

* The AKS cluster must be created with ``--network-plugin none`` (BYOCNI). See
the `Bring your own CNI documentation <https://docs.microsoft.com/en-us/azure/aks/use-byo-cni?tabs=azure-cli>`_
for more details about BYOCNI prerequisites / implications.
* The AKS cluster must be created with ``--network-plugin none``. See the
`Bring your own CNI <https://docs.microsoft.com/en-us/azure/aks/use-byo-cni?tabs=azure-cli>`_
documentation for more details about BYOCNI prerequisites / implications.
14 changes: 14 additions & 0 deletions Documentation/installation/requirements-aks.rst
@@ -0,0 +1,14 @@
**Default Configuration:**

============================= =============== =================== ==============
Mode (``--network-plugin``) Datapath IPAM Datastore
============================= =============== =================== ==============
BYOCNI (``none``) Encapsulation Cluster Pool Kubernetes CRD
Legacy Azure IPAM (``azure``) Direct Routing Azure IPAM Kubernetes CRD
============================= =============== =================== ==============

Using `Bring your own CNI <https://docs.microsoft.com/en-us/azure/aks/use-byo-cni?tabs=azure-cli>`_
is the preferred way to run Cilium on `Azure Kubernetes Service (AKS) <https://docs.microsoft.com/en-us/azure/aks/>`_,
however integration with the Azure stack via the :ref:`Azure IPAM<ipam_azure>`
is not available and will only work with clusters not using BYOCNI. While still
maintained for now, this mode is considered legacy.
4 changes: 3 additions & 1 deletion Documentation/network/concepts/ipam/azure.rst
Expand Up @@ -12,7 +12,9 @@ Azure IPAM

.. note::

Azure IPAM is not compatible with AKS clusters created in BYOCNI mode.
While still maintained for now, Azure IPAM is considered legacy and is not
compatible with AKS clusters created in `Bring your own CNI <https://docs.microsoft.com/en-us/azure/aks/use-byo-cni?tabs=azure-cli>`_
mode. Using BYOCNI is the preferred way to install Cilium on AKS.

The Azure IPAM allocator is specific to Cilium deployments running in the Azure
cloud and performs IP allocation based on `Azure Private IP addresses
Expand Down

0 comments on commit ac49ee7

Please sign in to comment.