Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] AKS support of BYO user-assigned-identity for Managed Identity support #1591

Closed
jluk opened this issue May 5, 2020 · 32 comments

Comments

@jluk
Copy link
Contributor

jluk commented May 5, 2020

What happened:
In the current Managed Identity model, only AKS created identities are supported.

This blocks enterprise scenarios where a dedicated networking team provides network permissions, but can't assign permissions to an identity that can be passed an app team prior to creating the cluster. This requirement expands to any needed permissions which should be granted to a cluster identity prior to cluster creation.

The goal of this issue is to enable a user to bring their own user assigned identity which must have all necessary permissions to be used in the cluster, similar to bringing your own SP today.

Addresses:
#1557
#1542

@jluk jluk added this to Planned (Committed) in Azure Kubernetes Service Roadmap (Public) May 5, 2020
@jluk jluk moved this from Planned (Committed) to In Progress (Development) in Azure Kubernetes Service Roadmap (Public) May 19, 2020
@jluk jluk changed the title [Feature Request] AKS support of user-assigned-identity for Managed Identity support [Feature Request] AKS support of BYO user-assigned-identity for Managed Identity support May 29, 2020
@Tbohunek
Copy link

Tbohunek commented Jun 23, 2020

Hello @jluk , thanks for a super quick response.
This is an urgent issue for us, we were using MSI on pre-Prod deployments on a scenario which is now considered BYO RT, and our future deployments are now blocked. The issue is we should go Prod in matter of days, and we cannot deploy a functioning cluster now. Falling back to SPs has its own issues and would require us to redeploy the cluster once MSI is supported again.

In other words, release of BYO RT broke previously working functionality.

@TomGeske
Copy link

@Tbohunek: current ETA is that we are planning to release public preview for BYO control plane MI this month.

@jluk
Copy link
Contributor Author

jluk commented Jun 23, 2020

@Tbohunek the BYO RT functionality should have only impacted net-new scenarios in which you brought an existing subnet/rt to a cluster and AKS will now use the existing subnet.

Can you clarify what is breaking? If you are deploying MSI clusters today I presume you are manually adding routes to the route table AKS creates on your behalf, which should continue to work.

@Tbohunek
Copy link

Tbohunek commented Jun 23, 2020

@jluk We have a subnet subnet with Route Table rt1 assigned. We deploy the AKS cluster cluster1, this created Route Table aks-agentpool-xxx-rt in AKS RG mc_RG_xxx which we then script-assigned to the subnet1.

Right now this is seen as BYO RT scenario and AKS no longer creates aks-agentpool-xxx Route table.

@Tbohunek
Copy link

What I see is that you need to update the AKS deployment chain of events so that the Managed Identity is created first and given permissions on the subnet and on the BYO RT before AKS begins to create routes in the BYO RT.

That doesn't sound too hard, but I know there are also issues that AKS deployment did not create Role assignments for the SP nor did it assign the AKS-managed RT to the subnet afterwards, even though it should have. I need to verify that this actually works now. #400

@Tbohunek
Copy link

Hey,
So we did a workaround (un-peer vnet from hub to stop BGP routing to avoid using custom RT, deploy, then peer again). This seems to work to deploy with Managed Identities.
However @jluk , we noticed a defect where we are not able to provision istio ingress gateways due to missing permissions for the Managed Identity to read the subnet.

LoadBalancer    65s (x5 over 2m20s)  service-controller  Ensuring load balancer
  Warning  SyncLoadBalancerFailed  65s (x5 over 2m20s)  service-controller  Error syncing load balancer: failed to ensure load balancer: network.SubnetsClient#Get: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client 'xxx' with object id 'xxx' does not have authorization to perform action 'Microsoft.Network/virtualNetworks/subnets/read' over scope '/subscriptions/xxx/resourceGroups/rg-xxx/providers/Microsoft.Network/virtualNetworks/vnet-xxxx/subnets/sn-xxx' or the scope is invalid. If access was recently granted, please refresh your credentials."

Is this already known? Manually adding role assignment works for the moment.

@jluk
Copy link
Contributor Author

jluk commented Jun 26, 2020

@Tbohunek thanks for raising, we're looking into the missing permission for managed identity cc @TomGeske

@Tbohunek
Copy link

@jluk Cool, thanks. Do you also plan to include the required permission on BYO RT and Subnet, or are those two planned to be kept manual?

ckittel pushed a commit to mspnp/aks-baseline that referenced this issue Jul 9, 2020
* assign ntw contrib role to the cluster system-assigned id

Why? in the current Managed Identity model, only system assigned identities are supported.

Azure/AKS#1591
Azure/AKS#1557

* deploy Traefik version v2.2.1 as an internal load balancer

  - observe ingress resources in a0008 only
  - select user nodepool for ingress controller. This is under the premise that
everything we bring to the cluster should land in user nodepools
  - deploy traefik service as an ingress load balancer: https://docs.microsoft.com/en-us/azure/aks/internal-lb#create-an-internal-load-balancer
  - configure Traekif and workload route with TLS v1.2 or higher with SNI enabled
    * use wildcard default self signed certificate: *.bicycle.contoso.com
    * create secret with tls cert

```
                                     |            AKS             |
hello.bicycle.contoso.com          --|   Internal Load Balancer   |-> <returns 404 but has a matching certificate>
                                     |        10.10.4.X           |
bu0001a0008-00.bicycle.contoso.com --| Traefik Ingress Controller |-> aspnetweb-service:80
                                     |           Https            |
```

  - HA:
    *  traefik match number of nodes as replicas (2) 
    *  inform the scheduler that all the workload replicas are desired to be co-located with traefik pods
    * prevent from co-locating replicas of workload on a single node
    * prevent from co-locating replicas of traefik on a single node


```
  System        User          User
  Node          Node 1        Node 2

+---------+  +----------+  +----------+
|   ...   |  |traefik|1 |  |traefik|2 |
+---------+  +----------+  +----------+

             +----------+  +----------+
             |workload|1|  |workload|2|
             +----------+  +----------+
```

* Azure App Gw cert integration
@kenans
Copy link

kenans commented Jul 15, 2020

@TomGeske Hey, I wonder if there's any update about the BYO control plane MI public preview so far?

@TomGeske
Copy link

Yes, Preview will be available very soon. We are currently wrapping up final items like cli and docs.

@kenans
Copy link

kenans commented Jul 20, 2020

@TomGeske Thank you for the information! Also I wonder if there's any plan to allow user assigned MI for Kubelet? I saw it was still "Not currently supported" from the document

@TomGeske
Copy link

@kenans: Yes, that's definitely in our pipeline. Plan is to validate first bring your own control plane MI, once we are good we will others like Kubelet for bring your own.

@jluk jluk moved this from In Progress (Development) to Public Preview (Shipped & Improving) in Azure Kubernetes Service Roadmap (Public) Jul 20, 2020
@TomGeske
Copy link

We just shipped preview for bring your own control plane managed identity: https://docs.microsoft.com/en-us/azure/aks/use-managed-identity#bring-your-own-control-plane-mi-preview. Give it a try and let us know how it goes.

@Tbohunek
Copy link

Hi @TomGeske Sadly we were unable to deploy the preview version due to different reasons. Do you have feedback from other customers? Does it look good with GA soon?

Thanks

@TomGeske
Copy link

@Tbohunek: technical reason?

@Tbohunek
Copy link

@TomGeske nope, operational reason - the team is entering production so there is limited time for experiments. I'll let you know next week.

@jastis77
Copy link

Adding an existing app gateway on on an existing cluster using az aks enable-addons cli command creates the Ingress controller managed identity and assigns it the contributor role on the App gateway resource group. This is the desired behavior. However using an ARM template to create a AKS cluster with an App gateway add on using an existing App gateway resource Id does not do the same role assignment. This impacts our effort in deploying AKS with app gateway using CI CD where our service principal cannot be given User access admin role.

@TomGeske
Copy link

Bring your own support for add-on MIs like the one from AppGW are planned in future.

@stijnv1
Copy link

stijnv1 commented Aug 27, 2020

We are trying to use BYO user-assigned identity for AKS in combination with kubenet as network plugin and custom UDR.
Following article mentions that managed identity in combination with custom UDR and kubenet is not supported.

Following article though mentions in a note that both service principal or managed identity can be used for custom UDR in combination with kubenet.

We tested the deployment with a BYO user-managed identity, kubenet and custom UDR and get an error telling us managed identities in combination with custom udr is not supported.
Tested this with both ARM template deployment and Azure CLI using latest aks-preview extension.

Will this become a supported scenario in the future (BYO user-assigned-identity icw kubenet and custom UDR)?

@TomGeske
Copy link

@stijnv1: thanks for your feedback. That might be a validation that need to be removed. let me double check.
Do you have the exact error message?

@stijnv1
Copy link

stijnv1 commented Aug 27, 2020

@TomGeske

Exact message we get when deploying with Azure CLI or ARM:

"code": "CustomRouteTableWithMSINotSupported",
"message": "Clusters using managed identity do not support bringing your own route table. Please see https://aka.ms/aks/customrt for more information"

@noose
Copy link

noose commented Sep 9, 2020

Can you please a look

msrest.http_logger :     'x-ms-correlation-request-id': '842ae2a8-adcd-46f4-be73-b6c64be44057'
msrest.http_logger :     'x-ms-routing-request-id': 'FRANCESOUTH:20200909T115339Z:842ae2a8-adcd-46f4-be73-b6c64be44057'

?
We also expiring this problem

Deployment failed. Correlation ID: 842ae2a8-adcd-46f4-be73-b6c64be44057. {
  "status": "Failed",
  "error": {
    "code": "ResourceDeploymentFailure",
    "message": "The resource operation completed with terminal provisioning state 'Failed'.",
    "details": [
      {
        "code": "DeploymentFailed",
        "message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.",
        "details": [
          {
            "code": "BadRequest",
            "message": "{\r\n  \"code\": \"CustomRouteTableWithMSINotSupported\",\r\n  \"message\": \"Clusters using managed identity do not support bringing your own route table. Please see https://aka.ms/aks/customrt for more information\"\r\n}"
          }
        ]
      }
    ]
  }
}

@TomGeske
Copy link

TomGeske commented Sep 9, 2020

Thanks for your feedback. That's still not ready. We are working on enabling bring your own route table for managed identity. Current ETA is October.

@denniszielke
Copy link

Since we now have October - can you outline when MSI+ Kubenet + Outboundtype will be working?

@miwithro
Copy link
Contributor

miwithro commented Oct 6, 2020

Before the end of October. I can provide a better date next week.

@adhodgson1
Copy link

Hi guys,

I need a little assistance with creating an ARM template that uses the user assigned identity. Here are relevant portions of the template, the user assigned identity is being created in the same resource group as the AKS cluster:

  "variables": {
    "managedIdentityId": "[resourceId('Microsoft.ManagedIdentity/userAssignedIdentities', parameters('resourceName'))]"
  },
  "resources": [
    {
      "apiVersion": "2020-09-01",
      "type": "Microsoft.ContainerService/managedClusters",
      "location": "[resourceGroup().location]",
      "name": "[parameters('resourceName')]",
      "tags": "[parameters('tags')]",
      "identity": {
        "type": "UserAssigned",
        "userAssignedIdentities" : {
          "[variables('managedIdentityId')]": {
            "clientId": "45d7d4ee-9a58-4518-9723-4b9aa796e667",
            "principalId": "9f121e4b-3e54-4573-82ff-e8737d7c7374"
          }
        }
      },

I have also tried with the ID of the identity instead of using a variable, though the end goal will be to have all these parameterised.

Here is the error I am getting:

"code": "InvalidIdentityValues",
"message": "Invalid value for the identities '/subscriptions/<my-sub>/resourceGroups/<my-rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<my-identity-name>'. The 'UserAssignedIdentities' property keys should only be empty json objects, null or the resource exisiting property."

@adhodgson1
Copy link

Fixed this, for those getting the same issue the correct syntax is:
"identity": {
"type": "UserAssigned",
"userAssignedIdentities" : {
"[variables('managedIdentityId')]": {}
}
}

@cdunford
Copy link

This feature appears to be in preview now; is there any timetable for GA?

@palma21 palma21 moved this from Public Preview (Shipped & Improving) to Generally Available (Done) in Azure Kubernetes Service Roadmap (Public) Dec 9, 2020
@palma21
Copy link
Member

palma21 commented Dec 9, 2020

It is now GA

@ThorstenHans
Copy link

Thanks @palma21 for sharing 🚀

@miwithro miwithro closed this as completed Dec 9, 2020
@hello-woof
Copy link

@TomGeske and @miwithro, I've been trying to do similar configuration to #1591 (comment), and I'm still seeing the same error. Is this still in progress?

resource "azurerm_kubernetes_cluster" "aks" {
  name                    = "myaks"
  location                = azurerm_resource_group.kube.location
  kubernetes_version      = var.kube_version
  resource_group_name     = azurerm_resource_group.kube.name
  dns_prefix              = "myaks"
  private_cluster_enabled = true

  default_node_pool {
    name           = "default"
    node_count     = var.nodepool_nodes_count
    vm_size        = var.nodepool_vm_size
    vnet_subnet_id = module.knet.subnet_ids["aks"]
  }

  identity {
    type = "SystemAssigned"
  }

  network_profile {
    docker_bridge_cidr = var.network_docker_bridge_cidr
    dns_service_ip     = var.network_dns_service_ip
    network_plugin     = "kubenet"
    outbound_type      = "userDefinedRouting"
    service_cidr       = var.network_service_cidr
  }

  depends_on = [module.routetable]
}

@TomGeske
Copy link

You would need bring your own identity in that case. @miwithro couldn't find an example in Terraform how to define bring your own identity for control plane. Can you help?

identity {
  type = "SystemAssigned"
}

@palma21 palma21 moved this from Generally Available (Done) to Archive (GA older than 1 month) in Azure Kubernetes Service Roadmap (Public) Jan 13, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jan 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
Azure Kubernetes Service Roadmap (Pub...
Archive (GA older than 1 month)
Development

No branches or pull requests