Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom BYO VNET/RT for Kubenet - Route Table Subnet Association #400

Closed
moomzni opened this issue May 29, 2018 · 54 comments
Closed

Custom BYO VNET/RT for Kubenet - Route Table Subnet Association #400

moomzni opened this issue May 29, 2018 · 54 comments

Comments

@moomzni
Copy link

moomzni commented May 29, 2018

Hi,

When integrating a AKS cluster with a custom VNET we are seeing strange networking behaviour - if the cluster has 2 or more nodes then inter-node networking does not appear to work and pods cannot communicate with one another.

Test scenario:

  • Using terraform (v.0.11.7)
  • AzureRM provider (v1.6.0)
  • Cluster provisioned using azurerm_kubernetes_cluster resource
  • 2 worker nodes deployed into custom subnet (in our case a /24 within a /24 VNET)

The cluster builds fine but it appears that whilst both minions are present in the route table within the MC_* resource group, the custom subnet is not associated; if we manually make the association networking behaves as expected and stuff starts working.

The next problem is that when we re-execute Terraform - as the route_table_id is not exported from the azurerm_kubernetes_cluster resource - we overwrite this manual association of subnet to route table.

As a workaround we have automated the association of subnet to routetable and added this to our subnet resource:

lifecycle  {
  ignore_changes          = ["route_table_id"]
}
@exceptorr
Copy link

+1, looks like we've ran into this too.

@moomzni ,

the custom subnet is not associated; if we manually make the association networking behaves as expected and stuff starts working.

Could you please share a workaround you found for this? What exactly you modify to let AKS with custom VNET work?

@moomzni
Copy link
Author

moomzni commented Jun 7, 2018

Quite a timely prompt - To work around this we needed to manually associate the routing table (inside the MC_* resource group) with the subnet.

For us this meant creating a null resource in Terraform like the below - this will only run on the initial creation of the cluster:

resource "null_resource" "fix_routetable" {
  provisioner "local-exec" {
    command = "az network vnet subnet update -n ${azurerm_subnet.aks_subnet.name} -g ${azurerm_resource_group.aks_resource_group.name} --vnet-name ${azurerm_virtual_network.aks_virtualnetwork.name} --route-table $(az resource list --resource-group MC_${azurerm_resource_group.aks_resource_group.name}_${azurerm_kubernetes_cluster.aks_cluster.name}_<YOUR LOCATION HERE> --resource-type Microsoft.Network/routeTables --query '[].{ID:id}' -o tsv)"
  }
  depends_on = ["azurerm_kubernetes_cluster.aks_cluster"]
}

Note you'll still need to add the lifecycle policy to ignore the route_table_id to where you are defining your subnet in Terraform - but this worked for us...

@exceptorr
Copy link

Great, thanks @moomzni ! I'll try this. :)

Note you'll still need to add the lifecycle policy to ignore the route_table_id to where you are defining your subnet in Terraform

What will happen or crash if I will omit that lifecycle policy? As far as I've understood, if my usecase of AKS is "deploy cluster, run tests, destroy cluster" - in this case I should not care about route table being overwritten by Terraform re-execution, but in all anothers - something will go wrong...

@kautsig
Copy link

kautsig commented Jun 15, 2018

Not sure if manually attaching the route table is a sustainable solution. I created a cluster through the azure portal, no route table is created in the MC_* resource group.

This is also mentioned here: hashicorp/terraform-provider-azurerm#1394 - so this might be a terraform problem.

@ionutgrecu-adv365
Copy link

AKS with custom vnet, created using

  1. Azure portal: all working as expected. No route table created.
  2. Terraform: a route table is create but not associated to the subnet. As a result, API calls fail with timeout for port-forward:
    error: error upgrading connection: error dialing backend: dial tcp 172.21.0.5:10250: getsockopt: connection timed out
    As workaround, manually associating the subnet fixes the problem.

@exceptorr
Copy link

exceptorr commented Jun 15, 2018

Azure portal: all working as expected. No route table created.

If you have chosen an "advanced networking" in portal - Terraform currently does not support this. Terraform uses some old version of Azure API, as I've been told by Azure support, so we just need to wait a fix or implementation of new APIs from HashiCorp as I think.

@kattelus
Copy link

I tried to read related issues but didn't get clear picture of status related to this issue. So does someone know is this fixed already in some version or if not does anyone any idea of eta?

@rikribbers
Copy link

I can confirm the issue is still there, we had internode network issues and they were gone when manually assigning the custom subnet to the routing table and NSG.

@jcoeltjen
Copy link

jcoeltjen commented Jan 24, 2019

This is also a major problem for us.
I already opened a support ticket for this: 119012225001304

This is also an issue when creating the cluster via the Azure CLI.
Even when the node count is one a route table is created but it is never associated to the subnet.
As soon as the cluster is scaled up the issue causes major problems.
As far as I can see the problem cannot be traced using the deployment logs.

@jluk
Copy link
Contributor

jluk commented Mar 4, 2019

Thanks for the patience folks, we are working on resolving this within AKS in a few stages. In the initial stage we will ensure the RT and NSG are associated automatically IF there are no existing RTs/NSGs on the subnet. This is to avoid introducing new conflicts which can cause trouble in clusters. ETA for this first described fix should be released in the next 3-4 weeks at the latest.

@ekarlso
Copy link

ekarlso commented Mar 4, 2019

@jluk How can we use AKS with a hub spoke model atm when you bring your own routetable with a vnet / subnet to redirect the traffic over to a azure firewall?

@JunSun17
Copy link

JunSun17 commented Mar 5, 2019

@ekarlso can you give more specific details on your network layout?

In general, we ask customer to bring in a subnet (reserved for AKS) which is not used by other components in the infrastructure. We also ask the various settings used in custom vnet, such as service Cidr, pod Cidr, etc are also reserved to avoid conflict. This whole should help the customer to fit in the AKS cluster into their infrastructure environment while not introducing network related issues.

If somehow you have a case that require to pre-define route table and nsg and associate them to the subnet used in AKS, can you share it? My guess is some merge resolve is needed, but it is really case by case, and might not resolve when some conflicts happens. For example, if a route is already defined to route traffic from 10.244.0.1 to 10.240.0.5 in the existing RT by customer, our K8S create logic may apply another Kubenet based routing from 10.244.0.1 to 10.240.0.1. I think this will end up with an un-resolvable conflict.

@ekarlso
Copy link

ekarlso commented Mar 7, 2019

@JunSun17 We are routing all outgoing traffic via Azure Firewall basically. so 0.0.0.0 -> az firewall ip which requires us to modify the RT.

@deodad
Copy link

deodad commented Mar 16, 2019

Expanding on @moomzni suggestion, here's what were doing to associate the AKS route table and network security group with the node subnet:

resource "null_resource" "assign_subnet_resources" {
  provisioner "local-exec" {
    command = "az network vnet subnet update --route-table $(az network route-table list --resource-group ${azurerm_kubernetes_cluster.k8s.node_resource_group} --query \"[].id | [0]\" -o tsv) --network-security-group $(az network nsg list -g ${azurerm_kubernetes_cluster.k8s.node_resource_group} --query \"[].id | [0]\" -o tsv) --ids ${azurerm_subnet.kubesubnet.id}"
  }

  depends_on = ["azurerm_kubernetes_cluster.main"]
}

Along with a lifecycle policy like this on the subnet resource:

resource "azurerm_subnet" "kubesubnet" {
  ...

  lifecycle {
    ignore_changes = [
      "network_security_group_id",
      "route_table_id",
    ]
  }
}

@jnoller jnoller added the roadmap label Apr 3, 2019
@J0F3
Copy link

J0F3 commented Jul 22, 2019

Hi all,

I just ran in the same problem but in a slightly different constellation. However the root cause is, I think, very similar or the same.
In my case I use ARM templates to deploy a vNet and to deploy an AKS to that previously created vNet respectively vSubnet.
Now I noticed that when the templates are deployed for the first time the Routing Table is assigned to the subnet in which the AKS is deployed. So everything works as desired. But then when deploying the ARM templates a second time the Routing Table will be removed again from the subnet wich breaks of course the functionality of the AKS Cluster.
Actually, I think, this is how the ARM templates actually work. Because the Routing Table is not specified as a property of the subnet in the template the deployment removes the Routing Table assignment. So actually the Routing Table should be defined in the ARM template which creates the vNet and vSubnet. However that is not possible because the Routing Table is automatically created by AKS after the vNet is already deployed.

So what is the current plan with this issue? Are there any plans for an actual fix beside the workarounds running some scripts after the deployment?

I think the main problem here is in the logic how the update of the vNet works when ARM or Terraform code is redeployed after the vNet already exists. Meaning it is more an problem with ARM/Azure REST APIs than an actual problem in AKS. Could that be?

Thx,
Jonas

@JunSun17
Copy link

@J0F3 if your ARM templates re-deploy vnet/subnet, yes currently there will be no attempt to re-associate the AKS created route table to the new subnet. This association is only done once when the AKS cluster is created, and the association is to the subnet given in the AKS data model when the cluster is being created.

If the subnet is re-deployed, it might introduce downtime since AKS VMs will lose network during that period. If the subnet IP ranges are changed, it will introduce more issues since the subnet now is different from what the cluster thought it should be,

If the subnet is not changed during re-deployment, I think you can do a manual association of the route table to the new subnet. Hope it helps.

@J0F3
Copy link

J0F3 commented Jul 24, 2019

Thank you @JunSun17
Yes the manual re-association of the route table would work. However it is very hack and during the deployment the ASK is in a bad state until e.g. a script runs which does the association again.
For me the logic of the vNet resource provider is also not really logical. When deploying something in "incremental" mode I would expect that anything existing is not changed. But for the vSubnet for example everything that is not specified in the template is obviously removed.

@JunSun17
Copy link

@J0F3 I think the gap is when re-deploy vnet/subnet, it does not first read the current state of the vnet/subnet and then apply the incremental changes. Therefore it is not aware of changes happened after the initial deployment. For your workflow, maybe some workaround can be done if you can integrate the current state bit into the new template you deploy.

@ronaldmiranda
Copy link

+1 please guys, fix this.

@tedescomicchidev
Copy link

Hi all
As an intermediate "fix" you can deploy a linked ARM template to get the subnet association and then re-apply this every time you re-deploy the VNET via ARM template.

This is my linked template, which I use to get the current RouteTable.Id and make it ready via output:
{ "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#", "contentVersion": "1.0.0.0", "parameters": { "aksVNET": { "defaultValue": { "name": "aks-vnet", "aksVNETSubnet": { "name": "aks-vnet-nodes" } }, "type": "Object", "metadata": { "description": "" } } }, "variables": { "vnetname": "[parameters('vNetSettings')['name']]", "subnetname": "[parameters('vNetSettings')['aksVNETSubnet']['name']]" }, "resources": [], "outputs": { "subnetroutetableid": { "type": "String", "value": "[if(contains(reference(resourceId(resourceGroup().name, 'Microsoft.Network/virtualNetworks/subnets', variables('vnetname'), variables('subnetname')), '2018-03-01'), 'routeTable'), reference(resourceId(resourceGroup().name, 'Microsoft.Network/virtualNetworks/subnets', variables('vnetname'), variables('subnetname')), '2018-03-01').routeTable.Id, '')]" } } }

And this template leverages the template above via linking and is meant to straight forward create a new VNET with a few subnets:
{ "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#", "contentVersion": "1.0.0.0", "parameters": { "vNetSettings": { "defaultValue": { "name": "aks-vnet", "addressPrefixes": [ "10.0.0.0/8" ], "subnetAksCluster": { "name": "aksnodes", "CIDR": "10.110.0.0/16" }, "subnetIngressLb": { "name": "ingresslb", "CIDR": "10.120.0.0/24" }, "subnetAppGw": { "name": "appgw", "CIDR": "10.130.0.0/24" } }, "type": "Object", "metadata": { "description": "Settings for the vNet. (See defaultValue for properties)" } } }, "variables": { "vnetResourceAPI": "2019-04-01", "vnetname": "[parameters('vNetSettings')['name']]", "subnetname": "[parameters('vNetSettings')['subnetAksCluster']['name']]" }, "resources": [ { "type": "Microsoft.Network/virtualNetworks", "apiVersion": "[variables('vnetResourceAPI')]", "name": "[parameters('vNetSettings')['name']]", "location": "[resourceGroup().location]", "dependsOn": [ "linkedTemplate" ], "properties": { "addressSpace": { "addressPrefixes": "[parameters('vNetSettings')['addressPrefixes']]" }, "subnets": [ { "name": "[parameters('vNetSettings')['subnetAksCluster']['name']]", "properties": { "addressPrefix": "[parameters('vNetSettings')['subnetAksCluster']['CIDR']]", "routeTable": { "id": "[reference('linkedTemplate').outputs.subnetroutetableid.value]" } } }, { "name": "[parameters('vNetSettings')['subnetIngressLb']['name']]", "properties": { "addressPrefix": "[parameters('vNetSettings')['subnetIngressLb']['CIDR']]" } }, { "name": "[parameters('vNetSettings')['subnetAppGw']['name']]", "properties": { "addressPrefix": "[parameters('vNetSettings')['subnetAppGw']['CIDR']]" } } ] } }, { "type": "Microsoft.Resources/deployments", "apiVersion": "2018-05-01", "name": "linkedTemplate", "properties": { "mode": "Incremental", "templateLink": { "uri": "https://micchitransfer.blob.core.windows.net/mobi/subnetroutetableid.json", "contentVersion": "1.0.0.0" }, "parameters": { "vNetSettings": { "value": "[parameters('vNetSettings')]" } } } } ] }

I hope this helps :-)

@subesokun
Copy link

subesokun commented Aug 30, 2019

Thanks for the patience folks, we are working on resolving this within AKS in a few stages. In the initial stage we will ensure the RT and NSG are associated automatically IF there are no existing RTs/NSGs on the subnet. This is to avoid introducing new conflicts which can cause trouble in clusters. ETA for this first described fix should be released in the next 3-4 weeks at the latest.

@jluk Just tested this with a newly created AKS cluster (v1.14) and the RT got correctly assigned to the custom subnet 🎉But unfortunately, the NSG didn't get assigned to the custom subnet (only the network interfaces of the cluster nodes). My VNet consists out of two subnets and one of them has a custom NSG assigned but the subnet for AKS doesn't have any. Could that be an issue?

@J0F3
Copy link

J0F3 commented Aug 30, 2019

@subesokun I think that how it should work. The routing table is assigned to the subnet and the NSG is assigned to the network interfaces of the cluster nodes. The resulting effect of the NSG is the same as it would be assigned to the vSubnet but while the AKS NSG is is assigned only to the network interfaces of the nodes you have still the ability to set a custom NSG on the vSubnet which is actually a good thing. We currently use that to further restrict who can communicate to the AKS respectively to the Ingress Controller on the AKS.
Note that this is only true when you specify a custom vNet. If you let AKS to create the vNet then the AKS NSG is assigned on the vSubnet (instead of the network interfaces of the nodes).

@subesokun
Copy link

@J0F3 Thanks a lot for the feedback! Ok, then everything looks good 👍

@shahga
Copy link

shahga commented Apr 10, 2020

When is this expected to see light of day?

@jluk
Copy link
Contributor

jluk commented Jun 2, 2020

  • This capability has now been released as part of AKS release 2020-06-01.
  • We will share the active regions in the current rollout and close this issue when the change is global.
  • The documentation and details to resolution can be found at aka.ms/aks/customrt.

This change enables a create time ability to bring a subnet with your own route table already associated with the subnet to be used, instead of being required to use an AKS created route table. If no route table exists on the subnet brought to a cluster create, AKS will create a route table for you and it should be associated with the subnet. All of these details and limitations can be found in the above aka link.

If there are issues with any of the above we will field them through support requests to help you resolve any problems.

@jluk jluk assigned jluk and unassigned palma21 Jun 2, 2020
@JayDoubleu
Copy link

@jluk Is this rolled out to portal and azure cli ? I don't seem to be able to replicate the new behaviour. Is there any extra steps/settings required for this to be enabled ?

@jluk
Copy link
Contributor

jluk commented Jun 5, 2020

@JayDoubleu the release may not be in your region quite yet, I will be placing the regions which have the change on this issue today.

The userflow is:

  1. Setup a VNET you own to have the custom route table associated to the subnet you plan to deploy your cluster into
  2. Deploy a new cluster and pass in the VNET you wish to use similar to this doc: https://docs.microsoft.com/en-us/azure/aks/configure-kubenet#create-an-aks-cluster-in-the-virtual-network

@dmateos
Copy link

dmateos commented Jun 5, 2020

@jluk we are having issues with this in the australia-east region also, it keeps deploying its own route table (doesn't associate with subnet) and keeps writing to that instead.

This is a private cluster using a UDR.

Can you also clarify for me if the --outbound-type userDefinedRouting is required for this? and if not how using this is different than not using it.

If you use your own route table are you not overriding the outbound anyway if you define a route for 0.0.0.0/0?

Basically the two oposing documents im looking at are
https://docs.microsoft.com/en-us/azure/aks/egress-outboundtype
https://docs.microsoft.com/en-us/azure/aks/configure-kubenet

@dmateos
Copy link

dmateos commented Jun 9, 2020

We have redeployed are cluster and still having this issue.

At the moment we have to follow this pattern

  • Scale up the nodes
  • Check the unbound route table for the 172.16 route it adds
  • Manually add this to our real route table assigned to the subnet.

@tombuildsstuff
Copy link

@jluk is there a timeframe when 2020-06 will appear in the Swagger/Go SDK?

@jluk
Copy link
Contributor

jluk commented Jun 9, 2020

Edit 6/10: there is currently an issue in release we are resolving, as a result this BYO RT capability will have a few more days of delay. Apologies for the wait, we will give you new information on this by end of week.

@dmateos apologies for the delay this should be available in your region of deployment by end of week (Australia). If you wish to use your own egress paths, such as through an NVA you can use the UDR feature to have AKS no longer provide an IP for the load balancer and configure the backend pool for it. A primary scenario for that feature is if you cannot have any public IPs exist on your cluster.

@tombuildsstuff AKS 2020-06-01 should be available by the week of 6/22. This feature does not have a dependency on that API though.

@dmateos
Copy link

dmateos commented Jun 11, 2020

I think this is working fine for us as of yesterday.

We remade our test cluster and it took the route table we expected and has been writing to it with the UDR's in place.

@roben0
Copy link

roben0 commented Jun 11, 2020

@jluk this problem has returned for us. ARM template deployment with Standard Load Balancer & outboundType = userDefinedRouting will fail if a route table is not associated to the subnet with error code 'ExistingRouteTableNotAssociatedWithSubnet' however if one is associated to the subnet it is ignored and AKS deploys its own route table which is not associated to the subnet and keeps writing to that instead.

@sivamca127
Copy link

sivamca127 commented Jun 12, 2020

Hi,

we can able to map custom subnet to route table successfully . But i've another concern , if i want to add any custom route at present i don't find any available option in terraform aks. And Also , is there any way to display newly create route table name and vmss name ( node pool chosen as virtualmachinescaleset)?

kindly advise on this.

Thanks,
Siva

@jluk
Copy link
Contributor

jluk commented Jun 16, 2020

Confirming all changes are fixed and globally released. Closing as GA.

@roben0 please try a new deployment an existing RT on the subnet you bring should be respected. If not please open a support ticket and AKS SRE will investigate for you.

@sivamca127 you need to set the route rules with the Azure VNet resource. The route table is a resource of Azure Networking.
https://www.terraform.io/docs/providers/azurerm/r/route_table.html

@jluk jluk closed this as completed Jun 16, 2020
@roben0
Copy link

roben0 commented Jun 17, 2020

Confirmed as working OK for us now. Re-deployments last night and today seem to be using the existing RT correctly. Thanks for following up.

@Tbohunek
Copy link

Any chance to reopen this topic? It is a big issue that the BYO RT doesn't support MSI as it made our existing PROD deployment method impossible.
We need to have RT assigned on the subnet prior to creation that points at NVA, without it the deployment would fail as it cannot reach MS repos. But with it is now consideret BYO RT and due to MSI it won't deploy. Not good, really.

@jluk
Copy link
Contributor

jluk commented Jun 23, 2020

@Tbohunek thanks for raising your issue, we're working to patch in the MSI support for BYO RT on Kubenet. The limitation is caused by Managed Identity in its current form not allowing you to bring your own identity prior to cluster creation as you've seen.

This ticket (#1591) tracks that capability is on track to release in a few weeks. Could we continue the convo on your needs on that issue?

@joshhayskr
Copy link

Hello,

Would this fix impact the custom role documentation found here?:
https://docs.microsoft.com/en-us/azure/aks/kubernetes-service-principal#networking

We also have a hub and spoke model and the Enterprise is sensitive over who has full Network Contributor rights. We generally try to avoid custom roles, but have one for this case. We found we had to add the following two permissions to get through new deployments:

  • Microsoft.Network/routeTables/write
  • Microsoft.Network/routeTables/read

@jluk
Copy link
Contributor

jluk commented Jun 23, 2020

@joshhaysky that is correct, for this additional custom RT scenario you'll need additional permissions we will have that document updated today

@Tbohunek
Copy link

Tbohunek commented Jun 23, 2020

@jluk If we get it right though, AKS deployment should map the required Role assignments itself without the user having /roleAssignments/write/ Action allowed on the Subscription. That's true or we need to do the role mapping ourselves?

@ghost ghost locked as resolved and limited conversation to collaborators Aug 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Development

No branches or pull requests