-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom BYO VNET/RT for Kubenet - Route Table Subnet Association #400
Comments
+1, looks like we've ran into this too. @moomzni ,
Could you please share a workaround you found for this? What exactly you modify to let AKS with custom VNET work? |
Quite a timely prompt - To work around this we needed to manually associate the routing table (inside the MC_* resource group) with the subnet. For us this meant creating a null resource in Terraform like the below - this will only run on the initial creation of the cluster:
Note you'll still need to add the lifecycle policy to ignore the route_table_id to where you are defining your subnet in Terraform - but this worked for us... |
Great, thanks @moomzni ! I'll try this. :)
What will happen or crash if I will omit that lifecycle policy? As far as I've understood, if my usecase of AKS is "deploy cluster, run tests, destroy cluster" - in this case I should not care about route table being overwritten by Terraform re-execution, but in all anothers - something will go wrong... |
Not sure if manually attaching the route table is a sustainable solution. I created a cluster through the azure portal, no route table is created in the MC_* resource group. This is also mentioned here: hashicorp/terraform-provider-azurerm#1394 - so this might be a terraform problem. |
AKS with custom vnet, created using
|
If you have chosen an "advanced networking" in portal - Terraform currently does not support this. Terraform uses some old version of Azure API, as I've been told by Azure support, so we just need to wait a fix or implementation of new APIs from HashiCorp as I think. |
I tried to read related issues but didn't get clear picture of status related to this issue. So does someone know is this fixed already in some version or if not does anyone any idea of eta? |
I can confirm the issue is still there, we had internode network issues and they were gone when manually assigning the custom subnet to the routing table and NSG. |
This is also a major problem for us. This is also an issue when creating the cluster via the Azure CLI. |
Thanks for the patience folks, we are working on resolving this within AKS in a few stages. In the initial stage we will ensure the RT and NSG are associated automatically IF there are no existing RTs/NSGs on the subnet. This is to avoid introducing new conflicts which can cause trouble in clusters. ETA for this first described fix should be released in the next 3-4 weeks at the latest. |
@jluk How can we use AKS with a hub spoke model atm when you bring your own routetable with a vnet / subnet to redirect the traffic over to a azure firewall? |
@ekarlso can you give more specific details on your network layout? In general, we ask customer to bring in a subnet (reserved for AKS) which is not used by other components in the infrastructure. We also ask the various settings used in custom vnet, such as service Cidr, pod Cidr, etc are also reserved to avoid conflict. This whole should help the customer to fit in the AKS cluster into their infrastructure environment while not introducing network related issues. If somehow you have a case that require to pre-define route table and nsg and associate them to the subnet used in AKS, can you share it? My guess is some merge resolve is needed, but it is really case by case, and might not resolve when some conflicts happens. For example, if a route is already defined to route traffic from 10.244.0.1 to 10.240.0.5 in the existing RT by customer, our K8S create logic may apply another Kubenet based routing from 10.244.0.1 to 10.240.0.1. I think this will end up with an un-resolvable conflict. |
@JunSun17 We are routing all outgoing traffic via Azure Firewall basically. so 0.0.0.0 -> az firewall ip which requires us to modify the RT. |
Expanding on @moomzni suggestion, here's what were doing to associate the AKS route table and network security group with the node subnet:
Along with a lifecycle policy like this on the subnet resource:
|
Hi all, I just ran in the same problem but in a slightly different constellation. However the root cause is, I think, very similar or the same. So what is the current plan with this issue? Are there any plans for an actual fix beside the workarounds running some scripts after the deployment? I think the main problem here is in the logic how the update of the vNet works when ARM or Terraform code is redeployed after the vNet already exists. Meaning it is more an problem with ARM/Azure REST APIs than an actual problem in AKS. Could that be? Thx, |
@J0F3 if your ARM templates re-deploy vnet/subnet, yes currently there will be no attempt to re-associate the AKS created route table to the new subnet. This association is only done once when the AKS cluster is created, and the association is to the subnet given in the AKS data model when the cluster is being created. If the subnet is re-deployed, it might introduce downtime since AKS VMs will lose network during that period. If the subnet IP ranges are changed, it will introduce more issues since the subnet now is different from what the cluster thought it should be, If the subnet is not changed during re-deployment, I think you can do a manual association of the route table to the new subnet. Hope it helps. |
Thank you @JunSun17 |
@J0F3 I think the gap is when re-deploy vnet/subnet, it does not first read the current state of the vnet/subnet and then apply the incremental changes. Therefore it is not aware of changes happened after the initial deployment. For your workflow, maybe some workaround can be done if you can integrate the current state bit into the new template you deploy. |
+1 please guys, fix this. |
Hi all This is my linked template, which I use to get the current RouteTable.Id and make it ready via output: And this template leverages the template above via linking and is meant to straight forward create a new VNET with a few subnets: I hope this helps :-) |
@jluk Just tested this with a newly created AKS cluster (v1.14) and the RT got correctly assigned to the custom subnet 🎉But unfortunately, the NSG didn't get assigned to the custom subnet (only the network interfaces of the cluster nodes). My VNet consists out of two subnets and one of them has a custom NSG assigned but the subnet for AKS doesn't have any. Could that be an issue? |
@subesokun I think that how it should work. The routing table is assigned to the subnet and the NSG is assigned to the network interfaces of the cluster nodes. The resulting effect of the NSG is the same as it would be assigned to the vSubnet but while the AKS NSG is is assigned only to the network interfaces of the nodes you have still the ability to set a custom NSG on the vSubnet which is actually a good thing. We currently use that to further restrict who can communicate to the AKS respectively to the Ingress Controller on the AKS. |
@J0F3 Thanks a lot for the feedback! Ok, then everything looks good 👍 |
When is this expected to see light of day? |
This change enables a create time ability to bring a subnet with your own route table already associated with the subnet to be used, instead of being required to use an AKS created route table. If no route table exists on the subnet brought to a cluster create, AKS will create a route table for you and it should be associated with the subnet. All of these details and limitations can be found in the above aka link. If there are issues with any of the above we will field them through support requests to help you resolve any problems. |
@jluk Is this rolled out to portal and azure cli ? I don't seem to be able to replicate the new behaviour. Is there any extra steps/settings required for this to be enabled ? |
@JayDoubleu the release may not be in your region quite yet, I will be placing the regions which have the change on this issue today. The userflow is:
|
@jluk we are having issues with this in the australia-east region also, it keeps deploying its own route table (doesn't associate with subnet) and keeps writing to that instead. This is a private cluster using a UDR. Can you also clarify for me if the --outbound-type userDefinedRouting is required for this? and if not how using this is different than not using it. If you use your own route table are you not overriding the outbound anyway if you define a route for 0.0.0.0/0? Basically the two oposing documents im looking at are |
We have redeployed are cluster and still having this issue. At the moment we have to follow this pattern
|
@jluk is there a timeframe when |
Edit 6/10: there is currently an issue in release we are resolving, as a result this BYO RT capability will have a few more days of delay. Apologies for the wait, we will give you new information on this by end of week.@dmateos apologies for the delay this should be available in your region of deployment by end of week (Australia). If you wish to use your own egress paths, such as through an NVA you can use the UDR feature to have AKS no longer provide an IP for the load balancer and configure the backend pool for it. A primary scenario for that feature is if you cannot have any public IPs exist on your cluster. @tombuildsstuff AKS 2020-06-01 should be available by the week of 6/22. This feature does not have a dependency on that API though. |
I think this is working fine for us as of yesterday. We remade our test cluster and it took the route table we expected and has been writing to it with the UDR's in place. |
@jluk this problem has returned for us. ARM template deployment with Standard Load Balancer & outboundType = userDefinedRouting will fail if a route table is not associated to the subnet with error code 'ExistingRouteTableNotAssociatedWithSubnet' however if one is associated to the subnet it is ignored and AKS deploys its own route table which is not associated to the subnet and keeps writing to that instead. |
Hi, we can able to map custom subnet to route table successfully . But i've another concern , if i want to add any custom route at present i don't find any available option in terraform aks. And Also , is there any way to display newly create route table name and vmss name ( node pool chosen as virtualmachinescaleset)? kindly advise on this. Thanks, |
Confirming all changes are fixed and globally released. Closing as GA. @roben0 please try a new deployment an existing RT on the subnet you bring should be respected. If not please open a support ticket and AKS SRE will investigate for you. @sivamca127 you need to set the route rules with the Azure VNet resource. The route table is a resource of Azure Networking. |
Confirmed as working OK for us now. Re-deployments last night and today seem to be using the existing RT correctly. Thanks for following up. |
Any chance to reopen this topic? It is a big issue that the BYO RT doesn't support MSI as it made our existing PROD deployment method impossible. |
@Tbohunek thanks for raising your issue, we're working to patch in the MSI support for BYO RT on Kubenet. The limitation is caused by Managed Identity in its current form not allowing you to bring your own identity prior to cluster creation as you've seen. This ticket (#1591) tracks that capability is on track to release in a few weeks. Could we continue the convo on your needs on that issue? |
Hello, Would this fix impact the custom role documentation found here?: We also have a hub and spoke model and the Enterprise is sensitive over who has full Network Contributor rights. We generally try to avoid custom roles, but have one for this case. We found we had to add the following two permissions to get through new deployments:
|
@joshhaysky that is correct, for this additional custom RT scenario you'll need additional permissions we will have that document updated today |
@jluk If we get it right though, AKS deployment should map the required Role assignments itself without the user having |
Hi,
When integrating a AKS cluster with a custom VNET we are seeing strange networking behaviour - if the cluster has 2 or more nodes then inter-node networking does not appear to work and pods cannot communicate with one another.
Test scenario:
The cluster builds fine but it appears that whilst both minions are present in the route table within the MC_* resource group, the custom subnet is not associated; if we manually make the association networking behaves as expected and stuff starts working.
The next problem is that when we re-execute Terraform - as the route_table_id is not exported from the azurerm_kubernetes_cluster resource - we overwrite this manual association of subnet to route table.
As a workaround we have automated the association of subnet to routetable and added this to our subnet resource:
The text was updated successfully, but these errors were encountered: