Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Template resource length limit exceeded #1159

Closed
lachie83 opened this issue Aug 2, 2017 · 21 comments
Closed

Template resource length limit exceeded #1159

lachie83 opened this issue Aug 2, 2017 · 21 comments

Comments

@lachie83
Copy link
Member

lachie83 commented Aug 2, 2017

Is this a request for help?: Yes


Is this an ISSUE or FEATURE REQUEST? (choose one): ISSUE


What version of acs-engine?: PR #1143


Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm) Kubernetes

What happened:

trying to deploy a default cluster with using the default cluster.json with the modifications

      "kubernetesConfig": {
        "enableRbac": true,
        "networkPolicy": "calico"
      }

FATA[0014] resources.DeploymentsClient#CreateOrUpdate: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="InvalidTemplate" Message="Deployment template validation failed: 'The template resource 'k8s-master-42902998-0' at line '1' and column '27924' is not valid: The language expression length limit exceeded. Limit: '24576' and actual: '25975'.. Please see https://aka.ms/arm-template-expressions for usage details.'."

I believe this is happening on the customData field on the agentPool

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know:

@lachie83
Copy link
Member Author

lachie83 commented Aug 2, 2017

related #18

@lachie83
Copy link
Member Author

lachie83 commented Aug 2, 2017

I think as an interim solution I can try collapsing all the manifests for a single application into one yaml list. Eg. tiller-deploy and tiller-svc can become simply tiller.yaml. Thoughts?

@lachie83
Copy link
Member Author

lachie83 commented Aug 2, 2017

Couple of thoughts on my end.

  • Provision a blob, per cluster place the file in there and download it onto disk
  • Investigate some sort of compression?
  • Increase the max field size on ARM

@anhowe
Copy link
Contributor

anhowe commented Aug 2, 2017

This is arm length limit:

“customData”: “[concat(variables(‘part1’), variables(‘part2’), variables(‘part3’)]”

Also, we were diligent early on about gzipping all content, so we should just make sure content is still gzipped.

@lachie83
Copy link
Member Author

lachie83 commented Aug 2, 2017

confirmed that it's all gzipped.

@lachie83
Copy link
Member Author

lachie83 commented Aug 2, 2017

#1162 for temp workaround

dtzar added a commit that referenced this issue Sep 26, 2017
@pidah
Copy link
Contributor

pidah commented Oct 2, 2017

@lachie83 I am running into this issue on 0.7.0 release (k8s 1.7.5). Tried deploying exact config that works previously on k8s 1.7.2. Is it possible for the ARM limit to be increased ? as we have more customizations we have not yet applied.

@ams0
Copy link

ams0 commented Oct 4, 2017

Got the same, trying to deploy calico+maxpods+rbac+aadprofile+customvnet..using the latest commit (2cd215d) and k8s 1.8

@zimmertr
Copy link

zimmertr commented Oct 4, 2017

Can confirm with k8s 1.8 + calico + rbac + custom vnet with 5b57309 as well.

@khaldoune
Copy link

@lachie83
Hi,
Any updates please? I've got the same issue with acs-engine v0.8.
Thanks.

@khaldoune
Copy link

@lachie83
This is template that I have used:

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorRelease": "1.7",
      "kubernetesConfig": {
         "enableRbac": true,
         "networkPolicy": "calico",
         "clusterSubnet": "10.5.4.0/23",
         "maxPods": 200,
         "serviceCidr": "10.5.8.0/23",
         "dnsServiceIP": "10.5.8.10"
      }
    },
    "masterProfile": {
      "count": 3,
      "dnsPrefix": "k8sdev8",
      "vmSize": "Standard_D2",
      "vnetSubnetId": "/subscriptions/b234e268-db65-4037-ab3e-052665760654....../resourceGroups/k8s-armv8/providers/Microsoft.Network/virtualNetworks/k8s-vnet-test/subnets/masters",
      "firstConsecutiveStaticIP": "10.5.10.6"
    },
    "agentPoolProfiles": [
      {
        "name": "aparmv0",
        "count": 2,
        "vmSize": "Standard_DS2",
        "availabilityProfile": "AvailabilitySet",
        "dnsPrefix": "",
        "vnetSubnetId": "/subscriptions/b234e268-db65-4037-ab3e-052665760654....../resourceGroups/k8s-armv8/providers/Microsoft.Network/virtualNetworks/k8s-vnet-test/subnets/pods"
      }
    ],
    "linuxProfile": {
      "adminUsername": "k8s",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "ssh-rsa xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
          }
        ]
      }
    },
    "servicePrincipalProfile": {
      "clientId": "38d27248-...............",
      "secret": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
    }
  }
}

Thanks.

@lachie83
Copy link
Member Author

cc @anhowe

@anhowe
Copy link
Contributor

anhowe commented Oct 12, 2017

cc @jchauncey

@jalberto
Copy link

Is there a working workaround? #1162 is not working in 0.8.0

@jchauncey
Copy link
Contributor

So the RBAC is what seems to push this over the limit. Custom VNETs + Calico are perfectly ok

@mikhail-manuilov
Copy link

az group deployment create --name "kube-prod2" --resource-group "ContainerProd" --template-file "./_output/kube-prod2/azuredeploy.json" --parameters "./_output/kube-prod2/azuredeploy.parameters.json"

Deployment failed. Deployment template validation failed: 'The template resource 'k8s-master-23453153-0' at line '1' and column '96087' is not valid: The language expression length limit exceeded. Limit: '24576' and actual: '24759'.. Please see https://aka.ms/arm-template-expressions for usage details.'.

Doing acs-engine generate using template below

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "kubernetesConfig": {
        "networkPolicy": "calico"
      }
    },
    "masterProfile": {
      "count": 3,
      "dnsPrefix": "kube-prod2",
      "vmSize": "Standard_D2_v2"
    },
    "agentPoolProfiles": [
      {
        "name": "agentpool1",
        "count": 10,
        "vmSize": "Standard_D3_v2",
        "availabilityProfile": "AvailabilitySet"
      }
    ],
    "linuxProfile": {
      "adminUsername": "xxxx",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "ssh-rsa xxx"
          }
        ]
      }
    },
    "servicePrincipalProfile": {
      "clientId": "da5e64b1-xxxx-xxx-xxxx-xxxxxxxxxxxx",
      "secret": "dadd8745-xxxx-xxx-xxxx-xxxxxxxxxxxx"
    }
  }
}

@zimmertr
Copy link

@jchauncey

Last I heard this was supposed to be increased on 11/10. I'm still unable to provision an instance with RBAC enabled.

What is the new ETA for the template length restriction to be increased?

@jchauncey
Copy link
Contributor

So the change was supposed to finish roll out today by 2pm PST. Let me know if you have any issues after that

@zimmertr
Copy link

zimmertr commented Nov 15, 2017

@jchauncey I provisioned a cluster today, 11/15, at 11:07AM using a template I generated yesterday at 9AM. This template was once working (~1 month ago), but is no longer working.

Looks like I'm still having the same issue.

kubectl get nodes

Error from server: client: etcd cluster is unavailable or misconfigured; error #0: client: etcd member http://127.0.0.1:2379 has no leader

etcdctl cluster-health

member 33b0bb17e22c066a is unhealthy: got unhealthy result from http://172.17.38.30:2379

kubectl cluster-info

Kubernetes master is running at https://dev-we-m.redacted.com

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

kubectl cluster-info dump

Error from server: client: etcd cluster is unavailable or misconfigured; error #0: client: etcd member http://127.0.0.1:2379 has no leader

journalctl -xe

Nov 15 19:59:27 k8s-master-48084675-0 docker[7056]: W1115 19:59:27.712637    8227 status_manager.go:431] Failed to get status for pod
Nov 15 19:59:27 k8s-master-48084675-0 etcd[3353]: etcdserver: request timed out, possibly due to connection lost
Nov 15 19:59:29 k8s-master-48084675-0 etcd[3353]: etcdserver: request timed out, possibly due to connection lost
Nov 15 19:59:30 k8s-master-48084675-0 docker[7056]: I1115 19:59:30.045913    8227 kubelet_node_status.go:503] Using Node Hostname fro
Nov 15 19:59:30 k8s-master-48084675-0 docker[7056]: W1115 19:59:30.648711    8227 cni.go:196] Unable to update cni config: No network
Nov 15 19:59:30 k8s-master-48084675-0 docker[7056]: E1115 19:59:30.648803    8227 kubelet.go:2095] Container runtime network not read
Nov 15 19:59:32 k8s-master-48084675-0 etcd[3353]: failed to reach the peerURL(http://172.17.38.32:2380) of member 7f03ea887d8488f0 (G
Nov 15 19:59:32 k8s-master-48084675-0 etcd[3353]: cannot get the version of member 7f03ea887d8488f0 (Get http://172.17.38.32:2380/ver
Nov 15 19:59:33 k8s-master-48084675-0 etcd[3353]: etcdserver: request timed out, possibly due to connection lost
Nov 15 19:59:33 k8s-master-48084675-0 etcd[3353]: failed to reach the peerURL(http://172.17.38.31:2380) of member a4058da44ae0514c (G
Nov 15 19:59:33 k8s-master-48084675-0 etcd[3353]: cannot get the version of member a4058da44ae0514c (Get http://172.17.38.31:2380/ver
Nov 15 19:59:34 k8s-master-48084675-0 etcd[3353]: etcdserver: request timed out, possibly due to connection lost
Nov 15 19:59:34 k8s-master-48084675-0 docker[7056]: W1115 19:59:34.714484    8227 status_manager.go:431] Failed to get status for pod
Nov 15 19:59:34 k8s-master-48084675-0 etcd[3353]: etcdserver: request timed out, possibly due to connection lost
Nov 15 19:59:35 k8s-master-48084675-0 etcd[3353]: etcdserver: request timed out, possibly due to connection lost
Nov 15 19:59:35 k8s-master-48084675-0 docker[7056]: W1115 19:59:35.649701    8227 cni.go:196] Unable to update cni config: No network
Nov 15 19:59:35 k8s-master-48084675-0 docker[7056]: E1115 19:59:35.650205    8227 kubelet.go:2095] Container runtime network not read
Nov 15 19:59:37 k8s-master-48084675-0 etcd[3353]: etcdserver: request timed out, possibly due to connection lost
Nov 15 19:59:37 k8s-master-48084675-0 docker[7056]: E1115 19:59:37.050694    8227 kubelet_node_status.go:390] Error updating node sta
Nov 15 19:59:39 k8s-master-48084675-0 etcd[3353]: etcdserver: request timed out, possibly due to connection lost
Nov 15 19:59:40 k8s-master-48084675-0 etcd[3353]: etcdserver: request timed out, possibly due to connection lost
Nov 15 19:59:40 k8s-master-48084675-0 etcd[3353]: failed to reach the peerURL(http://172.17.38.32:2380) of member 7f03ea887d8488f0 (G
Nov 15 19:59:40 k8s-master-48084675-0 etcd[3353]: cannot get the version of member 7f03ea887d8488f0 (Get http://172.17.38.32:2380/ver
Nov 15 19:59:40 k8s-master-48084675-0 docker[7056]: W1115 19:59:40.651057    8227 cni.go:196] Unable to update cni config: No network
Nov 15 19:59:40 k8s-master-48084675-0 docker[7056]: E1115 19:59:40.651277    8227 kubelet.go:2095] Container runtime network not read
Nov 15 19:59:41 k8s-master-48084675-0 etcd[3353]: etcdserver: request timed out, possibly due to connection lost

@dtzar
Copy link
Contributor

dtzar commented Nov 16, 2017

@zimmertr this is no longer related to the length issue and is tracked at the existing 1621. Closing this issue since the limit has been lifted.

@JunSun17
Copy link
Collaborator

JunSun17 commented Dec 7, 2017

Confirmed the language expression length limit has been bumped from 24576 to 81920:

I also tested deployment with the inputs specified in lachie83's and mikhail-manuilov's comments and can not reproduce this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests