CKS Firewall and scaling cluster problem if default firewall rules delete

### problem

After creating k8s cluster and remove default firewall rules, I cannot scaling cluster, with network error:

> 2025-10-02 13:02:14,919 WARN  [o.a.c.m.w.WebhookServiceImpl] (API-Job-Executor-36:[ctx-11ddf09f, job-9391, ctx-04c8b6be, ctx-35bd1fab, ctx-42198853]) (logid:2fbf4611) Skipping delivering event Event {"description":"{\"event\":\"VM.START\",\"status\":\"Completed\"}","eventId":null,"eventType":"VM.START","eventUuid
":null,"resourceType":"VirtualMachine","resourceUUID":null} to any webhook as account ID is missing
2025-10-02 13:02:14,919 WARN  [o.a.c.f.e.EventDistributorImpl] (API-Job-Executor-36:[ctx-11ddf09f, job-9391, ctx-04c8b6be, ctx-35bd1fab, ctx-42198853]) (logid:2fbf4611) Failed to publish event [category: ActionEvent, type: VM.START] on bus webhookEventBus
2025-10-02 13:02:14,936 ERROR [c.c.k.c.a.KubernetesClusterScaleWorker] (API-Job-Executor-36:[ctx-11ddf09f, job-9391, ctx-04c8b6be]) (logid:2fbf4611) Scaling failed for Kubernetes cluster : my-k8s, unable to update network rules com.cloud.exception.ManagementServerException: Firewall rule for node SSH access can't
 be provisioned
        at com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterScaleWorker.scaleKubernetesClusterIsolatedNetworkRules(KubernetesClusterScaleWorker.java:128)
        at com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterScaleWorker.scaleKubernetesClusterNetworkRules(KubernetesClusterScaleWorker.java:176)
        at com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterScaleWorker.scaleUpKubernetesClusterSize(KubernetesClusterScaleWorker.java:388)
        at com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterScaleWorker.scaleKubernetesClusterSize(KubernetesClusterScaleWorker.java:424)
        at com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterScaleWorker.scaleCluster(KubernetesClusterScaleWorker.java:477)
        at com.cloud.kubernetes.cluster.KubernetesClusterManagerImpl.scaleKubernetesCluster(KubernetesClusterManagerImpl.java:1767)
        at jdk.internal.reflect.GeneratedMethodAccessor1219.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:569)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
        at org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:105)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
        at com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:52)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
        at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215)
        at jdk.proxy3/jdk.proxy3.$Proxy517.scaleKubernetesCluster(Unknown Source)
        at org.apache.cloudstack.api.command.user.kubernetes.cluster.ScaleKubernetesClusterCmd.execute(ScaleKubernetesClusterCmd.java:160)
        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:173)
        at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:110)
        at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:652)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
        at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:600)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)

### versions

OS is ubuntu 22.04
Cloudstack version is 4.20.1
K8s version is v1.33.1-calico-x86_64
Primary storage is Ceph RBD 19.2.3
Libvirt version is 8.0.0-1ubuntu7.12



### The steps to reproduce the bug

1. Create a network with any subnet (e.g., 10.10.10.1/24).
2. Create a k8s cluster in HA mode with one worker node, using the previously created external network.
3. Remove the default firewall rules:

- 0.0.0.0/0 TCP 6443 6443
- 0.0.0.0/0 TCP 2222 2225

4. Add new firewall rules:

- 10.10.10.1/24 TCP 1 65534
- 1.1.1.1/32 TCP 1 65534

5. Try to scale the cluster to two worker nodes.

Result:
An error occurs, although the new instance is created.

Workaround
When using the following firewall rules instead:

- 10.10.10.1/24 TCP 6443 6443
- 10.10.10.1/24 TCP 2222 2225

→ Scaling the cluster works correctly.

Additional Issues Observed
1. Opening SSH (2222–2225) and k8s management (6443) to 0.0.0.0/0 is a security risk.
2. When k8s enters the alert state, it is impossible to repair the cluster. The only options available are stop or delete.
- After stopping and starting the cluster, it change state to running. The new worker instance is created, but it does not join the Kubernetes cluster (it isn’t present in the cluster).
- However, scaling the cluster is still not possible, and deleting an individual instance also fails.
- The only option left is to remove the entire cluster and create it again.



### What to do about it?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CKS Firewall and scaling cluster problem if default firewall rules delete #11779

problem

versions

The steps to reproduce the bug

What to do about it?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CKS Firewall and scaling cluster problem if default firewall rules delete #11779

Description

problem

versions

The steps to reproduce the bug

What to do about it?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions