Self-manged k8s cluster: Controller not updating the targets after pod mutation

I am running a self-manged (kOps based) k8s cluster. 

- Restarted the deployment 1. Looks like it is taking forever for the controller to detect the new IPs and update the target group with those IPs, as a result targets stay unhealthy. I am testing it now, it has been 20+ minutes and targets are still pointing to stale IPs as a result staying unhealthy.

```
Admin:~/environment $ k get pods -n kops-sample-webapp-1 sample-read-endpoint-1-79db8686dd-fls6q -owide
NAME                                      READY   STATUS    RESTARTS   AGE   IP             NODE                  NOMINATED NODE   READINESS GATES
sample-read-endpoint-1-79db8686dd-fls6q   1/1     Running   0          62m   10.100.12.44   i-09c46490ebd9d7a7a   <none>           <none>
Admin:~/environment $ k get pods -n kops-sample-webapp-1 sample-read-endpoint-1-79db8686dd-7gr4t  -owide
NAME                                      READY   STATUS    RESTARTS   AGE   IP             NODE                  NOMINATED NODE   READINESS GATES
sample-read-endpoint-1-79db8686dd-7gr4t   1/1     Running   0          63m   10.100.11.72   i-06976711a4a8d5bda   <none>           <none>
Admin:~/environment $ 

Admin:~/environment $ aws vpc-lattice list-targets --target-group-identifier tg-0f64ed0c67e307b45
{
    "items": [
        {
            "id": "10.100.11.6",
            "port": 80,
            "reasonCode": "ConnectionTimeout",
            "status": "UNHEALTHY"
        },
        {
            "id": "10.100.12.108",
            "port": 80,
            "reasonCode": "ConnectionTimeout",
            "status": "UNHEALTHY"
        }
    ]
}
Admin:~/environment $
```

- Restarted yet another deployment - deployment 2. The new pods for deployment 2 claimed the old stale IPs of the write endpoint. And now the deployment 1 targets are marked healthy (as they were never removed from the target group and health check succeeded). Now traffic intended for write endpoint is being served by/route to read endpoint while traffic to read endpoint is hanging:

```
Admin:~/environment $ k get pods -n kops-sample-webapp-1 sample-write-endpoint-1-84cfb6f8bd-b29pn -owide                                                                                                                     
NAME                                       READY   STATUS    RESTARTS   AGE   IP              NODE                  NOMINATED NODE   READINESS GATES
sample-write-endpoint-1-84cfb6f8bd-b29pn   1/1     Running   0          88m   10.100.12.252   i-09c46490ebd9d7a7a   <none>           <none>
Admin:~/environment $ k get pods -n kops-sample-webapp-1 sample-write-endpoint-1-84cfb6f8bd-vxfhn -owide
NAME                                       READY   STATUS    RESTARTS   AGE   IP             NODE                  NOMINATED NODE   READINESS GATES
sample-write-endpoint-1-84cfb6f8bd-vxfhn   1/1     Running   0          88m   10.100.11.57   i-06976711a4a8d5bda   <none>           <none>
Admin:~/environment $ 

Admin:~/environment $ aws vpc-lattice list-targets --target-group-identifier tg-095f618fb72e3c199
{
    "items": [
        {
            "id": "10.100.12.44",
            "port": 80,
            "status": "HEALTHY"
        },
        {
            "id": "10.100.11.72",
            "port": 80,
            "status": "HEALTHY"
        }
    ]
}
Admin:~/environment $ 
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Self-manged k8s cluster: Controller not updating the targets after pod mutation #476

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Self-manged k8s cluster: Controller not updating the targets after pod mutation #476

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions