azure-cni may leak IP allocations after failing to ADD them to a pod

**Is this a request for help?**: No

---

**Is this an ISSUE or FEATURE REQUEST?** (choose one): Issue

---

**Which release version?**:   master + cherry-pick of #212 

---

**Which component (CNI/IPAM/CNM/CNS)**: CNI

---

**Which Operating System (Linux/Windows)**: Windows Server version 1803

---

**Which Orchestrator and version (e.g. Kubernetes, Docker)**: Kubernetes

---

**What happened**:

After scaling up a replica set, some containers failed to start. When this happens, the IPs were not freed. Here's an example of the end state after scaling back down. Only 1 pod IP should be in use on the node, but there are 3 marked as in use in the IPAM file

```
kubectl get pod -o wide
NAME                           READY     STATUS    RESTARTS   AGE       IP             NODE
psh-5d98ff98b5-qpbjv           1/1       Running   0          18h       10.240.0.141   k8s-linuxpool-13955535-1
whoami-1803-78fd64846f-lq9m7   1/1       Running   0          18h       10.240.0.99    13955k8s9001



# Run on 13955k8s9001
(get-content c:\k\azure-vnet-ipam.json | convertfrom-json).IPAM.AddressSpaces.local.Pools.'10.240.0.0/12'.Addresses


10.240.0.100 : @{ID=; Addr=10.240.0.100; InUse=False}
10.240.0.101 : @{ID=; Addr=10.240.0.101; InUse=False}
10.240.0.102 : @{ID=; Addr=10.240.0.102; InUse=False}
10.240.0.103 : @{ID=; Addr=10.240.0.103; InUse=False}
10.240.0.104 : @{ID=; Addr=10.240.0.104; InUse=False}
10.240.0.105 : @{ID=; Addr=10.240.0.105; InUse=False}
10.240.0.106 : @{ID=; Addr=10.240.0.106; InUse=False}
10.240.0.107 : @{ID=; Addr=10.240.0.107; InUse=False}
10.240.0.108 : @{ID=; Addr=10.240.0.108; InUse=False}
10.240.0.109 : @{ID=; Addr=10.240.0.109; InUse=False}
10.240.0.110 : @{ID=; Addr=10.240.0.110; InUse=False}
10.240.0.111 : @{ID=; Addr=10.240.0.111; InUse=False}
10.240.0.112 : @{ID=; Addr=10.240.0.112; InUse=False}
10.240.0.113 : @{ID=; Addr=10.240.0.113; InUse=True}
10.240.0.114 : @{ID=; Addr=10.240.0.114; InUse=False}
10.240.0.115 : @{ID=; Addr=10.240.0.115; InUse=False}
10.240.0.116 : @{ID=; Addr=10.240.0.116; InUse=False}
10.240.0.117 : @{ID=; Addr=10.240.0.117; InUse=False}
10.240.0.118 : @{ID=; Addr=10.240.0.118; InUse=False}
10.240.0.119 : @{ID=; Addr=10.240.0.119; InUse=False}
10.240.0.120 : @{ID=; Addr=10.240.0.120; InUse=False}
10.240.0.121 : @{ID=; Addr=10.240.0.121; InUse=False}
10.240.0.122 : @{ID=; Addr=10.240.0.122; InUse=False}
10.240.0.123 : @{ID=; Addr=10.240.0.123; InUse=False}
10.240.0.124 : @{ID=; Addr=10.240.0.124; InUse=False}
10.240.0.125 : @{ID=; Addr=10.240.0.125; InUse=True}
10.240.0.126 : @{ID=; Addr=10.240.0.126; InUse=False}
10.240.0.97  : @{ID=; Addr=10.240.0.97; InUse=False}
10.240.0.98  : @{ID=; Addr=10.240.0.98; InUse=False}
10.240.0.99  : @{ID=; Addr=10.240.0.99; InUse=True}
```

---

**What you expected to happen**:

No leaks

---

**How to reproduce it** (as minimally and precisely as possible):

```
# cordon all Windows nodes except 1
kubectl apply -f https://raw.githubusercontent.com/PatrickLang/Windows-K8s-Samples/master/HyperVExamples/whoami-1803.yaml
kubectl scale deploy whoami-1803 --replicas=6
# wait some time, not all 6 will start successfully
kubectl scale deploy whoami-1803 --replicas=1
```

---

**Anything else we need to know**:

Found this while testing fix for #195 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

azure-cni may leak IP allocations after failing to ADD them to a pod #214

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

azure-cni may leak IP allocations after failing to ADD them to a pod #214

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions