Symptoms
Unexpected connectivity.
NPM logs include Error running iptables-nft-save. Make sure the logs are not truncated
Mitigation
- Option 1: Restart impacted NPM Pod until it no longer has above error in logs.
- Option 2: Delete all Network Policies.
Details
azure npm can incorrectly use legacy IP tables instead of NF tables, which npm and kube-proxy have been using for Ubuntu 22 nodes (on AKS nodes since kubernetes 1.25).
There is a bug in NPM: If the first call to NF tables results in a transient error such as segmentation fault, then NPM incorrectly believes it should use legacy tables. The fix will not only fix this detection logic, but also improve telemetry for quicker investigations.