Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.10] daemon, node: Fix faulty router IP restoration logic #16675

Merged
merged 1 commit into from Jun 29, 2021

Conversation

christarazi
Copy link
Member

@christarazi christarazi commented Jun 28, 2021

Once this PR is merged, you can update the PR labels via:

$ for pr in 16672; do contrib/backporting/set-labels.py $pr done 1.10; done

@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.10 kind/backports This PR provides functionality previously merged into master. labels Jun 28, 2021
@christarazi christarazi changed the title daemon, node: Fix faulty router IP restoration logic [v1.10] daemon, node: Fix faulty router IP restoration logic Jun 28, 2021
@christarazi christarazi force-pushed the pr/christarazi/v1.10-backport-16672 branch from 14de0c9 to 890ffaa Compare June 28, 2021 22:30
@christarazi
Copy link
Member Author

christarazi commented Jun 28, 2021

test-backport-1.10

this hit #16659

@aanm
Copy link
Member

aanm commented Jun 29, 2021

@aditighag
Copy link
Member

k8s-1.21-kernel-4.9 failure - #16659

app2-58757b7dd5-465jl" cannot curl fqdn target "vagrant-cache.ci.cilium.io"
Expected command: kubectl exec -n 202106282353k8shubbletesthubbleobservetestl3l4flow app2-58757b7dd5-465jl -- curl --path-as-is -s -D /dev/stderr --fail --connect-timeout 5 --max-time 20 http://vagrant-cache.ci.cilium.io -w "time-> DNS: '%{time_namelookup}(%{remote_ip})', Connect: '%{time_connect}',Transfer '%{time_starttransfer}', total '%{time_total}'"
To succeed, but it failed:
Exitcode: 28
Err: exit status 28
Stdout:
time-> DNS: '0.022253()', Connect: '0.000000',Transfer '0.000000', total '5.000267'
Stderr:
command terminated with exit code 28

[ upstream commit ff63b07 ]

When running in ENI or Alibaba IPAM mode, or any CRD-backed IPAM mode
("crd") and upon Cilium restart, it was very likely that `cilium_host`
was assigned an additional IP. Below is a case where Cilium was
restarted 3 times, hence getting 3 additional router IPs:

```
4: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default qlen 1000
    link/ether 66:03:3c:07:8c:47 brd ff:ff:ff:ff:ff:ff
    inet 192.168.35.9/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet 192.168.34.37/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet 192.168.57.107/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet6 fe80::6403:3cff:fe07:8c47/64 scope link
       valid_lft forever preferred_lft forever
```

This was because in CRD-backed IPAM modes, we wait until we fully sync
with K8s in order to derive the VPC CIDR, which becomes the pod CIDR on
the node. Since the router IP restoration logic was using a different
pod CIDR during the router IP validation check, it was erroneously
discarding it. This was observed with:

```
2021-06-25T13:59:47.816069937Z level=info msg="The router IP (192.168.135.3) considered for restoration does not belong in the Pod CIDR of the node. Discarding old router IP." cidr=10.8.0.0/16 subsys=node
```

This is problematic because the extraneous router IPs could be also
assigned to pods, which would break pod connectivity.

The fix is to break up the router IP restoration process into 2 parts.
The first is to attempt a restoration of the IP from the filesystem
(`node_config.h`). We also fetch the router IPs from Kubernetes
resources since they were already retrieved prior inside
k8s.WaitForNodeInformation().

Then after the CRD-backed IPAM is initialized and started
(*Daemon).startIPAM() is called, we attempt the second part. This
includes evaluating which IPs (either from filesystem or from K8s)
should be set as the router IPs. The IPs from the filesystem take
precedence. In case the node was rebooted, the filesystem will be wiped
so then we'd rely on the IPs from the K8s resources. At this point in
the daemon initialization, we have the correct CIDR range as the pod
CIDR range to validate the chosen IP.

Fixes: beb8bde ("k8s, node: Restore router IPs (`cilium_host`) from
K8s resource")

Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi force-pushed the pr/christarazi/v1.10-backport-16672 branch from 890ffaa to 2ee471c Compare June 29, 2021 21:09
@christarazi christarazi marked this pull request as ready for review June 29, 2021 21:09
@christarazi christarazi requested a review from a team as a code owner June 29, 2021 21:09
@christarazi
Copy link
Member Author

christarazi commented Jun 29, 2021

CI has passed previously. Force pushed to update the commit reference as the upstream PR was merged.

@aanm aanm merged commit c349779 into cilium:v1.10 Jun 29, 2021
@christarazi christarazi deleted the pr/christarazi/v1.10-backport-16672 branch June 29, 2021 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants