Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug where IP addresses of devices in unknown state are resolved as remote-node #17418

Merged
merged 1 commit into from Sep 29, 2021

Conversation

jibi
Copy link
Member

@jibi jibi commented Sep 16, 2021

In initExcludedIPs() we build a list of IPs that Cilium needs to exclude
to operate. One check to determine if an IP should be excluded is based
on the state of the net device: if the device is not up, then its IPs
are excluded.

Unfortunately, this check is not enough, as it's possible to have a
device reporting an unknown state (because its driver is missing the
operstate handling, e.g. a dummy device) while still being operational.

This commit changes the logic in initExcludedIPs() to not exclude IPs of
devices reporting an unknown state.

@jibi jibi added area/daemon Impacts operation of the Cilium daemon. release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.10 labels Sep 16, 2021
@jibi jibi requested review from borkmann and a team September 16, 2021 14:08
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.10.5 Sep 16, 2021
@jibi
Copy link
Member Author

jibi commented Sep 16, 2021

test-me-please

Job 'Cilium-PR-K8s-1.19-kernel-5.4' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sDatapathConfig AutoDirectNodeRoutes Check connectivity with sockops and direct routing

Failure Output

FAIL: Error creating resource /home/jenkins/workspace/Cilium-PR-K8s-1.19-kernel-5.4/src/github.com/cilium/cilium/test/k8sT/manifests/l3-policy-demo.yaml: Cannot retrieve cilium pod cilium-m8vl8 policy revision: cannot get revision from json output '': could not parse JSON from command "kubectl exec -n kube-system cilium-m8vl8 -- cilium policy get -o json"

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.19-kernel-5.4 so I can create a new GitHub issue to track it.

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@bmcustodio bmcustodio mentioned this pull request Sep 16, 2021
@pchaigno
Copy link
Member

The pull request title will be used in release-notes, so maybe could be something as follows to clarify impact for users:

Fix bug where IP addresses of devices in unknown state are resolved as remote-node.

@jibi jibi changed the title node: don't exclude IPs from devices in unknown oper state Fix bug where IP addresses of devices in unknown state are resolved as remote-node Sep 16, 2021
pkg/node/ip_linux.go Outdated Show resolved Hide resolved
In initExcludedIPs() we build a list of IPs that Cilium needs to exclude
to operate. One check to determine if an IP should be excluded is based
on the state of the net device: if the device is not up, then its IPs
are excluded.

Unfortunately, this check is not enough, as it's possible to have a
device reporting an unknown state (because its driver is missing the
operstate handling, e.g. a dummy device) while still being operational.

This commit changes the logic in initExcludedIPs() to not exclude IPs of
devices reporting an unknown state.

Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
@jibi jibi force-pushed the pr/jibi/fix-init-excluded-ips branch from ecfc53b to 4000367 Compare September 17, 2021 07:50
@jibi
Copy link
Member Author

jibi commented Sep 17, 2021

test-me-please

Job 'Cilium-PR-K8s-1.16-net-next' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks service across nodes Tests NodePort BPF Tests with direct routing Tests LoadBalancer Connectivity to endpoint via LB

Failure Output

FAIL: Can not connect to service "http://192.168.1.146" from outside cluster (1/10)

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.16-net-next so I can create a new GitHub issue to track it.

@jibi
Copy link
Member Author

jibi commented Sep 20, 2021

/mlh new-flake Cilium-PR-K8s-1.16-net-next

👍 created #17437

edit: was already tracked in #16399

@pchaigno
Copy link
Member

Reviews are in. Tests are passing except for flake mentioned above. Marking ready to merge.

@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 29, 2021
@jibi jibi merged commit 6dbabed into master Sep 29, 2021
@jibi jibi deleted the pr/jibi/fix-init-excluded-ips branch September 29, 2021 11:49
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.10.5 Sep 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.10.5
Backport done to v1.10
Development

Successfully merging this pull request may close these issues.

None yet

5 participants