Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

devices: allow local route entries in device mngr #24608

Merged

Conversation

oblazek
Copy link
Contributor

@oblazek oblazek commented Mar 28, 2023

Related slack thread: https://cilium.slack.com/archives/C2B917YHE/p1679920009513589

Previously when --enable-runtime-device-detection was enabled the code skipped local route table.
This means if device route was only present in the local table it was skipped and no datapath program
was attached.

In case of a dummy device with e.g. anycast IP it would be necessary to remove this check.
Known unwanted links are still filtered out and the rest can be narrowed down by using --devices flag.
This PR fixes that.

Signed-off-by: Ondrej Blazek ondrej.blazek@firma.seznam.cz

Allow devices from local route table to be used for datapath programs. 

@oblazek oblazek requested a review from a team as a code owner March 28, 2023 13:17
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 28, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Mar 28, 2023
@joamaki joamaki added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Mar 28, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 28, 2023
@joamaki joamaki self-requested a review March 28, 2023 13:31
@joamaki
Copy link
Contributor

joamaki commented Mar 29, 2023

/test

@oblazek
Copy link
Contributor Author

oblazek commented Mar 30, 2023

ah, failure seems legit, taking a look

Copy link
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@oblazek oblazek force-pushed the ob-runtime-devices-local-route-entries branch from 9c4f518 to 6cde854 Compare March 30, 2023 11:28
@oblazek
Copy link
Contributor Author

oblazek commented Mar 30, 2023

forced pushed with a fixed test file..

@oblazek
Copy link
Contributor Author

oblazek commented Mar 31, 2023

the failures don't look related.. could be a flake?

@joestringer
Copy link
Member

If the issue is a flake, then I would expect to be able to find the test failure + symptoms documented in another issue. If there isn't an issue, then it could either be this PR or others have not yet encountered the issue such that we've had a chance to file an issue for it. It looks like there's a bunch of failures, may be worth investigating the first failure (make sure it's the first one that ran&failed) to see whether the initial failure somehow left the test cluster in a bad state that caused later failures. Either way if we think this is not related to your PR, I'd encourage you to file an issue with the failure details so that others can find that issue as well. If it's generally affecting the main branch then others will likely hit the issue and chime in on the filed issue soon enough. If you're the only one observing the issue, then that increases the odds that there's some aspect related to your changes.

@oblazek oblazek force-pushed the ob-runtime-devices-local-route-entries branch from 6cde854 to 5a05666 Compare April 11, 2023 06:06
@oblazek
Copy link
Contributor Author

oblazek commented Apr 12, 2023

got this error for the ci-eks

2023-04-11T11:19:55.991006262Z level=warning msg="Unable to synchronize EC2 VPC list" error="operation error EC2: DescribeVpcs, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post \"https://ec2.us-east-2.amazonaws.com/\": dial tcp: lookup ec2.us-east-2.amazonaws.com on 10.100.0.10:53: write udp 192.168.127.65:37304->10.100.0.10:53: write: operation not permitted" subsys=eni
2023-04-11T11:19:55.991349965Z level=fatal msg="Unable to start eni allocator" error="Initial synchronization with instances API failed" subsys=cilium-operator-aws

@oblazek
Copy link
Contributor Author

oblazek commented Apr 12, 2023

Anyone has encountered the Setup&Test failures? I can't see much in the sysdump as it doesn't include cilium-agent logs.. only operator, which seems fine.

@oblazek oblazek force-pushed the ob-runtime-devices-local-route-entries branch from 5a05666 to 9ed7e4f Compare April 12, 2023 08:53
@joamaki
Copy link
Contributor

joamaki commented Apr 17, 2023

/test

@oblazek oblazek force-pushed the ob-runtime-devices-local-route-entries branch from 9ed7e4f to 6bbb40d Compare April 17, 2023 11:55
@joestringer
Copy link
Member

joestringer commented May 5, 2023

/test

Job 'Cilium-PR-K8s-1.24-kernel-5.4' failed:

Click to show.

Test Name

K8sDatapathConfig Host firewall With native routing and endpoint routes

Failure Output

FAIL: Error deleting resource /home/jenkins/workspace/Cilium-PR-K8s-1.24-kernel-5.4/src/github.com/cilium/cilium/test/k8s/manifests/host-policies.yaml: Cannot retrieve "cilium-8crvt"'s policy revision: cannot get policy revision: ""

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.24-kernel-5.4/1996/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.24-kernel-5.4 so I can create one.

Then please upload the Jenkins artifacts to that issue.

@joestringer
Copy link
Member

A few of the failures look familiar from a search, but some of them do not look familiar. Do you think that they could be related to the changes being made in this PR?

@oblazek
Copy link
Contributor Author

oblazek commented May 6, 2023

I will try to look deeper.. but I am unsure how are these related.. thanks Joe :)

@joestringer
Copy link
Member

Given the Jenkins outage we had last week, you may need to rebase the PR again and re-run /test in order to gather data on the failures again. The old links are expired now.

@oblazek oblazek force-pushed the ob-runtime-devices-local-route-entries branch 2 times, most recently from 4a7b867 to 22224ec Compare May 25, 2023 20:01
@oblazek
Copy link
Contributor Author

oblazek commented May 26, 2023

/test

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Unexpected missed tail call

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/198/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

Then please upload the Jenkins artifacts to that issue.

@oblazek oblazek force-pushed the ob-runtime-devices-local-route-entries branch from 22224ec to 246931d Compare June 2, 2023 06:34
Allow devices from local route table to be used as a
possible network facing device (to have datapath bpf programs).

Known unwanted links are still filtered out and the rest can
be narrowed down by using `--devices` flag.

Signed-off-by: Ondrej Blazek <ondrej.blazek@firma.seznam.cz>
@oblazek oblazek force-pushed the ob-runtime-devices-local-route-entries branch from 246931d to 0efc045 Compare June 12, 2023 12:45
@oblazek
Copy link
Contributor Author

oblazek commented Jun 12, 2023

rebased to be up to date.. retriggering tests

@oblazek
Copy link
Contributor Author

oblazek commented Jun 12, 2023

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 12, 2023
@borkmann borkmann merged commit 3a6b49c into cilium:main Jun 13, 2023
63 of 65 checks passed
@yingnanzhang666
Copy link
Contributor

yingnanzhang666 commented Jan 24, 2024

This means if device route was only present in the local table it was skipped and no datapath program
was attached.
In case of a dummy device with e.g. anycast IP it would be necessary to remove this check.

@oblazek I cannot see the historical slack thread. For the dummy device with e.g. anycast IP, Can we just add its IP as reserved:host into ipcache without attach datapath program?
The traffic which target is the anycast IP will also go through the bpf program attached to real interface. right? If add the IP as the reserved:host, it will also do the host firewall check in the program attached to the real interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants