Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

egressgw: steer traffic to the right interface using BPF #26215

Merged
merged 6 commits into from
Oct 3, 2023

Conversation

jibi
Copy link
Member

@jibi jibi commented Jun 14, 2023

see commits

Fixes: #23504

EgressGW: interface selection is now done with BPF, using --install-egress-gateway-routes is no longer needed.

@jibi jibi added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/misc This PR makes changes that have no direct user impact. feature/egress-gateway Impacts the egress IP gateway feature. labels Jun 14, 2023
@jibi jibi force-pushed the pr/jibi/egressgw-fib-lookup branch 4 times, most recently from 203cd6d to 1f1c132 Compare June 15, 2023 08:58
bpf/lib/nat.h Outdated Show resolved Hide resolved
@jibi jibi force-pushed the pr/jibi/egressgw-fib-lookup branch 5 times, most recently from 992dc9d to d54a868 Compare June 20, 2023 12:10
@jibi jibi force-pushed the pr/jibi/egressgw-fib-lookup branch from d54a868 to a1b42a2 Compare June 20, 2023 12:24
@jibi jibi changed the base branch from main to pr/jibi/egressgw-bpf-tests June 20, 2023 12:25
Base automatically changed from pr/jibi/egressgw-bpf-tests to main June 20, 2023 15:31
@jibi jibi force-pushed the pr/jibi/egressgw-fib-lookup branch 2 times, most recently from e1050b6 to 2fb9cea Compare June 20, 2023 15:37
@julianwiedmann julianwiedmann added the kind/enhancement This would improve or streamline existing functionality. label Jul 13, 2023
@jibi jibi force-pushed the pr/jibi/egressgw-fib-lookup branch 5 times, most recently from 1410713 to 157fa1c Compare August 4, 2023 10:10
@jibi jibi force-pushed the pr/jibi/egressgw-fib-lookup branch 3 times, most recently from 1bef1e9 to eb762e1 Compare August 24, 2023 08:43
@jibi jibi force-pushed the pr/jibi/egressgw-fib-lookup branch 2 times, most recently from 47959bf to 93b6eec Compare September 13, 2023 10:23
@jibi jibi deleted the pr/jibi/egressgw-fib-lookup branch October 3, 2023 08:01
@margamanterola margamanterola added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. and removed release-note/misc This PR makes changes that have no direct user impact. labels Oct 4, 2023
@gandro
Copy link
Member

gandro commented Oct 4, 2023

Sorry for the drive-by post-merge comment/review. I just noticed this change while I was investigating how the old feature worked (looking into auto-direct-node-routes improvements). Something that confused me for a bit: The documentation still mentions this flag. Since the flag is now a no-op, should the docs be removed too?

https://docs.cilium.io/en/latest/network/egress-gateway/#eks-s-eni-mode

@julianwiedmann
Copy link
Member

Sorry for the drive-by post-merge comment/review. I just noticed this change while I was investigating how the old feature worked (looking into auto-direct-node-routes improvements). Something that confused me for a bit: The documentation still mentions this flag. Since the flag is now a no-op, should the docs be removed too?

https://docs.cilium.io/en/latest/network/egress-gateway/#eks-s-eni-mode

Yep, we'll still need to go through the whole doc update dance (also mention the deprecation etc) 👍.

julianwiedmann added a commit to julianwiedmann/cilium that referenced this pull request Jan 29, 2024
cilium#26215 changed how we do
egressGW-specific routing on the gateway node - instead of installing
custom IP rules, we rely on the node's routing setup.
cilium#30286 then fixed up a corner-case on
older kernels.

Reflect both parts in the docs.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann added a commit to julianwiedmann/cilium that referenced this pull request Jan 29, 2024
cilium#26215 changed how we do
egressGW-specific routing on the gateway node - instead of installing
custom IP rules, we rely on the node's routing setup.
cilium#30286 then fixed up a corner-case on
older kernels.

Reflect both parts in the docs.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann added a commit to julianwiedmann/cilium that referenced this pull request Jan 29, 2024
cilium#26215 changed how we do
egressGW-specific routing on the gateway node - instead of installing
custom IP rules, we rely on the node's routing setup.
cilium#30286 then fixed up a corner-case on
older kernels.

Reflect both parts in the docs.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Jan 29, 2024
#26215 changed how we do
egressGW-specific routing on the gateway node - instead of installing
custom IP rules, we rely on the node's routing setup.
#30286 then fixed up a corner-case on
older kernels.

Reflect both parts in the docs.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
joamaki pushed a commit that referenced this pull request Jan 30, 2024
[ upstream commit e777df1 ]

#26215 changed how we do
egressGW-specific routing on the gateway node - instead of installing
custom IP rules, we rely on the node's routing setup.
#30286 then fixed up a corner-case on
older kernels.

Reflect both parts in the docs.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
aanm pushed a commit that referenced this pull request Jan 31, 2024
[ upstream commit e777df1 ]

#26215 changed how we do
egressGW-specific routing on the gateway node - instead of installing
custom IP rules, we rely on the node's routing setup.
#30286 then fixed up a corner-case on
older kernels.

Reflect both parts in the docs.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
julianwiedmann added a commit to julianwiedmann/cilium that referenced this pull request May 8, 2024
To let EGW traffic exit the gateway through the correct interface,
we've introduced FIB lookup-driven redirects in the to-netdev path
(cilium#26215). This is needed for cases
where the traffic first hits one interface via the default route, but then
needs to bounce to some other interface that matches the actual egressIP.
In this approach we masquerade the packet on its first pass through
to-netdev, set the SNAT_DONE mark, and then redirect to the actual egress
interface. Due to the SNAT_DONE mark we then skip the SNAT logic in the
second pass through to-netdev.

cilium#29379 then improved the situation for
any EGW traffic that enters the gateway from the overlay network (==
anything that's not by a pod on the gateway). We now redirect in
from-overlay, straight to the actual egress interface and masquerade the
packet there.

Now also harmonize the approach for local pods, and defer the masquerade
until the packet hits the actual egress interface. This simplifies the
overall picture, and also allows us to raise TO_NETWORK datapath trace
events that are enriched with the packet's original source IP.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann added a commit to julianwiedmann/cilium that referenced this pull request May 8, 2024
To let EGW traffic exit the gateway through the correct interface,
we've introduced FIB lookup-driven redirects in the to-netdev path
(cilium#26215). This is needed for cases
where the traffic first hits one interface via the default route, but then
needs to bounce to some other interface that matches the actual egressIP.
In this approach we masquerade the packet on its first pass through
to-netdev, set the SNAT_DONE mark, and then redirect to the actual egress
interface. Due to the SNAT_DONE mark we then skip the SNAT logic in the
second pass through to-netdev.

cilium#29379 then improved the situation for
any EGW traffic that enters the gateway from the overlay network (==
anything that's not by a pod on the gateway). We now redirect in
from-overlay, straight to the actual egress interface and masquerade the
packet there.

Now also harmonize the approach for local pods, and defer the masquerade
until the packet hits the actual egress interface. This simplifies the
overall picture, and also allows us to raise TO_NETWORK datapath trace
events that are enriched with the packet's original source IP.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann added a commit to julianwiedmann/cilium that referenced this pull request May 14, 2024
To let EGW traffic exit the gateway through the correct interface,
we've introduced FIB lookup-driven redirects in the to-netdev path
(cilium#26215). This is needed for cases
where the traffic first hits one interface via the default route, but then
needs to bounce to some other interface that matches the actual egressIP.
In this approach we masquerade the packet on its first pass through
to-netdev, set the SNAT_DONE mark, and then redirect to the actual egress
interface. Due to the SNAT_DONE mark we then skip the SNAT logic in the
second pass through to-netdev.

cilium#29379 then improved the situation for
any EGW traffic that enters the gateway from the overlay network (==
anything that's not by a pod on the gateway). We now redirect in
from-overlay, straight to the actual egress interface and masquerade the
packet there.

Now also harmonize the approach for local pods, and defer the masquerade
until the packet hits the actual egress interface. This simplifies the
overall picture. But it also allows us to raise TO_NETWORK datapath trace
events that are enriched with the packet's original source IP - this event
is raised on the *second* pass through to-netdev, so we need the SNAT to
happen at the same time.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann added a commit to julianwiedmann/cilium that referenced this pull request May 14, 2024
To let EGW traffic exit the gateway through the correct interface,
we've introduced FIB lookup-driven redirects in the to-netdev path
(cilium#26215). This is needed for cases
where the traffic first hits one interface via the default route, but then
needs to bounce to some other interface that matches the actual egressIP.
In this approach we masquerade the packet on its first pass through
to-netdev, set the SNAT_DONE mark, and then redirect to the actual egress
interface. Due to the SNAT_DONE mark we then skip the SNAT logic in the
second pass through to-netdev.

cilium#29379 then improved the situation for
any EGW traffic that enters the gateway from the overlay network (==
anything that's not by a pod on the gateway). We now redirect in
from-overlay, straight to the actual egress interface and masquerade the
packet there.

Now also harmonize the approach for local pods, and defer the masquerade
until the packet hits the actual egress interface. This simplifies the
overall picture. But it also allows us to raise TO_NETWORK datapath trace
events that are enriched with the packet's original source IP - this event
is raised on the *second* pass through to-netdev, so we need the SNAT to
happen at the same time.

Also add a comment to clarify the check to skip HostFW for SNATed traffic.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann added a commit to julianwiedmann/cilium that referenced this pull request May 14, 2024
To let EGW traffic exit the gateway through the correct interface,
we've introduced FIB lookup-driven redirects in the to-netdev path
(cilium#26215). This is needed for cases
where the traffic first hits one interface via the default route, but then
needs to bounce to some other interface that matches the actual egressIP.
In this approach we masquerade the packet on its first pass through
to-netdev, set the SNAT_DONE mark, and then redirect to the actual egress
interface. Due to the SNAT_DONE mark we then skip the SNAT logic in the
second pass through to-netdev.

cilium#29379 then improved the situation for
any EGW traffic that enters the gateway from the overlay network (==
anything that's not by a pod on the gateway). We now redirect in
from-overlay, straight to the actual egress interface and masquerade the
packet there.

Now also harmonize the approach for local pods, and defer the masquerade
until the packet hits the actual egress interface. This simplifies the
overall picture. But it also allows us to raise TO_NETWORK datapath trace
events that are enriched with the packet's original source IP - this event
is raised on the *second* pass through to-netdev, so we need the SNAT to
happen at the same time.

Also add a comment to clarify the check to skip HostFW for SNATed traffic.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann added a commit to julianwiedmann/cilium that referenced this pull request May 14, 2024
To let EGW traffic exit the gateway through the correct interface,
we've introduced FIB lookup-driven redirects in the to-netdev path
(cilium#26215). This is needed for cases
where the traffic first hits one interface via the default route, but then
needs to bounce to some other interface that matches the actual egressIP.
In this approach we masquerade the packet on its first pass through
to-netdev, set the SNAT_DONE mark, and then redirect to the actual egress
interface. Due to the SNAT_DONE mark we then skip the SNAT logic in the
second pass through to-netdev.

cilium#29379 then improved the situation for
any EGW traffic that enters the gateway from the overlay network (==
anything that's not by a pod on the gateway). We now redirect in
from-overlay, straight to the actual egress interface and masquerade the
packet there.

Now also harmonize the approach for local pods, and defer the masquerade
until the packet hits the actual egress interface. This simplifies the
overall picture. But it also allows us to raise TO_NETWORK datapath trace
events that are enriched with the packet's original source IP - this event
is raised on the *second* pass through to-netdev, so we need the SNAT to
happen at the same time.

Also add a comment to clarify the check to skip HostFW for SNATed traffic.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann added a commit to julianwiedmann/cilium that referenced this pull request May 14, 2024
To let EGW traffic exit the gateway through the correct interface,
we've introduced FIB lookup-driven redirects in the to-netdev path
(cilium#26215). This is needed for cases
where the traffic first hits one interface via the default route, but then
needs to bounce to some other interface that matches the actual egressIP.
In this approach we masquerade the packet on its first pass through
to-netdev, set the SNAT_DONE mark, and then redirect to the actual egress
interface. Due to the SNAT_DONE mark we then skip the SNAT logic in the
second pass through to-netdev.

cilium#29379 then improved the situation for
any EGW traffic that enters the gateway from the overlay network (==
anything that's not by a pod on the gateway). We now redirect in
from-overlay, straight to the actual egress interface and masquerade the
packet there.

Now also harmonize the approach for local pods, and defer the masquerade
until the packet hits the actual egress interface. This simplifies the
overall picture. But it also allows us to raise TO_NETWORK datapath trace
events that are enriched with the packet's original source IP - this event
is raised on the *second* pass through to-netdev, so we need the SNAT to
happen at the same time.

Also add a comment to clarify the check to skip HostFW for SNATed traffic.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann added a commit to julianwiedmann/cilium that referenced this pull request May 14, 2024
To let EGW traffic exit the gateway through the correct interface,
we've introduced FIB lookup-driven redirects in the to-netdev path
(cilium#26215). This is needed for cases
where the traffic first hits one interface via the default route, but then
needs to bounce to some other interface that matches the actual egressIP.
In this approach we masquerade the packet on its first pass through
to-netdev, set the SNAT_DONE mark, and then redirect to the actual egress
interface. Due to the SNAT_DONE mark we then skip the SNAT logic in the
second pass through to-netdev.

cilium#29379 then improved the situation for
any EGW traffic that enters the gateway from the overlay network (==
anything that's not by a pod on the gateway). We now redirect in
from-overlay, straight to the actual egress interface and masquerade the
packet there.

Now also harmonize the approach for local pods, and defer the masquerade
until the packet hits the actual egress interface. This simplifies the
overall picture. But it also allows us to raise TO_NETWORK datapath trace
events that are enriched with the packet's original source IP - this event
is raised on the *second* pass through to-netdev, so we need the SNAT to
happen at the same time.

Also add a comment to clarify the check to skip HostFW for SNATed traffic.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann added a commit to julianwiedmann/cilium that referenced this pull request May 14, 2024
To let EGW traffic exit the gateway through the correct interface,
we've introduced FIB lookup-driven redirects in the to-netdev path
(cilium#26215). This is needed for cases
where the traffic first hits one interface via the default route, but then
needs to bounce to some other interface that matches the actual egressIP.
In this approach we masquerade the packet on its first pass through
to-netdev, set the SNAT_DONE mark, and then redirect to the actual egress
interface. Due to the SNAT_DONE mark we then skip the SNAT logic in the
second pass through to-netdev.

cilium#29379 then improved the situation for
any EGW traffic that enters the gateway from the overlay network (==
anything that's not by a pod on the gateway). We now redirect in
from-overlay, straight to the actual egress interface and masquerade the
packet there.

Now also harmonize the approach for local pods, and defer the masquerade
until the packet hits the actual egress interface. This simplifies the
overall picture. But it also allows us to raise TO_NETWORK datapath trace
events that are enriched with the packet's original source IP - this event
is raised on the *second* pass through to-netdev, so we need the SNAT to
happen at the same time.

Also add a comment to clarify the check to skip HostFW for SNATed traffic.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann added a commit to julianwiedmann/cilium that referenced this pull request May 14, 2024
To let EGW traffic exit the gateway through the correct interface,
we've introduced FIB lookup-driven redirects in the to-netdev path
(cilium#26215). This is needed for cases
where the traffic first hits one interface via the default route, but then
needs to bounce to some other interface that matches the actual egressIP.
In this approach we masquerade the packet on its first pass through
to-netdev, set the SNAT_DONE mark, and then redirect to the actual egress
interface. Due to the SNAT_DONE mark we then skip the SNAT logic in the
second pass through to-netdev.

cilium#29379 then improved the situation for
any EGW traffic that enters the gateway from the overlay network (==
anything that's not by a pod on the gateway). We now redirect in
from-overlay, straight to the actual egress interface and masquerade the
packet there.

Now also harmonize the approach for local pods, and defer the masquerade
until the packet hits the actual egress interface. This simplifies the
overall picture. But it also allows us to raise TO_NETWORK datapath trace
events that are enriched with the packet's original source IP - this event
is raised on the *second* pass through to-netdev, so we need the SNAT to
happen at the same time.

Also add a comment to clarify the check to skip HostFW for SNATed traffic.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
julianwiedmann added a commit to julianwiedmann/cilium that referenced this pull request May 14, 2024
To let EGW traffic exit the gateway through the correct interface,
we've introduced FIB lookup-driven redirects in the to-netdev path
(cilium#26215). This is needed for cases
where the traffic first hits one interface via the default route, but then
needs to bounce to some other interface that matches the actual egressIP.
In this approach we masquerade the packet on its first pass through
to-netdev, set the SNAT_DONE mark, and then redirect to the actual egress
interface. Due to the SNAT_DONE mark we then skip the SNAT logic in the
second pass through to-netdev.

cilium#29379 then improved the situation for
any EGW traffic that enters the gateway from the overlay network (==
anything that's not by a pod on the gateway). We now redirect in
from-overlay, straight to the actual egress interface and masquerade the
packet there.

Now also harmonize the approach for local pods, and defer the masquerade
until the packet hits the actual egress interface. This simplifies the
overall picture. But it also allows us to raise TO_NETWORK datapath trace
events that are enriched with the packet's original source IP - this event
is raised on the *second* pass through to-netdev, so we need the SNAT to
happen at the same time.

Also add a comment to clarify the check to skip HostFW for SNATed traffic.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request May 23, 2024
To let EGW traffic exit the gateway through the correct interface,
we've introduced FIB lookup-driven redirects in the to-netdev path
(#26215). This is needed for cases
where the traffic first hits one interface via the default route, but then
needs to bounce to some other interface that matches the actual egressIP.
In this approach we masquerade the packet on its first pass through
to-netdev, set the SNAT_DONE mark, and then redirect to the actual egress
interface. Due to the SNAT_DONE mark we then skip the SNAT logic in the
second pass through to-netdev.

#29379 then improved the situation for
any EGW traffic that enters the gateway from the overlay network (==
anything that's not by a pod on the gateway). We now redirect in
from-overlay, straight to the actual egress interface and masquerade the
packet there.

Now also harmonize the approach for local pods, and defer the masquerade
until the packet hits the actual egress interface. This simplifies the
overall picture. But it also allows us to raise TO_NETWORK datapath trace
events that are enriched with the packet's original source IP - this event
is raised on the *second* pass through to-netdev, so we need the SNAT to
happen at the same time.

Also add a comment to clarify the check to skip HostFW for SNATed traffic.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
joamaki pushed a commit that referenced this pull request May 30, 2024
[ upstream commit cf6b203 ]

To let EGW traffic exit the gateway through the correct interface,
we've introduced FIB lookup-driven redirects in the to-netdev path
(#26215). This is needed for cases
where the traffic first hits one interface via the default route, but then
needs to bounce to some other interface that matches the actual egressIP.
In this approach we masquerade the packet on its first pass through
to-netdev, set the SNAT_DONE mark, and then redirect to the actual egress
interface. Due to the SNAT_DONE mark we then skip the SNAT logic in the
second pass through to-netdev.

#29379 then improved the situation for
any EGW traffic that enters the gateway from the overlay network (==
anything that's not by a pod on the gateway). We now redirect in
from-overlay, straight to the actual egress interface and masquerade the
packet there.

Now also harmonize the approach for local pods, and defer the masquerade
until the packet hits the actual egress interface. This simplifies the
overall picture. But it also allows us to raise TO_NETWORK datapath trace
events that are enriched with the packet's original source IP - this event
is raised on the *second* pass through to-netdev, so we need the SNAT to
happen at the same time.

Also add a comment to clarify the check to skip HostFW for SNATed traffic.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
joamaki pushed a commit that referenced this pull request May 30, 2024
[ upstream commit cf6b203 ]

To let EGW traffic exit the gateway through the correct interface,
we've introduced FIB lookup-driven redirects in the to-netdev path
(#26215). This is needed for cases
where the traffic first hits one interface via the default route, but then
needs to bounce to some other interface that matches the actual egressIP.
In this approach we masquerade the packet on its first pass through
to-netdev, set the SNAT_DONE mark, and then redirect to the actual egress
interface. Due to the SNAT_DONE mark we then skip the SNAT logic in the
second pass through to-netdev.

#29379 then improved the situation for
any EGW traffic that enters the gateway from the overlay network (==
anything that's not by a pod on the gateway). We now redirect in
from-overlay, straight to the actual egress interface and masquerade the
packet there.

Now also harmonize the approach for local pods, and defer the masquerade
until the packet hits the actual egress interface. This simplifies the
overall picture. But it also allows us to raise TO_NETWORK datapath trace
events that are enriched with the packet's original source IP - this event
is raised on the *second* pass through to-netdev, so we need the SNAT to
happen at the same time.

Also add a comment to clarify the check to skip HostFW for SNATed traffic.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
joamaki pushed a commit that referenced this pull request May 30, 2024
[ upstream commit cf6b203 ]

To let EGW traffic exit the gateway through the correct interface,
we've introduced FIB lookup-driven redirects in the to-netdev path
(#26215). This is needed for cases
where the traffic first hits one interface via the default route, but then
needs to bounce to some other interface that matches the actual egressIP.
In this approach we masquerade the packet on its first pass through
to-netdev, set the SNAT_DONE mark, and then redirect to the actual egress
interface. Due to the SNAT_DONE mark we then skip the SNAT logic in the
second pass through to-netdev.

#29379 then improved the situation for
any EGW traffic that enters the gateway from the overlay network (==
anything that's not by a pod on the gateway). We now redirect in
from-overlay, straight to the actual egress interface and masquerade the
packet there.

Now also harmonize the approach for local pods, and defer the masquerade
until the packet hits the actual egress interface. This simplifies the
overall picture. But it also allows us to raise TO_NETWORK datapath trace
events that are enriched with the packet's original source IP - this event
is raised on the *second* pass through to-netdev, so we need the SNAT to
happen at the same time.

Also add a comment to clarify the check to skip HostFW for SNATed traffic.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/egress-gateway Impacts the egress IP gateway feature. kind/enhancement This would improve or streamline existing functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate bpf_fib_lookup + bpf_redirect to replace egress gateway IP rules/routes
6 participants