New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daemon: Add option --bpf-lb-external-clusterip #15650
daemon: Add option --bpf-lb-external-clusterip #15650
Conversation
test-me-please |
71535e7
to
15263b0
Compare
test-me-please |
1 similar comment
test-me-please |
Documentation/cmdref/cilium-agent.md
Outdated
@@ -74,6 +74,7 @@ cilium-agent [flags] | |||
--enable-bpf-clock-probe Enable BPF clock source probing for more efficient tick retrieval | |||
--enable-bpf-masquerade Masquerade packets from endpoints leaving the host with BPF instead of iptables | |||
--enable-bpf-tproxy Enable BPF-based proxy redirection, if support available | |||
--enable-cluster-ip-external-access Enable external access to ClusterIP services |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What other documentation related changes should be made for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be good to document this flag in the kube-proxy replacement getting started guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Added documentation.
static __always_inline bool __lb_svc_is_routable(__u8 flags) | ||
{ | ||
return (flags & svc_is_routable_mask()) > SVC_FLAG_ROUTABLE; | ||
return (flags & SVC_FLAG_ROUTABLE) != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything that could break with this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking loud about the upgrade path. When cilium-agent starts, services are updated from kube-apiserver before the datapath gets regenerated. So, it means that for some time, the old code will be running with the new service flags. I think this should be fine, as for previously routable services nothing should change.
During the downgrade, we will have the old service flags (= pre your changes) with the new datapath (= with your changes) for awhile. This will allow ClusterIP access from outside. But I guess this is tolerable.
static __always_inline bool __lb_svc_is_routable(__u8 flags) | ||
{ | ||
return (flags & svc_is_routable_mask()) > SVC_FLAG_ROUTABLE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be guarded by EnableClusterIPExternalAccess
config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean by guarded. The flag is disabled by default in which case SVC_FLAG_ROUTABLE
is unset for ClusterIP services. Earlier SVC_FLAG_ROUTABLE
was set for ClusterIP services, but higher bits were unset and hence this function returned false for ClusterIP
services. Now we only need to check for SVC_FLAG_ROUTABLE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of which, I can't find where, in your PR, SVC_FLAG_ROUTABLE
would be set or unset depending on the value for config.ExternalClusterIP
. Shouldn't there be a change to the definition of SVC_FLAG_ROUTABLE
in bpf/lib/common.h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change to this logic is in the agent in updateMasterService
: https://github.com/cilium/cilium/pull/15650/files#diff-8eff0d99dd1ceb7d15ce632811672e7b17a17fb5e984fafa7875d6ad2433b3d8R519. It was already set earlier and in this PR I'm turning it off for ClusterIP services unless the new external access flag is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thank you. On my first read, I somehow understood that you would be changing the value for SVC_FLAG_ROUTABLE
depending on the value of ExternalClusterIP
, but I understand this is not the case - You're just changing the flags
.
15263b0
to
39b6b7a
Compare
test-me-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Some comments.
pkg/option/config.go
Outdated
@@ -1963,6 +1971,8 @@ var ( | |||
|
|||
k8sEnableLeasesFallbackDiscovery: defaults.K8sEnableLeasesFallbackDiscovery, | |||
APIRateLimit: make(map[string]string), | |||
|
|||
EnableClusterIPExternalAccess: defaults.EnableClusterIPExternalAccess, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you don't need this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep it isn't strictly necessary (as it'll default to false if unspecified). I added it since there were bunch of other boolean options that defaulted to false as well and were specified here. I'm fine either way.
Outside access to ClusterIP services | ||
************************************ | ||
|
||
By default Cilium does not route packets coming from outside the cluster to a backend exposed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the backend
term only internally in the Cilium codebase. The k8s term is service endpoint
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I'll change it. There's quite few other mentions of backend in the same document in similar context (e.g. in DSR section). Wonder if we should do pass over the document and make sure the naming is consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if we should do pass over the document and make sure the naming is consistent?
👍
static __always_inline bool __lb_svc_is_routable(__u8 flags) | ||
{ | ||
return (flags & svc_is_routable_mask()) > SVC_FLAG_ROUTABLE; | ||
return (flags & SVC_FLAG_ROUTABLE) != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking loud about the upgrade path. When cilium-agent starts, services are updated from kube-apiserver before the datapath gets regenerated. So, it means that for some time, the old code will be running with the new service flags. I think this should be fine, as for previously routable services nothing should change.
During the downgrade, we will have the old service flags (= pre your changes) with the new datapath (= with your changes) for awhile. This will allow ClusterIP access from outside. But I guess this is tolerable.
Adding @borkmann to the reviewers list, as he introduced the routable concept. |
39b6b7a
to
725ce3d
Compare
|
||
By default Cilium does not route packets coming from outside the cluster to a service endpoint | ||
exposed with a ``ClusterIP`` service. Routing to ``ClusterIP`` services can be enabled using the | ||
by setting ``config.enableClusterIPExternalAccess=true``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we try to elaborate more on the rationale here on why we don't expose it by default? Back then we didn't expose it due to the assumption that the range would typically be non-routable, and given that, as a security measure, we also do not expect such traffic on the phys iface. (Plus there's NodePort which would accomplish similar things just in different IP/port range (if not backed by e.g. LoadBalancer). Perhaps there are additional pointers from K8s doc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the link to k8s documentation about this (which we already had below so I just moved it here).
(Overall, code wise looks good to me, just few minor nits.) |
fc1f551
to
2437df0
Compare
62e13c2
to
8922607
Compare
test-net-next |
Documentation/cmdref/cilium-agent.md
Outdated
@@ -37,6 +37,7 @@ cilium-agent [flags] | |||
--bpf-lb-dev-ip-addr-inherit string Device name which IP addr is inherited by devices running LB BPF program (--devices) | |||
--bpf-lb-dsr-dispatch string BPF load balancing DSR dispatch method ("opt", "ipip") (default "opt") | |||
--bpf-lb-dsr-l4-xlate string BPF load balancing DSR L4 DNAT method for IPIP ("frontend", "backend") (default "frontend") | |||
--bpf-lb-external-clusterip Enable external access to ClusterIP services |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small doc nit, but no need to rerun ci: could we append here: (default false)
?
Add the option --bpf-lb-external-clusterip to enable routing to ClusterIP services from outside the cluster. The loadbalancer routing logic is modified to only look at the SVC_FLAG_ROUTABLE flag instead of expecting an additional higher bit in addition to it. The SVC_FLAG_ROUTABLE is only set for ClusterIP services when the flag is set. Fixes: cilium#14581 Signed-off-by: Jussi Maki <jussi@isovalent.com>
8922607
to
215945d
Compare
Passing |
test-runtime |
test-1.20-4.19 |
test-gke |
test-1.21-4.9 |
This PR is marked for 1.10.0 but I don't see this option in the 1.10.1 Cilium image. Are there any plans to backport this PR? |
Add the option --bpf-lb-external-clusterip to enable
routing to ClusterIP services from outside the cluster.
The loadbalancer routing logic is modified to only look
at the SVC_FLAG_ROUTABLE flag instead of expecting an
additional higher bit in addition to it.
The SVC_FLAG_ROUTABLE is only set for ClusterIP services
when the --bpf-lb-external-clusterip agent flag is set.
Fixes: #14581