Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon: Add option --bpf-lb-external-clusterip #15650

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions Documentation/cmdref/cilium-agent.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 7 additions & 3 deletions Documentation/gettingstarted/kubeproxy-free.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1183,6 +1183,13 @@ working, take a look at `this KEP
free mode, make sure that default Kubernetes services like ``kube-dns`` and ``kubernetes``
have the required label value.

External Access To ClusterIP Services
*************************************

As per `k8s Service <https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types>`__,
Cilium's eBPF kube-proxy replacement by default disallows access to a ClusterIP service from outside the cluster.
This can be allowed by setting ``bpf.lbExternalClusterIP=true``.

Limitations
###########

Expand Down Expand Up @@ -1213,9 +1220,6 @@ Limitations
release introduces ``EndpointSliceMirroring`` controller that mirrors custom ``Endpoints``
resources to corresponding ``EndpointSlices`` and thus allowing backing ``Endpoints``
to work. For a more detailed discussion see :gh-issue:`12438`.
* As per `k8s Service <https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types>`__,
Cilium's eBPF kube-proxy replacement disallow access of a ClusterIP service
from outside a cluster.

Further Readings
################
Expand Down
22 changes: 1 addition & 21 deletions bpf/lib/lb.h
Original file line number Diff line number Diff line change
Expand Up @@ -325,29 +325,9 @@ bool lb6_svc_is_affinity(const struct lb6_service *svc)
return svc->flags & SVC_FLAG_AFFINITY;
}

static __always_inline
__u8 svc_is_routable_mask(void)
{
__u8 mask = SVC_FLAG_ROUTABLE;

#ifdef ENABLE_LOADBALANCER
mask |= SVC_FLAG_LOADBALANCER;
#endif
#ifdef ENABLE_NODEPORT
mask |= SVC_FLAG_NODEPORT;
#endif
#ifdef ENABLE_EXTERNAL_IP
mask |= SVC_FLAG_EXTERNAL_IP;
#endif
#ifdef ENABLE_HOSTPORT
mask |= SVC_FLAG_HOSTPORT;
#endif
return mask;
}

static __always_inline bool __lb_svc_is_routable(__u8 flags)
{
return (flags & svc_is_routable_mask()) > SVC_FLAG_ROUTABLE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be guarded by EnableClusterIPExternalAccess config?

Copy link
Contributor Author

@joamaki joamaki Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean by guarded. The flag is disabled by default in which case SVC_FLAG_ROUTABLE is unset for ClusterIP services. Earlier SVC_FLAG_ROUTABLE was set for ClusterIP services, but higher bits were unset and hence this function returned false for ClusterIP services. Now we only need to check for SVC_FLAG_ROUTABLE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of which, I can't find where, in your PR, SVC_FLAG_ROUTABLE would be set or unset depending on the value for config.ExternalClusterIP. Shouldn't there be a change to the definition of SVC_FLAG_ROUTABLE in bpf/lib/common.h?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to this logic is in the agent in updateMasterService: https://github.com/cilium/cilium/pull/15650/files#diff-8eff0d99dd1ceb7d15ce632811672e7b17a17fb5e984fafa7875d6ad2433b3d8R519. It was already set earlier and in this PR I'm turning it off for ClusterIP services unless the new external access flag is set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thank you. On my first read, I somehow understood that you would be changing the value for SVC_FLAG_ROUTABLE depending on the value of ExternalClusterIP, but I understand this is not the case - You're just changing the flags.

return (flags & SVC_FLAG_ROUTABLE) != 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything that could break with this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking loud about the upgrade path. When cilium-agent starts, services are updated from kube-apiserver before the datapath gets regenerated. So, it means that for some time, the old code will be running with the new service flags. I think this should be fine, as for previously routable services nothing should change.

During the downgrade, we will have the old service flags (= pre your changes) with the new datapath (= with your changes) for awhile. This will allow ClusterIP access from outside. But I guess this is tolerable.

}

static __always_inline
Expand Down
3 changes: 3 additions & 0 deletions daemon/cmd/daemon_main.go
Original file line number Diff line number Diff line change
Expand Up @@ -985,6 +985,9 @@ func init() {
flags.String(option.BGPConfigPath, "/var/lib/cilium/bgp/config.yaml", "Path to file containing the BGP configuration")
option.BindEnv(option.BGPConfigPath)

flags.Bool(option.ExternalClusterIPName, false, "Enable external access to ClusterIP services (default false)")
option.BindEnv(option.ExternalClusterIPName)

viper.BindPFlags(flags)
}

Expand Down
1 change: 1 addition & 0 deletions install/kubernetes/cilium/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ contributors across the globe, there is almost always someone available to help.
| bgp.announce.loadbalancerIP | bool | `false` | Enable allocation and announcement of service LoadBalancer IPs |
| bgp.enabled | bool | `false` | Enable BGP support inside Cilium; embeds a new ConfigMap for BGP inside cilium-agent and cilium-operator |
| bpf.clockProbe | bool | `false` | |
| bpf.lbExternalClusterIP | bool | `false` | Allow cluster external access to ClusterIP services. |
| bpf.lbMapMax | int | `65536` | Configure the maximum number of entries in the TCP connection tracking table. ctTcpMax: '524288' -- Configure the maximum number of entries for the non-TCP connection tracking table. ctAnyMax: '262144' -- Configure the maximum number of service entries in the load balancer maps. |
| bpf.monitorAggregation | string | `"medium"` | Configure auto-sizing for all BPF maps based on available memory. ref: https://docs.cilium.io/en/v1.9/concepts/ebpf/maps/#ebpf-maps -- Configure the level of aggregation for monitor notifications. Valid options are none, low, medium, maximum |
| bpf.monitorFlags | string | `"all"` | Configure which TCP flags trigger notifications when seen for the first time in a connection. |
Expand Down
4 changes: 4 additions & 0 deletions install/kubernetes/cilium/templates/cilium-configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,10 @@ data:
{{- if hasKey .Values.bpf "lbBypassFIBLookup" }}
bpf-lb-bypass-fib-lookup: {{ .Values.bpf.lbBypassFIBLookup | quote }}
{{- end }}
{{- if hasKey .Values.bpf "lbExternalClusterIP" }}
bpf-lb-external-clusterip: {{ .Values.bpf.lbExternalClusterIP | quote }}
kaworu marked this conversation as resolved.
Show resolved Hide resolved
{{- end }}

# Pre-allocation of map entries allows per-packet latency to be reduced, at
# the expense of up-front memory allocation for the entries in the maps. The
# default value below will minimize memory usage in the default installation;
Expand Down
3 changes: 3 additions & 0 deletions install/kubernetes/cilium/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,9 @@ bpf:
# first time in a connection.
monitorFlags: "all"

# -- Allow cluster external access to ClusterIP services.
lbExternalClusterIP: false

# -- Enable native IP masquerade support in eBPF
#masquerade: true

Expand Down
4 changes: 4 additions & 0 deletions pkg/defaults/defaults.go
Original file line number Diff line number Diff line change
Expand Up @@ -433,4 +433,8 @@ const (

// WireguardSubnetV6 is a default wireguard tunnel subnet
WireguardSubnetV6 = "fdc9:281f:04d7:9ee9::1/64"

// ExternalClusterIP enables cluster external access to ClusterIP services.
// Defaults to false to retain prior behaviour of not routing external packets to ClusterIPs.
ExternalClusterIP = false
)
6 changes: 5 additions & 1 deletion pkg/maps/lbmap/lbmap.go
Original file line number Diff line number Diff line change
Expand Up @@ -504,6 +504,10 @@ func updateMasterService(fe ServiceKey, nbackends int, revNATID int, svcType loa
svcLocal bool, sessionAffinity bool, sessionAffinityTimeoutSec uint32,
checkSourceRange bool) error {

// isRoutable denotes whether this service can be accessed from outside the cluster.
isRoutable := !fe.IsSurrogate() &&
qmonnet marked this conversation as resolved.
Show resolved Hide resolved
(svcType != loadbalancer.SVCTypeClusterIP || option.Config.ExternalClusterIP)

fe.SetBackendSlot(0)
zeroValue := fe.NewValue().(ServiceValue)
zeroValue.SetCount(nbackends)
Expand All @@ -512,7 +516,7 @@ func updateMasterService(fe ServiceKey, nbackends int, revNATID int, svcType loa
SvcType: svcType,
SvcLocal: svcLocal,
SessionAffinity: sessionAffinity,
IsRoutable: !fe.IsSurrogate(),
IsRoutable: isRoutable,
CheckSourceRange: checkSourceRange,
})
zeroValue.SetFlags(flag.UInt16())
Expand Down
11 changes: 11 additions & 0 deletions pkg/option/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -952,6 +952,10 @@ const (
// BGPConfigPath is the file path to the BGP configuration. It is
// compatible with MetalLB's configuration.
BGPConfigPath = "bgp-config-path"

// ExternalClusterIPName is the name of the option to enable
// cluster external access to ClusterIP services.
ExternalClusterIPName = "bpf-lb-external-clusterip"
)

// Default string arguments
Expand Down Expand Up @@ -1941,6 +1945,10 @@ type DaemonConfig struct {
// BGPConfigPath is the file path to the BGP configuration. It is
// compatible with MetalLB's configuration.
BGPConfigPath string

// ExternalClusterIP enables routing to ClusterIP services from outside
// the cluster. This mirrors the behaviour of kube-proxy.
ExternalClusterIP bool
}

var (
Expand Down Expand Up @@ -1985,6 +1993,8 @@ var (

k8sEnableLeasesFallbackDiscovery: defaults.K8sEnableLeasesFallbackDiscovery,
APIRateLimit: make(map[string]string),

ExternalClusterIP: defaults.ExternalClusterIP,
}
)

Expand Down Expand Up @@ -2492,6 +2502,7 @@ func (c *DaemonConfig) Populate() {
c.EnableCustomCalls = viper.GetBool(EnableCustomCallsName)
c.BGPAnnounceLBIP = viper.GetBool(BGPAnnounceLBIP)
c.BGPConfigPath = viper.GetString(BGPConfigPath)
c.ExternalClusterIP = viper.GetBool(ExternalClusterIPName)

err = c.populateMasqueradingSettings()
if err != nil {
Expand Down
47 changes: 47 additions & 0 deletions test/k8sT/Services.go
Original file line number Diff line number Diff line change
Expand Up @@ -579,6 +579,53 @@ var _ = Describe("K8sServicesTest", func() {
testCurlFromPods(echoPodLabel, url, 5, 0)
}
})

curlClusterIPFromExternalHost := func() *helpers.CmdRes {
clusterIP, _, err := kubectl.GetServiceHostPort(helpers.DefaultNamespace, serviceName)
ExpectWithOffset(1, err).Should(BeNil(), "Cannot get service %s", serviceName)
ExpectWithOffset(1, govalidator.IsIP(clusterIP)).Should(BeTrue(), "ClusterIP is not an IP")
httpSVCURL := fmt.Sprintf("http://%s/", net.JoinHostPort(clusterIP, "80"))

By("testing external connectivity via cluster IP %s", clusterIP)

status := kubectl.ExecInHostNetNS(context.TODO(), k8s1NodeName, helpers.CurlFail(httpSVCURL))
ExpectWithOffset(1, status).Should(helpers.CMDSuccess(), "cannot curl to service IP from host: %s", status.CombineOutput())

return kubectl.ExecInHostNetNS(context.TODO(), outsideNodeName, helpers.CurlFail(httpSVCURL))
}

SkipItIf(func() bool { return helpers.DoesNotExistNodeWithoutCilium() },
"ClusterIP cannot be accessed externally when access is disabled",
func() {
Expect(curlClusterIPFromExternalHost()).ShouldNot(helpers.CMDSuccess(),
"External host %s unexpectedly connected to ClusterIP when lbExternalClusterIP was unset", outsideNodeName)
})

SkipContextIf(func() bool { return helpers.DoesNotExistNodeWithoutCilium() }, "With ClusterIP external access", func() {
var (
svcIP string
)
BeforeAll(func() {
DeployCiliumOptionsAndDNS(kubectl, ciliumFilename, map[string]string{
"bpf.lbExternalClusterIP": "true",
})
clusterIP, _, err := kubectl.GetServiceHostPort(helpers.DefaultNamespace, serviceName)
svcIP = clusterIP
Expect(err).Should(BeNil(), "Cannot get service %s", serviceName)
res := kubectl.AddIPRoute(outsideNodeName, svcIP, k8s1IP, false)
Expect(res).Should(helpers.CMDSuccess(), "Error adding IP route for %s via %s", svcIP, k8s1IP)
})

AfterAll(func() {
res := kubectl.DelIPRoute(outsideNodeName, svcIP, k8s1IP)
Expect(res).Should(helpers.CMDSuccess(), "Error removing IP route for %s via %s", svcIP, k8s1IP)
DeployCiliumAndDNS(kubectl, ciliumFilename)
})

It("ClusterIP can be accessed when external access is enabled", func() {
Expect(curlClusterIPFromExternalHost()).Should(helpers.CMDSuccess(), "Could not curl ClusterIP %s from external host", svcIP)
})
})
})

SkipContextIf(func() bool { return !helpers.RunsOnNetNextOr419Kernel() }, "Checks local redirect policy", func() {
Expand Down