Skip to content

Commit

Permalink
install/kubernetes: use bidirectional mounts to mount bpf fs
Browse files Browse the repository at this point in the history
Bidirectional mounts are available in Kubernetes since 1.4 [1].
This allows Cilium container to mount the bpf fs automatically
and propagate the mount into the host.

This will improve Cilium's UX as it will remove the requirement of
mounting the BPF fs in the host.

[1] https://kubernetes.io/docs/concepts/storage/volumes/#mount-propagation

Signed-off-by: André Martins <andre@cilium.io>
  • Loading branch information
aanm committed Aug 7, 2021
1 parent 28eb44e commit f7a3f59
Show file tree
Hide file tree
Showing 18 changed files with 31 additions and 119 deletions.
8 changes: 0 additions & 8 deletions Documentation/gettingstarted/k3s.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,14 +48,6 @@ Should you encounter any issues during the installation, please refer to the
Please consult the Kubernetes :ref:`k8s_requirements` for information on how
you need to configure your Kubernetes cluster to operate with Cilium.

Mount the eBPF Filesystem
=========================
On each node, run the following to mount the eBPF Filesystem:

.. code-block:: shell-session
sudo mount bpffs -t bpf /sys/fs/bpf
Install Cilium
==============

Expand Down
8 changes: 0 additions & 8 deletions Documentation/gettingstarted/k8s-install-helm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -217,14 +217,6 @@ Install Cilium

.. include:: requirements-k3s.rst

**Mount the eBPF Filesystem:**

On each node, run the following to mount the eBPF Filesystem:

.. code-block:: shell-session
sudo mount bpffs -t bpf /sys/fs/bpf
**Install Cilium:**

.. parsed-literal::
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,15 +50,6 @@ cluster:

.. image:: images/rancher_add_nodes.png

Before adding nodes to the cluster, ensure the BPF filesystem is mounted.
You can persist the configuration using the following commands:

.. code-block:: shell-session
sudo mount bpffs -t bpf /sys/fs/bpf
sudo bash -c "echo 'none /sys/fs/bpf bpf rw,relatime 0 0' >> /etc/fstab"
Next, add nodes to the cluster using the Rancher-provided Docker commands. Be
sure to add the appropriate number of nodes required for your cluster. After
a few minutes, you will see that the Nodes overview will show an error message
Expand Down
23 changes: 10 additions & 13 deletions Documentation/operations/system_requirements.rst
Original file line number Diff line number Diff line change
Expand Up @@ -347,20 +347,17 @@ Mounted eBPF filesystem
# mount | grep /sys/fs/bpf
$ # if present should output, e.g. "none on /sys/fs/bpf type bpf"...
This step is **required for production** environments but optional for testing
and development. It allows the ``cilium-agent`` to pin eBPF resources to a
persistent filesystem and make them persistent across restarts of the agent.
If the eBPF filesystem is not mounted in the host filesystem, Cilium will
automatically mount the filesystem but it will be unmounted and re-mounted when
the Cilium pod is restarted. This in turn will cause eBPF resources to be
re-created which will cause network connectivity to be disrupted while Cilium
is not running. Mounting the eBPF filesystem in the host mount namespace will
ensure that the agent can be restarted without affecting connectivity of any
pods.

In order to mount the eBPF filesystem, the following command must be run in the
host mount namespace. The command must only be run once during the boot process
of the machine.
automatically mount the filesystem.

Mounting this BPF filesystem allows the ``cilium-agent`` to persist eBPF
resources across restarts of the agent so that the datapath can continue to
operate while the agent is subsequently restarted or upgraded.

Optionally it is also possible to mount the eBPF filesystem before Cilium is
deployed in the cluster, the following command must be run in the host mount
namespace. The command must only be run once during the boot process of the
machine.

.. code-block:: shell-session
Expand Down
2 changes: 1 addition & 1 deletion daemon/cmd/daemon_main.go
Original file line number Diff line number Diff line change
Expand Up @@ -1188,7 +1188,7 @@ func initEnv(cmd *cobra.Command) {
// the path to an already mounted filesystem instead. This is
// useful if the daemon is being round inside a namespace and the
// BPF filesystem is mapped into the slave namespace.
bpf.CheckOrMountFS(option.Config.BPFRoot, k8s.IsEnabled())
bpf.CheckOrMountFS(option.Config.BPFRoot)
cgroups.CheckOrMountCgrpFS(option.Config.CGroupRoot)

option.Config.Opts.SetBool(option.Debug, debugDatapath)
Expand Down
4 changes: 0 additions & 4 deletions images/cilium/init-container.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,3 @@ if [ "${CILIUM_ALL_STATE}" = "true" ] \
|| [ "${CLEAN_CILIUM_STATE}" = "true" ]; then
cilium cleanup -f --all-state
fi

if [ "${CILIUM_WAIT_BPF_MOUNT}" = "true" ]; then
until mount | grep "/sys/fs/bpf type bpf"; do echo "BPF filesystem is not mounted yet"; sleep 1; done
fi;
1 change: 0 additions & 1 deletion install/kubernetes/cilium/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,6 @@ contributors across the globe, there is almost always someone available to help.
| bpf.monitorInterval | string | `"5s"` | Configure the typical time between monitor notifications for active connections. |
| bpf.policyMapMax | int | `16384` | Configure the maximum number of entries in endpoint policy map (per endpoint). |
| bpf.preallocateMaps | bool | `false` | Enables pre-allocation of eBPF map values. This increases memory usage but can reduce latency. |
| bpf.waitForMount | bool | `false` | Force the cilium-agent DaemonSet to wait in an initContainer until the eBPF filesystem has been mounted. |
| certgen | object | `{"image":{"pullPolicy":"Always","repository":"quay.io/cilium/certgen","tag":"v0.1.4"},"podLabels":{},"ttlSecondsAfterFinished":1800}` | Configure certificate generation for Hubble integration. If hubble.tls.auto.method=cronJob, these values are used for the Kubernetes CronJob which will be scheduled regularly to (re)generate any certificates not provided manually. |
| certgen.podLabels | object | `{}` | Labels to be added to hubble-certgen pods |
| certgen.ttlSecondsAfterFinished | int | `1800` | Seconds after which the completed job pod will be deleted |
Expand Down
3 changes: 0 additions & 3 deletions install/kubernetes/cilium/files/nodeinit/prestop.bash
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,6 @@ else
while PATH="${PATH}:/home/kubernetes/bin" crictl ps | grep -v "node-init" | grep -q "POD_cilium"; do sleep 1; done
fi

systemctl disable sys-fs-bpf.mount || true
systemctl stop sys-fs-bpf.mount || true

if ip link show cilium_host; then
echo "Deleting cilium_host interface..."
ip link del cilium_host
Expand Down
38 changes: 0 additions & 38 deletions install/kubernetes/cilium/files/nodeinit/startup.bash
Original file line number Diff line number Diff line change
Expand Up @@ -4,44 +4,6 @@ set -o errexit
set -o pipefail
set -o nounset

mount | grep "/sys/fs/bpf type bpf" || {
# Mount the filesystem until next reboot
echo "Mounting BPF filesystem..."
mount bpffs /sys/fs/bpf -t bpf

# Configure systemd to mount after next boot
echo "Installing BPF filesystem mount"
cat >/tmp/sys-fs-bpf.mount <<EOF
[Unit]
Description=Mount BPF filesystem (Cilium)
Documentation=http://docs.cilium.io/
DefaultDependencies=no
Before=local-fs.target umount.target
After=swap.target
[Mount]
What=bpffs
Where=/sys/fs/bpf
Type=bpf
Options=rw,nosuid,nodev,noexec,relatime,mode=700
[Install]
WantedBy=multi-user.target
EOF

if [ -d "/etc/systemd/system/" ]; then
mv /tmp/sys-fs-bpf.mount /etc/systemd/system/
echo "Installed sys-fs-bpf.mount to /etc/systemd/system/"
elif [ -d "/lib/systemd/system/" ]; then
mv /tmp/sys-fs-bpf.mount /lib/systemd/system/
echo "Installed sys-fs-bpf.mount to /lib/systemd/system/"
fi

# Ensure that filesystem gets mounted on next reboot
systemctl enable sys-fs-bpf.mount
systemctl start sys-fs-bpf.mount
}

echo "Link information:"
ip link

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,7 @@ spec:
{{- if not (eq .Values.containerRuntime.integration "crio") }}
- name: bpf-maps
mountPath: /sys/fs/bpf
mountPropagation: Bidirectional
{{- end }}
{{- if not (contains "/run/cilium/cgroupv2" .Values.cgroup.hostRoot) }}
# Check for duplicate mounts before mounting
Expand Down Expand Up @@ -412,12 +413,7 @@ spec:
name: cilium-config
key: clean-cilium-bpf-state
optional: true
- name: CILIUM_WAIT_BPF_MOUNT
valueFrom:
configMapKeyRef:
name: cilium-config
key: wait-bpf-mount
optional: true
{{- if .Values.k8sServiceHost }}
- name: KUBERNETES_SERVICE_HOST
value: {{ .Values.k8sServiceHost | quote }}
Expand All @@ -439,8 +435,6 @@ spec:
{{- if not (eq .Values.containerRuntime.integration "crio") }}
- name: bpf-maps
mountPath: /sys/fs/bpf
{{- /* Required for wait-bpf-mount to work */}}
mountPropagation: HostToContainer
{{- end }}
# Required to mount cgroup filesystem from the host to cilium agent pod
- name: cilium-cgroup
Expand Down
3 changes: 0 additions & 3 deletions install/kubernetes/cilium/templates/cilium-configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -386,9 +386,6 @@ data:
enable-l7-proxy: {{ .Values.l7Proxy | quote }}
{{- end }}

# wait-bpf-mount makes init container wait until bpf filesystem is mounted
wait-bpf-mount: "{{ .Values.bpf.waitForMount }}"

{{- if ne .Values.cni.chainingMode "none" }}
# Enable chaining with another CNI plugin
#
Expand Down
4 changes: 0 additions & 4 deletions install/kubernetes/cilium/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -230,10 +230,6 @@ bpf:
# -- Enable BPF clock source probing for more efficient tick retrieval.
clockProbe: false

# -- Force the cilium-agent DaemonSet to wait in an initContainer until the
# eBPF filesystem has been mounted.
waitForMount: false

# -- Enables pre-allocation of eBPF map values. This increases
# memory usage but can reduce latency.
preallocateMaps: false
Expand Down
20 changes: 10 additions & 10 deletions pkg/bpf/bpffs_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ func hasMultipleMounts() (bool, error) {

// checkOrMountCustomLocation tries to check or mount the BPF filesystem in the
// given path.
func checkOrMountCustomLocation(bpfRoot string, printWarning bool) error {
func checkOrMountCustomLocation(bpfRoot string) error {
setMapRoot(bpfRoot)

// Check whether the custom location has a BPFFS mount.
Expand All @@ -170,7 +170,7 @@ func checkOrMountCustomLocation(bpfRoot string, printWarning bool) error {
// If the custom location has no mount, let's mount BPFFS there.
if !mounted {
setMapRoot(bpfRoot)
if err := mountFS(printWarning); err != nil {
if err := mountFS(true); err != nil {
return err
}

Expand Down Expand Up @@ -200,7 +200,7 @@ func checkOrMountCustomLocation(bpfRoot string, printWarning bool) error {
// probably Cilium is running inside container which has mounted /sys/fs/bpf
// from host, but host doesn't have proper BPFFS mount, so that mount is just
// the empty directory. In that case, mount BPFFS under /run/cilium/bpffs.
func checkOrMountDefaultLocations(printWarning bool) error {
func checkOrMountDefaultLocations() error {
// Check whether /sys/fs/bpf has a BPFFS mount.
mounted, bpffsInstance, err := mountinfo.IsMountFS(mountinfo.FilesystemTypeBPFFS, mapRoot)
if err != nil {
Expand All @@ -210,7 +210,7 @@ func checkOrMountDefaultLocations(printWarning bool) error {
// If /sys/fs/bpf is not mounted at all, we should mount
// BPFFS there.
if !mounted {
if err := mountFS(printWarning); err != nil {
if err := mountFS(false); err != nil {
return err
}

Expand Down Expand Up @@ -240,7 +240,7 @@ func checkOrMountDefaultLocations(printWarning bool) error {
return err
}
if !cMounted {
if err := mountFS(printWarning); err != nil {
if err := mountFS(false); err != nil {
return err
}
} else if !cBpffsInstance {
Expand All @@ -253,13 +253,13 @@ func checkOrMountDefaultLocations(printWarning bool) error {
return nil
}

func checkOrMountFS(bpfRoot string, printWarning bool) error {
func checkOrMountFS(bpfRoot string) error {
if bpfRoot == "" {
if err := checkOrMountDefaultLocations(printWarning); err != nil {
if err := checkOrMountDefaultLocations(); err != nil {
return err
}
} else {
if err := checkOrMountCustomLocation(bpfRoot, printWarning); err != nil {
if err := checkOrMountCustomLocation(bpfRoot); err != nil {
return err
}
}
Expand All @@ -281,9 +281,9 @@ func checkOrMountFS(bpfRoot string, printWarning bool) error {
//
// If printWarning is set, will print a warning if bpffs has not previously been
// mounted.
func CheckOrMountFS(bpfRoot string, printWarning bool) {
func CheckOrMountFS(bpfRoot string) {
mountOnce.Do(func() {
if err := checkOrMountFS(bpfRoot, printWarning); err != nil {
if err := checkOrMountFS(bpfRoot); err != nil {
log.WithError(err).Fatal("Unable to mount BPF filesystem")
}
})
Expand Down
2 changes: 1 addition & 1 deletion pkg/bpf/map_linux_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ var (
)

func runTests(m *testing.M) (int, error) {
CheckOrMountFS("", false)
CheckOrMountFS("")
if err := ConfigureResourceLimits(); err != nil {
return 1, fmt.Errorf("Failed to configure rlimit")
}
Expand Down
2 changes: 1 addition & 1 deletion pkg/maps/ctmap/ctmap_privileged_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ func Test(t *testing.T) {
}

func (k *CTMapTestSuite) SetUpSuite(c *C) {
bpf.CheckOrMountFS("", false)
bpf.CheckOrMountFS("")
err := bpf.ConfigureResourceLimits()
c.Assert(err, IsNil)
}
Expand Down
8 changes: 4 additions & 4 deletions pkg/maps/eppolicymap/eppolicymap_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,13 @@ func (e *EPPolicyMapTestSuite) TearDownTest(c *C) {
}

func (e *EPPolicyMapTestSuite) TestCreateEPPolicy(c *C) {
bpf.CheckOrMountFS("", false)
bpf.CheckOrMountFS("")
CreateEPPolicyMap()
}

func (e *EPPolicyMapTestSuite) TestWriteEndpoint(c *C) {
option.Config.SockopsEnable = true
bpf.CheckOrMountFS("", false)
bpf.CheckOrMountFS("")
keys := make([]*lxcmap.EndpointKey, 1)
many := make([]*lxcmap.EndpointKey, 256)
fd, err := bpf.CreateMap(bpf.MapTypeHash,
Expand All @@ -75,7 +75,7 @@ func (e *EPPolicyMapTestSuite) TestWriteEndpoint(c *C) {
// in invalid fd in if its disabled.
func (e *EPPolicyMapTestSuite) TestWriteEndpointFails(c *C) {
option.Config.SockopsEnable = true
bpf.CheckOrMountFS("", false)
bpf.CheckOrMountFS("")
keys := make([]*lxcmap.EndpointKey, 1)
_, err := bpf.CreateMap(bpf.MapTypeHash,
uint32(unsafe.Sizeof(policymap.PolicyKey{})),
Expand All @@ -91,7 +91,7 @@ func (e *EPPolicyMapTestSuite) TestWriteEndpointFails(c *C) {

func (e *EPPolicyMapTestSuite) TestWriteEndpointDisabled(c *C) {
option.Config.SockopsEnable = false
bpf.CheckOrMountFS("", false)
bpf.CheckOrMountFS("")
keys := make([]*lxcmap.EndpointKey, 1)
fd, err := bpf.CreateMap(bpf.MapTypeHash,
uint32(unsafe.Sizeof(policymap.PolicyKey{})),
Expand Down
2 changes: 1 addition & 1 deletion pkg/maps/policymap/policymap_privileged_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ var (
)

func runTests(m *testing.M) (int, error) {
bpf.CheckOrMountFS("", false)
bpf.CheckOrMountFS("")
if err := bpf.ConfigureResourceLimits(); err != nil {
return 1, fmt.Errorf("Failed to configure rlimit")
}
Expand Down
5 changes: 2 additions & 3 deletions tools/maptool/main.go
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
// SPDX-License-Identifier: Apache-2.0
// Copyright 2019 Authors of Cilium
// Copyright 2019-2021 Authors of Cilium

package main

Expand All @@ -9,7 +9,6 @@ import (
"strings"

"github.com/cilium/cilium/pkg/bpf"
"github.com/cilium/cilium/pkg/defaults"
"github.com/cilium/cilium/pkg/maps/eppolicymap"
"github.com/cilium/cilium/pkg/maps/sockmap"
)
Expand Down Expand Up @@ -48,7 +47,7 @@ func main() {
if err := bpf.ConfigureResourceLimits(); err != nil {
fmt.Fprintf(os.Stdout, "Failed to configure resource limits: %s\n", err)
}
bpf.CheckOrMountFS(defaults.DefaultMapRoot, true)
bpf.CheckOrMountFS("")

bpfObjPath := os.Args[2]
if err := createMap(bpfObjPath); err != nil {
Expand Down

0 comments on commit f7a3f59

Please sign in to comment.