datapath: preserve tail call maps in between replacing interface program pairs #17744

ti-mo · 2021-10-29T12:54:18Z

To prevent unpinning tail call maps after replacing only one half of an interface's programs, e.g. from-netdev and to-netdev, allow the caller to defer finalizer operations until a batch of interfaces have been replaced.

This avoids short windows of packet drops due to missed tail calls.

Fixes #16561

Also fixed a bug where we specified the wrong bpffs path to the migration code. This causes migrations to not be executed in the currently-released version of 1.11, meaning resizing maps causes endpoints regeneration failures.

Preserve tail call maps during resize to prevent drops during agent upgrade

ti-mo · 2021-10-29T14:08:18Z

/test

kkourt

LGTM.

I'm not 100% sure about not garbage collecting the files, but if this only happens when the map size is changed (which, I guess, does not happen that often) I cannot think of a strong argument against it.

That being said, could we do it an easy version of GC where we keep only one backup directory? That is, after the backup is done we look for older backups for the same map and delete them.

pchaigno

LGTM. Thanks!

Did you already test this by adding random tail call somewhere and executing K8sUpdates?

I'm not 100% sure about not garbage collecting the files, but if this only happens when the map size is changed (which, I guess, does not happen that often) I cannot think of a strong argument against it.

If it becomes an issue, we could maybe rely on the file creation timestamps in bpffs to determine which backups have been made a very long time ago and are safe to delete. But I really doubt Kubernetes nodes have a lifespan long enough for this to become an issue.

That being said, could we do it an easy version of GC where we keep only one backup directory? That is, after the backup is done we look for older backups for the same map and delete them.

I don't think we can even assume datapath logic n-1 is fully loaded when we start loading datapath logic n. So there could still be bpf_lxc programs using tail call maps from the n-2 version... or even before :-/

ti-mo · 2021-11-02T09:47:06Z

Did you already test this by adding random tail call somewhere and executing K8sUpdates?

I've tested this by creating a prog array of a different size manually and running the migration commands. Not sure how to work with K8sUpdates.

But I really doubt Kubernetes nodes have a lifespan long enough for this to become an issue.

We can't really assume all k8s nodes are short-lived, as baremetal deployments are a thing as well. However, we only need to resize these maps once in a blue moon and they're rather small. If we do resize them, we should take care to grow them sufficiently (e.g. x1.5) so that adding future tail calls does not require a map migration.

ti-mo · 2021-11-02T09:53:48Z

Marked as ready, only known unrelated flakes are hit. This should suffice for 1.11, followups can happen in case any issues become apparent with this approach.

pkg/bpf/bpffs_migrate.go

jrajahalme · 2021-11-03T11:50:49Z

Is this superseding #13032?

ti-mo · 2021-11-03T13:00:13Z

Is this superseding #13032?

Yep, I think it does!

joestringer · 2021-11-03T19:26:26Z

I think we need to come to an agreement on the path to tackle #17744 (comment) before we can consider this as 'ready-to-merge'. I'll be happy to be proven wrong about the concern but let's follow where that discussion goes before rushing this into the tree.

pchaigno · 2022-01-31T18:35:58Z

The Jenkins test failures are caused by #15474 (66967dd recently extended the test). Other required tests are passing team review requests are covered, so I think this is ready to merge.

pkg/datapath/loader/netlink.go

joestringer · 2022-01-31T22:26:07Z

pkg/datapath/loader/netlink.go

@@ -70,7 +70,7 @@ func replaceDatapath(ctx context.Context, ifName, objPath, progSec, progDirectio

 	// Temporarily rename bpffs pins of maps whose definitions have changed in
 	// a new version of a datapath ELF.
-	if err := bpf.StartBPFFSMigration(bpf.GetMapRoot(), objPath); err != nil {
+	if err := bpf.StartBPFFSMigration(bpf.MapPrefixPath(), objPath); err != nil {


I think it's rather critical given it can deadlock datapath regeneration (or even prevent agent start?) when maps get changed.

Can you describe this in a bit more detail, also include it in the commit message? It's not immediately obvious to me which conditions would cause this kind of situation or even that the first patch is resolving such a serious bug.

On second thought, I think this is good to go. I don't see this as a particularly invasive or risky change, it just defers cleanup to be slightly later.

➕ Assuming we resolve the other feedback, this should be fairly low risk.

joestringer · 2022-01-31T22:36:57Z

pkg/datapath/loader/loader.go


 		if ep.RequireEgressProg() {
-			if err := replaceDatapath(ctx, ep.InterfaceName(), objPath, symbolToEndpoint, dirEgress, false, ""); err != nil {
+			finalize, err := replaceDatapath(ctx, ep.InterfaceName(), objPath, symbolToEndpoint, dirEgress, false, "")


If I follow right, the approach here is to ensure that both the ingress+egress programs for a given endpoint are completely reloaded before unpinning the old references. That makes some sense to me.

Similarly for the reloadHostDatapath case above, the programs for all interfaces are loaded first before unpinning the old map references. That also makes sense. 👍

I recall that some of the discussion previously was around what happens when we reload base programs, and how that could impact upgrade. Looking through, it seems like we've managed to sidestep that question for now while resolving the immediate concerns around upgrade. Is that right? The approach here seems like a good incremental improvement and allows us to defer dealing with the init.sh conversion and this other potential datapath disruption question until the go migration work matures.

(no changes necessary, this is more just me checking if I am following correctly)

Yup, I think we're on the same page here. Could you specify what you mean by 'base programs'?

Sure thing. Base programs are the "global" programs that apply once for the node, like bpf_host, bpf_overlay, etc. Structurally in the code, when Cilium wants to update these programs, it calls into the datapath Reinitialize() function. It's typically pretty rare to reinitialize the datapath like this during normal Cilium operations.

This is in contrast to the per-endpoint bpf_lxc programs which are cloned per-endpoint and are dynamically configured based on creation of workloads on the nodes. These endpoint BPF programs are installed via the endpoint-specific code in pkg/endpoint/bpf.go. Thes actual program reload for endpoints goes via either:

CompileAndLoad(): unusual; typically only triggered by cilium endpoint regenerate CLI,

CompileOrLoad(): typical case on startup, or

ReloadDatapath(): used when updating maps for individual endpoints, does not trigger recompilation of ELFs and does not reload the global programs.

@joestringer Thanks, I thought these could only regenerate on agent startup, TIL endpoint regenerate.

pkg/bpf/bpffs_migrate.go

This patch fixes bpffs migrations meant to occur at runtime. The tc/globals/ part of the path was missing, and the bpffs root was being scanned instead, causing migrations not to run. This would lead to endpoints continuously failing to regenerate at runtime when map definitions in the ELF differ from their pinned counterparts, e.g. during an agent upgrade. For example, when resizing CALLS_MAP and trying to replace a program, the old map will be loaded from bpffs, while the program expects to see a map of a different capacity: ``` ~ tc filter replace dev cilium-probe ingress bpf da obj bpf/bpf_lxc.o sec from-container libbpf: couldn't reuse pinned map at '/sys/fs/bpf/tc//globals/test_cilium_calls_65535': parameter mismatch libbpf: map 'test_cilium_calls_65535': error reusing pinned map libbpf: map 'test_cilium_calls_65535': failed to create: Invalid argument(-22) libbpf: failed to load object 'bpf/bpf_lxc.o' Unable to load program ``` The CLI equivalent executed as part of init.sh does work as expected, as the full path including tc/globals/ is specified there. Signed-off-by: Timo Beckers <timo@isovalent.com>

To prevent unpinning tail call maps after replacing only one half of an interface's programs, e.g. from-netdev and to-netdev, allow the caller to defer finalizer operations until a batch of interfaces have been replaced. This avoids short windows of packet drops due to missed tail calls. Fixes cilium#16561 Signed-off-by: Timo Beckers <timo@isovalent.com>

ti-mo · 2022-02-11T10:16:11Z

@joestringer Addressed your feedback, please take another look.

I've specified the following in the first commit:

This would lead to endpoints continuously failing to regenerate at runtime when map definitions in the ELF differ from their pinned counterparts, e.g. during an agent upgrade.

For example, when resizing CALLS_MAP and trying to replace a program, the old map will be loaded from bpffs, while the program expects to see a map of a different capacity:
~ tc filter replace dev cilium-probe ingress bpf da obj bpf/bpf_lxc.o sec from-container
libbpf: couldn't reuse pinned map at '/sys/fs/bpf/tc//globals/test_cilium_calls_65535': parameter mismatch
libbpf: map 'test_cilium_calls_65535': error reusing pinned map
libbpf: map 'test_cilium_calls_65535': failed to create: Invalid argument(-22)
libbpf: failed to load object 'bpf/bpf_lxc.o'
Unable to load program

ti-mo · 2022-02-11T10:18:32Z

/test

joestringer

LGTM, thanks again for the fix.

joestringer · 2022-02-11T17:32:25Z

pkg/datapath/loader/netlink.go

 	}

-	return nil
+	finalize := func() {
+		l := log.WithField("device", ifName).WithField("objPath", objPath)


FYI there's also log.WithFields(logrus.Fields{ ... }) to specify multiple fields at once with one call. This is a more common style elsewhere in the codebase, but also this doesn't make any meaningful difference so don't worry about changing this one.

pchaigno · 2022-02-11T17:50:45Z

All tests are passing and reviews are in. Marking ready to merge.

ti-mo requested review from a team and kkourt October 29, 2021 12:54

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 29, 2021

maintainer-s-little-helper bot assigned kkourt Oct 29, 2021

ti-mo added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Oct 29, 2021

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 29, 2021

ti-mo mentioned this pull request Oct 29, 2021

bpf: cilium-map-migrate might unpin in-use tailcall map #16561

Closed

ti-mo requested a review from brb October 29, 2021 12:56

maintainer-s-little-helper bot assigned brb Oct 29, 2021

pchaigno self-requested a review October 29, 2021 13:16

maintainer-s-little-helper bot assigned pchaigno Oct 29, 2021

kkourt approved these changes Nov 1, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned kkourt Nov 1, 2021

pchaigno approved these changes Nov 1, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned pchaigno Nov 1, 2021

pchaigno added the sig/loader Impacts the loading of BPF programs into the kernel. label Nov 1, 2021

ti-mo added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 2, 2021

jrajahalme mentioned this pull request Nov 2, 2021

Reduce datapath from_lxc complexity #17758

Merged

joestringer reviewed Nov 2, 2021

View reviewed changes

pkg/bpf/bpffs_migrate.go Outdated Show resolved Hide resolved

joestringer reviewed Nov 2, 2021

View reviewed changes

pkg/bpf/bpffs_migrate.go Outdated Show resolved Hide resolved

joestringer reviewed Nov 2, 2021

View reviewed changes

pkg/bpf/bpffs_migrate.go Outdated Show resolved Hide resolved

ti-mo force-pushed the tb/prog-array-unpin branch from c900ed6 to 39cf95d Compare November 3, 2021 10:26

jrajahalme mentioned this pull request Nov 3, 2021

Cilium envoy crd #17191

Closed

9 tasks

joestringer removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 3, 2021

pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 31, 2022

pchaigno removed their assignment Jan 31, 2022

joestringer removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 31, 2022

joestringer requested changes Jan 31, 2022

View reviewed changes

ti-mo mentioned this pull request Feb 10, 2022

CI: Missed tail call in Suite-k8s-1.21.K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master #18760

Closed

ti-mo added 2 commits February 11, 2022 10:46

ti-mo force-pushed the tb/prog-array-unpin branch from 43393e8 to 01e2385 Compare February 11, 2022 10:13

ti-mo requested a review from joestringer February 11, 2022 10:14

joestringer approved these changes Feb 11, 2022

View reviewed changes

maintainer-s-little-helper bot unassigned joestringer Feb 11, 2022

pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Feb 11, 2022

joestringer merged commit 9fb1668 into cilium:master Feb 11, 2022

ti-mo deleted the tb/prog-array-unpin branch February 11, 2022 21:17

jibi mentioned this pull request Feb 14, 2022

v1.11 backports 2022-02-14 #18800

Merged

jibi added backport-pending/1.11 and removed needs-backport/1.11 labels Feb 14, 2022

maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.11.2 Feb 14, 2022

maintainer-s-little-helper bot removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Feb 14, 2022

jibi added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed backport-pending/1.11 labels Feb 16, 2022

maintainer-s-little-helper bot moved this from Backport pending to v1.10 to Backport done to v1.11 in 1.11.2 Feb 16, 2022

joestringer mentioned this pull request Feb 23, 2022

Prepare for release v1.11.2 #18922

Merged

aspsk mentioned this pull request Jun 9, 2023

datapath: Reduce from LXC complexity #25993

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datapath: preserve tail call maps in between replacing interface program pairs #17744

datapath: preserve tail call maps in between replacing interface program pairs #17744

ti-mo commented Oct 29, 2021 •

edited

Loading

ti-mo commented Oct 29, 2021

kkourt left a comment •

edited

Loading

pchaigno left a comment

ti-mo commented Nov 2, 2021

ti-mo commented Nov 2, 2021

jrajahalme commented Nov 3, 2021

ti-mo commented Nov 3, 2021

joestringer commented Nov 3, 2021

pchaigno commented Jan 31, 2022 •

edited

Loading

joestringer Jan 31, 2022

joestringer Jan 31, 2022 •

edited

Loading

ti-mo Feb 10, 2022

joestringer Feb 10, 2022 •

edited

Loading

ti-mo Feb 11, 2022

ti-mo commented Feb 11, 2022

ti-mo commented Feb 11, 2022

joestringer left a comment

joestringer Feb 11, 2022

pchaigno commented Feb 11, 2022

datapath: preserve tail call maps in between replacing interface program pairs #17744

datapath: preserve tail call maps in between replacing interface program pairs #17744

Conversation

ti-mo commented Oct 29, 2021 • edited Loading

ti-mo commented Oct 29, 2021

kkourt left a comment • edited Loading

Choose a reason for hiding this comment

pchaigno left a comment

Choose a reason for hiding this comment

ti-mo commented Nov 2, 2021

ti-mo commented Nov 2, 2021

jrajahalme commented Nov 3, 2021

ti-mo commented Nov 3, 2021

joestringer commented Nov 3, 2021

pchaigno commented Jan 31, 2022 • edited Loading

joestringer Jan 31, 2022

Choose a reason for hiding this comment

joestringer Jan 31, 2022 • edited Loading

Choose a reason for hiding this comment

ti-mo Feb 10, 2022

Choose a reason for hiding this comment

joestringer Feb 10, 2022 • edited Loading

Choose a reason for hiding this comment

ti-mo Feb 11, 2022

Choose a reason for hiding this comment

ti-mo commented Feb 11, 2022

ti-mo commented Feb 11, 2022

joestringer left a comment

Choose a reason for hiding this comment

joestringer Feb 11, 2022

Choose a reason for hiding this comment

pchaigno commented Feb 11, 2022

ti-mo commented Oct 29, 2021 •

edited

Loading

kkourt left a comment •

edited

Loading

pchaigno commented Jan 31, 2022 •

edited

Loading

joestringer Jan 31, 2022 •

edited

Loading

joestringer Feb 10, 2022 •

edited

Loading