Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loader: fix issue where errors cancelled compile cause error logs. #30988

Conversation

tommyp1ckles
Copy link
Contributor

@tommyp1ckles tommyp1ckles commented Feb 26, 2024

On Linux/Unix based implementations, exec/cmd.Run will return either context.ContextCancelled or the error "signal: killed" depending on whether the cancellation occured while the process was running.

There's several places we check on is.Errors(err, context.Cancelled) on whether to emit high level logs about failed program compilations. Because already running cmd.Run() doesn't return an error that satisfies this, this will result in spurious error logs about failed compilation (i.e. "signal: killed")

This meant that in cases where a compilation is legitimately cancelled, we would still log an error such as

msg="BPF template object creation failed" ... error="...: compile bpf_lxc.o: signal: killed"

This can occur occasionally in CI, which enforces no error to pass, causing failures.

example:

	ctx, c := context.WithTimeout(context.Background(), time.Second)
	go func() {
		time.Sleep(time.Second)
		c()
	}()
	cmd := exec.CommandContext(ctx, "sleep", "2")
	fmt.Println(cmd.Run())

	ctx, c = context.WithTimeout(context.Background(), time.Second)
	c()
	cmd = exec.CommandContext(ctx, "sleep", "2")
	fmt.Println(cmd.Run())

To fix this, this will join in the ctx.Err() if it is:

  • context.Cancelled
  • The process has not exited itself.
  • The process appeared to be SIGKILL'ed.

Addresses: #30991

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 26, 2024
@tommyp1ckles
Copy link
Contributor Author

/test

@tommyp1ckles tommyp1ckles force-pushed the pr/tp/improve-loader-compile-cancel-logging branch from dcaf406 to 1f99f9e Compare February 26, 2024 22:39
@tommyp1ckles
Copy link
Contributor Author

/test

@tommyp1ckles tommyp1ckles force-pushed the pr/tp/improve-loader-compile-cancel-logging branch from 1f99f9e to c81a954 Compare February 26, 2024 23:29
On Linux/Unix based implementations, exec/cmd.Run will return either context.ContextCancelled or the error "signal: killed" depending on whether the cancellation occurred while the process was running.

There's several places we check on ```is.Errors(err, context.Cancelled)``` on whether to emit high level logs about failed program compilations.  Because already running cmd.Run() doesn't return an error that satisfies this, this will result in spurious error logs about failed compilation (i.e. "signal: killed")

This meant that in cases where a compilation is legitimately cancelled, we would still log an error such as

msg="BPF template object creation failed" ... error="...: compile bpf_lxc.o: signal: killed"

This can occur occasionally in CI, which enforces no error to pass, causing failures.

example:
```
	ctx, c := context.WithTimeout(context.Background(), time.Second)
	go func() {
		time.Sleep(time.Second)
		c()
	}()
	cmd := exec.CommandContext(ctx, "sleep", "2")
	fmt.Println(cmd.Run())

	ctx, c = context.WithTimeout(context.Background(), time.Second)
	c()
	cmd = exec.CommandContext(ctx, "sleep", "2")
	fmt.Println(cmd.Run())
```

To fix this, this will join in the ctx.Err() if it is:
* context.Cancelled
* The process has not exited itself.
* The process appeared to be SIGKILL'ed.

Addresses: cilium#30991

Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
@tommyp1ckles tommyp1ckles force-pushed the pr/tp/improve-loader-compile-cancel-logging branch from c81a954 to 1e0f141 Compare February 26, 2024 23:30
@tommyp1ckles
Copy link
Contributor Author

/test

@tommyp1ckles tommyp1ckles marked this pull request as ready for review February 26, 2024 23:31
@tommyp1ckles tommyp1ckles requested a review from a team as a code owner February 26, 2024 23:31
@tommyp1ckles
Copy link
Contributor Author

/test

@tommyp1ckles tommyp1ckles added sig/loader Impacts the loading of BPF programs into the kernel. area/CI Continuous Integration testing issue or flake release-note/ci This PR makes changes to the CI. labels Feb 27, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Feb 27, 2024
@julianwiedmann julianwiedmann added this pull request to the merge queue Mar 4, 2024
@julianwiedmann
Copy link
Member

👋 if this occurs in CI, should we also backport to older releases for reducing flakes?

Merged via the queue into cilium:main with commit 70b405f Mar 4, 2024
62 checks passed
@tommyp1ckles
Copy link
Contributor Author

@julianwiedmann I didn't notice it happening in other branches, I think the uptick in these failing CI was also related to other changes in 0.16.0. However, this does seem like a minor bug either way so I think we can justify backporting this.

@tommyp1ckles tommyp1ckles added needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Mar 8, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.15.2 Mar 8, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.8 Mar 8, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.13 Mar 8, 2024
@jibi jibi mentioned this pull request Mar 11, 2024
5 tasks
@jibi jibi added needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Mar 13, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Needs backport from main in 1.13.13 Mar 13, 2024
@jrajahalme jrajahalme added this to Needs backport from main in 1.15.3 Mar 13, 2024
@jrajahalme jrajahalme removed this from Needs backport from main in 1.15.2 Mar 13, 2024
@thorn3r thorn3r removed this from Needs backport from main in 1.13.13 Mar 13, 2024
@thorn3r thorn3r added this to Needs backport from main in 1.13.13 Mar 13, 2024
@thorn3r thorn3r added this to Needs backport from main in 1.13.14 Mar 13, 2024
@thorn3r thorn3r removed this from Needs backport from main in 1.13.13 Mar 13, 2024
@jibi jibi added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Mar 13, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.14 in 1.14.8 Mar 13, 2024
@jibi jibi added backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. and removed needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Mar 13, 2024
@thorn3r thorn3r added this to Backport pending to v1.14 in 1.14.9 Mar 13, 2024
@thorn3r thorn3r removed this from Backport pending to v1.14 in 1.14.8 Mar 13, 2024
@github-actions github-actions bot added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Mar 16, 2024
@jrajahalme jrajahalme moved this from Needs backport from main to Backport done to v1.15 in 1.15.3 Mar 26, 2024
@thorn3r thorn3r moved this from Needs backport from main to Backport done to v1.13 in 1.13.14 Mar 26, 2024
@jrajahalme jrajahalme moved this from Backport pending to v1.14 to Backport done to v1.14 in 1.14.9 Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. release-note/ci This PR makes changes to the CI. sig/loader Impacts the loading of BPF programs into the kernel.
Projects
No open projects
1.13.14
Backport done to v1.13
1.14.9
Backport done to v1.14
1.15.3
Backport done to v1.15
Development

Successfully merging this pull request may close these issues.

None yet

4 participants