Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maglev: Delete map if previous M's do not match #14345

Merged
merged 3 commits into from
Dec 11, 2020

Conversation

brb
Copy link
Member

@brb brb commented Dec 10, 2020

It was reported by a user, that when the M param
(--bpf-lb-maglev-table-size) changes, then the verifier is failing to
load the program on the 5.4 kernel with the following error:

invalid access to map value, value_size=32762 off=65520 size=2 R0
max value is outside of the array range processed 507 insns (limit
1000000) max_states_per_insn 1 total_states 34 peak_states 34
mark_read 17

Fix this by recreating the map when we detect that the M has changed.

Fix: #14251
Reported-by: Han Zhou hzhou8@ebay.com

@brb brb added kind/bug This is a bug in the Cilium logic. pending-review labels Dec 10, 2020
@brb brb requested review from borkmann and a team December 10, 2020 06:37
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Dec 10, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.10.0 Dec 10, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.9.2 Dec 10, 2020
@brb brb changed the title maglev: Delete map if previous M do not match maglev: Delete map if previous M's do not match Dec 10, 2020
@brb brb added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Dec 10, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Dec 10, 2020
@brb
Copy link
Member Author

brb commented Dec 10, 2020

test-me-please

@brb brb force-pushed the pr/brb/maglev-table-resize branch from f646a2c to afcb86f Compare December 10, 2020 10:10
@brb
Copy link
Member Author

brb commented Dec 10, 2020

test-me-please

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected that check to happen in cilium-map-migrate since that's where we currently check for property mismatches. But given we want to favor an all-Golang codebase in the long term, it probably makes sense to implement this logic in Go 👍

LGTM!

pkg/maps/lbmap/maglev.go Outdated Show resolved Hide resolved
Copy link
Member

@borkmann borkmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

pkg/maps/lbmap/maglev.go Outdated Show resolved Hide resolved
@brb
Copy link
Member Author

brb commented Dec 10, 2020

I would have expected that check to happen in cilium-map-migrate since that's where we currently check for property mismatches.

@pchaigno It's being checked by https://github.com/cilium/cilium/blob/master/pkg/bpf/map_linux.go#L1129. However, as we are planning to migrate from pkg/bpf to cilium/ebpf and the check is required only for the maglev map, I decided not to move the check-and-delete there.

@brb
Copy link
Member Author

brb commented Dec 10, 2020

Damn, we run the privileged tests on 4.9, and the map-in-map is not supported on that kernel.

@brb brb force-pushed the pr/brb/maglev-table-resize branch from afcb86f to b56d670 Compare December 10, 2020 15:38
@brb brb requested a review from a team December 10, 2020 17:08
@brb brb requested a review from a team as a code owner December 10, 2020 17:08
It was reported by a user, that when the M param
(--bpf-lb-maglev-table-size) changes, then the verifier is failing to
load the program on the 5.4 kernel with the following error:

    invalid access to map value, value_size=32762 off=65520 size=2 R0
    max value is outside of the array range processed 507 insns (limit
    1000000) max_states_per_insn 1 total_states 34 peak_states 34
    mark_read 17

Fix this by recreating the map when we detect that the M has changed.

Reported-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb brb force-pushed the pr/brb/maglev-table-resize branch from 22d4385 to 8c7fade Compare December 10, 2020 17:13
@brb
Copy link
Member Author

brb commented Dec 10, 2020

Damn, we run the privileged tests on 4.9, and the map-in-map is not supported on that kernel.

Added the skip. @borkmann @pchaigno PTAL (last two commits)

@brb
Copy link
Member Author

brb commented Dec 10, 2020

test-me-please

@brb brb force-pushed the pr/brb/maglev-table-resize branch from 8c7fade to d9afcd3 Compare December 10, 2020 17:23
Move the GetKernelVersion() from pkg/datapath/linux to pkg/version so
that it can be included by pkg/maps/lbmap which is included by
pkg/datapath/linux. Otherwise, we have a circular dependency.

Also, make the function to return an error instead of calling log.Fatal,
as it's caller's responsibility to decide what to do with an error.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Unfortunately, we run the privileged unit tests on the 4.9 kernel which
does not have the map-in-map support. Therefore, add a skip based on the
kernel version so that the CI does not fail.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb
Copy link
Member Author

brb commented Dec 10, 2020

test-me-please

@borkmann
Copy link
Member

retest-net-next

@brb
Copy link
Member Author

brb commented Dec 11, 2020

net-next hit #12511.

@brb
Copy link
Member Author

brb commented Dec 11, 2020

retest-net-next

@borkmann borkmann merged commit 8031a30 into master Dec 11, 2020
@borkmann borkmann deleted the pr/brb/maglev-table-resize branch December 11, 2020 09:48
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.9 in 1.9.2 Dec 16, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.9 to Backport done to v1.9 in 1.9.2 Jan 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.9.2
Backport done to v1.9
Development

Successfully merging this pull request may close these issues.

verifier error on agent restart with different table size
4 participants