Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Maglev unit test failing on newer kernels with: update failed: invalid argument #16844

Closed
sugangli opened this issue Jul 9, 2021 · 0 comments · Fixed by #16850
Closed
Assignees
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me!

Comments

@sugangli
Copy link
Contributor

sugangli commented Jul 9, 2021

CI failure

I observed the failure in gke presubmit unit test while merging the oss cilium with gke internal one:

START: maglev_test.go:81: MaglevSuite.TestInitMaps
maglev_test.go:112:
    c.Assert(err, IsNil)
... value *fmt.wrapError = &fmt.wrapError{msg:"update failed: invalid argument", err:0x16} ("update failed: invalid argument")
FAIL: maglev_test.go:81: MaglevSuite.TestInitMaps
OOPS: 0 passed, 1 FAILED
--- FAIL: Test (0.11s)
FAIL
coverage: 4.2% of statements in ./...
FAIL	github.com/cilium/cilium/pkg/maps/lbmap	0.140s
FAIL
make: *** [Makefile:144: tests-privileged] Error 1

I was able to reproduce this on my local setup:

$ sudo SKIP_VET=true TESTPKGS=pkg/maps/lbmap strace -f -e signal=none -ebpf -o trace make tests-privileged 
make  -C bpf cilium-map-migrate
make[1]: Entering directory '/usr/local/google/home/sugangli/go/src/github.com/cilium/cilium/bpf'
make[1]: 'cilium-map-migrate' is up to date.
make[1]: Leaving directory '/usr/local/google/home/sugangli/go/src/github.com/cilium/cilium/bpf'
make init-coverage
make[1]: Entering directory '/usr/local/google/home/sugangli/go/src/github.com/cilium/cilium'
echo "mode: count" > coverage-all-tmp.out
echo "mode: count" > coverage.out
make[1]: Leaving directory '/usr/local/google/home/sugangli/go/src/github.com/cilium/cilium'
for pkg in github.com/cilium/cilium/pkg/maps/lbmap; do \
	PATH=/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/google/home/sugangli/go/src/github.com/cilium/cilium/bpf go test -mod=vendor -tags=osusergo,privileged_tests -ldflags "-X github.com/cilium/cilium/pkg/kvstore.consulDummyAddress=https://consul:8443 -X github.com/cilium/cilium/pkg/kvstore.etcdDummyAddress=http://etcd:4002 -X github.com/cilium/cilium/pkg/testutils.CiliumRootDir=/usr/local/google/home/sugangli/go/src/github.com/cilium/cilium -X github.com/cilium/cilium/pkg/datapath.DatapathSHA256=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" $pkg -test.v -timeout 360s -check.vv -covermode=count -coverprofile=coverage.out -coverpkg github.com/cilium/cilium/pkg/maps/lbmap \
	|| exit 1; \
	tail -n +2 coverage.out >> coverage-all-tmp.out; \
done
=== RUN   Test
START: maglev_test.go:43: MaglevSuite.SetUpSuite
PASS: maglev_test.go:43: MaglevSuite.SetUpSuite	0.001s

START: maglev_test.go:81: MaglevSuite.TestInitMaps
maglev_test.go:112:
    c.Assert(err, IsNil)
... value *fmt.wrapError = &fmt.wrapError{msg:"update failed: invalid argument", err:0x16} ("update failed: invalid argument")

FAIL: maglev_test.go:81: MaglevSuite.TestInitMaps

OOPS: 0 passed, 1 FAILED
--- FAIL: Test (1.79s)
FAIL
coverage: 9.9% of statements in github.com/cilium/cilium/pkg/maps/lbmap
FAIL	github.com/cilium/cilium/pkg/maps/lbmap	1.900s
FAIL
make: *** [Makefile:144: tests-privileged] Error 1

@pchaigno was able to reproduce this failure on his setup, and found out the current CI skipped this test since the test VM had an old kernel(4.9).

The kernel I am using is :5.10.40-1rodete1-amd64.
I am also attaching the trace file here.
trace3.txt

@sugangli sugangli added area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! labels Jul 9, 2021
@pchaigno pchaigno changed the title CI: CI skips maglev unit test due to the test VM has an old kernel CI: Maglev unit test failing on newer kernels with: update failed: invalid argument Jul 9, 2021
@pchaigno pchaigno changed the title CI: Maglev unit test failing on newer kernels with: update failed: invalid argument CI: Maglev unit test failing on newer kernels with: update failed: invalid argument Jul 9, 2021
@jibi jibi self-assigned this Jul 9, 2021
jibi added a commit that referenced this issue Jul 12, 2021
When a maglev BPF map is initialized, before creating it we check if it
already exists, and if its inner map size matches the desired maglev
table size ("M" parameter), so that we can delete and recreate it in
case of a mismatch.

The lbmap.MaglevOuterMapTableSize function is reponsible for reporting
to the caller if the map already exists and its inner map size.

Currently, if the map exists but its empty, MaglevOuterMapTableSize will
incorrectly return false (i.e. "map does not exist"), preventing
lbmap.deleteMapIfMNotMatch from deleting it in case of a mismatch with
the M parameter.

This commit fixes this logic.

Fixes: #16844
Fixes: 879f9eb
Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
jibi added a commit that referenced this issue Jul 13, 2021
When a maglev BPF map is initialized, before creating it we check if it
already exists, and if its inner map size matches the desired maglev
table size ("M" parameter), so that we can delete and recreate it in
case of a mismatch.

The lbmap.MaglevOuterMapTableSize function is reponsible for reporting
to the caller if the map already exists and its inner map size.

Currently, if the map exists but its empty, MaglevOuterMapTableSize will
incorrectly return false (i.e. "map does not exist"), preventing
lbmap.deleteMapIfMNotMatch from deleting it in case of a mismatch with
the M parameter.

This commit fixes this logic.

Fixes: #16844
Fixes: 879f9eb
Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
aanm pushed a commit that referenced this issue Jul 19, 2021
When a maglev BPF map is initialized, before creating it we check if it
already exists, and if its inner map size matches the desired maglev
table size ("M" parameter), so that we can delete and recreate it in
case of a mismatch.

The lbmap.MaglevOuterMapTableSize function is reponsible for reporting
to the caller if the map already exists and its inner map size.

Currently, if the map exists but its empty, MaglevOuterMapTableSize will
incorrectly return false (i.e. "map does not exist"), preventing
lbmap.deleteMapIfMNotMatch from deleting it in case of a mismatch with
the M parameter.

This commit fixes this logic.

Fixes: #16844
Fixes: 879f9eb
Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
krishgobinath pushed a commit to krishgobinath/cilium that referenced this issue Oct 20, 2021
When a maglev BPF map is initialized, before creating it we check if it
already exists, and if its inner map size matches the desired maglev
table size ("M" parameter), so that we can delete and recreate it in
case of a mismatch.

The lbmap.MaglevOuterMapTableSize function is reponsible for reporting
to the caller if the map already exists and its inner map size.

Currently, if the map exists but its empty, MaglevOuterMapTableSize will
incorrectly return false (i.e. "map does not exist"), preventing
lbmap.deleteMapIfMNotMatch from deleting it in case of a mismatch with
the M parameter.

This commit fixes this logic.

Fixes: cilium#16844
Fixes: 879f9eb
Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants