Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify NAT46x64,recorder tests #32068

Merged
merged 23 commits into from
Apr 25, 2024
Merged

Conversation

joestringer
Copy link
Member

@joestringer joestringer commented Apr 18, 2024

Requires #32070 to test.

Review commit-by-commit to reason through the changes over the course of the PR.

Previously, these tests would emit very verbose output which was hard to
interpret when it failed, for instance a test may fail and this is the output:

++[01:02:57] docker exec -t lb-node ip -o -4 a s eth0
++[01:02:57] awk '{print $4}'
++[01:02:57] head -n1
++[01:02:57] cut -d/ -f1
+[01:02:57] LB_NODE_IP=172.12.42.2
+[01:02:57] ip r a 10.0.0.8/32 via 172.12.42.2
++[01:02:57] seq 1 10
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
++[01:02:57] seq 1 10
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null 10.0.0.8:80
+[01:02:57] echo 'Failed 1'
Failed 1
+[01:02:57] exit -1
Error: Process completed with exit code 255.

If you understand what the test is trying to do and where in the test that
these commands were run, then you might be able to debug it. But for most
mortals, this output was not very easy to understand.

With this PR, the tests are reworked to only print verbose output when the
environment variable V=1 is set, and by default it prints a much terser
set of output, such as:

[./test/nat46x64/test.sh:228]     Installing Cilium with Mode:TC  Algorithm:Maglev        Recorder:Disabled
cilium-lb
9c536a08a4bdb8751cdc2337503c5864e1fd1984619635b466cf4352d99d71b2
cilium: daemon unreachable
cilium: daemon unreachable
OK
[./test/nat46x64/test.sh:229]     Testing service [fd00:cafe::1]:80 -> 172.12.42.3:80 via TC + Maglev
[./test/nat46x64/test.sh:233]     Installing Cilium with Mode:TC  Algorithm:Random        Recorder:Disabled
cilium-lb
894298c1821b2b536f6744c78aa886bc8f3ab7f4a291b82d9ccb6677d253f26b
cilium: daemon unreachable
cilium: daemon unreachable
OK
[./test/nat46x64/test.sh:234]     Testing service [fd00:cafe::1]:80 -> 172.12.42.3:80 via TC + Random
[./test/nat46x64/test.sh:238]     Configuring service 10.0.0.8:80 -> 172.12.42.3:80
Deprecation warning: --id parameter will change from int to string in v1.14
Creating new service with id '2'
Added service with 1 backends
[./test/nat46x64/test.sh:244]     Installing route: 'ip -4 r a 10.0.0.8/32 via 172.12.42.2'
[./test/nat46x64/test.sh:246]     Checking connectivity via [fd00:cafe::1]:80 -> 172.12.42.3:80
[./test/nat46x64/test.sh:249]     Checking connectivity via 10.0.0.8:80 -> 172.12.42.3:80
[./test/nat46x64/test.sh:252]     Installing Cilium with Mode:TC  Algorithm:Maglev        Recorder:Disabled
cilium-lb
89f2ad155a0a5ae0922ba741baa8d6186d4f87ee4a3e019d5845a1847b6f5f1d
cilium: daemon unreachable
cilium: daemon unreachable
OK
[./test/nat46x64/test.sh:253]     Testing service [fd00:cafe::1]:80 -> 172.12.42.3:80 via TC + Maglev
Deprecation warning: --id parameter will change from int to string in v1.14
Service 1 deleted successfully
Deprecation warning: --id parameter will change from int to string in v1.14
Service 2 deleted successfully

Or, on failure:

$ sudo ./test/nat46x64/test.sh
[./test/nat46x64/test.sh:197]     Installing Cilium with Mode:TC  Algorithm:Maglev        Recorder:Disabled
cilium-lb
ccb3d8f50c6d15a016035eb81c12e62efa04ef79cad62a00a1c1b9bd529ca58c
cilium: daemon unreachable
OK
[./test/nat46x64/test.sh:199]     Configuring service 10.0.0.4:80 -> [2001:db8:1::3]:80
Deprecation warning: --id parameter will change from int to string in v1.14
Creating new service with id '1'
Added service with 1 backends
[./test/nat46x64/test.sh:209]     Testing service 10.0.0.4:80 -> [2001:db8:1::3]:80 via TC + Maglev
[./test/nat46x64/test.sh:210]     Installing route: 'ip -4 r a 10.0.0.4/32 via 172.12.42.2'
[./test/nat46x64/test.sh:211]     Failed connection from localhost to 10.0.0.4:80 (attempt 5/10)

The annotated lines in the square brackets also provide a test file and line number where the failure occurred to make it easier to track down the failure and attempt to reproduce.

@joestringer joestringer added area/CI Continuous Integration testing issue or flake sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/loadbalancing Impacts load-balancing and Kubernetes service implementations labels Apr 18, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Apr 18, 2024
@joestringer joestringer added the release-note/ci This PR makes changes to the CI. label Apr 18, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 18, 2024
@joestringer joestringer force-pushed the pr/joe/improve-nat46x64-debugging branch from 18c737f to cba5202 Compare April 18, 2024 21:24
@joestringer
Copy link
Member Author

/ci-l4lb

@joestringer joestringer force-pushed the pr/joe/improve-nat46x64-debugging branch from efc564f to 39c0a25 Compare April 18, 2024 21:41
@joestringer joestringer changed the base branch from main to pr/joe/ariane-nat46x64 April 18, 2024 21:41
@joestringer
Copy link
Member Author

/ci-l4lb

@joestringer joestringer added the dont-merge/blocked Another PR must be merged before this one. label Apr 18, 2024
@joestringer
Copy link
Member Author

/test

Base automatically changed from pr/joe/ariane-nat46x64 to main April 22, 2024 04:56
@joestringer joestringer marked this pull request as ready for review April 22, 2024 16:04
@joestringer joestringer requested review from a team as code owners April 22, 2024 16:04
@joestringer joestringer removed the dont-merge/blocked Another PR must be merged before this one. label Apr 22, 2024
@joestringer joestringer force-pushed the pr/joe/improve-nat46x64-debugging branch from 39c0a25 to e828af6 Compare April 22, 2024 21:34
@joestringer joestringer removed request for a team and aanm April 22, 2024 21:34
@joestringer
Copy link
Member Author

/test

Avoid leaving state left behind unless explicitly requested.

Signed-off-by: Joe Stringer <joe@cilium.io>
Apply all of the common configurations with a variable from the top of
the file, to simplify tracking down the different configurations
throughout the test.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
This mirrors the exact same pattern in the IPv6->IPv6 case, see commit
129bc1e ("test,l4lb: Change expected error in L4LB test") for more
details.

Signed-off-by: Joe Stringer <joe@cilium.io>
For some reason, the installations of Cilium in this test seem to just
randomly decide whether they're going to install with nat46x64 gateway
enabled or not. I can't tell if this is intentional, so this commit
changes all installs in this file to enable the feature.

Signed-off-by: Joe Stringer <joe@cilium.io>
Make the test output more clear by echoing to the terminal a message
whenever reconfiguring Cilium.

Signed-off-by: Joe Stringer <joe@cilium.io>
These configurations were already being run elsewhere in the test.

Signed-off-by: Joe Stringer <joe@cilium.io>
The initial test performs NAT46, Service 6to6, then does some validation
around the scenarios. The second commit does the inverse, NAT64 and
4to4 with the same basic checks afterwards.

This commit reworks the initial set of tests so that the code can be
shared for both tests in a subsequent commit.

Signed-off-by: Joe Stringer <joe@cilium.io>
No functional changes. This will be reused in the next commit by the
NAT64 test.

Signed-off-by: Joe Stringer <joe@cilium.io>
These tests were running practically the same logic, just duplicated.
Now that the previous commit abstracted away the differences between the
frontends and backends in the tests into common variables, we can
convert the core of the test into a function and call it with different
variables to run the different tests.

Signed-off-by: Joe Stringer <joe@cilium.io>
Centralize the connectivity checks into a common function and update the
output message to fail more clearly with a line number and other details
about how the connection is being generated.

Signed-off-by: Joe Stringer <joe@cilium.io>
Improve the tracing and failure output in the test.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
If cilium-agent doesn't come up ready, print the logs to the terminal
and fail out. This should make a common class of possible failure
conditions much easier to debug.

Signed-off-by: Joe Stringer <joe@cilium.io>
Move the declaration of the arrays of filters above the test code to
make it easier to read through what the test is doing. While we're at
it, silence unnecessary test output unless in verbose mode (env V=1).

Signed-off-by: Joe Stringer <joe@cilium.io>
Add a couple of checks to ensure that all of the requested filters end
up in the BPF maps.

Signed-off-by: Joe Stringer <joe@cilium.io>
BASH_LINENO is an array, and only the first index is useful to
understand the line number, so only format that part.

Signed-off-by: Joe Stringer <joe@cilium.io>
Apparently the CI relies on this script only cleaning up after itself
upon success, so switch the default behaviour back to this. On a local
TTY, the developer will be asked whether to clean up the environment if
there's a failure. Otherwise it'll only be cleaned up on success.

Signed-off-by: Joe Stringer <joe@cilium.io>
Move the tests for the recorder lists into a dedicated function.

Suggested-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
@joestringer joestringer force-pushed the pr/joe/improve-nat46x64-debugging branch from e828af6 to 584a1b6 Compare April 25, 2024 03:06
@joestringer
Copy link
Member Author

/test

@joestringer joestringer added this pull request to the merge queue Apr 25, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 25, 2024
Merged via the queue into main with commit 6e9c1c7 Apr 25, 2024
124 of 125 checks passed
@joestringer joestringer deleted the pr/joe/improve-nat46x64-debugging branch April 25, 2024 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake area/loadbalancing Impacts load-balancing and Kubernetes service implementations ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants