Simplify NAT46x64,recorder tests #32068

joestringer · 2024-04-18T21:19:55Z

Requires #32070 to test.

Review commit-by-commit to reason through the changes over the course of the PR.

Previously, these tests would emit very verbose output which was hard to
interpret when it failed, for instance a test may fail and this is the output:

++[01:02:57] docker exec -t lb-node ip -o -4 a s eth0
++[01:02:57] awk '{print $4}'
++[01:02:57] head -n1
++[01:02:57] cut -d/ -f1
+[01:02:57] LB_NODE_IP=172.12.42.2
+[01:02:57] ip r a 10.0.0.8/32 via 172.12.42.2
++[01:02:57] seq 1 10
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null '[fd00:cafe::1]:80'
++[01:02:57] seq 1 10
+[01:02:57] for i in $(seq 1 10)
+[01:02:57] curl -s -o /dev/null 10.0.0.8:80
+[01:02:57] echo 'Failed 1'
Failed 1
+[01:02:57] exit -1
Error: Process completed with exit code 255.

If you understand what the test is trying to do and where in the test that
these commands were run, then you might be able to debug it. But for most
mortals, this output was not very easy to understand.

With this PR, the tests are reworked to only print verbose output when the
environment variable V=1 is set, and by default it prints a much terser
set of output, such as:

[./test/nat46x64/test.sh:228]     Installing Cilium with Mode:TC  Algorithm:Maglev        Recorder:Disabled
cilium-lb
9c536a08a4bdb8751cdc2337503c5864e1fd1984619635b466cf4352d99d71b2
cilium: daemon unreachable
cilium: daemon unreachable
OK
[./test/nat46x64/test.sh:229]     Testing service [fd00:cafe::1]:80 -> 172.12.42.3:80 via TC + Maglev
[./test/nat46x64/test.sh:233]     Installing Cilium with Mode:TC  Algorithm:Random        Recorder:Disabled
cilium-lb
894298c1821b2b536f6744c78aa886bc8f3ab7f4a291b82d9ccb6677d253f26b
cilium: daemon unreachable
cilium: daemon unreachable
OK
[./test/nat46x64/test.sh:234]     Testing service [fd00:cafe::1]:80 -> 172.12.42.3:80 via TC + Random
[./test/nat46x64/test.sh:238]     Configuring service 10.0.0.8:80 -> 172.12.42.3:80
Deprecation warning: --id parameter will change from int to string in v1.14
Creating new service with id '2'
Added service with 1 backends
[./test/nat46x64/test.sh:244]     Installing route: 'ip -4 r a 10.0.0.8/32 via 172.12.42.2'
[./test/nat46x64/test.sh:246]     Checking connectivity via [fd00:cafe::1]:80 -> 172.12.42.3:80
[./test/nat46x64/test.sh:249]     Checking connectivity via 10.0.0.8:80 -> 172.12.42.3:80
[./test/nat46x64/test.sh:252]     Installing Cilium with Mode:TC  Algorithm:Maglev        Recorder:Disabled
cilium-lb
89f2ad155a0a5ae0922ba741baa8d6186d4f87ee4a3e019d5845a1847b6f5f1d
cilium: daemon unreachable
cilium: daemon unreachable
OK
[./test/nat46x64/test.sh:253]     Testing service [fd00:cafe::1]:80 -> 172.12.42.3:80 via TC + Maglev
Deprecation warning: --id parameter will change from int to string in v1.14
Service 1 deleted successfully
Deprecation warning: --id parameter will change from int to string in v1.14
Service 2 deleted successfully

Or, on failure:

$ sudo ./test/nat46x64/test.sh
[./test/nat46x64/test.sh:197]     Installing Cilium with Mode:TC  Algorithm:Maglev        Recorder:Disabled
cilium-lb
ccb3d8f50c6d15a016035eb81c12e62efa04ef79cad62a00a1c1b9bd529ca58c
cilium: daemon unreachable
OK
[./test/nat46x64/test.sh:199]     Configuring service 10.0.0.4:80 -> [2001:db8:1::3]:80
Deprecation warning: --id parameter will change from int to string in v1.14
Creating new service with id '1'
Added service with 1 backends
[./test/nat46x64/test.sh:209]     Testing service 10.0.0.4:80 -> [2001:db8:1::3]:80 via TC + Maglev
[./test/nat46x64/test.sh:210]     Installing route: 'ip -4 r a 10.0.0.4/32 via 172.12.42.2'
[./test/nat46x64/test.sh:211]     Failed connection from localhost to 10.0.0.4:80 (attempt 5/10)

The annotated lines in the square brackets also provide a test file and line number where the failure occurred to make it easier to track down the failure and attempt to reproduce.

joestringer · 2024-04-18T21:34:26Z

/ci-l4lb

joestringer · 2024-04-18T21:42:39Z

/ci-l4lb

joestringer · 2024-04-19T15:53:54Z

/test

joestringer · 2024-04-22T21:34:45Z

/test

test/nat46x64/test.sh

Avoid leaving state left behind unless explicitly requested. Signed-off-by: Joe Stringer <joe@cilium.io>

Apply all of the common configurations with a variable from the top of the file, to simplify tracking down the different configurations throughout the test. Signed-off-by: Joe Stringer <joe@cilium.io>

Signed-off-by: Joe Stringer <joe@cilium.io>

This mirrors the exact same pattern in the IPv6->IPv6 case, see commit 129bc1e ("test,l4lb: Change expected error in L4LB test") for more details. Signed-off-by: Joe Stringer <joe@cilium.io>

For some reason, the installations of Cilium in this test seem to just randomly decide whether they're going to install with nat46x64 gateway enabled or not. I can't tell if this is intentional, so this commit changes all installs in this file to enable the feature. Signed-off-by: Joe Stringer <joe@cilium.io>

Make the test output more clear by echoing to the terminal a message whenever reconfiguring Cilium. Signed-off-by: Joe Stringer <joe@cilium.io>

These configurations were already being run elsewhere in the test. Signed-off-by: Joe Stringer <joe@cilium.io>

The initial test performs NAT46, Service 6to6, then does some validation around the scenarios. The second commit does the inverse, NAT64 and 4to4 with the same basic checks afterwards. This commit reworks the initial set of tests so that the code can be shared for both tests in a subsequent commit. Signed-off-by: Joe Stringer <joe@cilium.io>

No functional changes. This will be reused in the next commit by the NAT64 test. Signed-off-by: Joe Stringer <joe@cilium.io>

These tests were running practically the same logic, just duplicated. Now that the previous commit abstracted away the differences between the frontends and backends in the tests into common variables, we can convert the core of the test into a function and call it with different variables to run the different tests. Signed-off-by: Joe Stringer <joe@cilium.io>

Centralize the connectivity checks into a common function and update the output message to fail more clearly with a line number and other details about how the connection is being generated. Signed-off-by: Joe Stringer <joe@cilium.io>

Improve the tracing and failure output in the test. Signed-off-by: Joe Stringer <joe@cilium.io>

Signed-off-by: Joe Stringer <joe@cilium.io>

If cilium-agent doesn't come up ready, print the logs to the terminal and fail out. This should make a common class of possible failure conditions much easier to debug. Signed-off-by: Joe Stringer <joe@cilium.io>

Move the declaration of the arrays of filters above the test code to make it easier to read through what the test is doing. While we're at it, silence unnecessary test output unless in verbose mode (env V=1). Signed-off-by: Joe Stringer <joe@cilium.io>

Add a couple of checks to ensure that all of the requested filters end up in the BPF maps. Signed-off-by: Joe Stringer <joe@cilium.io>

BASH_LINENO is an array, and only the first index is useful to understand the line number, so only format that part. Signed-off-by: Joe Stringer <joe@cilium.io>

Apparently the CI relies on this script only cleaning up after itself upon success, so switch the default behaviour back to this. On a local TTY, the developer will be asked whether to clean up the environment if there's a failure. Otherwise it'll only be cleaned up on success. Signed-off-by: Joe Stringer <joe@cilium.io>

Move the tests for the recorder lists into a dedicated function. Suggested-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io>

joestringer · 2024-04-25T03:06:57Z

/test

joestringer added area/CI Continuous Integration testing issue or flake sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/loadbalancing Impacts load-balancing and Kubernetes service implementations labels Apr 18, 2024

maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Apr 18, 2024

joestringer added the release-note/ci This PR makes changes to the CI. label Apr 18, 2024

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 18, 2024

joestringer force-pushed the pr/joe/improve-nat46x64-debugging branch from 18c737f to cba5202 Compare April 18, 2024 21:24

joestringer force-pushed the pr/joe/improve-nat46x64-debugging branch from efc564f to 39c0a25 Compare April 18, 2024 21:41

joestringer changed the base branch from main to pr/joe/ariane-nat46x64 April 18, 2024 21:41

joestringer added the dont-merge/blocked Another PR must be merged before this one. label Apr 18, 2024

joestringer mentioned this pull request Apr 18, 2024

daemon: Run conntrack GC after Endpoint Restore #32012

Merged

Base automatically changed from pr/joe/ariane-nat46x64 to main April 22, 2024 04:56

joestringer marked this pull request as ready for review April 22, 2024 16:04

joestringer requested review from a team as code owners April 22, 2024 16:04

joestringer requested review from aspsk, tklauser and aanm April 22, 2024 16:04

joestringer removed the dont-merge/blocked Another PR must be merged before this one. label Apr 22, 2024

joestringer force-pushed the pr/joe/improve-nat46x64-debugging branch from 39c0a25 to e828af6 Compare April 22, 2024 21:34

joestringer removed request for a team and aanm April 22, 2024 21:34

joestringer commented Apr 22, 2024

View reviewed changes

test/nat46x64/test.sh Show resolved Hide resolved

joestringer added 20 commits April 24, 2024 13:07

nat46x64: Clean up environment by default

ec5b0ea

Avoid leaving state left behind unless explicitly requested. Signed-off-by: Joe Stringer <joe@cilium.io>

nat46x64: Refactor common configurations

7da09f5

Apply all of the common configurations with a variable from the top of the file, to simplify tracking down the different configurations throughout the test. Signed-off-by: Joe Stringer <joe@cilium.io>

nat46x64: Refactor waiting for service to be ready

2292ade

Signed-off-by: Joe Stringer <joe@cilium.io>

nat46x64: Add IPv4 service readiness check

74bbbf0

This mirrors the exact same pattern in the IPv6->IPv6 case, see commit 129bc1e ("test,l4lb: Change expected error in L4LB test") for more details. Signed-off-by: Joe Stringer <joe@cilium.io>

nat46x64: Centralize configuration, improve output

91eb374

Make the test output more clear by echoing to the terminal a message whenever reconfiguring Cilium. Signed-off-by: Joe Stringer <joe@cilium.io>

nat46x64: Remove redundant compile tests

7a3bd1c

These configurations were already being run elsewhere in the test. Signed-off-by: Joe Stringer <joe@cilium.io>

nat46x64: Move NAT46 test into function

9813f89

No functional changes. This will be reused in the next commit by the NAT64 test. Signed-off-by: Joe Stringer <joe@cilium.io>

nat46x64: Improve connectivity check fail reports

158706d

Centralize the connectivity checks into a common function and update the output message to fail more clearly with a line number and other details about how the connection is being generated. Signed-off-by: Joe Stringer <joe@cilium.io>

nat46x64: Add debugging steps and better tracing

300a858

Improve the tracing and failure output in the test. Signed-off-by: Joe Stringer <joe@cilium.io>

nat46x64: Refactor local node route configuration

096bf10

Signed-off-by: Joe Stringer <joe@cilium.io>

nat46x64: Quieten test output by default

4a5a373

Signed-off-by: Joe Stringer <joe@cilium.io>

nat46x64: Print cilium-agent logs on install fail

d15dfe2

If cilium-agent doesn't come up ready, print the logs to the terminal and fail out. This should make a common class of possible failure conditions much easier to debug. Signed-off-by: Joe Stringer <joe@cilium.io>

nat46x64: Test recorder filter installation

961c453

Add a couple of checks to ensure that all of the requested filters end up in the BPF maps. Signed-off-by: Joe Stringer <joe@cilium.io>

nat46x64: Only print line numbers

20a351d

BASH_LINENO is an array, and only the first index is useful to understand the line number, so only format that part. Signed-off-by: Joe Stringer <joe@cilium.io>

nat46x64: Refactor recorder list checking

584a1b6

Move the tests for the recorder lists into a dedicated function. Suggested-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io>

joestringer force-pushed the pr/joe/improve-nat46x64-debugging branch from e828af6 to 584a1b6 Compare April 25, 2024 03:06

joestringer requested a review from aspsk April 25, 2024 03:15

joestringer enabled auto-merge April 25, 2024 03:15

aspsk approved these changes Apr 25, 2024

View reviewed changes

joestringer added this pull request to the merge queue Apr 25, 2024

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 25, 2024

Merged via the queue into main with commit 6e9c1c7 Apr 25, 2024
124 of 125 checks passed

joestringer deleted the pr/joe/improve-nat46x64-debugging branch April 25, 2024 07:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify NAT46x64,recorder tests #32068

Simplify NAT46x64,recorder tests #32068

joestringer commented Apr 18, 2024 •

edited

Loading

joestringer commented Apr 18, 2024

joestringer commented Apr 18, 2024

joestringer commented Apr 19, 2024

joestringer commented Apr 22, 2024

joestringer commented Apr 25, 2024

Simplify NAT46x64,recorder tests #32068

Simplify NAT46x64,recorder tests #32068

Conversation

joestringer commented Apr 18, 2024 • edited Loading

joestringer commented Apr 18, 2024

joestringer commented Apr 18, 2024

joestringer commented Apr 19, 2024

joestringer commented Apr 22, 2024

joestringer commented Apr 25, 2024

joestringer commented Apr 18, 2024 •

edited

Loading