-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: K8sPolicyTest Multi-node policy test validates ingress CIDR-dependent L4 (FromCIDR + ToPorts) connectivity is restored after importing ingress policy #12596
Comments
Thanks for filing. Yeah it looks like one potential cause could either be what #12559 fixed, or that |
Also observed again in #13025 (comment) |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Our fromCIDR+toPorts tests have been flaking for a while. In particular, the check on the number of policy verdicts found in the output of cilium monitor sometimes fails. Those fails happen because we read cilium monitor's output too soon, before it has a chance to drain the ring buffer (see #12596 for a detailed analysis of the logs). This commit fixes it by repeatedly checking the output against our regular expression until it succeeds or times out, similarly to what is already done in K8sDatapathConfig MonitorAggregation. Fixes: #12596 Fixes: 94eb491 ("test: Add policy test for ingress FromCIDR+ToPorts") Fixes: 74be0b2 ("test: fromCIDR+toPorts host policy") Signed-off-by: Paul Chaignon <paul@cilium.io>
#13840 added additional Summary of the flakeOur test establishes Initial ideaTo be exact, the test fails because Finding the missing connections
# Type 4 is for TRACE_TO_LXC.
# The time filter is the time the connections started according to the test logs.
$ cat hubble.json | jq 'select(.IP.source=="192.168.36.13") | select(.IP.destination=="10.0.1.61")' \
jq 'select(.l4.TCP.flags.SYN==true) | select(.event_type.type==4)' \
jq 'select(.time>="2020-12-04T09:49").l4.TCP.source_port' > hubble-srcports.txt
$ grep -Po "(?<=192.168.36.13:)\d+(?= -> 10.0.1.61:80)" cilium-monitor.txt > cilium-monitor-srcports.txt
$ diff hubble-srcports.txt cilium-monitor-srcports.txt
44,45d43
< 55568
< 55572 We just caught our two missing connections (:tada:) and they should appear in Root causeSo it looks like instead of starting the connections too quickly after starting We need to wait a bit before reading the connections from SolutionSurely, this isn't the only test relying on cilium monitor in this way? How are other tests implementing this? Well, although there are several tests calling cilium/test/k8sT/DatapathConfiguration.go Lines 175 to 179 in 44c122d
That's the fix #14286 implements. FAQWhy doesn't it affect the previous test, which checks connections are denied with the same Are there partial connections at the end of the monitor logs? On the 4 flakes I checked, that never happens. cilium monitor always starts with a TCP SYN for the first connection and ends with a TCP FIN+ACK followed by a TCP ACK for the last connection. I'm guessing there is some factor of luck here + the fact packets for a same connection are likely slightly grouped together. |
Our fromCIDR+toPorts tests have been flaking for a while. In particular, the check on the number of policy verdicts found in the output of cilium monitor sometimes fails. Those fails happen because we read cilium monitor's output too soon, before it has a chance to drain the ring buffer (see #12596 for a detailed analysis of the logs). This commit fixes it by repeatedly checking the output against our regular expression until it succeeds or times out, similarly to what is already done in K8sDatapathConfig MonitorAggregation. Fixes: #12596 Fixes: 94eb491 ("test: Add policy test for ingress FromCIDR+ToPorts") Fixes: 74be0b2 ("test: fromCIDR+toPorts host policy") Signed-off-by: Paul Chaignon <paul@cilium.io>
Our fromCIDR+toPorts tests have been flaking for a while. In particular, the check on the number of policy verdicts found in the output of cilium monitor sometimes fails. Those fails happen because we read cilium monitor's output too soon, before it has a chance to drain the ring buffer (see #12596 for a detailed analysis of the logs). This commit fixes it by repeatedly checking the output against our regular expression until it succeeds or times out, similarly to what is already done in K8sDatapathConfig MonitorAggregation. Fixes: #12596 Fixes: 94eb491 ("test: Add policy test for ingress FromCIDR+ToPorts") Fixes: 74be0b2 ("test: fromCIDR+toPorts host policy") Signed-off-by: Paul Chaignon <paul@cilium.io>
Our fromCIDR+toPorts tests have been flaking for a while. In particular, the check on the number of policy verdicts found in the output of cilium monitor sometimes fails. Those fails happen because we read cilium monitor's output too soon, before it has a chance to drain the ring buffer (see #12596 for a detailed analysis of the logs). This commit fixes it by repeatedly checking the output against our regular expression until it succeeds or times out, similarly to what is already done in K8sDatapathConfig MonitorAggregation. Fixes: #12596 Fixes: 94eb491 ("test: Add policy test for ingress FromCIDR+ToPorts") Fixes: 74be0b2 ("test: fromCIDR+toPorts host policy") Signed-off-by: Paul Chaignon <paul@cilium.io>
[ upstream commit 7ba7edc ] Our fromCIDR+toPorts tests have been flaking for a while. In particular, the check on the number of policy verdicts found in the output of cilium monitor sometimes fails. Those fails happen because we read cilium monitor's output too soon, before it has a chance to drain the ring buffer (see #12596 for a detailed analysis of the logs). This commit fixes it by repeatedly checking the output against our regular expression until it succeeds or times out, similarly to what is already done in K8sDatapathConfig MonitorAggregation. Fixes: #12596 Fixes: 94eb491 ("test: Add policy test for ingress FromCIDR+ToPorts") Fixes: 74be0b2 ("test: fromCIDR+toPorts host policy") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 7ba7edc ] Our fromCIDR+toPorts tests have been flaking for a while. In particular, the check on the number of policy verdicts found in the output of cilium monitor sometimes fails. Those fails happen because we read cilium monitor's output too soon, before it has a chance to drain the ring buffer (see #12596 for a detailed analysis of the logs). This commit fixes it by repeatedly checking the output against our regular expression until it succeeds or times out, similarly to what is already done in K8sDatapathConfig MonitorAggregation. Fixes: #12596 Fixes: 94eb491 ("test: Add policy test for ingress FromCIDR+ToPorts") Fixes: 74be0b2 ("test: fromCIDR+toPorts host policy") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 7ba7edc ] Our fromCIDR+toPorts tests have been flaking for a while. In particular, the check on the number of policy verdicts found in the output of cilium monitor sometimes fails. Those fails happen because we read cilium monitor's output too soon, before it has a chance to drain the ring buffer (see #12596 for a detailed analysis of the logs). This commit fixes it by repeatedly checking the output against our regular expression until it succeeds or times out, similarly to what is already done in K8sDatapathConfig MonitorAggregation. Fixes: #12596 Fixes: 94eb491 ("test: Add policy test for ingress FromCIDR+ToPorts") Fixes: 74be0b2 ("test: fromCIDR+toPorts host policy") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 7ba7edc ] Our fromCIDR+toPorts tests have been flaking for a while. In particular, the check on the number of policy verdicts found in the output of cilium monitor sometimes fails. Those fails happen because we read cilium monitor's output too soon, before it has a chance to drain the ring buffer (see #12596 for a detailed analysis of the logs). This commit fixes it by repeatedly checking the output against our regular expression until it succeeds or times out, similarly to what is already done in K8sDatapathConfig MonitorAggregation. Fixes: #12596 Fixes: 94eb491 ("test: Add policy test for ingress FromCIDR+ToPorts") Fixes: 74be0b2 ("test: fromCIDR+toPorts host policy") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 7ba7edc ] Our fromCIDR+toPorts tests have been flaking for a while. In particular, the check on the number of policy verdicts found in the output of cilium monitor sometimes fails. Those fails happen because we read cilium monitor's output too soon, before it has a chance to drain the ring buffer (see #12596 for a detailed analysis of the logs). This commit fixes it by repeatedly checking the output against our regular expression until it succeeds or times out, similarly to what is already done in K8sDatapathConfig MonitorAggregation. Fixes: #12596 Fixes: 94eb491 ("test: Add policy test for ingress FromCIDR+ToPorts") Fixes: 74be0b2 ("test: fromCIDR+toPorts host policy") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
https://jenkins.cilium.io/job/Cilium-PR-K8s-oldest-net-next/1273/testReport/Suite-k8s-1/11/K8sPolicyTest_Multi_node_policy_test_validates_ingress_CIDR_dependent_L4__FromCIDR___ToPorts__connectivity_is_restored_after_importing_ingress_policy/
test_results_Cilium-PR-K8s-oldest-net-next_1273_BDD-Test-PR.zip
The flake happened 6 times in PRs since the test was added last week. (Test-specific artifacts are always missing when this flake happens.) The two numbers compared below are never off by more than 3. Maybe just an issue with
cilium monitor
in the test? It seems to only happen in net-next.I'm opening this issue to track the flake until we know more, but it's possible this is another instance of #12555, which was fixed by #12559.
Stacktrace
Standard Output
Standard Error
Show standard error
/cc @christarazi
The text was updated successfully, but these errors were encountered: