New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.13 Backports 2023-08-09 #27393
v1.13 Backports 2023-08-09 #27393
Conversation
/test-backport-1.13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for addressing the conflicts on my second commit! 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
[ upstream commit b952c55 ] This commit increases the timeout of the job `Installation and Connectivity Test` within the worklfow `External Workload` from 30m to 45m. The current timeout is too low: `Installation and Connectivity Test (1.26, europe-north1-b, 5) The job running on runner GitHub Actions 52 has exceeded the maximum execution time of 30 minutes.` https://github.com/cilium/cilium/actions/runs/5604532568 Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit eb80fb1 ] In 7a9447d we reworked a few workflows to now be triggered by Ariane, however we missed some changes to make sure that they checkout actions and code from the correct contexts. In particular: - Gateway API / Ingress / Integration tests were checking out environment variables from the default branch instead of the appropriate context ref (all in all not a big deal and still safe, but could be annoying to troubleshoot later down the road). - Runtime tests were checking out environment variables from the PR branch instead of the appropriate context ref (this was a potential security issue), and then incorrectly pulling the default branch for executing tests instead of the appropriate PR branch context (so we were not testing what we expected). Fixes: 7a9447d Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit b471f4f ] No idea why but the steps were not aligned properly here. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 96f3fd7 ] [ backporter's note: Conflicted with f51daf2. Fixed it by pull in "Derive Cilium installation config" step.] Workflows running on PRs and based on `pull_request_target` and `workflow_dispatch` are executed in a privileged context (e.g. access to repository secrets), hence we take extra care not to execute anything coming from the PR directly in the context of the workflow steps, but instead always in a sandboxed or controlled environment (e.g. a managed Kubernetes cluster or LVH VMs). This commit standardizes and adds some context around which checkouts are trusted and which are not, and where to be start being careful with what the workflow steps are doing. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 7dc64d9 ] When building Docker images for documentation with V=0, the Makefile for documentation does not print the name of the generated images. One particular example of calling the Makefile with V=0 is for Cilium's runtime tests in CI, with .travis/build.sh calling: V=0 make -j 2 --quiet This calls the default target for Cilium's main Makefile, which relies (among others) on the "postcheck" target, which in turns calls the "check" target from Documentation/Makefile, passing the V=0 flag. The absence of the "docs-builder" string from the logs make it harder to figure out whether the workflow successfully built the image. This commit simply adds the image tag (passed to Docker with "--tag", but not actually containing the version tag) to the invocation of $(ECHO_DOCKER), so that Documentation/Makefile can print it when we request a non-verbose build with V=0. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 3e52822 ] This commit modifies calls to `send_trace_notify` in IPSec contexts where connection tracking information is not available to drop events when monitor aggregation is enabled. Benchmarks were performed on a two node, e2-custom-4-8192, cluster, using pod to pod netperf stream, RR and CRR tests. Cilium v1.14.0 was installed using unstripped images with IPSec, VXLAN tunnel mode, and Hubble enabled. Tests were performed in the order of stream, RR and CRR for each configuration. Between each test, the netserver was reset and each node's conntrack tables were cleared. Pod to pod netperf stream test on v1.14.0, one client per core: ```csv Throughput,Throughput Units,Elapsed Time (sec) 464.06,10^6bits/s,180.04 461.25,10^6bits/s,180.02 441.73,10^6bits/s,180.00 482.26,10^6bits/s,180.04 ``` Pod to pod netperf stream test on this commit, one client per core: ```csv Throughput,Throughput Units,Elapsed Time (sec) 698.48,10^6bits/s,180.03 643.81,10^6bits/s,180.05 780.22,10^6bits/s,180.05 552.11,10^6bits/s,180.03 ``` Taking the average of each client's throughput, this change leads to an increase in throughput by +45%. Pod to pod netperf RR test on v1.14.0, one client per core: ```csv 50th Percentile Latency Microseconds,90th Percentile Latency Microseconds,99th Percentile Latency Microseconds,Round Trip Latency usec/tran,Request Size Bytes,Response Size Bytes,Elapsed Time (sec) 302,486,1350,361.258,1,1,180.00 312,501,1304,372.272,1,1,180.00 320,530,1338,383.703,1,1,180.00 302,459,1286,355.595,1,1,180.00 ``` Pod to pod netperf RR test on this commit, one client per core: ```csv 50th Percentile Latency Microseconds,90th Percentile Latency Microseconds,99th Percentile Latency Microseconds,Round Trip Latency usec/tran,Request Size Bytes,Response Size Bytes,Elapsed Time (sec) 199,256,370,213.484,1,1,180.00 198,254,366,212.594,1,1,180.00 202,261,371,216.930,1,1,180.00 199,257,366,213.430,1,1,180.00 ``` Taking the worst 99th percentile latency from each test, this change leads to a reduction in p99 latency by 76.5%. Pod to pod netperf CRR test on v1.14.0, one client per core: ```csv 50th Percentile Latency Microseconds,90th Percentile Latency Microseconds,99th Percentile Latency Microseconds,Round Trip Latency usec/tran,Request Size Bytes,Response Size Bytes,Elapsed Time (sec) 935,3910,6920,3337.723,1,1,180.00 914,3308,6432,2502.398,1,1,180.00 933,3797,6843,3199.433,1,1,180.00 927,3761,6812,2855.196,1,1,180.00 ``` Pod to pod netperf CRR test on this commit, one client per core: ```csv 50th Percentile Latency Microseconds,90th Percentile Latency Microseconds,99th Percentile Latency Microseconds,Round Trip Latency usec/tran,Request Size Bytes,Response Size Bytes,Elapsed Time (sec) 683,4680,5803,3155.128,1,1,180.00 678,4627,5782,2958.921,1,1,180.00 683,4658,5830,3095.390,1,1,180.00 680,4646,5787,3092.147,1,1,180.00 ``` Taking the worst 99th percentile latency from each test, this change leads to a reduction in p99 latency by 15.8%. Fixes: #26648 Signed-off-by: Ryan Drew <ryan.drew@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 1a92459 ] The typical pattern to determine the L4 offset of an IPv6 packet is tuple->nexthdr = ip6->nexthdr; hdrlen = ipv6_hdrlen(ctx, &tuple->nexthdr); if (hdrlen < 0) return hdrlen; l4_off = ETH_HLEN + hdrlen; ipv6_hdrlen() then walks the extension headers and adds up their length, until it reaches the TCP/UDP/... header. For bpf_lxc, 38b4a61 ("bpf: Split CT lookup to its own tail call") split off the CT lookup into a separate tail-call. Subsequent code retrieves the lookup result and a fully populated CT tuple from the CT_TAIL_CALL_BUFFER6. For the IPv6 paths this means that tuple->nexthdr already contains the L4 proto. When ipv6_hdrlen() is called a second time with this value, it believes that the packet has no extension headers and returns a wrong L4 offset. Go for the minimal fix and re-initialize tuple->nexthdr in the affected paths. Fixes: 38b4a61 ("bpf: Split CT lookup to its own tail call") Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit b2c5deb ] We link to Cilium Slack at multiple locations in Cilium's documentation. The way we do it is far from consistent: - We usually provide a direct link to the Slack instance - ... Which, over time, resulted in at least 13 occurrences of the same URL https://cilium.herokuapp.com - Sometimes, we create an indirect link to a "_slack" target, pointing to the gettinghelp.rst page - Some other times, we use other indirect links pointing to the glossary. - At some locations, we also mention that users can get invites to the Slack instance. As far as I know, these invites are no longer necessary. Let's try to clean this up. - Create a target link in the RST epilogue in conf.py, so we can reference it from any location. - Call this link everywhere we need, using `Cilium Slack`_ in a consistent fashion. - Format most channel names as inline literals. - Improve formatting (not the content) of some paragraphs that already get updates for Slack references: rewrap, remove trailing spaces, etc. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 4461616 ] cilium/cilium-cli#1854 is released. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 4440b3e ] As of August 08 2023, K8s version 1.22 is no longer available for GKE clusters. Therefore, this commit removes the version from the CI GKE matrix configuration. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
adf1bb9
to
66ed85f
Compare
Rebased on the latest v1.13 to resolve conflict. |
/test-backport-1.13 |
Paul's change is a document only. He's out now and CI is passing. I think it's safe to go forward. |
@cilium/github-sec @cilium/ci-structure Could someone take a look and approve if it seems ok? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nbusseneau has already approved, interesting that it requires reviews from github-sec, and structure
@brlbil Thanks! Yeah, I was also wondering that 🤔 And somehow GitHub says
Louis approved as a tophat already. Mysterious... |
@YutaroHayakawa @brlbil That's because they reviewed before the pull request was marked ready to review. Such reviews before it's ready are ignored. |
DROP_NO_NODE_ID
for IPsec #27184 (@pchaigno)Once this PR is merged, you can update the PR labels via:
or with