Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.13 Backports 2023-08-09 #27393

Merged
merged 11 commits into from Aug 16, 2023
Merged

v1.13 Backports 2023-08-09 #27393

merged 11 commits into from Aug 16, 2023

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Aug 9, 2023

@joamaki joamaki added kind/backports This PR provides functionality previously merged into master. backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. labels Aug 9, 2023
@joamaki
Copy link
Contributor Author

joamaki commented Aug 9, 2023

/test-backport-1.13

Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for addressing the conflicts on my second commit! 🙏

@nebril nebril marked this pull request as ready for review August 10, 2023 05:43
@nebril nebril requested review from a team as code owners August 10, 2023 05:43
Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

mhofstetter and others added 9 commits August 15, 2023 17:10
[ upstream commit b952c55 ]

This commit increases the timeout of the job
`Installation and Connectivity Test` within the worklfow `External
Workload` from 30m to 45m.

The current timeout is too low:

`Installation and Connectivity Test (1.26, europe-north1-b, 5)
The job running on runner GitHub Actions 52 has exceeded the maximum execution time of 30 minutes.`

https://github.com/cilium/cilium/actions/runs/5604532568

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit eb80fb1 ]

In 7a9447d we reworked a few workflows
to now be triggered by Ariane, however we missed some changes to make
sure that they checkout actions and code from the correct contexts.

In particular:

- Gateway API / Ingress / Integration tests were checking out
  environment variables from the default branch instead of the
  appropriate context ref (all in all not a big deal and still safe, but
  could be annoying to troubleshoot later down the road).
- Runtime tests were checking out environment variables from the PR
  branch instead of the appropriate context ref (this was a potential
  security issue), and then incorrectly pulling the default branch for
  executing tests instead of the appropriate PR branch context (so we
  were not testing what we expected).

Fixes: 7a9447d

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit b471f4f ]

No idea why but the steps were not aligned properly here.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 96f3fd7 ]

[ backporter's note: Conflicted with
f51daf2. Fixed it by pull in "Derive
Cilium installation config" step.]

Workflows running on PRs and based on `pull_request_target` and
`workflow_dispatch` are executed in a privileged context (e.g. access to
repository secrets), hence we take extra care not to execute anything
coming from the PR directly in the context of the workflow steps, but
instead always in a sandboxed or controlled environment (e.g. a managed
Kubernetes cluster or LVH VMs).

This commit standardizes and adds some context around which checkouts
are trusted and which are not, and where to be start being careful with
what the workflow steps are doing.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit b3171b9 ]

This commit documents the new drop reason introduced for IPsec in
6109a38 ("bpf/ipsec: Stop relying on ipcache node_id field").

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 7dc64d9 ]

When building Docker images for documentation with V=0, the Makefile for
documentation does not print the name of the generated images.

One particular example of calling the Makefile with V=0 is for Cilium's
runtime tests in CI, with .travis/build.sh calling:

    V=0 make -j 2 --quiet

This calls the default target for Cilium's main Makefile, which relies
(among others) on the "postcheck" target, which in turns calls the
"check" target from Documentation/Makefile, passing the V=0 flag. The
absence of the "docs-builder" string from the logs make it harder to
figure out whether the workflow successfully built the image.

This commit simply adds the image tag (passed to Docker with "--tag",
but not actually containing the version tag) to the invocation of
$(ECHO_DOCKER), so that Documentation/Makefile can print it when we
request a non-verbose build with V=0.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 3e52822 ]

This commit modifies calls to `send_trace_notify` in IPSec contexts
where connection tracking information is not available to drop events
when monitor aggregation is enabled.

Benchmarks were performed on a two node, e2-custom-4-8192,
cluster, using pod to pod netperf stream, RR and CRR tests. Cilium
v1.14.0 was installed using unstripped images with IPSec, VXLAN
tunnel mode, and Hubble enabled. Tests were performed in the order
of stream, RR and CRR for each configuration. Between each test,
the netserver was reset and each node's conntrack tables were
cleared.

Pod to pod netperf stream test on v1.14.0, one client per core:

```csv
Throughput,Throughput Units,Elapsed Time (sec)
464.06,10^6bits/s,180.04
461.25,10^6bits/s,180.02
441.73,10^6bits/s,180.00
482.26,10^6bits/s,180.04
```

Pod to pod netperf stream test on this commit, one client per core:

```csv
Throughput,Throughput Units,Elapsed Time (sec)
698.48,10^6bits/s,180.03
643.81,10^6bits/s,180.05
780.22,10^6bits/s,180.05
552.11,10^6bits/s,180.03
```

Taking the average of each client's throughput, this change leads to
an increase in throughput by +45%.

Pod to pod netperf RR test on v1.14.0, one client per core:

```csv
50th Percentile Latency Microseconds,90th Percentile Latency Microseconds,99th Percentile Latency Microseconds,Round Trip Latency usec/tran,Request Size Bytes,Response Size Bytes,Elapsed Time (sec)
302,486,1350,361.258,1,1,180.00
312,501,1304,372.272,1,1,180.00
320,530,1338,383.703,1,1,180.00
302,459,1286,355.595,1,1,180.00
```

Pod to pod netperf RR test on this commit, one client per core:

```csv
50th Percentile Latency Microseconds,90th Percentile Latency Microseconds,99th Percentile Latency Microseconds,Round Trip Latency usec/tran,Request Size Bytes,Response Size Bytes,Elapsed Time (sec)
199,256,370,213.484,1,1,180.00
198,254,366,212.594,1,1,180.00
202,261,371,216.930,1,1,180.00
199,257,366,213.430,1,1,180.00
```

Taking the worst 99th percentile latency from each test, this change
leads to a reduction in p99 latency by 76.5%.

Pod to pod netperf CRR test on v1.14.0, one client per core:

```csv
50th Percentile Latency Microseconds,90th Percentile Latency Microseconds,99th Percentile Latency Microseconds,Round Trip Latency usec/tran,Request Size Bytes,Response Size Bytes,Elapsed Time (sec)
935,3910,6920,3337.723,1,1,180.00
914,3308,6432,2502.398,1,1,180.00
933,3797,6843,3199.433,1,1,180.00
927,3761,6812,2855.196,1,1,180.00
```

Pod to pod netperf CRR test on this commit, one client per core:

```csv
50th Percentile Latency Microseconds,90th Percentile Latency Microseconds,99th Percentile Latency Microseconds,Round Trip Latency usec/tran,Request Size Bytes,Response Size Bytes,Elapsed Time (sec)
683,4680,5803,3155.128,1,1,180.00
678,4627,5782,2958.921,1,1,180.00
683,4658,5830,3095.390,1,1,180.00
680,4646,5787,3092.147,1,1,180.00
```

Taking the worst 99th percentile latency from each test, this change
leads to a reduction in p99 latency by 15.8%.

Fixes: #26648

Signed-off-by: Ryan Drew <ryan.drew@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 1a92459 ]

The typical pattern to determine the L4 offset of an IPv6 packet is

	tuple->nexthdr = ip6->nexthdr;
	hdrlen = ipv6_hdrlen(ctx, &tuple->nexthdr);
	if (hdrlen < 0)
		return hdrlen;
	l4_off = ETH_HLEN + hdrlen;

ipv6_hdrlen() then walks the extension headers and adds up their length,
until it reaches the TCP/UDP/... header.

For bpf_lxc, 38b4a61 ("bpf: Split CT lookup to its own tail call")
split off the CT lookup into a separate tail-call. Subsequent code
retrieves the lookup result and a fully populated CT tuple from the
CT_TAIL_CALL_BUFFER6.

For the IPv6 paths this means that tuple->nexthdr already contains the
L4 proto. When ipv6_hdrlen() is called a second time with this value, it
believes that the packet has no extension headers and returns a wrong
L4 offset.

Go for the minimal fix and re-initialize tuple->nexthdr in the affected
paths.

Fixes: 38b4a61 ("bpf: Split CT lookup to its own tail call")
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit b2c5deb ]

We link to Cilium Slack at multiple locations in Cilium's documentation.
The way we do it is far from consistent:

- We usually provide a direct link to the Slack instance
- ... Which, over time, resulted in at least 13 occurrences of the same
  URL https://cilium.herokuapp.com
- Sometimes, we create an indirect link to a "_slack" target, pointing
  to the gettinghelp.rst page
- Some other times, we use other indirect links pointing to the
  glossary.
- At some locations, we also mention that users can get invites to the
  Slack instance. As far as I know, these invites are no longer
  necessary.

Let's try to clean this up.

- Create a target link in the RST epilogue in conf.py, so we can
  reference it from any location.
- Call this link everywhere we need, using `Cilium Slack`_ in a
  consistent fashion.
- Format most channel names as inline literals.
- Improve formatting (not the content) of some paragraphs that already
  get updates for Slack references: rewrap, remove trailing spaces, etc.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
brb and others added 2 commits August 15, 2023 17:15
[ upstream commit 4461616 ]

cilium/cilium-cli#1854 is released.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 4440b3e ]

As of August 08 2023, K8s version 1.22 is no longer available for GKE
clusters.

Therefore, this commit removes the version from the CI GKE matrix
configuration.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
@YutaroHayakawa
Copy link
Member

YutaroHayakawa commented Aug 15, 2023

Rebased on the latest v1.13 to resolve conflict.

@YutaroHayakawa
Copy link
Member

/test-backport-1.13

@YutaroHayakawa YutaroHayakawa removed the request for review from pchaigno August 16, 2023 07:18
@YutaroHayakawa
Copy link
Member

Paul's change is a document only. He's out now and CI is passing. I think it's safe to go forward.

@YutaroHayakawa
Copy link
Member

@cilium/github-sec @cilium/ci-structure Could someone take a look and approve if it seems ok?

Copy link
Contributor

@brlbil brlbil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nbusseneau has already approved, interesting that it requires reviews from github-sec, and structure

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Aug 16, 2023
@YutaroHayakawa
Copy link
Member

YutaroHayakawa commented Aug 16, 2023

@brlbil Thanks! Yeah, I was also wondering that 🤔

And somehow GitHub says

Waiting on code owner review from cilium/tophat

Louis approved as a tophat already. Mysterious...

@aanm aanm merged commit 0c5797c into v1.13 Aug 16, 2023
120 checks passed
@aanm aanm deleted the pr/v1.13-backport-2023-08-09 branch August 16, 2023 07:49
@pchaigno
Copy link
Member

@YutaroHayakawa @brlbil That's because they reviewed before the pull request was marked ready to review. Such reviews before it's ready are ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet