Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel: update to latest version for series 5.10, 5.15, 6.1 #3708

Merged

Conversation

markusboehme
Copy link
Member

Description of changes:

Update kernels to latest AL kernels available in the repositories. Massage the configuration for the 5.10 kernel to be closer to the state we had before the update. Add a fix for #3691 to the 5.15 series, since its update picks up the buggy commit (this is a downstream backport since none is publicly available yet; to be upstreamed; backport could be simplified a bit compared to #3699 due to the existing backport of the bug already bringing enough helpers).

Testing done:

Validate basic functionality through sonobuoy quick test:

> kubectl get nodes -o wide
NAME                                              STATUS   ROLES    AGE     VERSION                INTERNAL-IP      EXTERNAL-IP     OS-IMAGE                                KERNEL-VERSION   CONTAINER-RUNTIME
ip-192-168-71-140.eu-central-1.compute.internal   Ready    <none>   9m47s   v1.23.17-eks-ea94ec3   192.168.71.140   18.185.22.137   Bottlerocket OS 1.18.0 (aws-k8s-1.23)   5.10.205         containerd://1.6.26+bottlerocket
ip-192-168-77-109.eu-central-1.compute.internal   Ready    <none>   101s    v1.28.4-eks-d91a302    192.168.77.109   3.126.118.175   Bottlerocket OS 1.18.0 (aws-k8s-1.28)   6.1.66           containerd://1.6.26+bottlerocket
ip-192-168-87-213.eu-central-1.compute.internal   Ready    <none>   5m27s   v1.26.11-eks-b93ee12   192.168.87.213   3.71.78.190     Bottlerocket OS 1.18.0 (aws-k8s-1.26)   5.15.145         containerd://1.6.26+bottlerocket

> sonobuoy run --mode=quick --wait
[...]
17:32:57    systemd-logs   ip-192-168-71-140.eu-central-1.compute.internal   complete   passed                                        
17:32:57    systemd-logs   ip-192-168-77-109.eu-central-1.compute.internal   complete   passed                                        
17:32:57    systemd-logs   ip-192-168-87-213.eu-central-1.compute.internal   complete   passed                                        
17:32:57             e2e                                            global   complete   passed   Passed:  1, Failed:  0, Remaining:  0
17:32:57 Sonobuoy plugins have completed. Preparing results for download.
17:33:17 Sonobuoy has completed. Use `sonobuoy retrieve` to get results.

Changes to the configs as reported by tools/diff-kernel-config:

config-aarch64-aws-k8s-1.23-diff:        15 removed,   0 added,   4 changed
config-aarch64-aws-k8s-1.26-diff:         0 removed,   0 added,   0 changed
config-aarch64-aws-k8s-1.28-diff:         0 removed,   0 added,   0 changed
config-x86_64-aws-k8s-1.23-diff:          0 removed,   0 added,   0 changed
config-x86_64-aws-k8s-1.26-diff:          0 removed,   0 added,   0 changed
config-x86_64-aws-k8s-1.28-diff:          0 removed,   0 added,   0 changed
config-x86_64-metal-k8s-1.26-diff:        0 removed,   0 added,   0 changed
config-x86_64-metal-k8s-1.28-diff:        0 removed,   0 added,   0 changed
config-x86_64-vmware-k8s-1.26-diff:       0 removed,   0 added,   0 changed
config-x86_64-vmware-k8s-1.28-diff:       0 removed,   0 added,   0 changed

The full diff-report can be found on Gist.

The changed configuration in the 5.10 kernel on aarch64 boils down to two groups:

  • DRM_QXL: Removed the driver for a virtualized GPU used in remote desktop scenarios. It's no longer supported by Amazon Linux. Nothing of value is lost for Bottlerocket's scenario.
  • MLX4 and friends: Dropping the drivers for 4th gen Mellanox NICs. Those are only relevant in bare metal scenarios. Amazon Linux enabled those drivers for x86_64 as well (they were already enabled for aarch64) to reach parity. However, for Bottlerocket the reverse makes more sense: We don't support aarch64 on bare metal (and not with the 5.10 kernel), and ship drivers for the 5th gen Mellanox NICs instead.

The backport for #3691 I tested via aws-k8s-1.27 on x86_64. I verified that probes can be placed on symbols defined in loadable modules (via qualified and unqualified names, using nf_nat_packet and nf_nat:nf_nat_packet, respectively), and that ambiguous probe definitions are still rejected (using kzalloc):

bash-5.1# uname -r
5.15.145
bash-5.1# cd /sys/kernel/tracing/              
bash-5.1# > kprobe_events
bash-5.1# echo 'p nf_nat_packet' >>kprobe_events
bash-5.1# echo 'p kzalloc' >>kprobe_events      
bash: echo: write error: Address not available
bash-5.1# grep -wc kzalloc /proc/kallsyms 
8
bash-5.1# cat kprobe_events  
p:kprobes/p_nf_nat_packet_0 nf_nat_packet

Terms of contribution"

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Rebase to Amazon Linux upstream version 5.10.205-195.804.amzn2.

Signed-off-by: Markus Boehme <markubo@amazon.com>
Rebase to Amazon Linux upstream version 5.15.145-95.156.amzn2.

Signed-off-by: Markus Boehme <markubo@amazon.com>
Rebase to Amazon Linux upstream version 6.1.66-93.164.amzn2023.

Signed-off-by: Markus Boehme <markubo@amazon.com>
The recent update to the 5.10 kernel picked up a bunch of Mellanox
network drivers from the Amazon Linux kernel. We don't need those, so
deactivate them again.

Signed-off-by: Markus Boehme <markubo@amazon.com>
Commit b022f0c7e404 ("tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols")
in the upstream kernel introduced a regression where kprobes cannot be
created on functions residing in loadable modules if the probe location
is identified by an unqualified function name.

The faulty commit was backported to the 5.15, but a backport of the fix
is not yet available. Carry the fix here for release preparation and
seek resolution upstream as soon as possible. This may mean upstreaming,
or reverting the faulty commit entirely (sentiment for the faulty commit
has soured, and it has been yanked already from patch queues for older
stable series).

Signed-off-by: Markus Boehme <markubo@amazon.com>
@markusboehme
Copy link
Member Author

Cancelled the workflows while I'm updating the look-aside cache.

@jpculp jpculp merged commit a92f49e into bottlerocket-os:develop Jan 11, 2024
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants