Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aya: Fix program loading on kernels with a patch > 255 #791

Merged
merged 1 commit into from
Sep 27, 2023

Conversation

nrxus
Copy link
Contributor

@nrxus nrxus commented Sep 22, 2023

This fixes an issue that made aya not able to load eBPF programs on kernels with a patch number greater than 255.

I noticed this when trying to load some eBPF programs on an AWS instance running a 4.14.304-226.531.amzn2.x86_64 kernel. The kernel code that it tried to load was for a 4.15 kernel instead which caused a verifier error.

I considered doing this directly in the KernelVersion::current function but per the comments in: torvalds/linux@9b82f13 it looks like the clamp should only be applied for the context of creating the LINUX_VERSION_CODE so I limited the fix to only the calculating of the code for bpf prog loading.

@netlify
Copy link

netlify bot commented Sep 22, 2023

Deploy Preview for aya-rs-docs ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 0a6a267
🔍 Latest deploy log https://app.netlify.com/sites/aya-rs-docs/deploys/651462d4578e220008f0ff0e
😎 Deploy Preview https://deploy-preview-791--aya-rs-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@mergify mergify bot added the aya This is about aya (userspace) label Sep 22, 2023
@alessandrod
Copy link
Collaborator

Thanks! Could you also paste the verifier error? @marysaka has been using kernels with patch > 255 so I'm wondering how we missed this failure and where it shows up

@marysaka
Copy link
Member

I'm pretty sure I tested kernel with patch > 255 apart from Debian ones as I was testing all kernel of every major distro around (cc @Arignir could you confirm that?)

@nrxus
Copy link
Contributor Author

nrxus commented Sep 23, 2023

Could you have been using kernels that are above 5.0? Starting on 5.0 the checks for kern_code were skipped: torvalds/linux@6c4fc209fcf9

The verifier doesn't have any output on the error, it just returns an EINVAL. The way we realize it was due to an incorrect kern_code was by straceing the syscall and saw that it was emitting a code for a 4.15 kernel

@tamird
Copy link
Member

tamird commented Sep 23, 2023

I added clamping logic in 6e570f0 -- maybe that was overly specific to 4.19.x and can be relaxed rather than adding disjoint logic?

@tamird
Copy link
Member

tamird commented Sep 23, 2023

The commit you linked first appeared in 5.x kernels - how does libbpf deal with this case on that aws 4.14 kernel?

@nrxus
Copy link
Contributor Author

nrxus commented Sep 23, 2023

I added clamping logic in 6e570f0 -- maybe that was overly specific to 4.19.x and can be relaxed rather than adding disjoint logic?

I saw that clamping code but I wasn't sure if that was meant for checking the kernel version in major/minor/patch form or if it was meant specifically to clamp the code during prog load. For this PR I specifically only want to target the kern code during prog load.

The commit you linked first appeared in 5.x kernels - how does libbpf deal with this case on that aws 4.14 kernel?

The helper in libbpf clamp the patch to 256 regardless of the major/minor version:
https://github.com/torvalds/linux/blob/d90b0276af8f25a0b8ae081a30d1b2a61263393b/tools/lib/bpf/bpf_helpers.h#L74

@tamird
Copy link
Member

tamird commented Sep 23, 2023

Even that macro appeared fairly late torvalds/linux@9ae2c26

@nrxus
Copy link
Contributor Author

nrxus commented Sep 23, 2023

It looks like libbpf when doing prog load relies on the kernel provided KERNEL_VERSION_CODE macro if I understand this thread correctly: https://lore.kernel.org/lkml/2023041729-luckiness-active-3ddb@gregkh/T/. Since aya couldn't use that then I think it should clamp to 255 which is what newer kernel versions do when calculating the kernel version code

@alessandrod
Copy link
Collaborator

My thoughts so far: I like the idea of clamping only when generating the kernel code, instead of always clamping in KernelVersion::current() - API wise a caller might want to get the actual patch level for other puposes not necessarily for generating a kernel code.

Once we split code generation, we can clamp regardless of 4.x 5.x X.x?

@nrxus
Copy link
Contributor Author

nrxus commented Sep 24, 2023

Does this mean you'd like me to move the pre-existing clamping code into the crate public method I created?

@alessandrod
Copy link
Collaborator

Does this mean you'd like me to move the pre-existing clamping code into the crate public method I created?

Yes, I don't think clamping slightly differently in two difference places is a good idea

@nrxus nrxus force-pushed the fix-kernel-code-on-submode-gt-255 branch from c01bc92 to ead9987 Compare September 25, 2023 06:52
Copy link
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice archaeology digging up all the commits.

aya/src/util.rs Outdated Show resolved Hide resolved
aya/src/util.rs Show resolved Hide resolved
aya/src/util.rs Outdated Show resolved Hide resolved
@nrxus nrxus force-pushed the fix-kernel-code-on-submode-gt-255 branch from ead9987 to 0819562 Compare September 25, 2023 16:02
@nrxus nrxus requested a review from tamird September 25, 2023 16:04
@nrxus nrxus force-pushed the fix-kernel-code-on-submode-gt-255 branch from 0819562 to 63fba8e Compare September 25, 2023 17:02
@nrxus
Copy link
Contributor Author

nrxus commented Sep 25, 2023

The integration tests are failing on something unrelated, the image that it is trying to use is no longer available in the Debian ftp. I am not really sure what image would be more stable so you all don't have to keep fixing it every few weeks though

@tamird
Copy link
Member

tamird commented Sep 25, 2023

I'm pretty confused about that - I took a look earlier and the most recent 6.4.0-x image is

linux-image-6.4.0-0.deb12.2-cloud-amd64-unsigned_6.4.4-3~bpo12+1_amd64.deb	2023-08-16 06:27	26M

I don't understand why it would have been reverted back to August 16.

@tamird tamird force-pushed the fix-kernel-code-on-submode-gt-255 branch from 63fba8e to 53cc4c4 Compare September 27, 2023 15:14
@mergify mergify bot added the test A PR that improves test cases or CI label Sep 27, 2023
@nrxus nrxus force-pushed the fix-kernel-code-on-submode-gt-255 branch from 7217132 to 53cc4c4 Compare September 27, 2023 16:42
@tamird tamird force-pushed the fix-kernel-code-on-submode-gt-255 branch from 53cc4c4 to 0a6a267 Compare September 27, 2023 17:13
@tamird tamird merged commit 6786383 into aya-rs:main Sep 27, 2023
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aya This is about aya (userspace) test A PR that improves test cases or CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants