-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with Qualcomm kernel modules and BTI since at least LLVM 17 #2022
Comments
@samitolvanen does this issue ring a bell? I am not sure if it is just a straight CFI failure or if there is something else here. Based on the source, this is not kCFI, so maybe regular CFI regressed in clang? |
This looks like a CFI failure to me, nothing suggests it's related to BTI. It's possible that previous Clang versions inlined the function being called, or converted the indirect call into direct call, and newer versions no longer do that, thus tripping CFI. I realize the error message isn't very informative, but can you identify which function is getting called here? |
I got a decoded stacktrace here.
Unsetting |
Looks like
I think you'd want to decode this into arm64 assembly instead. However, since the faulting instruction is in the CFI error handler in the module, this probably won't be of much help anyway. |
Thanks, this was rather helpful. With some logs I was able to track the issue down: It is this function call that triggers the CFI error: The method signature of the calling method is static void ipa_pkt_status_parse_v5_0(
const void *unparsed_status, struct ipahal_pkt_status *status) and the signature of the method it is calling here is static void __ipa_parse_gen_pkt_v5_0(struct ipahal_pkt_status *status,
const void *unparsed_status) with the arguments provided being the same as received in the function parameters: void (*__parse_gen_pkt)(struct ipahal_pkt_status *status,
const void *unparsed_status);
[snip]
ipahal_pkt_status_objs[ipahal_ctx->hw_type].\
__parse_gen_pkt(status, unparsed_status); To me, this looks like the types are perfectly matched. Any idea what is wrong here? Does CFI not like the void pointers? |
We just discovered that there is more than just BTI that triggers this. LTO is another way to cause this:
|
OK, since the function types do match, it sounds like you're running into some kind of a Clang CFI bug here. I have seen this before when the function declaration didn't match the definition, so you might want to double check that there are no subtle differences there. Full LTO and/or dropping BTI probably causes the compiler to optimize this differently. If the call becomes a direct call during optimization or the called function is completely inlined, the CFI check is dropped and you won't see the failure. |
this same issue is still present, is there any fix from upstream? |
I think someone needs to bisect LLVM to see what caused this if it is only an LLVM upgrade that causes this (has anyone tried using a newer version of Android clang to see if it is still reproducible there?). I'm happy to try and help walk someone through doing that using tc-build but it will take more computing resources because it is building a toolchain from scratch then building a kernel using that toolchain. So far, I don't think there is enough information in this thread currently to see why this is happening and it is a lower priority issue for me personally because it appears to occur only in out of tree Android code and upstream Linux does not use the same CFI implementation anymore. |
Hi, I'm not 100% sure if this is the correct place for the specific issue we're running into. My knowledge of BTI is very limited, so I can't say that it's not the kernel module code causing this.
Issue description
We're building the Google/Qualcomm provided android12-5.10 kernel for Xiaomi Qualcomm SM8450 devices (our kernel sources can be found here). The Google GKI defconfig enables BTI implicitly via some security-related configs and builds + runs fine if compiled with clang-r450784e from the Google prebuilts, which is based on LLVM 14.
Ever since we started building Android 14, which defaults to clang-r498229b (based on LLVM 17), devices are failing to boot to system unless BTI is disabled, which isn't an option for us because it makes it impossible to boot with a Google certified GKI boot image.
This is due to a kernel panic caused by a BTI violation:
The relevant code for this is Qualcomm's IPA implementation, here's a link to the specific method in the Qualcomm sources and one to it in our module sources. I don't see any obvious issues here, and I believe this is likely to be a false positive by LLVM?
Just to be sure, I've also compiled the kernel with clang-r522817, which is based on LLVM 18, and the issue persists there.
Again, if this is the wrong place to report this, I'd appreciate if you could direct me towards where to report this issue. Thanks!
The text was updated successfully, but these errors were encountered: