-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support #[target_feature(enable = ...)]
on #[naked]
functions
#137720
base: master
Are you sure you want to change the base?
support #[target_feature(enable = ...)]
on #[naked]
functions
#137720
Conversation
// FIXME share code with `create_object_file` | ||
fn parse_architecture( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this there some standard way of getting the architecture that I'm missing? I'd like the rest of the code to operate on an enum rather than doing a bunch of manual string matching.
89d44c1
to
4626bcc
Compare
// no action is needed, all instructions are accepted regardless of target feature | ||
} | ||
|
||
Architecture::Aarch64 | Architecture::Aarch64_Ilp32 | Architecture::Arm => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for Arm (AArch32), some target features like v8
, neon
, don't seem to work with .arch_extension
: https://godbolt.org/z/GTK3vqfrh
As for v*
features, they work with .arch armv*
: https://godbolt.org/z/7oj9njWG6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the docs list only three features for .arch_extension
on aarch32: crc, fp16 and ras. I could see how the architectures are special, but even dotprod
does not seem to work?!
also for the arch: because there is no push/pop, we can't get back to the original arch if we set it, right? I did figure out that you can .fpu neon
for neon, but like arch
, there is no way to reset the fpu once you set it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some further work categorizing the features that rust/llvm supports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, good news, I think there is a way forward here.
global_asm!
being stateful (as mentioned here #137720 (comment)) cannot be relied on by users because of this rule in the reference
https://doc.rust-lang.org/reference/inline-assembly.html#r-asm.rules.not-successive
Therefore, we just need to make sure that at the end of a naked function, the settings are as configured via the global target features. It's less elegant that the push/pop mechanism , and for arm that will still be a bit of a mess, but it seems workable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, after looking at it a bit, I still think it's possible, but hard. I have some partial code here https://gist.github.com/folkertdev/fe99874c466e598d0fb2dadf13b91b6f but I think this really needs someone more familiar with the arm target features to do a good job on.
(or, like, can we get LLVM to add a push/pop mechanism for us?)
So, in combination with target features being unstable on arm anyway, I would prefer to skip that logic for now, keep this PR straightforward, and get target features working on targets with stable asm support first.
I think this is as far as I'm going to get on my own (well with a bunch of help from taiki too, thanks!). Questions I don't have a good answer to:
r? @Amanieu |
// only enable/disable a feature if it is not already globally enabled. | ||
// so that we don't infuence subsequent asm blocks | ||
if !tcx.sess.unstable_target_features.contains(&feature.name) { | ||
writeln!(begin, ".arch_extension {}", feature.name).unwrap(); | ||
|
||
writeln!(end, ".arch_extension no{}", feature.name).unwrap(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't actually appear to work in some cases because due to a bug in LLVM (#113221), even if the feature is enabled on the Rust side, it is not passed to the assembler...
However, given that the bug appears conditionally, I think it is also wrong to always set both .arch_extension {feature}
and .arch_extension no{feature}
.
A bit odd, but setting .arch_extension {feature}
always and .arch_extension no{feature}
conditionally would work in both cases.
In that approach, .arch_extension {feature}
will affect subsequent assemblies if the feature is enabled, but what it enables is a globally enabled feature on the Rust side, so I guess there is probably no adverse effect. EDIT: see #137720 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there is probably no adverse effec
Nah, this doesn't met stateful directive rule...
And I think the s390x implementation of this PR that attempts to restore state based on features enabled on the Rust side has the same problem...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. We now have a separate feature flag for target features on naked functions (so that naked functions themselves can be stabilized), #138568. At this point I'm sceptical that we can make the approach of this PR work in general.
So, that means that either
- we use this approach on platforms where it is guaranteed to work
- we just have the user handle it (and can help out with some good documentation for that platforms that we've looked at)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I think it is safer to only support the x86/x86_64 which obviously does not require anything and architectures which support push/pop logic such as riscv, and make errors otherwise. Once the LLVM bug is fixed, we can also support aarch64 and s390x (and probably arm).
- we just have the user handle it (and can help out with some good documentation for that platforms that we've looked at)
Considering the LLVM bug, I guess it is also difficult to use it properly on the user side. (as for aarch64 and s390x)
fixes #136280
Instructions that are part of a target feature require a special directive on some targets. This PR adds those for the most common targets.
This is very WIP, but I'm hoping to collect some feedback on what is (not) supported and how to report that to users.
r? @ghost
cc @taiki-e @Amanieu