Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline Assembly. #444

Open
lachlansneff opened this issue Aug 6, 2018 · 38 comments
Open

Inline Assembly. #444

lachlansneff opened this issue Aug 6, 2018 · 38 comments

Comments

@lachlansneff
Copy link
Contributor

@lachlansneff lachlansneff commented Aug 6, 2018

Since cranelift is soon to be a backend for rust, it will need to support inline assembly. There is no good way to solve this right now, since rust uses the llvm inline asm syntax right now. I'm making this issue so we can think about this in the long-term.

@71

This comment has been minimized.

Copy link
Contributor

@71 71 commented Sep 8, 2018

I just had an idea that's very simple, but also very "raw," but I suppose it would be pretty good for the time being.
What if we simply allowed inline bytes, with the added ability to interpolate Cranelift values?

For example:

01 C0      ; add eax, eax
0F AF $v1  ; imul eax, v1
C3         ; ret

Edit: Obviously there is the problem of Cranelift values being encoded either as registers or stack variables for instance, but maybe some annotations could be added for a value to be in a register, like $(v1:reg) or $(ebb2:rel8).

@lachlansneff

This comment has been minimized.

Copy link
Contributor Author

@lachlansneff lachlansneff commented Sep 8, 2018

I'd be all for this for the time being actually, with a small modification.

Treat a block of code as an ebb essentially. Say what inputs it gets and what they should be stored into (registers, stack offsets, etc). The actual code block would be completely opaque. This would prevent cranelift from needing a disassembler.

For example:

asm(eax: v1, ebx: v2) -> eax:
  01 c0          ; add eax,eax
  0f af c3       ; imul eax,ebx

or

asm(v1 -> eax, v2 -> ebx) -> eax:
  01 c0          ; add eax,eax
  0f af c3       ; imul eax,ebx
@lachlansneff

This comment has been minimized.

Copy link
Contributor Author

@lachlansneff lachlansneff commented Sep 8, 2018

I'd like to see what @sunfishcode has to say on this.

@sunfishcode

This comment has been minimized.

Copy link
Member

@sunfishcode sunfishcode commented Sep 9, 2018

Interesting idea. My first question is, how do you envision using this?

Looking at the idea itself, a factor to take into consideration is that machine encodings are really complex. A register doesn't typically end up being encoded in a byte; it's usually smaller than a byte, its position in its containing bytes will depend on which operand of the instruction it's for, and sometimes it requires other changes elsewhere in the instruction. Some examples:

$ cat t.s
movq (%rcx), %rcx # The encoding of %rcx depends on where it appears in the instruction!
movq (%rsi), %rcx # These use different REX prefix bits than the instructions below!
movq (%r11), %rcx
movq (%r12), %rcx # This needs a extra byte!
movq (%r13), %rcx # This needs an extra byte in a different way!
movq (%r14), %rcx
$ cc -c t.s
$ objdump -d t.o
...

I think we could still make this system work, if we defined a sufficiently elaborate manifest that could describe all the edits that one would need to make once one knows what registers and stack slots everything will be in. That'd be fairly elaborate, but it might still be simpler than fully parsing instructions from text.

@lachlansneff

This comment has been minimized.

Copy link
Contributor Author

@lachlansneff lachlansneff commented Sep 9, 2018

I'd use it for inline stuff on Nebulet. Like talking to io ports, etc.

@sunfishcode

This comment has been minimized.

Copy link
Member

@sunfishcode sunfishcode commented Sep 11, 2018

A few more thoughts here:

I missed this above: in @lachlansneff's variant above, the IR specifies all the registers, so it wouldn't actually require Cranelift to fill in register values. It'd literally be a block of bytes that simply requires certain values in certain fixed registers at input and output. And maybe a list of clobbers. That seems like it wouldn't be too hard to implement. That said, it might actually be too simple, in the sense that it'd work for very simple things, but would be difficult to evolve into something that does more.

If you're willing to specify all the register inputs and outputs yourself, it wouldn't be that much different to just put the code you want in a .s file and then call it, with a specialized calling convention if you want. Support for custom calling conventions is something we do have other uses for, so we wouldn't mind improving that.

Or, for in/out or similar instructions on x86, would it be enough if we exposed whatever instructions you need as instructions in Cranelift, so that you could use them, similar to compiler intrinsics? This would be easy to implement, and much easier for anyone writing an OS to use.

@nbp

This comment has been minimized.

Copy link
Collaborator

@nbp nbp commented Sep 11, 2018

For your information, there is a rustc issue to stabilize how to do inline assembly. In particular, I suggested on this other thread a similar idea as @6a and @sunfishcode suggested, i-e. to have a set of bytes with constraints.

This idea got some push back on the basis that some people want to do inline assembly with a register allocation handled by the compiler, but I honestly do not see how this can be done without standardizing some form of assembly code.

rust-lang/rust#29722

@sunfishcode

This comment has been minimized.

Copy link
Member

@sunfishcode sunfishcode commented Sep 12, 2018

In case anyone wants to see where the idea above might go, here's a slightly more complete sketch. Here's a hypothetical IR structure for this form of inline asm:

struct InlineAsm {
    /// the raw bytes to start with
    data: vec<u8>,
    /// descriptions of all explicit input and output values
    constraints: vec<Constraint>, 
    /// extra registers, "memory", or other machine state which is clobbered
    clobbers: vec<String>,
    /// patches which can change or add bytes
    patches: vec<Patch>,
}

struct Constraint {
    // TODO: in/out/inout, tied, earlyclobber, hints, alternatives, register/memory/immediate classes, etc.
}

struct Patch {
    offset: u64, // byte offset in `data` *before* any patches are applied
    contents: PatchDetails,
}

enum PatchDetails {
    X86RexPrefix {
        w: bool, // REX.w
        inputs: vec<u64>, // indices in `constraints` for register operands which determine if a REX prefix is needed and if so what bits it should have.
    }
    X86ModRMRegField {
        input: u64, // index in `constraints` for the register to encode
    }
    X86ModRMRMField {
        input: u64, // similar
    }
    P2Align {
        // Insert fill bytes to align the following code to a 1<<p2align boundary.
        p2align: u8,
        // The byte value to insert.
        fill: u8
    }
    // TODO: lots more stuff here
}

I think this would deliver most of what LLVM/GCC-style inline asm do, assuming we filled it out. And it would eliminate the need for the backend to parse assembly text. That said, it's still fairly complex. And of course, no existing assemblers are built to work this way, including LLVM's assembler, so it would be a bunch of work to implement even in LLVM.

@lygstate

This comment has been minimized.

Copy link

@lygstate lygstate commented Oct 31, 2018

Any idea? Inline assembly are essential for constructing OS or bare-metal environment.

@sunfishcode

This comment has been minimized.

Copy link
Member

@sunfishcode sunfishcode commented Oct 31, 2018

I'm not aware of any easy answers. Inline asm is a huge, complex, and nebulous set of features.

From a broader perspective, there's a form of an XY problem. If you ask for "inline asm", it may be years before we can deliver it. We want to help, but you're asking for a lot, and we don't know of any way to do it faster while still achieving our other goals. If you instead were to ask for access to certain machine instructions, or control registers, or ways to guide the register allocator around key pieces of code, or other specific things, plausible solutions sometimes could take just a few days to implement. With no exaggeration, more specific features can often be a thousand times easier to implement, and typically end up more robust, with more well defined interfaces, and better-understood implementations.

We may still implement inline asm eventually. People interested in seeing it happen are encouraged to get involved and help out!

@skyne98

This comment has been minimized.

Copy link

@skyne98 skyne98 commented Nov 15, 2018

Hey, @sunfishcode, as far as I understand we cannot just allow using common register names like eax, ebx and stuff, right? It will require compiling the code with some of the already existing assembly compilers.

@nbp

This comment has been minimized.

Copy link
Collaborator

@nbp nbp commented Nov 15, 2018

@skyne98 , you can and you should because some instructions require special registers. However, many uses of inline assembly are meant to provide an assembly template with which a register allocator can work with. For example, making a SIMD library which is using inline assembly does not want to hard-code a specific register allocation, as multiple consecutive uses of this inline assembly would cause too much register congestion that would penalize the performance of SIMD.

@skyne98

This comment has been minimized.

Copy link

@skyne98 skyne98 commented Nov 15, 2018

So, what I mean is that the template you provide does not map directly to some binary instruction, it depends on things -- therefore you cannot just parse them, convert to binary and put into the "executable", right?

Otherwise, what is the complexity @sunfishcode was talking about? SimpleJIT is already "jitting" out some binary code.

@sunfishcode

This comment has been minimized.

Copy link
Member

@sunfishcode sunfishcode commented Nov 15, 2018

That's right. In general, the template cannot be parsed until the substitutions are made. And it may contain more than just instructions; it may use arbitrary assembler directives like .align or .section
or other things which have effects other than just contributing bytes to add to the output. And they may make use of arbitrary symbolic expressions that need to be evaluated and/or patched in later. They may rely on numerous historical assembler parsing quirks. They may use complex and interdependent operand constraints, and they may require the compiler's register allocator to find optimal solutions to those constraints. They may have ambiguous constraints like "memory" which ostensibly clobbers all memory but in practice compilers treat as not literally all memory. And so on.

@skyne98

This comment has been minimized.

Copy link

@skyne98 skyne98 commented Nov 16, 2018

Also, @sunfishcode, what do you think about this project. To me seems like it can be at least partially helpful, however I have to admit I haven't read the source yet.

Also, I don't think we need all the features you mentioned above. What I was trying to ask before is that if doing substitutions is possible and not too hard to do, then introducing only adding basic instructions like stack pushes, pops and 'mov'-s (just add 20 most used instructions in kernels as well as some important ones, we may even try to analyze code for that) will help A LOT and make writing bare metal code at least possible to do.

What do you think?

@skyne98

This comment has been minimized.

Copy link

@skyne98 skyne98 commented Nov 16, 2018

Dynasm-rs seems to be doing exactly what our problem is -- doing JIT compilation, but doing it for assembly template strings. Sounds very promising to me, especially considering the fact that the creator says that his project is in alpha state!

It is currently in alpha, meaning that while everything should work, a lot of features need to be tested

And, it is also capable of doing things like this (just for reference):

#![feature(plugin)]
#![plugin(dynasm)]

#[macro_use]
extern crate dynasmrt;

use dynasmrt::{DynasmApi, DynasmLabelApi};

use std::{io, slice, mem};
use std::io::Write;

fn main() {
    let mut ops = dynasmrt::x64::Assembler::new().unwrap();
    let string = "Hello World!";

    dynasm!(ops
        ; ->hello:
        ; .bytes string.as_bytes()
    );

    let hello = ops.offset();
    dynasm!(ops
        ; lea rcx, [->hello]
        ; xor edx, edx
        ; mov dl, BYTE string.len() as _
        ; mov rax, QWORD print as _
        ; sub rsp, BYTE 0x28
        ; call rax
        ; add rsp, BYTE 0x28
        ; ret
    );

    let buf = ops.finalize().unwrap();

    let hello_fn: extern "win64" fn() -> bool = unsafe {
        mem::transmute(buf.ptr(hello))
    };

    assert!(
        hello_fn()
    );
}

pub extern "win64" fn print(buffer: *const u8, length: u64) -> bool {
    io::stdout().write_all(unsafe {
        slice::from_raw_parts(buffer, length as usize)
    }).is_ok()
}
@sunfishcode

This comment has been minimized.

Copy link
Member

@sunfishcode sunfishcode commented Nov 16, 2018

Dynasm is a cool library and we're prototyping with it in Lightbeam. It does its parsing at compile time rather than runtime, so in its current form we can't just drop it into Cranelift to parse strings being passed in (Cranelift's runtime is the user's compile time). But if what you need to do can be done with dynasm, then you can certainly use it yourself directly.

If someone added dynamic string parsing to dynasm, that could be interesting. That said, while some users don't need lots of features, others do, so it likely wouldn't suffice for the long term.

Also, the hard part of the operand constraint problem is computing a set of registers and stack slots that satisfy all the constraints. Actually push/moving data into place is something we already have to do to support for other features :-).

Analyzing inline asm usage in kernels may help here. Part of the problem with inline asm is that it's a big sprawling set of features, but if we could identify restricted feature sets that would work in practice for at least some users, that might give us more options.

@skyne98

This comment has been minimized.

Copy link

@skyne98 skyne98 commented Nov 18, 2018

If we can help in any way, @sunfishcode, please tell us :)

@sunfishcode

This comment has been minimized.

Copy link
Member

@sunfishcode sunfishcode commented Nov 18, 2018

I included some ideas for projects people could start on in my posts above. I'm happy to answer specific questions, or to mentor people on projects.

@skyne98

This comment has been minimized.

Copy link

@skyne98 skyne98 commented Nov 19, 2018

I would have liked to start such a project, but I don't think I am experienced enough in this area. I am up for experimentation, though, with decent mentorship 😄 My general idea was to try to write a simple kernel that was "jitting" itself, so I guess it could have been a good playground for testing such a project.

@bjorn3

This comment has been minimized.

Copy link
Contributor

@bjorn3 bjorn3 commented Nov 19, 2018

How about:

struct InlineAssembly {
    inputs: Vec<Value>,
    outputs: Vec<Value>,
    contraints: isa::RecipeConstraints,
    emit: Box<Fn(Vec<isa::registers::RegUnit>) -> Vec<u8>>
}

This way cranelift only has to know the register constraints.

@sunfishcode

This comment has been minimized.

Copy link
Member

@sunfishcode sunfishcode commented Nov 19, 2018

@skyne98 If you're thinking about JITing, then it sounds like you're more in dynasm's space. Which is a cool space, so go ahead and have fun!

@bjorn3 Yes, if we generalized RecipeConstraints to not require static constraint sets, something along those lines might suffice for a subset of inline assembly use cases.

@skyne98

This comment has been minimized.

Copy link

@skyne98 skyne98 commented Nov 19, 2018

@sunfishcode, more precisely my plan was to implement a small language frontend, then use cranelift as a backend. I did not intend to make a JIT compiler from the ground up, cranelift just looked to me as a much better alternative to LLVM for such a job.

@sunfishcode

This comment has been minimized.

Copy link
Member

@sunfishcode sunfishcode commented Nov 20, 2018

@skyne98 Implementing a small language frontend, using Cranelift's as a backend sounds like a great project! Check out the simplejit-demo for an example of how to get started, and please ask questions if anything is unclear.

This GitHub issue is about inline asm, and most languages can be implemented without the use of inline asm. If you encounter something that seems to require inline asm, please ask about it, as we may be able to find alternatives.

@skyne98

This comment has been minimized.

Copy link

@skyne98 skyne98 commented Nov 21, 2018

@sunfishcode, the "problem" is actually that eventually I want to try to write a kernel in it, and then it will need to have an ability to have inline assembly. That's really the reason I started writing here.

Also, I would really love to try using Cranelift as a backend, and it will be even more awesome if you could have done a code review for the project in the future 😄

@sunfishcode

This comment has been minimized.

Copy link
Member

@sunfishcode sunfishcode commented Nov 25, 2018

What features does your kernel need? Do you need access to specific instructions? Access to control registers? If you can name the features you need, I can help you design proper features for them, that will be safer to use, and more robust, than inline asm.

@skyne98

This comment has been minimized.

Copy link

@skyne98 skyne98 commented Nov 25, 2018

For example, even setting up virtual memory and swapping out the tables on context switches requires some inline assembly. How can you build around that? Writing interface to them in rust and then just calling them by invoking rust functions?

Or do you mean implementing a couple of custom features to the cranelift itself?

@sunfishcode

This comment has been minimized.

Copy link
Member

@sunfishcode sunfishcode commented Nov 25, 2018

What instructions would you use to set up virtual memory and swap out the tables on context switches? If you can show me a sequence of instructions that you need to emit, I can help you design a way to emit those instructions using Cranelift without using inline asm.

@vitiral

This comment has been minimized.

Copy link

@vitiral vitiral commented Sep 15, 2019

Does there exist native assembly which is impossible for cranelift to emit comparable IR?

Or the real question: could we be thinking of this backwards? What if the rust compiler took a user's "inline assembly" and emitted cranelift IR (when cranelift was the backend). It would not be ideal, but it would prevent having to worry about standardizing something on cranelift's side and fix the issue on rust's side.

Edit: also if someone is specifying inline assembly aren't they supposed to know their back-end? Why shouldn't they be forced to rewrite the assembly blob in cranelift IR?

@bjorn3

This comment has been minimized.

Copy link
Contributor

@bjorn3 bjorn3 commented Sep 15, 2019

What if the rust compiler took a user's "inline assembly" and emitted cranelift IR (when cranelift was the backend).

First, cranelift is allowed to add arbitrary spill, fill and regmove instructions between the generated clif ir when regalloc thinks it is necessary. Cranelift is also allowed to perform arbitrary optimizations. The reason to write inline asm may be to prevent all those things, as your asm is faster, or the only correct one. If you want to setup a stack in inline asm, you dont want clif to insert spills before you are done. If you want to save and restore all registers in an OS, you dont want clif to insert writes to not yet saved/already restored registers.

Second, a common reason to write inline asm is because you want to use a certain instruction. If you translate inline asm to clif ir, you would have to implement all thousands of instructions existing on the target arch. This is the main problem why clif doesnt support inline asm. It would take months at least to implement them all for just one arch.

@sunfishcode

This comment has been minimized.

Copy link
Member

@sunfishcode sunfishcode commented Sep 15, 2019

Yes, adding all the instruction encodings is a non-trivial project, though it is doable. It's also the easy part.

One might assume assembly files are easy to parse. After all, it's just "mnemonic [operand]*". However, assembly files have an elaborate syntax. And that's not to mention the expression syntax with its own surprising operator precedence rules. Symbols should be easy because they're just names, but naming turns out to be it's own little special world.

Back to parsing instructions; it should be easy, but even just parsing the mneomonics presents interesting subtleties. Parsing the operands also involves subtle concerns. And then there are the bugs you have to emulate. And then, because that was so much fun, some architectures have multiple syntaxes.

Beyond parsing, let's talk about all the directives. Besides having multiple macro expanders to implement, many directives do subtle things with sections and fragments, exposing a lot of what would otherwise be implementation details, so your compiler backends basically have to be architected according to how C compilers have traditionally been architected in order to support all the interactions between compiler code and assembly code.

But we're getting to the end of the as manual, so the end is in sight, right?

Now we switch to the gcc manual. Extended asm comes with an elaborate operand description language and constraint system. Don't underestimate this; this is a complicated system that needs to interact in deep ways with a compiler's register allocator, which is already one of the most complicated parts of any compiler. And it has to be a good solver, because users can (and in practice do) use constraints to ask for the number of registers the machine has, forcing the constraint solver to find optimal solutions if it wants to be able to compile code at all.

And while many of the machine-specific constraints sound simple, they (a) are all things the register allocator has to understand in detail, (b) can result in fixups and relocations that the whole compiler backend has to be able to represent, and (c) have complex interactions with other constraints in ways that aren't always documented.

Also, don't miss the section on goto labels, because this means that an inline asm is effectively an arbitrary branch instruction too, so it can also create arbitrary n-way control-flow constructs which the rest of your compiler backend now has to be able to understand, and which the register allocator has to be able to spill and reload around, to satisfy the already complex constraint systems, because it all wasn't complex enough already.

And with all the compiler implementation details inline asm exposes, there are no clear rules for what parts of the compiler's behavior are stable, and which (presumably) are undefined behavior to rely on. Can you do .popsection within an inline asm and get to a predictable place? Are the special labels the compiler emits to mark the end of the function and other things ok to use? Can you .section switch into the constant-pool section or other compiler-generated sections? Can you assume that a function's entry point is the lexically first instruction in a function? Can you put data which are not instructions in the middle of a .text section if you have a branch around it? Can the compiler inline functions containing inline asm sections? How about unroll loops containing inline asm sections? What happens if you use a "tied" operand constraint to tie together operands of different widths? Can you assume that an "m" constraint will print the memory operand in a syntactic form which permits you to add modifiers to it from the template string? Can inline asm in one function refer to inline asm in another function in any way?

If we work to add inline asm to Cranelift, with our current resources, it will take us multiple years, and delay other features. This is not an exaggeration, because I'm familiar with the effort it took the LLVM project to adequately implement GCC-style inline asm to support common code, with far more resources, and to this day it's still not uncommon to find things that don't work. And, doing so will make it harder for us to maintain and evolve Cranelift beyond that, because it would lock the backend architecture into certain ways of doing things, because so much is exposed.

Furthermore, this is not a project where people can easily contribute small steps to help get to the eventual goal. There is major design work to be done, deep within the most complex parts of the backend.

@bjorn3

This comment has been minimized.

Copy link
Contributor

@bjorn3 bjorn3 commented Sep 15, 2019

@sunfishcode I didnt realize that inline asm was that complex. I assumed it would just be a matter of implementing all instructions and that the rest would be easy.

@vitiral

This comment has been minimized.

Copy link

@vitiral vitiral commented Sep 15, 2019

@sunfishcode fantastic post!

My question is this though: is there any way to de-scope? I get that some might consider an "ideal" implementation to be one where cranelift understands the assembly it is compiling, but I take the opposite approach. IMO cranelift (and most compilers for that matter) should understand nothing about the assembly they are compiling. It should be completely opaque.

If someone wants to write assembly the interop work should be on them. They should essentially have to do this:

function %call_asm(i32, i32) -> f32 {
    ss0 = explicit_slot 8             ; Stack slot for draining rr0
    ss1 = explicit_slot 8             ; Stack slot for draining rr1
    ; declare explicit registers given (architecture, size, name)
    ; these kind of intrinsics are the only ones added
    rr0 = explicit_register "intel" 8 "eax"
    rr1 = explicit_register "intel" 8 "ebx"

ebb1(v0: i32, v1: i32):
    ; temporarily store whatever state we are clobbering
    stack_store ss0, rr0
    stack_store ss1, rr1
    register_store rr0, v0  ; mov eax $v0
    register_store rr1, v1  ; mov ebx $v1

    ; call arbitrary assembly. There is no way to insert any references here
    call_asm [
        "arbitrary asm line 1",
        "arbitrary asm line 2",
        ...
    ]

    ; restore clobbered state
    register_store rr0, ss0
    register_store rr1, ss1

    jump ebbN
     ...
}

It is then the assembly writer's job to make sure that their assembly doesn't break things for whatever platform it is being compiled against. Cranelift should not be the one to compile the call_asm -- it should call out to whatever backend the user wants (llvm, gcc, doesn't matter to cranelift).

Cranelift would have to only provide platform-specific intrinsics for storing/restoring state (i.e. explicit_register). Also, clearly branching within the assembly to outside the assembly has serious implications... so should probably just not be supported. Some of these things might be able to be communicated via metadata outside of the asm blob -- but IMO cranelift should not try to ever, ever parse the asm itself.

Will this lead to slower code (missed optimizations) than if cranelift knew how to compile the asm? Definitely. But as you point out, that is a huge can of worms -- and not only that, it seems to me it is a liability to the maintainability of the project in general.

@sunfishcode

This comment has been minimized.

Copy link
Member

@sunfishcode sunfishcode commented Sep 16, 2019

If you put the burden of marshalling values into specific registers on the user, and you don't care too much about micro-optimizations, the feature doesn't seem much better than out-of-line asm (.s files), though it'd still be a lot of work.

More broadly, hypothetical use cases for inline asm are awkward because they tend to artificially constrain the design space, and make it difficult to determine appropriate priorities.

Consequently, I'd like to request anyone wishing to discuss inline asm further to please include in your post:

(a) a description of a concrete use case using Cranelift, (b) as complete as possible a description of what specific instructions, instruction sequences, or machine state needs to be accessed, and (c) an explanation for why intrinsics or out-of-line asm might not be sufficient for the use case.

We can then discuss it starting from that point. This only pertains to inline asm, due to the extraordinary nature of this feature. Thanks!

@vitiral

This comment has been minimized.

Copy link

@vitiral vitiral commented Sep 16, 2019

the feature doesn't seem much better than out-of-line asm (.s files)

Does cranelift already support this? I'm really curious as to whether there can be an reasonably simple and well-defined solution which can solve any issue except performance (albeit with potentially more work needed from a programmer).

@sunfishcode

This comment has been minimized.

Copy link
Member

@sunfishcode sunfishcode commented Sep 16, 2019

You can use GNU as or llvm-mc to assemble .s files into .o files that you can link into your program.

@comex

This comment has been minimized.

Copy link

@comex comex commented Nov 1, 2019

Sorry to pop in. We're having a debate on Rust forums about whether Rust should ever stabilize inline assembly (currently an unstable feature which needs to be redesigned), and part of the discussion has to do with the difficulty of Cranelift supporting this functionality eventually.

This isn't a short term issue. For now, the discussion is only about whether Rust should support inline assembly; there's no specification of how the redesigned version should work, let alone an implementation or stabilization. Also, Cranelift support in rustc isn't upstream. Even if inline assembly were stabilized and Cranelift support upstreamed, rustc could still probably use LLVM to compile just the functions containing inline assembly.

@sunfishcode raises a lot of valid concerns in this thread, and it's clear they've thought about this deeply. Still, I think it is possible to mitigate many of those concerns, especially if the scope is limited to what's necessary for a future Rust inline assembly feature (as opposed to, say, compatibility with existing C codebases). The following is only a very broad sketch, but hopefully it can start a discussion:

First, regarding parsing assembly mnemonics and directives: An alternative would be to add a compilation mode to Cranelift that generates assembly and passes it through an external assembler, instead of generating machine code directly. This way, inline assembly could just be spliced in, with no need for Cranelift to parse anything. Of course, this would still require adding a bunch of new functionality, both to emit the assembly and to run the assembler. And it would not be suitable for JIT use cases, but Rust doesn't currently need a JIT.

Regarding constraint systems: A Rust inline assembly feature would probably have a drastically simpler constraint system, since compatibility with existing codebases isn't needed and GCC's constraints are really overcomplicated.

Regarding asm goto: Rust probably wouldn't support an equivalent of this at first anyway, but it might eventually. Is it really that different from a br_table?

As for the paragraph about implementation details... I've tried going through each of the questions and answering them:

Can you do .popsection within an inline asm and get to a predictable place?

No. Neither GCC nor Clang uses .pushsection to enter its sections, so on existing implementations this would just result in an error. But it is valid and useful to have matched pushsection/popsection pairs within an assembly string.

Are the special labels the compiler emits to mark the end of the function and other things ok to use?

No. Only GCC emits these (not Clang) and their names are not predictable, so it would be hard to use them anyway.

Can you .section switch into the constant-pool section or other compiler-generated sections?

If you mean with a pushsection/popsection pair: Why not? For the Rust use case where you're generating an object file, you already need to anticipate other objects being linked in which contain data in those sections, and thus need to generate relocations against symbols rather than sections. No harm in letting asm blocks do the same.

If you mean using .section and just leaving the assembler in the other section at the end of the asm block: No. With no way to force the compiler to generate symbols in a specific order, you'd be putting some unknown number of other symbols into the other section. Even if you tried to change things back in a different asm block in the same function, the compiler is not required to output basic blocks in any particular order, and there's also the possibility of duplication (see below).

Can you assume that a function's entry point is the lexically first instruction in a function?

No: without a way to find the end of the function, such an assumption would be useless. But you can assume that the function's entry point is the address you get if you write the function's name, modulo things like the Thumb bit on ARM.

Can you put data which are not instructions in the middle of a .text section if you have a branch around it?

Yes, unless the architecture/OS uses execute-only pages or has special requirements on text sections. Why not?

Can the compiler inline functions containing inline asm sections?

Yes; GCC and Clang do.

How about unroll loops containing inline asm sections?

Yes; ditto.

What happens if you use a "tied" operand constraint to tie together operands of different widths?

I tested it on x86: Clang errors out, GCC produces useless output. In Rust, I'd prefer to not support tying at all; you can just use a temporary variable instead, and it'll be more clear what's going on. If it does become supported, it would probably be an error to tie operands of different widths.

Can you assume that an "m" constraint will print the memory operand in a syntactic form which permits you to add modifiers to it from the template string?

No (what modifiers?). Though there's probably no need for Rust to support an "m" constraint at all; on some architectures (ARM) it's ambiguous and mostly useless. If a memory constraint is supported at all, it would probably be best to make it architecture-dependent and more precisely specify what it expands to.

Can inline asm in one function refer to inline asm in another function in any way?

No, because that would break if the latter were duplicated, and there is no way to disable duplication.

@sunfishcode

This comment has been minimized.

Copy link
Member

@sunfishcode sunfishcode commented Nov 1, 2019

rustc could still probably use LLVM to compile just the functions containing inline assembly.

This is what I expect we'd have to do for the forseeable future.

a compilation mode to Cranelift that generates assembly and passes it through an external assembler
[...]
And it would not be suitable for JIT use cases, but Rust doesn't currently need a JIT.

It's a good point. This would be a shorter path (though still not an easy one) to supporting at least the "frontend" side of inline asm, for at least the way Rust is typically used today. It's unclear if this would be worth building though, as we know people already talking about JIT use cases, so if we find people willing to take on a project of this scale, we might prefer they build an assembler library anyway.

Regarding asm goto: Rust probably wouldn't support an equivalent of this at first anyway, but it might eventually. Is it really that different from a br_table?

br_table doesn't come with an operand constraint language giving end users the ability to create brain teasers for the register allocator to solve and then copy/spill/fill/remat around.

I'm not saying it's impossible. But the discussion in the linked thread often doesn't acknowledge that complexity doesn't always scale linearly when you combine features, generalize features, embed features in the middle of the most compile-time-sensitive and compile-quality-sensitive NP-complete problem approximating part of the backend, or even, say, do all of the above at the same time ;-).

As for the paragraph about implementation details... I've tried going through each of the questions and answering them:

I expect your answers are correct. But also, yes, this was just some questions I thought of off the top of my head.

Inline asm is a massive expansion of the user-facing surface area of a compiler. And, much of it is not immediately visible, because it's not Rust syntax, and it's not even just assembly syntax, but it's also "how does assembly code written by the user interact with assembly code produced by the compiler", with a large list of directives at its disposal that can be involved in interactions. A lot of this area isn't documented or even really designed. We can usually figure out what to do in any given situation. But it's harder to design a backend in a way that we can be reasonably sure will work for the long term, and not set us up for years of figuring out situation after situation.

People often ask, "Can't you just support a subset of inline asm?" But everyone seems to need a different subset. And, there are many intuitive subsets which turn out to be insufficient for what people actually need. And even one subset does emerge, it may grow over time -- people in the linked thread talk about "C Parity", which could put pressure on any reasonable subset. And then, subsets can still be a lot of work, and come with the risk that if the subset grows, this work may need to be redone in a more general way later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.