Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cranelift: Sign extend immediates in instructions that embed them. #4602

Merged
merged 4 commits into from
Aug 15, 2022

Conversation

afonso360
Copy link
Contributor

@afonso360 afonso360 commented Aug 3, 2022

👋 Hey,

This PR alters the behaviour of *_imm instructions to sign extend their immediate argument when the control type is i128.

This comes from a fuzzer issue where the interpreter was sign extending the immediates but the legalizations were not.

Fixes: #4568
Fixes: #4641

@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Aug 3, 2022
@akirilov-arm
Copy link
Contributor

As I stated in the linked issue, IMHO the CLIF instruction definition should be updated.

@afonso360
Copy link
Contributor Author

afonso360 commented Aug 4, 2022

As I stated in the linked issue, IMHO the CLIF instruction definition should be updated.

You are right, I didn't read your comment properly (sorry!), ill mark this as draft until we make that decision.

@afonso360 afonso360 marked this pull request as draft August 4, 2022 14:53
@afonso360
Copy link
Contributor Author

@akirilov-arm I've updated the documentation, is this all you meant, or did I miss something else?

Also, cc @cfallin since I can't request reviews.

@afonso360 afonso360 marked this pull request as ready for review August 9, 2022 19:27
ir::Opcode::UremImm => replace.urem(arg, imm),
// comparisons
ir::Opcode::IfcmpImm => replace.ifcmp(arg, imm),
_ => unimplemented!(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it's a behavioral change if we panic here. The outer match just skips the instruction if it doesn't have an expansion for it. Maybe that's why the previous implementation duplicated so much code? I think @cfallin probably needs to weigh in on how this case should be handled.

Gotta say, though, I really like how much shorter your version is. I hope we can preserve that clarity even if this bit has to change.

It might help to introduce a function that's something like this (but I haven't tested this code, let alone compiled it):

fn imm_const(pos: &mut FuncCursor, arg: Value, imm: Imm64) -> Value {
    let ty = pos.func.dfg.value_type(arg);
    let imm = pos.ins().iconst(ty, imm);
    if ty == I128 {
        let imm = pos.ins().iconst(I64, imm);
        pos.ins().sextend(I128, imm)
    } else {
        pos.ins().iconst(ty.lane_type(), imm)
    }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we don't want to hit an unimplemented!() here; I think we just want an empty match-arm body (_ => {}) instead. That way we still construct the imm but we just don't do anything with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't an empty match-arm body there cause simple_legalize to loop, trying to legalize the same instruction repeatedly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The toplevel loop is a while let Some(inst) = pos.next_inst(), so it's stepping through insts with no action taken in the loop body. Though actually the more precise thing to do is to replicate the fallback _ => { ... } below, which sets prev_pos first then continues, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement after the match is pos.set_position(prev_pos); and the intent appears to be to re-examine the result of every legalization. So it looks to me like termination of this function relies on every match arm replacing the instruction that it originally matched on. But yes, copying the prev_pos assignment from the fallback case should work, I think.

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

ir::Opcode::UremImm => replace.urem(arg, imm),
// comparisons
ir::Opcode::IfcmpImm => replace.ifcmp(arg, imm),
_ => unimplemented!(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we don't want to hit an unimplemented!() here; I think we just want an empty match-arm body (_ => {}) instead. That way we still construct the imm but we just don't do anything with it.

@github-actions github-actions bot added the cranelift:meta Everything related to the meta-language. label Aug 9, 2022
Copy link
Contributor

@akirilov-arm akirilov-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afonso360 The changes to cranelift/codegen/meta/src/shared/instructions.rs are exactly what I had in mind.

cranelift/codegen/meta/src/shared/instructions.rs Outdated Show resolved Hide resolved
cranelift/codegen/meta/src/shared/instructions.rs Outdated Show resolved Hide resolved
cranelift/codegen/meta/src/shared/instructions.rs Outdated Show resolved Hide resolved
@afonso360
Copy link
Contributor Author

I also think zero extending the unsigned operations is probably the better choice.

However, what if we instead altered the type of the instruction to accept the full Imm128?

I don't think this would grow the size of InstructionData since we already have Shuffle which also stores a Imm128. However Shuffle stores the immediate in the DataFlowGraph, we could do the same, and in this legalization pass emit the appropriate iconst / iconcat / sextend depending on what the constant is.

Would storing the Imm in the DFG for these operations cause performance issues?

@cfallin
Copy link
Member

cfallin commented Aug 10, 2022

Actually that would be the ideal design: from first principles one would expect "with immediate" variants of 128-bit ops to carry 128-bit (16-byte) immediates.

However we can't carry that inline in the instruction, as you're hinting, because InstructionData is 16 bytes total (and we want to keep it that way).

I suspect that indirecting for all constants would have some nontrivial overhead, though it might be a bit of work to measure (a lot of refactoring).

I had another thought, though, that may be more "honest" about the costs: what if we limited (i) op_imm instructions and (ii) the iconst instruction to work for up to 64 bits only? Producers that want to emit full 128-bit values already produce two iconsts and iconcat them. In some sense, trying to allow iconst.i128 -1 or the op_imm folded variant is lulling producers into a false sense of what's supported. As soon as one writes iconst.i128 -0x8000_0000_0000_0001 one finds out that there are only half as many bits in the immediate field as the type otherwise implies.

Thoughts on that?

@afonso360
Copy link
Contributor Author

I suspect that indirecting for all constants would have some nontrivial overhead, though it might be a bit of work to measure (a lot of refactoring).

I was proposing that only for *_imm variants. The way I see it, these instructions are already just sort of "helpers" in the sense that they just construct a pattern that we recognize (i.e. iadd + iconst), they could just emit more advanced patterns to support i128 (i.e. iadd+iconst+sext or iadd+iconst+iconcat all in 64bits).

Having more operations only work up to i64 to me feels like i128 is a second class citizen, which is sort of true for the current backends, but I don't really like it at the lang level.

I don't know, I'm a bit undecided on this, I'd like to hear what other people think.

@cfallin
Copy link
Member

cfallin commented Aug 10, 2022

Generally I agree at an IR-design level: it would be nice to have orthogonality here, and allow all types up through i128 in all places where integers are allowed.

This compromise feels similar to "small immediates" in RISC ISAs though: we have a relatively small struct size for instructions, and its size is critical for performance, and its entire width would be taken by just the i128 immediate (hence it would need to grow). So in that sense I don't think we can have an inline Imm128.

I would be curious to see the overhead of a "every immediate is indirect" approach, if you're up for prototyping that. If it's actually negligible, then I think it's a reasonable change to make.

@afonso360
Copy link
Contributor Author

I would be curious to see the overhead of a "every immediate is indirect" approach, if you're up for prototyping that. If it's actually negligible, then I think it's a reasonable change to make.

Me too, and I think we could mitigate some of the cost (if it is non-negligible) by having two const InstructionData variants, one inline up to 64bits and one in the DFG for larger values, and choose between them when the user emits a iconst (can we do this?).

Although right now I don't have a lot of time to try this out.

I'm planning on addressing the feedback above and getting this merged, is that okay, or would you prefer restricting the types that we accept?

@afonso360 afonso360 requested a review from cfallin August 15, 2022 17:20
@cfallin cfallin enabled auto-merge (squash) August 15, 2022 17:34
@cfallin cfallin merged commit e577a76 into bytecodealliance:main Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:meta Everything related to the meta-language. cranelift Issues related to the Cranelift code generator
Projects
None yet
4 participants