cranelift: Remove `iadd_cin` instruction #5166

afonso360 · 2022-11-01T11:30:33Z

👋 Hey,

This operation is essentially two add's chained together, I think that any optimizations that we make based on this operation can also be made on the iadd instruction, so there doesn't seem to be any benefit of keeping this around other than frontend convenience. And if that is important, maybe we should just move this to a frontend helper that expands to two iadd's + some masking for the carry input?

Additionally since the removal of booleans, the input for the carry bit seems a bit under-specified. We need some form of masking on the carry input, either carry & 1 or carry != 0. (This also affects the iadd_carry instruction).

We discussed carry operations last week in #5123 and also in the cranelift weekly meeting (notes) but nothing concrete about iadd_cin I think.

Some previous discussion about this mentions that cg-clif may need iadd_cin to "efficiently implement certain llvm intrinsics used by core::arch intrinsics that are stabilized" and points to the current implementation but that seems to be implementing iadd_carry which I agree probably needs to remain.

@bjorn3 Do you know of any operation that cg-clif would need to do, that needs iadd_cin but couldn't efficiently be replaced with iadd+iadd?

For reference here's the llvm output for this instruction, and as far as I can tell, it doesn't seem to be doing anything particularly special for any of the arches that we support.

For the carry & 1 it does emit the and instruction + two add's.

And for the carry != 0 it does do something special on aarch64 and x86, but we can also pattern match iadd+icmp not zero and emit that same code.

The text was updated successfully, but these errors were encountered:

bjorn3 · 2022-11-01T11:50:24Z

What would be the purpose of removing iadd_cin while keeping iadd_carry? It is useful for terminating a bigint addition.

afonso360 · 2022-11-01T12:00:29Z

I'm proposing removing iadd_cin because that one feels a bit redundant to me since it could be replaced with two add's as mentioned above.

However iadd_carry does have the carry bit output, which makes it quite different.

It outputs the carry if either of the two add's overflows, and as you mentioned it is very useful. We also have specialized instructions in hardware to represent that operation. Which is not the case for our iadd_cin.

iadd_cout is also not a substitute of iadd_carry since that one computes the overflow over a single add.

bjorn3 · 2022-11-01T12:09:25Z

iadd_cin is iadd_carry except that it doesn't materialize the carry flag. In other words iadd_cin can be implemented using a single adc instruction on x86 while iadd_carry must be implemented as adc+setc if not fused with the instruction using the output carry flag.

afonso360 · 2022-11-01T12:19:04Z

In other words iadd_cin can be implemented using a single adc instruction on x86

Right, but in our case we have no guarantee that the previous input has set the carry flag unless we pattern match some previous instruction.

So we need to pattern mach the carry input of iadd_cin in order to lower it to an adc. And if we are doing that, we can also do that on iadd and use adc there.

afonso360 · 2022-11-01T12:37:40Z

What I'm proposing is replacing the following:

function %add1(i8) -> i8 {
block0(v0: i8):
    v1, v2 = iadd_cout v0, v0
    v3 = iadd_cin v1, v1, v2 
    return v3
}

With this:

function %add2(i8) -> i8 {
block0(v0: i8):
    v1, v2 = iadd_cout v0, v0
    v3 = iadd v1, v2
    v4 = iadd v1, v3
    return v4
}

~~In the v3 = iadd v1, v2 line, we know that v2 is a carry output, so we can pattern match (iadd _ (iadd_cout _ _)) and avoid materializing the carry flag by doing the adc instruction there.~~

Edit: In the v4 = iadd v1, v3 line, we can pattern match (iadd _ (iadd _ (iadd_cout _ _))) and avoid materializing the carry flag by doing the adc instruction there.

We actually can't do this exact pattern matching yet, since ISLE can't match two output instructions, but the same goes for iadd_cin.

scottmcm · 2023-11-27T20:07:57Z

Maybe the value is more clear here for wider integers?

It seems like

    v4, v5 = iadd_cout v0, v1
    v6 = iadd_cin v1, v2, v5

would, in general, need to be

    v4, v5 = iadd_cout v0, v1
    v7 = uextend v5
    v8 = iadd v1, v2
    v6 = iadd v8, v7

in order to not have iadd_cin. Though that's certainly still pattern-matchable, just increasing the number of permutations that would need to be matched.

(but it's also possible that I completely don't understand cranelift's type expectations)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cranelift: Remove `iadd_cin` instruction #5166

cranelift: Remove `iadd_cin` instruction #5166

afonso360 commented Nov 1, 2022 •

edited

Loading

bjorn3 commented Nov 1, 2022

afonso360 commented Nov 1, 2022 •

edited

Loading

bjorn3 commented Nov 1, 2022

afonso360 commented Nov 1, 2022

afonso360 commented Nov 1, 2022 •

edited

Loading

scottmcm commented Nov 27, 2023

cranelift: Remove iadd_cin instruction #5166

cranelift: Remove iadd_cin instruction #5166

Comments

afonso360 commented Nov 1, 2022 • edited Loading

bjorn3 commented Nov 1, 2022

afonso360 commented Nov 1, 2022 • edited Loading

bjorn3 commented Nov 1, 2022

afonso360 commented Nov 1, 2022

afonso360 commented Nov 1, 2022 • edited Loading

scottmcm commented Nov 27, 2023

cranelift: Remove `iadd_cin` instruction #5166

cranelift: Remove `iadd_cin` instruction #5166

afonso360 commented Nov 1, 2022 •

edited

Loading

afonso360 commented Nov 1, 2022 •

edited

Loading

afonso360 commented Nov 1, 2022 •

edited

Loading