Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cranelift: Remove iadd_cin instruction #5166

Open
afonso360 opened this issue Nov 1, 2022 · 6 comments
Open

cranelift: Remove iadd_cin instruction #5166

afonso360 opened this issue Nov 1, 2022 · 6 comments

Comments

@afonso360
Copy link
Contributor

afonso360 commented Nov 1, 2022

👋 Hey,

This operation is essentially two add's chained together, I think that any optimizations that we make based on this operation can also be made on the iadd instruction, so there doesn't seem to be any benefit of keeping this around other than frontend convenience. And if that is important, maybe we should just move this to a frontend helper that expands to two iadd's + some masking for the carry input?

Additionally since the removal of booleans, the input for the carry bit seems a bit under-specified. We need some form of masking on the carry input, either carry & 1 or carry != 0. (This also affects the iadd_carry instruction).

We discussed carry operations last week in #5123 and also in the cranelift weekly meeting (notes) but nothing concrete about iadd_cin I think.

Some previous discussion about this mentions that cg-clif may need iadd_cin to "efficiently implement certain llvm intrinsics used by core::arch intrinsics that are stabilized" and points to the current implementation but that seems to be implementing iadd_carry which I agree probably needs to remain.

@bjorn3 Do you know of any operation that cg-clif would need to do, that needs iadd_cin but couldn't efficiently be replaced with iadd+iadd?


For reference here's the llvm output for this instruction, and as far as I can tell, it doesn't seem to be doing anything particularly special for any of the arches that we support.

For the carry & 1 it does emit the and instruction + two add's.

And for the carry != 0 it does do something special on aarch64 and x86, but we can also pattern match iadd+icmp not zero and emit that same code.

@bjorn3
Copy link
Contributor

bjorn3 commented Nov 1, 2022

What would be the purpose of removing iadd_cin while keeping iadd_carry? It is useful for terminating a bigint addition.

@afonso360
Copy link
Contributor Author

afonso360 commented Nov 1, 2022

I'm proposing removing iadd_cin because that one feels a bit redundant to me since it could be replaced with two add's as mentioned above.

However iadd_carry does have the carry bit output, which makes it quite different.

It outputs the carry if either of the two add's overflows, and as you mentioned it is very useful. We also have specialized instructions in hardware to represent that operation. Which is not the case for our iadd_cin.

iadd_cout is also not a substitute of iadd_carry since that one computes the overflow over a single add.

@bjorn3
Copy link
Contributor

bjorn3 commented Nov 1, 2022

iadd_cin is iadd_carry except that it doesn't materialize the carry flag. In other words iadd_cin can be implemented using a single adc instruction on x86 while iadd_carry must be implemented as adc+setc if not fused with the instruction using the output carry flag.

@afonso360
Copy link
Contributor Author

In other words iadd_cin can be implemented using a single adc instruction on x86

Right, but in our case we have no guarantee that the previous input has set the carry flag unless we pattern match some previous instruction.

So we need to pattern mach the carry input of iadd_cin in order to lower it to an adc. And if we are doing that, we can also do that on iadd and use adc there.

@afonso360
Copy link
Contributor Author

afonso360 commented Nov 1, 2022

What I'm proposing is replacing the following:

function %add1(i8) -> i8 {
block0(v0: i8):
    v1, v2 = iadd_cout v0, v0
    v3 = iadd_cin v1, v1, v2 
    return v3
}

With this:

function %add2(i8) -> i8 {
block0(v0: i8):
    v1, v2 = iadd_cout v0, v0
    v3 = iadd v1, v2
    v4 = iadd v1, v3
    return v4
}

In the v3 = iadd v1, v2 line, we know that v2 is a carry output, so we can pattern match (iadd _ (iadd_cout _ _)) and avoid materializing the carry flag by doing the adc instruction there.

Edit: In the v4 = iadd v1, v3 line, we can pattern match (iadd _ (iadd _ (iadd_cout _ _))) and avoid materializing the carry flag by doing the adc instruction there.


We actually can't do this exact pattern matching yet, since ISLE can't match two output instructions, but the same goes for iadd_cin.

@scottmcm
Copy link
Contributor

Maybe the value is more clear here for wider integers?

It seems like

    v4, v5 = iadd_cout v0, v1
    v6 = iadd_cin v1, v2, v5

would, in general, need to be

    v4, v5 = iadd_cout v0, v1
    v7 = uextend v5
    v8 = iadd v1, v2
    v6 = iadd v8, v7

in order to not have iadd_cin. Though that's certainly still pattern-matchable, just increasing the number of permutations that would need to be matched.

(but it's also possible that I completely don't understand cranelift's type expectations)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants