-
Notifications
You must be signed in to change notification settings - Fork 43
[Intrinsics] Should wasm_{i8x16,i16x8}_extract_lane return {int8_t, int16_t}? #257
Comments
Seems reasonable. Will this break existing users of intrinsics with this code:
Will be a compile error or emit warnings after this change? |
Generally speaking it is best to use 32/64 bit arithmetic on x86/x64 and ARM whenever possible even if the underlying type doesn't need that much space. This allows the CPU to avoid tracking the unused portion of registers which can add extra costs. Basically, 8 bit and 16 bit arithmetic instructions are slower than their 32/64 bit counterparts because they have to perform extra bookkeeping. Narrower types can of course be used but they are best left for memory storage, not arithmetic. When loading, load directly to a wider type and similarly truncate when storing. I agree the API is perhaps ugly but at least it'll help a little bit performance wise. |
@nfrechette I would not expect this change to make difference in generated code (and if it did, that would be a good opportunity to improve our codegen). Also, hardware performance considerations should not matter because this is a change to the pre-WebAssembly part of the stack. This should really just be a quality of life change for source-level programmers. @ngzhian, right, there shouldn't be any warnings. |
As one datapoint, x86 has instructions which produce less than 16-bit value stored in a 32-bit register. The corresponding intrinsics return |
AFAIK it's the only one. The ARM and AltiVec extract functions ( |
There is no way for this instruction to return anything other than 32-bit or 64-bit values (since those are the only scalar types wasm supports). 'Make' intrinsic takes different arguments because of the "signature" of underlying instructions is different. If I remember correctly those intrinsics are a bit more complicated and can result in different instructions depending on arguments. Intrinsics for machine instructions (virtual in our case) represent the target, and they typically have data types close to what is used in those instructions. I am not sure, off the top of my head, whether LLVM would do anything differently if those would to return shorter types. |
Let's assume for the sake of discussion that the codegen will be identical whether we use int8_t and int16_t or we use int32_t, and I'll make sure to verify that that is indeed the case before implementing the change. Given that this is a quality of life improvements for users and that neither WebAssembly nor native codegen will be affected, does anyone think this change is not worth making? Another way of asking this question is to ask if anyone thinks that exactly representing the types of the underlying instructions is important enough to forgo this suggested quality of life improvement. If you would support making the change to int8_t and int16_t under these assumptions, give me a 🚀 and otherwise give me an 👀. |
WAsm SIMD intrinsics are supposed to be compiler-independent, and thus I'd prefer to keep them as close as possible to assembly instructions. Even though today only Clang supports WAsm SIMD, and it may be possible to cause it to generate the same code with both |
There should be no codegen changes here as that needs to adhere to the regular Wasm semantics i.e. only 32-bit/64-bit return types. This should be evaluated purely from a user perspective.
|
No one else strongly wanted this proposed change, so we can go ahead without it. |
Currently both of these intrinsics return int32_t to match the underlying WebAssembly instructions as closely as possible, but @nemequ makes good points in emscripten-core/emscripten#11419 in favor of having them return int8_t and int16_t instead. Would anyone be opposed to this change?
The text was updated successfully, but these errors were encountered: