[X86] Delaying widening results in an unnecessary `vpmovsxwd` copy

This C code:
```c
__m256i iter(int16_t* src1p) {
    __m256i ten = _mm256_set1_epi32(10);
    __m256i wload = _mm256_cvtepi16_epi32(_mm_loadu_si128((void*)src1p));
    __m256i mask = _mm256_cmpgt_epi32(wload, ten);
    return _mm256_add_epi32(wload, mask);
}
```
compiled with `-O3 -march=haswell`, results in:
```asm
iter:
        vmovdqu xmm0, xmmword ptr [rdi]
        vpmovsxwd       ymm1, xmm0
        vpcmpgtw        xmm0, xmm0, xmmword ptr [rip + .LCPI0_0]
        vpmovsxwd       ymm0, xmm0
        vpaddd  ymm0, ymm0, ymm1
        ret
```
but it could be
```asm
iter:
        vpmovsxwd       ymm0, xmmword ptr [rdi]
        vpbroadcastd    ymm1, dword ptr [rip + .LCPI0_0]
        vpcmpgtd        ymm1, ymm0, ymm1
        vpaddd  ymm0, ymm1, ymm0
        ret
```
avoiding having two `vpmovsxwd`s, and allowing the one that's left to have the memory operand inline.

https://godbolt.org/z/Ezrf9YbYn


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] Delaying widening results in an unnecessary `vpmovsxwd` copy #144266

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[X86] Delaying widening results in an unnecessary vpmovsxwd copy #144266

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[X86] Delaying widening results in an unnecessary `vpmovsxwd` copy #144266