Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigation into the mingw-w64 AVX AND AVX2 misalignment bug. #1209

Open
silvioprog opened this issue Mar 27, 2018 · 27 comments
Open

Investigation into the mingw-w64 AVX AND AVX2 misalignment bug. #1209

silvioprog opened this issue Mar 27, 2018 · 27 comments

Comments

@silvioprog
Copy link

Hello.

Please upgrade the following setups from msys2 home page:

msys2-i686-20161025.exe
msys2-x86_64-20161025.exe

This will prevent beginners from thinking that the project was left in 2016.

Thank you!

@RoyiAvital
Copy link

Is the MSYS2 project still maintained?

@mingwandroid
Copy link
Member

Yes, it's a rolling distribution with new package builds coming out all of the time.

@RoyiAvital
Copy link

Yet the MSYS2 project seems to be frozen since the end of 2016.

For instance, when will be the AVX and AVX2 code on Windows 64 Bit issue be resolved?

@Alexpux
Copy link
Member

Alexpux commented Apr 8, 2018

@RoyiAvital you can get more recent setup here https://sourceforge.net/projects/msys2/files/Base/

@mingwandroid
Copy link
Member

For instance, when will be the AVX and AVX2 code on Windows 64 Bit issue be resolved?

What issue are you talking about? Please provide a link.

You can compile software that uses AVX and AVX2 quite happily using our compilers. Our prebuilt binaries cannot in general use AVX and AVX2 because our end-user's machines may not support those CPU features and that would result in crashes due to illegal instructions. This is exactly the same thing that every software distribution must contend with and we are no worse.

Having said that some of the better written software out there (such as OpenBLAS) that we provide pre-built packages for implement runtime CPU detection and dispatch and this software will take good advantage of AVX and AVX2 when the machine supports it.

In many general purpose C and C++ library cases AVX and AVX2 do not provide much very speed-up anyway. These features work best when dealing with things like heavy matrix computation and also with hand-crafted assembly language (or at least compiler intrinsics).

@RoyiAvital
Copy link

RoyiAvital commented Apr 8, 2018

@mingwandroid ,
The problem is Windows 64 Bit requires 32 Byte Alignment for AVX & AVX2.
I'm really not an expert on this but since GCC (As utilized by MinGW64 and MSYS2) aligns with 16 Byte it seems the code isn't compatible with Windows 64.

I'm talking about code of the user. not libraries supplied with MinGW64 or MSYS2.

References:

  1. minigw-w64 is incapable of 32 byte stack alignment, easy work around or switch compilers?.
  2. How to align stack at 32 byte boundary in GCC?.
  3. Sleef - Can't produce AVX & AVX2 Code in Windows 64 Bit due to ABI Issues in GCC / MinGW64.
  4. MinGW64 Discussion Board - [Mingw-w64-public] AVX support is broken in 64-bit mode! Will there ever be a fix?.
  5. Wrapper for __m256 Producing Segmentation Fault with Constructor - Windows 64 + MinGW + AVX Issues.

What I'm asking is 2 things:

  1. Is this bug relevant even in MinGW64 based on GCC 7.x?
  2. If it does relevant, anyone working on fixing it?

@mingwandroid
Copy link
Member

For instance, when will be the AVX and AVX2 code on Windows 64 Bit issue be resolved?

@RoyiAvital,

This is clearly a GCC issue and not an MSYS2 issue, so you should ask GCC or mingw-w64 about that. We patch a few things in GCC but do not tend to fix things that are this low-level.

Is it really so difficult to determine the correct place to report issues to?

@mingwandroid
Copy link
Member

What I'm asking is 2 things:
Is this bug relevant even in MinGW64 based on GCC 7.x?

And you are asking questions that you should try to determine the answers to yourself, reproduction cases would be useful for someone with time to look into this.

@RoyiAvital
Copy link

RoyiAvital commented Apr 8, 2018

Here is the simplest code to reproduce it:

https://stackoverflow.com/questions/30926241

I'm sorry to post here.
My question regarding the MSYS2 is simple and relevant - Does MSYS2 allows generating AVX & AVX2 code which is compatible with Windows 64?

@mingwandroid
Copy link
Member

mingwandroid commented Apr 8, 2018

Here is the simplest code to reproduce it:

Thank you. But this example does not work out of the box according to the comments. You should modify it and paste it here instead of expecting others to do this work for you.

My question regarding the MSYS2 is simple and relevant - Does MSYS2 allows generating AVX & AVX2 code which is compatible with Windows 64?

As I already explained, I do not think anyone from MSYS2 will have the time to look into this and we do not tend to fix such low level bugs in mingw-w64/GCC which is where this bug is located.

.. and that's OK because it's a mingw-w64/GCC bug.

@RoyiAvital
Copy link

RoyiAvital commented Apr 8, 2018

This seems to be a code which creates the issue:

#include <immintrin.h>

void foo(__m256 x) {}

int main()
{
    __m256 r = _mm256_set1_ps(0.0f);
    foo(r);
}

Compiling with -mavx.
I think it is simple.

@mstorsjo
Copy link
Contributor

mstorsjo commented Apr 8, 2018

FWIW, all of this is very much unrelated to the original issue, while this thread now is hijacked for a completely different matter.

As mentioned in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412, clang and MSVC don't seem to have the same issue. An adjusted version of the example code, avxalign.c:

#include <immintrin.h>

void foo(__m256 x);

void func(void)
{
    __m256 r = _mm256_set1_ps(0.0f);
    foo(r);
}

If compiled with GCC:

$ x86_64-w64-mingw32-gcc -S -O2 -mavx avxalign.c -o -
func:   
        subq    $72, %rsp
        .seh_stackalloc 72
        .seh_endprologue
        vxorps  %xmm0, %xmm0, %xmm0
        leaq    32(%rsp), %rcx
        vmovaps %ymm0, 32(%rsp)
        vzeroupper
        call    foo
        nop
        addq    $72, %rsp
        ret

This doesn't align the pointer where the argument is stored, and writes into it with vmovaps.

With clang:

$ clang -target x86_64-w64-mingw32 -S -O2 -mavx avxalign.c -o -
func:                                   # @func
.seh_proc func
        pushq   %rbp
        .seh_pushreg 5
        subq    $80, %rsp
        .seh_stackalloc 80
        leaq    80(%rsp), %rbp
        .seh_setframe 5, 80
        .seh_endprologue
        andq    $-32, %rsp
        vxorps  %xmm0, %xmm0, %xmm0
        vmovaps %ymm0, 32(%rsp)
        leaq    32(%rsp), %rcx
        vzeroupper
        callq   foo
        nop
        movq    %rbp, %rsp
        popq    %rbp
        retq

This overallocates stack space in order to be able to align it, and then writes into it with an aligned write.

If not using SEH, by adding -fdwarf-exceptions, it produces different code that also does the alignment:

$ clang -target x86_64-w64-mingw32 -S -O2 -mavx avxalign.c -o - -fdwarf-exceptions
func:                                   # @func
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset %rbp, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register %rbp
        andq    $-32, %rsp
        subq    $96, %rsp
        vxorps  %xmm0, %xmm0, %xmm0
        vmovaps %ymm0, 32(%rsp)
        leaq    32(%rsp), %rcx
        vzeroupper
        callq   foo
        movq    %rbp, %rsp
        popq    %rbp
        retq

This is what MSVC produces:

$ cl -c -O2 avxalign.c
$ x86_64-w64-mingw32-objdump -d avxalign.obj
0000000000000000 <func>:
   0:   40 55                   rex push %rbp
   2:   48 83 ec 60             sub    $0x60,%rsp
   6:   48 8d 6c 24 40          lea    0x40(%rsp),%rbp
   b:   48 83 e5 e0             and    $0xffffffffffffffe0,%rbp
   f:   c5 fc 10 05 00 00 00    vmovups 0x0(%rip),%ymm0        # 17 <func+0x17>
  16:   00 
  17:   c5 fc 11 45 00          vmovups %ymm0,0x0(%rbp)
  1c:   48 8d 4d 00             lea    0x0(%rbp),%rcx
  20:   c5 f8 77                vzeroupper 
  23:   e8 00 00 00 00          callq  28 <func+0x28>
  28:   48 83 c4 60             add    $0x60,%rsp
  2c:   5d                      pop    %rbp
  2d:   c3                      retq   

This both aligns the pointer, and uses unaligned stores to write it onto the stack

However, this only seems to be an issue when passing such variables by value. Local variables seem to be properly aligned even with GCC:

$ cat avxalign2.c 
#include <immintrin.h>

void foo(__m256 *x);

void func(void)
{
    __m256 r = _mm256_set1_ps(0.0f);
    foo(&r);
}
$ x86_64-w64-mingw32-gcc -S -O2 -mavx avxalign2.c -o -
func:
        pushq   %rbp
        .seh_pushreg    %rbp
        movq    %rsp, %rbp
        .seh_setframe   %rbp, 0
        subq    $32, %rsp
        .seh_stackalloc 32
        .seh_endprologue
        vxorps  %xmm0, %xmm0, %xmm0
        subq    $64, %rsp
        leaq    63(%rsp), %rcx
        andq    $-32, %rcx
        vmovaps %ymm0, (%rcx)
        vzeroupper
        call    foo
        nop
        movq    %rbp, %rsp
        popq    %rbp
        ret

Here the local variable is properly aligned.

@RoyiAvital
Copy link

@mstorsjo ,
Great analysis!
It means that as long variables are passed by reference the code generated will work.

The question is, how can it be fixed?
Where the right place to post your analysis so people will fix it?

Thank You.

@mstorsjo
Copy link
Contributor

mstorsjo commented Apr 8, 2018

The GCC bug report that I linked is the relevant one.

@RoyiAvital
Copy link

Do you have account there to link to your analysis (Which states only variables transferred by value are an issue)?

It seems one must have a special granted account to post there which I don't have.
It seems a gut from Folding@Home is even asking for this.

Thank You.

@mingwandroid mingwandroid changed the title [FEATURE REQUEST] Upgrade setups distributed at www.msys2.org Investigation into the mingw-w64 AVX AND AVX2 misalignment bug. Apr 8, 2018
@mingwandroid
Copy link
Member

mingwandroid commented Apr 8, 2018

I've retitled the bug since it went completely tangential immediately (2nd and 4th comment).

@mstorsjo
Copy link
Contributor

mstorsjo commented Apr 8, 2018

Do you have account there to link to your analysis (Which states only variables transferred by value are an issue)?

I'm not sure if I have a GCC bugzilla account - anyone more affected by the issue than me can take it forward, I just gave the issue a brief look from the clang perspective.

@RoyiAvital
Copy link

@mstorsjo ,

Could you tell how did you compile the above (Clang and GCC)?
Version and distribution of MinGW64 / MSYS2?
What version of GCC / CLang?
Is there a MinGW64 target for Clang (Something like -target x86_64-w64-mingw64?

Thank You.

@mati865
Copy link
Collaborator

mati865 commented Apr 9, 2018

@RoyiAvital it you look at the beginning of each snippet you can see used command.

@RoyiAvital
Copy link

@mati865 ,
I saw those.
They show the command used for compilation, not the environment (Compiler Version, MSYS / CYGWIN / MinGW / MinGW64 version, etc...).

On a side note (for my own knowledge) I wanted to ask what other option Clang has for it <sys> options.
Above @mstorsjo used -target x86_64-w64-mingw32 I wonder if -target x86_64-w64-mingw64 makes any difference (If exists)?
What other option are there beside mingw32 For 64 Bit Windows (The <sys> in -target x86_64-w64-<sys>).

@mstorsjo
Copy link
Contributor

mstorsjo commented Apr 9, 2018

The environment used in the examples does not matter as it doesn't include anything or link anything - the command line contains everything needed. I tested with a recent clang svn version (the latest from about a day ago), but I'm pretty sure at least the last couple releases should behave the same.

There's no -mingw64 target, but -msvc and a maybe -cygwin.

@RoyiAvital
Copy link

OK.
Though I still think the GCC version could be important, no?

Just to add information, I think this comment is important - https://stackoverflow.com/questions/30928265/mingw64-is-incapable-of-32-byte-stack-alignment-required-for-avx-on-windows-x64?noredirect=1#comment86499640_30928265.

@Artoria2e5
Copy link

Sorry for digging up this old thread, but hey I bumped into it!

@RoyiAvital: Though I still think the GCC version could be important, no?

It's no longer important now. My own tests plus 54412 seems to suggest that it happens on all SEH-enabled versions of GCC, which is everything after something like 4.8.9.

If it does relevant, anyone working on fixing it?

The GCC people are (understandably) looking to do a proper fix to make the compiler actually understand how alignments work, but they seem a bit stuck.

@Artoria2e5
Copy link

Artoria2e5 commented Aug 23, 2021

I have hacked together a very ugly patch that basically makes every place that generates aligned load/stores use unaligned instructions instead. I remember seeing someone say on Nehalem and newer the peanlty is very small or something.

0001-Force-use-unaligned-insns-as-49001-workaround.patch.txt

I will try to send it to makepkg to build and test it if possible, but in case I burn out before then you all know what to try. Heck, I still owe Alex & the cygwin people a cmdline parser...


A more graceful way to do the workaround may be changing as to add a flag for doing this sort of replacement. But come on, a hack is a hack.

@dimula73
Copy link

dimula73 commented Aug 26, 2021

Hi all!

I have a feeling that GCC 11.2.0 has this issue fixed. At least I cannot reproduce the crash in a trivial test. I have managed to compile packages. If you would like to test them, please fetch here: https://yadi.sk/d/rL4Lo6HFkAojPA

I will upload the sources of the script a bit later today (I had to change the patchset and have a problem with updating checksums)

UPD:

Here is a commit that makes it possible to compile GCC 11.2.0 for MinGW64:
dimula73/MINGW-packages@7065447

I'm not sure I know how MSYS handles multiple compilers at the same time, so I'm a bit of scared to propose a PR for that :)

@Artoria2e5
Copy link

I don't think it fixes the issue -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=28103 from bug 54412 still segfaults.

PS: Hmm, any idea why the MSYS2 GNU tar is not recognizing the .zst file extension even when zstd is installed? I really thought they would've, you know, updated for that. Well anyways pacman -U works.

@dimula73
Copy link

I don't think it fixes the issue -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=28103 from bug 54412 still segfaults.

Hm... then it is just a coincidence that my local test passes :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants