Skip to content
This repository has been archived by the owner on Feb 14, 2023. It is now read-only.

Build failures on OS X #28

Closed
zmwangx opened this issue Jul 18, 2016 · 31 comments
Closed

Build failures on OS X #28

zmwangx opened this issue Jul 18, 2016 · 31 comments

Comments

@zmwangx
Copy link
Contributor

zmwangx commented Jul 18, 2016

This is a continuation of #11. I'm opening a new issue because

  1. The old failures due to SSE 4.1 appear to have been fixed in 1.1, while a new one surfaced;
  2. The old thread was slightly polluted by pointless arguments.

Again, the failures occur only on Homebrew's CI server, not locally. Builds on OS X 10.9 and 10.10 now pass, but there is still a problem on 10.11, log here:

/usr/local/Library/Homebrew/shims/super/clang++    -I/tmp/lepton-20160718-65941-texea9/lepton-1.2 -I/tmp/lepton-20160718-65941-texea9/lepton-1.2/src/vp8/util -I/tmp/lepton-20160718-65941-texea9/lepton-1.2/src/vp8/model -I/tmp/lepton-20160718-65941-texea9/lepton-1.2/src/vp8/encoder -I/tmp/lepton-20160718-65941-texea9/lepton-1.2/src/vp8/decoder  -std=c++11 -fno-exceptions -fno-rtti -DNDEBUG -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk -mmacosx-version-min=10.11   -msse4.2   -DDEFAULT_ALLOW_PROGRESSIVE -DHIGH_MEMORY -o CMakeFiles/lepton.dir/src/lepton/jpgcoder.cc.o -c /tmp/lepton-20160718-65941-texea9/lepton-1.2/src/lepton/jpgcoder.cc
In file included from /tmp/lepton-20160718-65941-texea9/lepton-1.2/src/lepton/jpgcoder.cc:64:
In file included from /tmp/lepton-20160718-65941-texea9/lepton-1.2/src/lepton/vp8_decoder.hh:4:
In file included from /tmp/lepton-20160718-65941-texea9/lepton-1.2/src/lepton/lepton_codec.hh:4:
In file included from /tmp/lepton-20160718-65941-texea9/lepton-1.2/src/vp8/model/model.hh:10:
/tmp/lepton-20160718-65941-texea9/lepton-1.2/src/vp8/model/numeric.hh:295:32: error: call to '_mm_mullo_epi32' is ambiguous
    __m128i t = _mm_srli_epi32(_mm_mullo_epi32(m, abs_num), log_max_numerator);
                               ^~~~~~~~~~~~~~~
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/7.3.0/include/smmintrin.h:130:1: note: candidate function
_mm_mullo_epi32 (__m128i __V1, __m128i __V2)
^
/tmp/lepton-20160718-65941-texea9/lepton-1.2/src/vp8/model/../util/mm_mullo_epi32.hh:38:1: note: candidate function
_mm_mullo_epi32(const __m128i &a, const __m128i &b)
^
In file included from /tmp/lepton-20160718-65941-texea9/lepton-1.2/src/lepton/jpgcoder.cc:64:
In file included from /tmp/lepton-20160718-65941-texea9/lepton-1.2/src/lepton/vp8_decoder.hh:4:
In file included from /tmp/lepton-20160718-65941-texea9/lepton-1.2/src/lepton/lepton_codec.hh:4:
In file included from /tmp/lepton-20160718-65941-texea9/lepton-1.2/src/vp8/model/model.hh:10:
/tmp/lepton-20160718-65941-texea9/lepton-1.2/src/vp8/model/numeric.hh:304:32: error: call to '_mm_mullo_epi32' is ambiguous
    __m128i t = _mm_srli_epi32(_mm_mullo_epi32(m, num), log_max_numerator);
                               ^~~~~~~~~~~~~~~
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/7.3.0/include/smmintrin.h:130:1: note: candidate function
_mm_mullo_epi32 (__m128i __V1, __m128i __V2)
^
/tmp/lepton-20160718-65941-texea9/lepton-1.2/src/vp8/model/../util/mm_mullo_epi32.hh:38:1: note: candidate function
_mm_mullo_epi32(const __m128i &a, const __m128i &b)
^
In file included from /tmp/lepton-20160718-65941-texea9/lepton-1.2/src/lepton/jpgcoder.cc:64:
In file included from /tmp/lepton-20160718-65941-texea9/lepton-1.2/src/lepton/vp8_decoder.hh:4:
In file included from /tmp/lepton-20160718-65941-texea9/lepton-1.2/src/lepton/lepton_codec.hh:4:
/tmp/lepton-20160718-65941-texea9/lepton-1.2/src/vp8/model/model.hh:903:27: error: call to '_mm_mullo_epi32' is ambiguous
        __m128i deq_low = _mm_mullo_epi32(coeffs_x_low, icos_low);
                          ^~~~~~~~~~~~~~~
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/7.3.0/include/smmintrin.h:130:1: note: candidate function
_mm_mullo_epi32 (__m128i __V1, __m128i __V2)
^
/tmp/lepton-20160718-65941-texea9/lepton-1.2/src/vp8/model/../util/mm_mullo_epi32.hh:38:1: note: candidate function
_mm_mullo_epi32(const __m128i &a, const __m128i &b)
^
In file included from /tmp/lepton-20160718-65941-texea9/lepton-1.2/src/lepton/jpgcoder.cc:64:
In file included from /tmp/lepton-20160718-65941-texea9/lepton-1.2/src/lepton/vp8_decoder.hh:4:
In file included from /tmp/lepton-20160718-65941-texea9/lepton-1.2/src/lepton/lepton_codec.hh:4:
/tmp/lepton-20160718-65941-texea9/lepton-1.2/src/vp8/model/model.hh:904:28: error: call to '_mm_mullo_epi32' is ambiguous
        __m128i deq_high = _mm_mullo_epi32(coeffs_x_high, icos_high);
                           ^~~~~~~~~~~~~~~
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/7.3.0/include/smmintrin.h:130:1: note: candidate function
_mm_mullo_epi32 (__m128i __V1, __m128i __V2)
^
/tmp/lepton-20160718-65941-texea9/lepton-1.2/src/vp8/model/../util/mm_mullo_epi32.hh:38:1: note: candidate function
_mm_mullo_epi32(const __m128i &a, const __m128i &b)
^

Build environment:

CPU: quad-core 64-bit ivybridge
OS X: 10.11.5-x86_64
Xcode: 7.3.1
CLT: 7.3.1.0.1.1461711523
Clang: 7.3 build 703

EDIT: The log above is for v1.2 (08c52d9).

@danielrh
Copy link
Contributor

fascinating!
Could you try a quick test of find/replace in the repo all instances of _mm_mullo_epi32 to something else like vec_multiply

then get rid of all the ifdefs around
https://github.com/dropbox/lepton/blob/master/src/vp8/util/mm_mullo_epi32.hh

if that fixes it.... then we have at least an idea of a path forward, even if it causes optimized architectures to be slower

@danielrh
Copy link
Contributor

My guess is actually that the #ifdef guards around
https://github.com/dropbox/lepton/blob/master/src/vp8/util/mm_mullo_epi32.hh
are too generous...and that function is being allowed into a build that doesn't need the (new) fallback code

@zmwangx
Copy link
Contributor Author

zmwangx commented Jul 18, 2016

Sorry, since I'm not familiar with this sort of low level code, let me ask one stupid question: what header(s) do I need to include for vec_multiply?

@danielrh
Copy link
Contributor

The problem now seems to be that the fallback function is interfering with the system mm_mullo
So if you just rename all then it will use the fallback function no matter what and the naming conflict disappears

@zmwangx
Copy link
Contributor Author

zmwangx commented Jul 18, 2016

Ah I see, sorry for being dumb... I applied this patch and let's wait for the build server to catch up.

@zmwangx
Copy link
Contributor Author

zmwangx commented Jul 18, 2016

@danielrh Results in. Indeed, if _mm_mullo_epi32 is renamed to vec_multiply and the latter is used unconditionally then everything passes.

@danielrh
Copy link
Contributor

Fascinating. Now it would be too bad to trade off speed for this but it might be good enough for a starting version

@zmwangx
Copy link
Contributor Author

zmwangx commented Jul 18, 2016

👍

My guess is actually that the #ifdef guards around
https://github.com/dropbox/lepton/blob/master/src/vp8/util/mm_mullo_epi32.hh
are too generous...and that function is being allowed into a build that doesn't need the (new) fallback code

That's true. Clang 7.3.0's smmintrin.h is here in case you're interested.

With a cursory glance I don't see how the #ifdef can be fixed, but what about manually checking for system __mm_mullo_epi32 in configure.ac and CMakeLists.txt?

@danielrh
Copy link
Contributor

That could work... It's a lot of annoyance to maintain that kind of check in cmake but I think it may not happen there

@zmwangx
Copy link
Contributor Author

zmwangx commented Jul 18, 2016

It's a lot of annoyance to maintain that kind of check in cmake

Yeah I know...

but I think it may not happen there

By "may not happen there" you mean?

@danielrh
Copy link
Contributor

cmake doesn't use -march=native so maybe it does a better job running on the build servers

@zmwangx
Copy link
Contributor Author

zmwangx commented Jul 18, 2016

I was using CMake on the build servers all along

$ cmake . -DCMAKE_C_FLAGS_RELEASE=-DNDEBUG -DCMAKE_CXX_FLAGS_RELEASE=-DNDEBUG -DCMAKE_INSTALL_PREFIX=/usr/local/Cellar/lepton/1.2 -DCMAKE_BUILD_TYPE=Release -DCMAKE_FIND_FRAMEWORK=LAST -DCMAKE_VERBOSE_MAKEFILE=ON -Wno-dev

so the errors do happen with CMake.

@danielrh
Copy link
Contributor

Hmm I tried making you a branch that does things "right" on OSX...
that branch is called osx_hack if it works we can merge to master... it should use the good math functions
I found on this list that OSX is not guaranteed to provide the needed macros (though 10.10 does)
https://software.intel.com/en-us/node/514528

@danielrh
Copy link
Contributor

@zmwangx
Copy link
Contributor Author

zmwangx commented Jul 18, 2016

I haven't applied the latest commit, but 7e1a155 alone results in something weird on 10.11 on 10.10:

/usr/local/Library/Homebrew/shims/super/clang++    -I/tmp/lepton-20160718-51992-ij3ypr/lepton-1.2 -I/tmp/lepton-20160718-51992-ij3ypr/lepton-1.2/src/vp8/util -I/tmp/lepton-20160718-51992-ij3ypr/lepton-1.2/src/vp8/model -I/tmp/lepton-20160718-51992-ij3ypr/lepton-1.2/src/vp8/encoder -I/tmp/lepton-20160718-51992-ij3ypr/lepton-1.2/src/vp8/decoder  -std=c++11 -fno-exceptions -fno-rtti -DNDEBUG   -march=core-avx2 -D__SSE4_1__=1 -D__SSE4_2__=1 -D__AVX2__=1 -D__AVX__=1   -DDEFAULT_ALLOW_PROGRESSIVE -DHIGH_MEMORY -o CMakeFiles/lepton-avx.dir/src/lepton/recoder.cc.o -c /tmp/lepton-20160718-51992-ij3ypr/lepton-1.2/src/lepton/recoder.cc
fatal error: error in backend: Do not know how to split this operator's operand!

http://bot.brew.sh/job/Homebrew%20Core%20Pull%20Requests/5020/version=mavericks/testReport/junit/brew-test-bot/mavericks/install_lepton/
http://bot.brew.sh/job/Homebrew%20Core%20Pull%20Requests/5020/version=yosemite/testReport/junit/brew-test-bot/yosemite/install_lepton/

And something else strikes back in 10.9:
http://bot.brew.sh/job/Homebrew%20Core%20Pull%20Requests/5020/version=mavericks/testReport/junit/brew-test-bot/mavericks/install_lepton/

@danielrh
Copy link
Contributor

The plot thickens

try osx_hack2 if the other commit fails?

@danielrh
Copy link
Contributor

also: I can't seem to access the logs--it appears to want some sort of access that I can't grant

@zmwangx
Copy link
Contributor Author

zmwangx commented Jul 18, 2016

it appears to want some sort of access that I can't grant

Ah yes, IIRC bot.brew.sh needs to read your organization membership to determine if you're a Homebrew maintainer, and enable more functionality if you are.

If that's not okay for you, I'll upload the logs to gists.

@danielrh
Copy link
Contributor

that would be ideal: thank you!

@zmwangx
Copy link
Contributor Author

zmwangx commented Jul 18, 2016

With both commits, on 10.9 and 10.10 I get the command line parsing error, and on 10.11 I get

/usr/local/Library/Homebrew/shims/super/clang++    -I/tmp/lepton-20160718-13240-ozse7s/lepton-1.2 -I/tmp/lepton-20160718-13240-ozse7s/lepton-1.2/src/vp8/util -I/tmp/lepton-20160718-13240-ozse7s/lepton-1.2/src/vp8/model -I/tmp/lepton-20160718-13240-ozse7s/lepton-1.2/src/vp8/encoder -I/tmp/lepton-20160718-13240-ozse7s/lepton-1.2/src/vp8/decoder  -std=c++11 -fno-exceptions -fno-rtti -DNDEBUG -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk -mmacosx-version-min=10.11   -march=core-avx2 -D__SSE4_1__=1 -D__SSE4_2__=1 -D__AVX2__=1 -D__AVX__=1   -DDEFAULT_ALLOW_PROGRESSIVE -DHIGH_MEMORY -o CMakeFiles/lepton-avx.dir/src/lepton/recoder.cc.o -c /tmp/lepton-20160718-13240-ozse7s/lepton-1.2/src/lepton/recoder.cc
/tmp/lepton-20160718-13240-ozse7s/lepton-1.2/src/lepton/recoder.cc:81:23: error: always_inline function '_mm256_load_si256' requires target feature 'sse4.2', but would be inlined into function 'find_aligned_end_64' that is compiled without support for 'sse4.2'
        __m256i row = _mm256_load_si256((const __m256i*)(const char*)(block + iter));
                      ^
/tmp/lepton-20160718-13240-ozse7s/lepton-1.2/src/lepton/recoder.cc:82:27: error: always_inline function '_mm256_cmpeq_epi16' requires target feature 'avx2', but would be inlined into function 'find_aligned_end_64' that is compiled without support for 'avx2'
        __m256i row_cmp = _mm256_cmpeq_epi16(row, _mm256_setzero_si256());
                          ^
/tmp/lepton-20160718-13240-ozse7s/lepton-1.2/src/lepton/recoder.cc:82:51: error: always_inline function '_mm256_setzero_si256' requires target feature 'sse4.2', but would be inlined into function 'find_aligned_end_64' that is compiled without support for 'sse4.2'
        __m256i row_cmp = _mm256_cmpeq_epi16(row, _mm256_setzero_si256());
                                                  ^
/tmp/lepton-20160718-13240-ozse7s/lepton-1.2/src/lepton/recoder.cc:83:16: error: always_inline function '_mm256_movemask_epi8' requires target feature 'avx2', but would be inlined into function 'find_aligned_end_64' that is compiled without support for 'avx2'
        mask = _mm256_movemask_epi8(row_cmp);
               ^

Logs:

10.11: https://gist.github.com/anonymous/c8584f6b220f6f93324bc5ae8180ebb7
10.10: https://gist.github.com/anonymous/a4a3f8f2a53cd946bbf918b21ed908bd
10.9: https://gist.github.com/anonymous/f78f0648b6c981d49c7136c7154f7548

@danielrh
Copy link
Contributor

can you try osx_hack2 ?

@zmwangx
Copy link
Contributor Author

zmwangx commented Jul 18, 2016

Just noticed the branch, pushed, waiting for server.

@zmwangx
Copy link
Contributor Author

zmwangx commented Jul 18, 2016

With osx_hack2, build passes on 10.11, but the "SSE 4.1 instruction set not enabled" issue is back on 10.10 and 10.9.

Logs:

10.10: https://gist.github.com/f21889ed9bfaf607ef4863c85c72068e
10.9: https://gist.github.com/72b8453697bbd77f8ec3b0670f5f0a22

@danielrh
Copy link
Contributor

ok the differences between the working thing and the failing thing are absolutely minimal.

I guess there's only one line it could be...pushed a new osx_hack2...can you try one last time--maybe this is the magic bullet. Not sure why these centralized build systems are always so buggy

@zmwangx
Copy link
Contributor Author

zmwangx commented Jul 18, 2016

I'm delighted to report that all three builds passed (with 927635b)!

@danielrh
Copy link
Contributor

what a marathon!

@zmwangx
Copy link
Contributor Author

zmwangx commented Jul 18, 2016

Thanks for all the work over here!

@zmwangx
Copy link
Contributor Author

zmwangx commented Jul 18, 2016

Is it possible to make a release for this so that lepton could be more readily packaged on OS X? Or maybe you'll wait for some more substantial changes?

@danielrh
Copy link
Contributor

https://github.com/dropbox/lepton/releases does this work well enough for you--it's sort of a partial release since there's no changes from a windows perspective

@danielrh
Copy link
Contributor

1.2.1 that is

@zmwangx
Copy link
Contributor Author

zmwangx commented Jul 18, 2016

That's good enough, thank you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants