Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimator unit test fails on M1 Mac #4287

Open
camelto2 opened this issue Oct 19, 2022 · 15 comments
Open

Estimator unit test fails on M1 Mac #4287

camelto2 opened this issue Oct 19, 2022 · 15 comments

Comments

@camelto2
Copy link
Contributor

Describe the bug
develop branch seems to have a bug, at least on my mac. I haven't been able to reproduce it anywhere else. Looks like a BUS error on unit_test_estimator

(note also, if you want to be able to even compile on the M1 right now with homebrew g++, you have to change your CLT from v14. There was a bug introduced with CLT v14 that seems to have a problem linking. The current solution is to download a previous CLT or use the 14.1 beta CLT. Any of those can be downloaded from apple developer)

To Reproduce
git checkout develop
build_dir=build_gcc
mkdir -p $build_dir
cd $build_dir
CC=gcc-12
CXX=g++-12
cmake -D QMC_MPI=0
-D CMAKE_C_COMPILER=$CC
-D CMAKE_CXX_COMPILER=$CXX
-D QMC_COMPLEX=1
..
make -j 8
ctest -R unit_estimators

Expected behavior
test shouldn't fail

System:

  • M1 Mac, running Monterey 12.6

Additional context
I talked through this with Ye at the All-hands meeting, and the cause is still unclear to us.
When I recompile with -g, lldb gives the backtrace below

The issue seems to be in InputSection::setFromValue. It is failing on the "count" name being passed in as RealType(15.)

Not really sure why it is failing but passing elsewhere. @ye-luo told me to ping @PDoakORNL to see if he had any ideas

 * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x102f8e105)

  * frame #0: 0x0000000195f17e0c libc++abi.dylib`__cxxabiv1::__class_type_info::process_static_type_above_dst(__cxxabiv1::__dynamic_cast_info*, void const*, void const*, int) const + 4
    frame #1: 0x0000000102e0ba04 libstdc++.6.dylib`get_adjusted_ptr(std::type_info const*, std::type_info const*, void**) + 100
    frame #2: 0x0000000102e0c2ac libstdc++.6.dylib`__gxx_personality_v0 + 1260
    frame #3: 0x00000001a0a8f380 libunwind.dylib`_Unwind_RaiseException + 576
    frame #4: 0x0000000102e0ca84 libstdc++.6.dylib`__cxa_throw + 84
    frame #5: 0x0000000100064ee8 test_estimators`std::__throw_bad_any_cast() at any:64:24
    frame #6: 0x0000000100072090 test_estimators`int std::any_cast<int>(__any=0x000060000170c468) at any:471:27
    frame #7: 0x0000000100161238 test_estimators`void qmcplusplus::InputSection::setFromValue<std::any>(this=0x000000016fdfbca8, name=0x000060000170c448, value=0x000060000170c468) at InputSection.cpp:149:39
    frame #8: 0x0000000100160634 test_estimators`qmcplusplus::InputSection::init(this=0x000000016fdfbca8, init_values=0x000000016fdfd6f0) at InputSection.cpp:73:17
    frame #9: 0x000000010006df8c test_estimators`::____C_A_T_C_H____T_E_S_T____9() at test_InputSection.cpp:320:5
    frame #10: 0x00000001000b2b00 test_estimators`Catch::TestInvokerAsFunction::invoke(this=0x0000600000004200) const at catch.hpp:14321:25
    frame #11: 0x00000001000b1d70 test_estimators`Catch::TestCase::invoke(this=0x00000001038126d8) const at catch.hpp:14160:21
    frame #12: 0x00000001000abb88 test_estimators`Catch::RunContext::invokeActiveTestCase(this=0x000000016fdfde58) at catch.hpp:13020:33
    frame #13: 0x00000001000ab9a4 test_estimators`Catch::RunContext::runCurrentTest(this=0x000000016fdfde58, redirectedCout=0x000000016fdfdb08, redirectedCerr=0x000000016fdfdae8) at catch.hpp:12993:37
    frame #14: 0x00000001000aa97c test_estimators`Catch::RunContext::runTest(this=0x000000016fdfde58, testCase=0x00000001038126d8) at catch.hpp:12754:27
    frame #15: 0x00000001000ad278 test_estimators`TestGroup::execute(this=0x000000016fdfde48) const at catch.hpp:13347:52
    frame #16: 0x00000001000aec78 test_estimators`Catch::Session::runInternal(this=0x000000016fdfe1b8) at catch.hpp:13553:46
    frame #17: 0x00000001000ae9b0 test_estimators`Catch::Session::run(this=0x000000016fdfe1b8) at catch.hpp:13509:35
    frame #18: 0x00000001000d4aa4 test_estimators`int Catch::Session::run<char>(this=0x000000016fdfe1b8, argc=1, argv=0x000000016fdfe848) at catch.hpp:13231:33
    frame #19: 0x00000001000c5e30 test_estimators`main(argc=1, argv=0x000000016fdfe848) at catch_main.cpp:64:27
    frame #20: 0x000000010243d08c dyld`start + 520

@prckent
Copy link
Contributor

prckent commented Oct 23, 2022

Do the tests pass with a non-complex build?

@camelto2
Copy link
Contributor Author

Do the tests pass with a non-complex build?

No, it fails regardless of real/complex

@camelto2
Copy link
Contributor Author

Also, I tried to compile with the address sanitizer support, but it seems that homebrew gnu compilers don't come with the libraries for the M1, whereas for intel Macs the libraries are there.

@prckent
Copy link
Contributor

prckent commented Nov 1, 2022

With the release of command line tools 4.1 I was able to independently reproduce this.

@ye-luo
Copy link
Contributor

ye-luo commented Feb 4, 2023

This issue remains with gcc-12 on my mac. It is an issue of gcc on mac I believe.

@prckent
Copy link
Contributor

prckent commented Feb 4, 2023

Does macports or brew installed clang have any issues? It would be good to have a recommendable route and to update the build recipe in the manual.

@ye-luo
Copy link
Contributor

ye-luo commented Feb 4, 2023

Does macports or brew installed clang have any issues? It would be good to have a recommendable route and to update the build recipe in the manual.

I only tried brew. The issue with clang was, I failed to find a working C++ standard library advanced enough for qmcpack needs.

@PDoakORNL
Copy link
Contributor

So to bring me into the loop here, is this still just an M1 phenomenon?

@prckent
Copy link
Contributor

prckent commented Feb 6, 2023

The ARM-based Orange PI reporting at https://cdash.qmcpack.org/CDash/viewTest.php?onlyfailed&buildid=396302 shows only numerical "differences"/failures in a deterministic optimizer test. No x86 builds are failing. => only issues on M1 so far.

Has anyone already tried the spack route instead of macports, brew?

@prckent
Copy link
Contributor

prckent commented Feb 6, 2023

Reproduced with macports gcc 12.2.0

@jptowns
Copy link
Contributor

jptowns commented Oct 27, 2023

I just hit this problem using gcc13 installed via homebrew on an M1 laptop running OSX Monterrey 12.7. Clang seems to still be problematic for the reasons Ye mentioned earlier.

@prckent
Copy link
Contributor

prckent commented Nov 3, 2023

Tried reinvestigating this just now on an m1 with Sonoma 14.1 . With #4815 I was finally able to build with AppleClang (!) and this test passed. Builds with gcc13 from macports resulting in a failing estimator unit test (only), but I notice they also configured with OpenBLAS while the AppleClang one picked up the preferred Accelerate framework. There might be other potentially significant differences. fftw-3, hdf5, boost were from macports. Note that the mpich and openmpi ports have issues so we aren't yet at a clean and easy build solution on Apple where everything "just works" as expected.

@prckent
Copy link
Contributor

prckent commented Nov 8, 2023

Not an Accelerate/OpenBLAS issue. Builds differing only by appleclang/gcc-13 fail only for the gcc-13 case.

@prckent
Copy link
Contributor

prckent commented Nov 8, 2023

Playing around with this I found that the bus error results from either of the CHECK_THROWS_AS tests in the InputSection::init TEST_CASE in test_InputSection.cpp. With them both commented, all the tests pass with gcc13.2 from macports.

TEST_CASE("InputSection::init", "[estimators]")
{
  SECTION("bad type handling")
  {
    TestInputSection ti;
    //CHECK_THROWS_AS(ti.init({{"full", bool(false)}, {"count", int(15)}, {"width", int(10)}}), UniformCommunicateError);
  }
 ...
  SECTION("invalid type assignment")
  {
    TestInputSection ti;
    //CHECK_THROWS_AS(ti.init({{"full", bool(false)}, {"count", Real(15.)}}), UniformCommunicateError);
  }

@PDoakORNL
Copy link
Contributor

PDoakORNL commented Nov 10, 2023

As far as I can tell gcc 13 does not officially support apple M1 at all. Looking at the homebrew formula its pulling in a unmerged branch from a well know but not official repo. I don't see any reason why we should support it or even look into any further. Use a compiler where support for M1 has actually been merged.

I would suggest we only officially support mainline llvm on osx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants