Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Debian Bullseye: Incompatibility with Octave #15504

Closed
git-fabus opened this issue Jun 27, 2023 · 10 comments
Closed

On Debian Bullseye: Incompatibility with Octave #15504

git-fabus opened this issue Jun 27, 2023 · 10 comments

Comments

@git-fabus
Copy link

Minimal example

We have observed a rather strange behavior of dealii in combination with octave's dependencies.
On a cleanly installed Debian Bullseye system, we built dealii according to your installation instructions [1], installing only the minimum requirements (build-essential, cmake, liblapack-dev) via apt. At this point the make test routine worked totally fine but after installing octave via apt including it dependencies those test wouldn't work anymore and ended with the following Error:

Running quicktests...
Test project /home/user/testing/build/tests/quick_tests
    Start 6: umfpack.debug
    Start 1: step.debug
    Start 2: step.release
    Start 4: tbb.debug
    Start 3: affinity.debug
    Start 5: lapack.debug
1/6 Test #3: affinity.debug ...................   Passed    5.62 sec
2/6 Test #5: lapack.debug .....................   Passed    5.61 sec
3/6 Test #4: tbb.debug ........................   Passed    5.63 sec
4/6 Test #2: step.release .....................   Passed    5.72 sec
5/6 Test #6: umfpack.debug ....................***Failed    5.84 sec
gmake[7]: *** [tests/quick_tests/CMakeFiles/umfpack.debug.run.dir/build.make:76: tests/quick_tests/CMakeFiles/umfpack.debug.run] Error 1
gmake[6]: *** [CMakeFiles/Makefile2:9574: tests/quick_tests/CMakeFiles/umfpack.debug.run.dir/all] Error 2
gmake[5]: *** [CMakeFiles/Makefile2:9581: tests/quick_tests/CMakeFiles/umfpack.debug.run.dir/rule] Error 2
gmake[4]: *** [Makefile:4231: umfpack.debug.run] Error 2
Test umfpack.debug: RUN
===============================   OUTPUT BEGIN  ===============================
[  0%] Built target reset-umfpack.debug-OK
[  0%] Built target expand_instantiations_exe
[  0%] Built target obj_arborx_inst
[  2%] Built target obj_arborx_debug
[ 11%] Built target obj_boost_serialization_debug
[ 11%] Built target obj_boost_system_debug
[ 13%] Built target obj_tbb_debug
[ 18%] Built target obj_umfpack_ZL_SOLVE_debug
[ 18%] Built target obj_umfpack_ZL_ASSEMBLE_debug
[ 18%] Built target obj_umfpack_ZL_STORE_debug
[ 18%] Built target obj_umfpack_DL_TRIPLET_MAP_X_debug
[ 18%] Built target obj_umfpack_DL_STORE_debug
[ 20%] Built target obj_umfpack_DL_TRIPLET_MAP_NOX_debug
[ 22%] Built target obj_umfpack_GENERIC_debug
[ 22%] Built target obj_umfpack_DL_TSOLVE_debug
[ 22%] Built target obj_umfpack_DL_TRIPLET_NOMAP_X_debug
[ 22%] Built target obj_umfpack_ZL_TRIPLET_NOMAP_NOX_debug
[ 25%] Built target obj_umfpack_L_UMFPACK_debug
[ 34%] Built target obj_umfpack_ZL_TSOLVE_debug
[ 34%] Built target obj_umfpack_ZL_TRIPLET_MAP_NOX_debug
[ 52%] Built target obj_umfpack_L_UMF_debug
[ 52%] Built target obj_umfpack_DL_ASSEMBLE_debug
[ 52%] Built target obj_umfpack_DL_TRIPLET_NOMAP_NOX_debug
[ 52%] Built target obj_umfpack_DL_SOLVE_debug
[ 59%] Built target obj_umfpack_Z_UMF_debug
[ 59%] Built target obj_umfpack_ZL_TRIPLET_MAP_X_debug
[ 59%] Built target obj_umfpack_ZL_TRIPLET_NOMAP_X_debug
[ 59%] Built target obj_amd_global_debug
[ 61%] Built target obj_amd_long_debug
[ 63%] Built target obj_amd_int_debug
[ 65%] Built target obj_muparser_debug
[ 70%] Built target obj_numerics_inst
[ 79%] Built target obj_numerics_debug
[ 86%] Built target obj_fe_inst
[ 93%] Built target obj_fe_debug
[100%] Built target obj_dofs_inst
[102%] Built target obj_dofs_debug
[106%] Built target obj_lac_inst
[111%] Built target obj_lac_debug
[118%] Built target obj_base_inst
[127%] Built target obj_base_debug
[138%] Built target obj_cgal_inst
[138%] Built target obj_cgal_debug
[138%] Built target obj_gmsh_inst
[138%] Built target obj_gmsh_debug
[140%] Built target obj_grid_inst
[145%] Built target obj_grid_debug
[147%] Built target obj_hp_inst
[147%] Built target obj_hp_debug
[150%] Built target obj_multigrid_inst
[152%] Built target obj_multigrid_debug
[154%] Built target obj_distributed_inst
[156%] Built target obj_distributed_debug
[159%] Built target obj_algorithms_inst
[161%] Built target obj_algorithms_debug
[161%] Built target obj_matrix_free_inst
[163%] Built target obj_matrix_free_debug
[168%] Built target obj_meshworker_inst
[168%] Built target obj_meshworker_debug
[168%] Built target obj_opencascade_inst
[168%] Built target obj_opencascade_debug
[168%] Built target obj_particle_inst
[170%] Built target obj_particle_debug
[172%] Built target obj_differentiation_ad_inst
[172%] Built target obj_differentiation_ad_debug
[172%] Built target obj_differentiation_sd_inst
[175%] Built target obj_differentiation_sd_debug
[175%] Built target obj_physics_inst
[175%] Built target obj_physics_debug
[177%] Built target obj_physics_elasticity_inst
[177%] Built target obj_physics_elasticity_debug
[177%] Built target obj_non_matching_inst
[177%] Built target obj_non_matching_debug
[179%] Built target obj_sundials_inst
[179%] Built target obj_sundials_debug
[181%] Built target deal_II.g
[184%] Built target umfpack.debug
umfpack.debug: RUN failed. Output:
terminate called after throwing an instance of 'dealii::StandardExceptions::ExcInternalError'
  what():  
--------------------------------------------------------
An error occurred in line <105> of file </home/user/testing/dealii-9.4.1/tests/quick_tests/umfpack.cc> in function
    void test(bool) [with int dim = 2]
The violated condition was: 
    x.l2_norm() / solution.l2_norm() < 1e-8
Additional information: 
    This exception -- which is used in many places in the library --
    usually indicates that some condition which the author of the code
    thought must be satisfied at a certain point in an algorithm, is not
    fulfilled. An example would be that the first part of an algorithm
    sorts elements of an array in ascending order, and a second part of
    the algorithm later encounters an element that is not larger than the
    previous one.
    
    There is usually not very much you can do if you encounter such an
    exception since it indicates an error in deal.II, not in your own
    program. Try to come up with the smallest possible program that still
    demonstrates the error and contact the deal.II mailing lists with it
    to obtain help.
--------------------------------------------------------

Aborted


umfpack.debug: ******    RUN failed    *******

===============================    OUTPUT END   ===============================
Expected stage PASSED - aborting
CMake Error at /home/user/testing/dealii-9.4.1/cmake/scripts/run_test.cmake:144 (MESSAGE):
  *** abort



6/6 Test #1: step.debug .......................   Passed    9.71 sec

83% tests passed, 1 tests failed out of 6

Total Test time (real) =   9.71 sec

The following tests FAILED:
	  6 - umfpack.debug (Failed) 

Our current system-configuration is:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          12
On-line CPU(s) list:             0-11
Thread(s) per core:              2
Core(s) per socket:              6
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) W-2133 CPU @ 3.60GHz

Original problem

We tried to reproduce the problem on several other Computers (and systems) using Debian Bullseye with a different architectures and did not get a failure of the tests. However the Problem could not be replicated on a other system with a different architecture.
Currently we are also running several PowerEdge R940 , where one still comes with Ubuntu Bionic and there all test passed even with octave installed. On another PowerEdge R940 with a installed Debian Bullseye it wont work again.

The Problem original occurred on those machines and did some research ( to track down the problem) with the above listed architecture to provide you a minimal not working example.
Also we are not sure, if there are other package combinations that wont work.
Thanks in advance

@drwells
Copy link
Member

drwells commented Jun 27, 2023

I'm not sure how this could happen - this strongly implies that UMFPACK computed a wrong answer, but we've been using this package (via Ubuntu) for a long time.

What I find odd is that the output contains

[181%] Built target deal_II.g
[184%] Built target umfpack.debug

this (numbers > 100%) usually implies that something has gone wrong in CMake (like it being run concurrently). Can you verify that we can get a clean build of deal.II to reproduce this issue?

@tamiko
Copy link
Member

tamiko commented Jun 27, 2023

@git-fabus I am trying to reproduce

@tamiko tamiko changed the title Incopalibity with Octave On Debian Bullseye: Incompatibility with Octave Jun 27, 2023
@tamiko
Copy link
Member

tamiko commented Jun 27, 2023

@git-fabus Would you mind to upload the file detailed.log of the failing configuration? As well as the output of ldd lib/libdeal_II.g.so?

@git-fabus
Copy link
Author

@drwells Yes, we migrated from Ubuntu to Debian and after that noticed, that we were getting wrong results.

We also believe that umfpack or its dependencies simply calculate wrong results, since when using the dealii installation to actually compute numerical examples, we obtain results that simply do not make sense (algorithms do not converge that should, etc...).

We built the dealii installation for the minimal example from scratch as described. Can you please elaborate what you mean by a clean build?
Concerning the dealii installation, we also noticed the following: In the setting of the above described original problem (and we are perfectly aware that you should not do this normally) we can in fact netmount the files of the exact same installation between the two identical PowerEdge R940 (one on Debian Bullseye and one on Ubuntu Bionic) and obtain the above described behaviour of the test failing under Debian but passing under Ubuntu( we compiled everything on a Debian system).

@tamiko
ldd lib/libdeal_II.g.so results in:

linux-vdso.so.1 (0x00007ffdb87cf000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc1d58f9000)
    liblapack.so.3 => /lib/x86_64-linux-gnu/liblapack.so.3 (0x00007fc1d525c000)
    libblas.so.3 => /lib/x86_64-linux-gnu/libblas.so.3 (0x00007fc1d51fc000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc1d51f6000)
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc1d5029000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc1d4ee3000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc1d4ec9000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc1d4cf5000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fc1f0157000)
    libopenblas.so.0 => /lib/x86_64-linux-gnu/libopenblas.so.0 (0x00007fc1d2a25000)
    libgfortran.so.5 => /lib/x86_64-linux-gnu/libgfortran.so.5 (0x00007fc1d276f000)
    libquadmath.so.0 => /lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007fc1d2726000)

The detailed.log is attached: detailed.log

@drwells
Copy link
Member

drwells commented Jun 28, 2023

Thanks. It's interesting that this is with the bundled copy of UMFPACK (which itself has no dependencies) which depends on LAPACK.

By clean build, I mean that we should ensure that the build completes normally (in a fresh build directory) with make -jN and make test (make test should not be run in parallel): I suspect that's related to this failure.

edit: I was wrong about UMFPACK

@git-fabus
Copy link
Author

As we understand the documentation correctly make test runs automatically in non parallel if no parameters given. Indeed we did build with a fresh build directory but did not build with only one core. We ran instead of make -jN make --jobs=N, which should be the same(with N=10 or 12).
We also did not notice any problems / weird messages like Errors. Where could we finde potential Error-Messages during the building process? Will they written in to a log?
In addition to this testing we also had an environment with an external UMFPACK (and depending packages) and the same Error from the original problem statement occurred as well.
In our perspective it is a concrete combination of Debian and the architecture of the processor.

@tamiko
Copy link
Member

tamiko commented Jun 28, 2023

@git-fabus I was trying to reproduce this issue yesterday starting with a debootstrapped bullseye (using bullseye, bullseye-updates and bullseye-backports as package sources). I was not able to trigger this issue.

But here is a thought: with liblapack-dev you are installing the default lapack provider:

# ls -la /usr/lib/x86_64-linux-gnu/libblas*  
lrwxrwxrwx 1 root root 44 Jun 28 09:41 /usr/lib/x86_64-linux-gnu/libblas.a -> /etc/alternatives/libblas.a-x86_64-linux-gnu
lrwxrwxrwx 1 root root 45 Jun 28 09:41 /usr/lib/x86_64-linux-gnu/libblas.so -> /etc/alternatives/libblas.so-x86_64-linux-gnu
lrwxrwxrwx 1 root root 47 Jun 28 09:41 /usr/lib/x86_64-linux-gnu/libblas.so.3 -> /etc/alternatives/libblas.so.3-x86_64-linux-gnu
# ls -la /usr/lib/x86_64-linux-gnu/liblapack*
lrwxrwxrwx 1 root root       46 Jun 28 09:41 /usr/lib/x86_64-linux-gnu/liblapack.a -> /etc/alternatives/liblapack.a-x86_64-linux-gnu
lrwxrwxrwx 1 root root       47 Jun 28 09:41 /usr/lib/x86_64-linux-gnu/liblapack.so -> /etc/alternatives/liblapack.so-x86_64-linux-gnu
lrwxrwxrwx 1 root root       49 Jun 28 09:41 /usr/lib/x86_64-linux-gnu/liblapack.so.3 -> /etc/alternatives/liblapack.so.3-x86_64-linux-gnu
-rw-r--r-- 1 root root 12742148 Aug  1  2020 /usr/lib/x86_64-linux-gnu/liblapack_pic.a
# ls -la /etc/alternatives/libblas.*              
lrwxrwxrwx 1 root root 40 Jun 28 09:41 /etc/alternatives/libblas.a-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/blas/libblas.a
lrwxrwxrwx 1 root root 41 Jun 28 09:41 /etc/alternatives/libblas.so-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/blas/libblas.so
lrwxrwxrwx 1 root root 43 Jun 28 09:41 /etc/alternatives/libblas.so.3-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/blas/libblas.so.3
# ls -la /etc/alternatives/liblapack.*
lrwxrwxrwx 1 root root 44 Jun 28 09:41 /etc/alternatives/liblapack.a-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/lapack/liblapack.a
lrwxrwxrwx 1 root root 45 Jun 28 09:41 /etc/alternatives/liblapack.so-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/lapack/liblapack.so
lrwxrwxrwx 1 root root 47 Jun 28 09:41 /etc/alternatives/liblapack.so.3-x86_64-linux-gnu -> /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3

Would you mind checking quickly whether you still have these default providers for blas and lapack selected on the faulty machine (because this might have changed after you installed octave, etc.)?

@tamiko
Copy link
Member

tamiko commented Jun 28, 2023

@git-fabus Actually - I was a bit blind... your ldd output shows:

    libblas.so.3 => /lib/x86_64-linux-gnu/libblas.so.3 (0x00007fc1d51fc000)
    libopenblas.so.0 => /lib/x86_64-linux-gnu/libopenblas.so.0 (0x00007fc1d2a25000)

I wouldn't necessarily trust libopenblas... it has shown weird behavior in the past.
I recommend to deselect openblas (via the alternatives mechanism).

@tamiko tamiko removed the Bug label Jun 28, 2023
@git-fabus
Copy link
Author

git-fabus commented Jun 29, 2023

Actually, these paths should be fine. We noticed that during the installation of octave (after some further tests) the paths of /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 were changed to
/usr/lib/x86_64-linux-gnu/blas/libblas.so.3 (with update-alternatives --config libblas.so.3-x86_64-linux-gnu), which resulted in the failed test, but only on the specific system, not on other systems with a different processor.
Also, the error does not occur on Ubuntu. So maybe the problem is rather somewhere else / deeper inside? It seems that openblas does not work as intended on this specific architecture and Debian.
Is there any way to work around this without building root privileges directly into deall.ii? There is probably a subset of people who tested dealii and everything worked and then installed octave and didn't notice this, cause the normal written programms wont report this error.

Edit: I think it is more or like a problem, that you should be aware of, but there is nothing to be done here anymore

@tamiko
Copy link
Member

tamiko commented Jul 1, 2023

@git-fabus Thanks for letting us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants