New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build fails on 32-bit architectures: compute_nonlocal_dual_graph: max_num_vertices_per_facet=-1 #1735
Comments
A sample backtrace looks like
|
Further debugging shows the error is in dolfinx/cpp/dolfinx/mesh/graphbuild.cpp Line 138 in 02f35af
In a two-process run in i386, the unmatched_facets loop here is skipped by one thread, accessed by the other thread. But values are
so of course it's crashing on
dolfinx/cpp/dolfinx/mesh/graphbuild.cpp Line 97 in 02f35af
With the explicit minus sign there, was buffer_global_min expected to have a negative value? Evidentally on i386 running 2 processes, it has buffer_global_min[0]=1 .
|
There are other python test failures on 32-bit machines, not certain if it's the same underlying problem. in C++ only demo_poisson_mpi is failing, while in python demo_helmholtz_2d.py, static-condensation-elasticity.py and demo_poisson.py all fail. See for example https://ci.debian.net/data/autopkgtest/testing/i386/f/fenics-dolfinx/16183257/log.gz Python unit tests give other errors:
and
The latter problem can be tested by hand (running the command manually.
while on amd64 it gets a specific dtype,
|
@drew-parsons is this still happening for 0.7.0? |
i'm waiting for debian to process dolfinx 0.7.0, will be able to say after that. |
I guess the problem has cleared now. |
There was an armel error in gjk similar to the one reported in #1104, but the tests mentioned here seem to be passing. |
Great, thanks. We can close this one too then! |
dolfinx 0.3.0 is failing to build on 32-bit architectures (i386, armhf, armel), see https://buildd.debian.org/status/package.php?p=fenics-dolfinx&suite=experimental
e.g. i386
https://buildd.debian.org/status/fetch.php?pkg=fenics-dolfinx&arch=i386&ver=1%3A0.3.0-3&stamp=1633022115&raw=0
There is a segfault, apparently triggered in openmpi's mca_btl_vader.so (backtrace reported at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=995599 ).
The point of handover from dolfinx to MPI before the segfault is mesh/graphbuild.cpp l.143-144
dolfinx/cpp/dolfinx/mesh/graphbuild.cpp
Line 143 in afbd8bd
Noting that the segfault is happening on 32-bit arches, and the dolfinx code is using int64_t to index the MPI buffers, could this be the origin of the segfault? Or would it more likely be some other bug in the OpenMPI implementation (in vader) ?
The text was updated successfully, but these errors were encountered: