Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of currently failing tests #13703

Closed
peterrum opened this issue May 9, 2022 · 23 comments
Closed

List of currently failing tests #13703

peterrum opened this issue May 9, 2022 · 23 comments
Assignees
Milestone

Comments

@peterrum
Copy link
Member

peterrum commented May 9, 2022

(Matthias) Failing tests with RC1:

Host Configuration Commit Build errors Build warnings Failing tests Passing tests
tester Clang-14.0.5-dealii-9.4-unity_build 5303c8 0 0 0 0
tester GNU-12.1.1-dealii-9.4-all_components 5303c8 0 6 0 6
tester Clang-14.0.5-dealii-9.4-no_mpi 5303c8 0 0 0 0
tester GNU-12.1.1-dealii-9.4-cpp20 5303c8 0 7 0 0
tester Clang-14.0.5-dealii-9.4-cpp17 5303c8 0 0 0 0
tester Clang-14.0.5-dealii-9.4-cpp14 5303c8 0 0 0 0
tester Clang-14.0.5-dealii-9.4-autodetection 5303c8 0 0 22 13656
tester Clang-13.0.1-dealii-9.4-autodetection 5303c8 0 0 21 13657
tester Clang-12.0.0-dealii-9.4-autodetection 5303c8 0 0 0 0
tester Clang-11.1.0-dealii-9.4-autodetection 5303c8 0 0 0 0
tester Clang-10.0.0-dealii-9.4-autodetection 5303c8 0 0 0 0
tester Clang-9.0.1-dealii-9.4-autodetection 5303c8 0 0 0 0
tester Clang-8.0.1-dealii-9.4-autodetection 5303c8 0 0 0 0
tester Clang-7.1.0-dealii-9.4-autodetection 5303c8 0 0 0 0
tester Clang-6.0.1-dealii-9.4-autodetection 5303c8 0 0 0 0
tester Clang-5.0.2-dealii-9.4-autodetection 5303c8 0 0 0 0
tester GNU-12.1.1-dealii-9.4-autodetection 5303c8 0 6 4 13698
tester GNU-11.3.0-dealii-9.4-autodetection 5303c8 0 0 6 13684
tester GNU-10.3.0-dealii-9.4-autodetection 5303c8 0 0 0 0
tester GNU-9.3.0-dealii-9.4-autodetection 5303c8 0 0 0 0
tester GNU-8.4.0-dealii-9.4-autodetection 5303c8 0 0 0 0
tester GNU-7.4.0-dealii-9.4-autodetection 5303c8 0 0 0 0
tester GNU-6.5.0-dealii-9.4-autodetection 5303c8 0 0 0 0
tester GNU-11.2.0-dealii-9.4-ubuntu-lts-22 5303c8 0 0 12 13459
tester GNU-9.4.0-dealii-9.4-ubuntu-lts-20 5303c8 0 0 0 0
tester GNU-7.5.0-dealii-9.4-ubuntu-lts-18 5303c8 0 0 0 0
tester GNU-10.2.1-dealii-9.4-debian-11 5303c8 0 300 12 13252
tester GNU-8.3.0-dealii-9.4-debian-10 5303c8 0 0 0 0
tester Clang-14.0.5-dealii-9.4-avx2-O3 5303c8 0 0 34 13574
tester Clang-14.0.5-dealii-9.4-avx2-Ofast 5303c8 0 0 73 13535
tester GNU-12.1.1-dealii-9.4-avx2-O3 5303c8 0 300 9 13623
tester GNU-12.1.0-dealii-9.4-64bit_indices 5303c8 0 2 28 12969
tester GNU-10.3.0-dealii-9.4-petsc_complex 5303c8 0 0 0 13158
tester GNU-12.1.1-dealii-9.4-tets 5303c8 0 6 28 12985
e5d1d9e6850b GNU-9.3.0-dealii-9.4 5303c8 0 0 8 11988
@peterrum peterrum added this to the Release 9.4 milestone May 9, 2022
@marcfehling
Copy link
Member

marcfehling commented May 9, 2022

I will have a look at the sharedtria/refine_and_coarsen test (provided I can reproduce the problem).

EDIT: My guess is that we should pick a partitioner here, rather than using the auto setting.

@kronbichler
Copy link
Member

@drwells how should be handle the nodal_renumbering cases? They fail on my machine because my compiler generates a different error message, namely

DEAL:0::void dealii::DoFRenumbering::compute_support_point_wise(std::vector<types::global_dof_index> &, const DoFHandler<dim, spacedim> &) [dim = 2, spacedim = 2]

rather than the one you added:

DEAL:0::void dealii::DoFRenumbering::compute_support_point_wise(std::vector<unsigned int>&, const dealii::DoFHandler<dim, spacedim>&) [with int dim = 2; int spacedim = 2]

Since the exact error message is not critical, and it seems to be a lot of manual work to get this robust on all different compilers and compiler versions, I suggest to simply remove the lines

for (std::size_t i = 3; i < 6; ++i)
deallog << lines[i] << std::endl;
and
for (std::size_t i = 3; i < 6; ++i)
deallog << lines[i] << std::endl;
(and the part extracting the lines).

@marcfehling
Copy link
Member

@peterrum sharedtria/refine_and_coarsen_01 works on my machine. Before changing the partitioner as a shot in the dark, could you provide a verbose error message here? Thanks.

@drwells drwells self-assigned this May 10, 2022
@drwells
Copy link
Member

drwells commented May 10, 2022

Yes, I think the nodal renumbering test fails because the assertions try to be too clever. I'll fix it.

@peterrum
Copy link
Member Author

sharedtria/refine_and_coarsen_01 works on my machine. Before changing the partitioner as a shot in the dark, could you provide a verbose error message here? Thanks.

For some reason, CDash does not show the error messages. What happens? Which assert is triggered?

@marcfehling
Copy link
Member

marcfehling commented May 10, 2022

In addition, the following tests fail on our machine:

  • distributed_grids/3d_coarse_grid_02.debug
  • matrix_free/ecl_03.mpirun=5.debug
  • matrix_free/ecl_03.mpirun=5.release
  • matrix_free/ecl_04.mpirun=5.debug
  • matrix_free/ecl_04.mpirun=5.release
  • mpi/create_mpi_datatype_01.mpirun=2.debug
  • mpi/create_mpi_datatype_01.mpirun=2.release
  • optimization/bfgs_05.debug
  • optimization/bfgs_05.release
  • optimization/bfgs_05b.debug
  • optimization/bfgs_05b.release
  • petsc/reinit_preconditioner_01.mpirun=3.debug
  • petsc/reinit_preconditioner_01.mpirun=3.release
  • simplex/step-55.mpirun=2.release

EDIT: The create_mpi_datatype and ecl tests fail for me because of an outdated version of OpenMPI, see #13703 (comment)

@tjhei
Copy link
Member

tjhei commented May 10, 2022

  • mpi/create_mpi_datatype_01

I assume this is because your MPI version is very, very old. Not sure what to do about it.

@tjhei
Copy link
Member

tjhei commented May 10, 2022

Also see #13638

@peterrum
Copy link
Member Author

@marcfehling

matrix_free/ecl_03.mpirun=5.debug
matrix_free/ecl_03.mpirun=5.release
matrix_free/ecl_04.mpirun=5.debug
matrix_free/ecl_04.mpirun=5.release

What is the output here?

@tjhei
Copy link
Member

tjhei commented May 15, 2022

also see the failing tests in #13458

@marcfehling
Copy link
Member

matrix_free/ecl_03.mpirun=5.debug
matrix_free/ecl_03.mpirun=5.release
matrix_free/ecl_04.mpirun=5.debug
matrix_free/ecl_04.mpirun=5.release

What is the output here?

It looks like these tests are flagged Unstable on CDash https://cdash.dealii.43-1.org/test/8538014

I will check the output on our machines in a moment and provide them here.

@marcfehling
Copy link
Member

marcfehling commented May 16, 2022

  • mpi/create_mpi_datatype_01

I assume this is because your MPI version is very, very old. Not sure what to do about it.

matrix_free/ecl_03.mpirun=5.debug
matrix_free/ecl_03.mpirun=5.release
matrix_free/ecl_04.mpirun=5.debug
matrix_free/ecl_04.mpirun=5.release

What is the output here?

We have OpenMPI 1.10.7 on our machines. It believe this is the reason why these tests fail. Do we have an elegant way how to disable tests if the MPI library is too old? We could add an #ifdef in the main function, but maybe you have a better idea :)

You can find the ctest output to the tests here:

@tjhei
Copy link
Member

tjhei commented May 16, 2022

Do we have an elegant way how to disable tests if the MPI library is too old?

That would hide the fact that your system is broken, so I am not sure this is a good idea. One option would be to check "MPI VERSION <=3.0" (see https://github.com/tjhei/BigMPICompat#test-results

@tamiko

This comment was marked as outdated.

@kronbichler
Copy link
Member

I also see test failures on the test matrix_free/ecl_01, see e.g. https://cdash.dealii.43-1.org/test/10060936 and https://cdash.dealii.43-1.org/test/10048075 (the latter is when we build with SSE2 vectorization only, so that might give a hint). @peterrum can you take a look?

@peterrum
Copy link
Member Author

peterrum commented May 22, 2022

@kronbichler I am able to reproduce the error locally when compiling with -msse2 (and also with VectorizedArray<double,2>). I'll try to fix this, but not sure when.

@peterrum
Copy link
Member Author

peterrum commented May 22, 2022

@kronbichler I am pretty sure that the problem is the following lines

if (cell_type[cell] <= affine)

DerivativeForm<1, dim, dim> inv_jac =
fe_val_neigh.jacobian(q).covariant_form();
for (unsigned int d = 0; d < dim; ++d)
for (unsigned int e = 0; e < dim; ++e)
{
const unsigned int ee = ExtractFaceHelper::
reorder_face_derivative_indices<dim>(
fe_val_neigh.get_face_number(), e);
face_data_by_cells[my_q]
.jacobians[1][offset][d][e][v] =
inv_jac[d][ee];

The reduced mapping of the neighbors is set up based on the current cell. In the test, we have a hyperball (for some reason I disabled in the test the manifolds) so that we have an affine cell in the center and the other cells are all general. When enabling the manifolds so that the inner cells are not affine the results are identical independent of the vectorization.

Do you have an idea how to address this? I would say that we need MappingInfo::neighbor_cell_type.

@kronbichler
Copy link
Member

I had a look at the failing test multigrid-global-coarsening/interpolate_01.mpirun=1.release, which only fails in release mode and only recently, at some point during the last week. I see with valgrind:

==1000711== Invalid read of size 4
==1000711==    at 0x10FF2F10: dealii::internal::FineDoFHandlerView<2>::reinit(dealii::IndexSet const&, dealii::IndexSet const&, dealii::IndexSet const&, bool) (in /home/kronbichler/deal/mpi_build/lib/libdeal_II.so.9.4.0-pre)
==1000711==    by 0x10FFDAD8: dealii::internal::PermutationFineDoFHandlerView<2>::PermutationFineDoFHandlerView(dealii::DoFHandler<2, 2> const&, dealii::DoFHandler<2, 2> const&, unsigned int, unsigned int) (in /home/kronbichler/deal/mpi_build/lib/libdeal_II.so.9.4.0-pre)
==1000711==    by 0x10F901D9: void dealii::internal::MGTwoLevelTransferImplementation::reinit_polynomial_transfer<2, double>(dealii::DoFHandler<2, 2> const&, dealii::DoFHandler<2, 2> const&, dealii::AffineConstraints<double> const&, dealii::AffineConstraints<double> const&, unsigned int, unsigned int, dealii::MGTwoLevelTransfer<2, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> >&) (in /home/kronbichler/deal/mpi_build/lib/libdeal_II.so.9.4.0-pre)
==1000711==    by 0x10F92F2D: dealii::MGTwoLevelTransfer<2, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> >::reinit(dealii::DoFHandler<2, 2> const&, dealii::DoFHandler<2, 2> const&, dealii::AffineConstraints<double> const&, dealii::AffineConstraints<double> const&, unsigned int, unsigned int) (in /home/kronbichler/deal/mpi_build/lib/libdeal_II.so.9.4.0-pre)
==1000711==    by 0x125F50: void test<2, double>(unsigned int, unsigned int, bool) (in /home/kronbichler/deal/mpi_build/tests/multigrid-global-coarsening/interpolate_01.release/interpolate_01.release)
==1000711==    by 0x124F5A: main (in /home/kronbichler/deal/mpi_build/tests/multigrid-global-coarsening/interpolate_01.release/interpolate_01.release)
==1000711==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

While I can't see line numbers (and I can't run valgrind on the debug version right now, it seems our library is too big for valgrind), I believe the most likely cause is some work related on the consensus algorithms or, as a second possibility, my work on the number cache or dof indices inside the triangulation. I can try to investigate in more detail later, but I just wanted to hear if anyone sees it immediately.

@marcfehling
Copy link
Member

marcfehling commented May 27, 2022

For the petsc/preconditioner_tvmult_01 test, some exception is caused with the latest version of petsc 3.17.1. It also occurs if only a single process is used (no MPI debugging!). With gdb I get the following info...

Program received signal SIGFPE, Arithmetic exception.
0x00007fffbc5d29f5 in hypre_BoomerAMGSolveT (amg_vdata=0x9fd920, A=0x90bc50, f=0xa027e0, u=0x909ca0) at par_amg_solveT.c:220
220	      conv_factor = resid_nrm / old_resid;
Missing separate debuginfos, use: debuginfo-install blas-3.4.2-8.el7.x86_64 glibc-2.17-326.el7_9.x86_64 hwloc-libs-1.11.8-4.el7.x86_64 infinipath-psm-3.3-26_g604758e_open.2.el7.x86_64 lapack-3.4.2-8.el7.x86_64 libfabric-1.7.2-1.el7.x86_64 libgfortran-4.8.5-44.el7.x86_64 libibumad-22.4-6.el7_9.x86_64 libibverbs-22.4-6.el7_9.x86_64 libnl3-3.2.28-4.el7.x86_64 libpsm2-11.2.78-1.el7.x86_64 librdmacm-22.4-6.el7_9.x86_64 libtool-ltdl-2.4.2-22.el7_3.x86_64 libuuid-2.23.2-65.el7_9.1.x86_64 numactl-libs-2.0.12-5.el7.x86_64 openmpi-1.10.7-5.el7.x86_64 opensm-libs-3.3.21-4.el7_9.x86_64 ucx-1.5.2-1.el7.x86_64 zlib-1.2.7-20.el7_9.x86_64

...and the following backtrace

#0  0x00007fffbc5d29f5 in hypre_BoomerAMGSolveT (amg_vdata=0x9fd920, A=0x90bc50, f=0xa027e0, u=0x909ca0) at par_amg_solveT.c:220
#1  0x00007fffbc58ffe8 in HYPRE_BoomerAMGSolveT (solver=0x9fd920, A=0x90bc50, b=0xa027e0, x=0x909ca0) at HYPRE_parcsr_amg.c:83
#2  0x00007fffbe857116 in PCApplyTranspose_HYPRE_BoomerAMG (pc=0xa02f50, b=0x8e9820, x=0x8ecc00) at petsc-3.17.1/src/ksp/pc/impls/hypre/hypre.c:631
#3  0x00007fffbe93afb2 in PCApplyTranspose (pc=0xa02f50, x=0x8e9820, y=0x8ecc00) at petsc-3.17.1/src/ksp/pc/interface/precon.c:611
#4  0x00007fffeb9d39ec in dealii::PETScWrappers::PreconditionBase::Tvmult (this=0x7fffffffd650, dst=..., src=...) at dealii/source/lac/petsc_precondition.cc:80
#5  0x0000000000432dc7 in test<dealii::PETScWrappers::PreconditionBoomerAMG> () at dealii/tests/petsc/preconditioner_tvmult_01.cc:85
#6  0x0000000000423d7e in main (argc=1, argv=0x7fffffffdfb8) at dealii/tests/petsc/preconditioner_tvmult_01.cc:102

This looks like a bug in hypre. We should come up with a minimal testcase and report it to them.

@tjhei
Copy link
Member

tjhei commented Jun 9, 2022

Suggestions on how I should fix #13638 ?

@tjhei
Copy link
Member

tjhei commented Jun 17, 2022

On my jenkins we are down to 3 failing tests:

	1355 - distributed_grids/3d_coarse_grid_02.debug (Failed)
	5975 - simplex/poisson_01.mpirun=1.debug (Failed)
	6003 - simplex/poisson_01.mpirun=4.debug (Failed)
	6319 - lac/distributed_vector_sm.mpirun=4.debug (Failed)
lac/distributed_vector_sm.
/jenkins/workspace/dealii_PR-13458/tests/lac/distributed_vector_sm.cc: In lambda function:

/jenkins/workspace/dealii_PR-13458/tests/lac/distributed_vector_sm.cc:76:7: internal compiler error: Segmentation fault

       vector.local_element(i) = 10 * my_rank + i;

       ^

Please submit a full bug report,
/jenkins/workspace/dealii_PR-13458/tests/simplex/poisson_01.cc: In function ‘void test(const dealii::Triangulation<dim, spacedim>&, const dealii::FiniteElement<dimension_, space_dimension_>&, const dealii::Quadrature<dim>&, const dealii::hp::QCollection<(dim - 1)>&, const dealii::Mapping<dim, spacedim>&, double, bool)’:

/jenkins/workspace/dealii_PR-13458/tests/simplex/poisson_01.cc:262:28: error: the value of ‘dealii::ReferenceCells::Triangle’ is not usable in a constant expression

       case ReferenceCells::Triangle:

                            ^

In file included from /jenkins/workspace/dealii_PR-13458/include/deal.II/grid/tria_description.h:24:0,

                 from /jenkins/workspace/dealii_PR-13458/include/deal.II/grid/tria.h:29,

                 from /jenkins/workspace/dealii_PR-13458/include/deal.II/distributed/repartitioning_policy_tools.h:19,

                 from /jenkins/workspace/dealii_PR-13458/include/deal.II/distributed/fully_distributed_tria.h:22,

                 from /jenkins/workspace/dealii_PR-13458/tests/simplex/poisson_01.cc:24:

/jenkins/workspace/dealii_PR-13458/include/deal.II/grid/reference_cell.h:792:41: note: ‘dealii::ReferenceCells::Triangle’ was not declared ‘constexpr’

   DEAL_II_CONSTEXPR const ReferenceCell Triangle =

see https://jenkins.tjhei.info/blue/organizations/jenkins/dealii/detail/PR-13458/10/pipeline/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants