Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation fails for Intel 18 #13821

Closed
marcfehling opened this issue May 25, 2022 · 15 comments
Closed

Compilation fails for Intel 18 #13821

marcfehling opened this issue May 25, 2022 · 15 comments

Comments

@marcfehling
Copy link
Member

marcfehling commented May 25, 2022

With the Intel 18.0.2 suite (compiler, MPI, MKL), we can not compile the current master branch. It succeeds with Intel 19.1.1 though.

The following error recurs:

In file included from /dealii/include/deal.II/grid/tria_description.h(27),
                 from /dealii/include/deal.II/grid/tria.h(29),
                 from /dealii/include/deal.II/distributed/tria_base.h(28),
                 from /dealii/include/deal.II/dofs/dof_handler.h(29),
                 from /dealii/include/deal.II/dofs/dof_accessor.h(22),
                 from /dealii/source/numerics/data_out.cc(18):
/dealii/include/deal.II/lac/la_parallel_vector.h(1938): internal error: assertion failed at: "shared/cfe/edgcpfe/class_decl.c", line 3591

                                    has_locally_owned_domain_indices<MatrixType>,
                                    ^

compilation aborted for /dealii/source/numerics/data_out.cc (code 4)
make[2]: *** [source/numerics/CMakeFiles/obj_numerics_release.dir/data_out.cc.o] Error 4
make[1]: *** [source/numerics/CMakeFiles/obj_numerics_release.dir/all] Error 2
make: *** [all] Error 2

The line is part of a std::enable_if construct:

template <
typename MatrixType,
typename std::enable_if<has_get_mpi_communicator<MatrixType> &&
has_locally_owned_domain_indices<MatrixType>,
MatrixType>::type * = nullptr>

Seems to be a compiler bug. Any idea how to deal with this?

@marcfehling marcfehling added this to the Release 9.4 milestone May 25, 2022
@marcfehling marcfehling changed the title Compilation fails for Intel18 Compilation fails for Intel 18 May 25, 2022
@marcfehling
Copy link
Member Author

Maybe reverting parts of #13291 and #13320 will allow compilation again.

@bangerth
Copy link
Member

You could try and see whether it makes a difference if you create a local static constexpr variable above that function that holds the result of has_get_mpi_communicator<MatrixType> && has_locally_owned_domain_indices<MatrixType> , and then use that variable in the enable_if.

@marcfehling
Copy link
Member Author

You could try and see whether it makes a difference if you create a local static constexpr variable above that function that holds the result of has_get_mpi_communicator<MatrixType> && has_locally_owned_domain_indices<MatrixType> , and then use that variable in the enable_if.

That does not work, unfortunately. I swapped positions of this and the following function, as the latter doesn't use the && operator. Still the same problem.

In file included from /dealii/include/deal.II/grid/tria_description.h(27),
                 from /dealii/include/deal.II/grid/tria.h(29),
                 from /dealii/include/deal.II/distributed/tria_base.h(28),
                 from /dealii/include/deal.II/dofs/dof_handler.h(29),
                 from /dealii/include/deal.II/dofs/dof_accessor.h(22),
                 from /dealii/source/numerics/data_out.cc(18):
/dealii/include/deal.II/lac/la_parallel_vector.h(1940): internal error: assertion failed at: "shared/cfe/edgcpfe/class_decl.c", line 3591

                  typename std::enable_if<has_initialize_dof_vector<MatrixType>,
                                          ^

compilation aborted for /dealii/source/numerics/data_out.cc (code 4)

I will try to revert the old PRs partially and then add some preprocessor conditional with respect to the Intel compiler version. This way we still have the new, elegant way, and the old, compatible implementation. Plus we can easily drop the legacy code when we decide to no longer support Intel 18 (or rather when the XSDK group decides it is time).

@bangerth
Copy link
Member

I played with this a bit earlier and pretty much instantly got into hot water when trying to replace std::enable_if by std::enable_if_t. I think it is because in essence we are doing this:

// function overload 1
      template <typename MatrixType,
                      typename std::enable_if<cond1>::type * = nullptr>
      void
      foo(MatrixType &                                mat,
            LinearAlgebra::distributed::Vector<Number, MatrixType> &vec,
            bool /*omit_zeroing_entries*/);

// function overload 2
      template <typename MatrixType,
                      typename std::enable_if<cond2, MatrixType>::type * = nullptr>
      void
      foo(MatrixType &                                mat,
            LinearAlgebra::distributed::Vector<Number> &vec,
            bool /*omit_zeroing_entries*/);

These functions have the same signature, and if there are template arguments so that both cond1 and cond2 are both true, then the compiler complained that we end up with a redeclaration of a function, which is not allowed.

Could you try something like

// function overload 1
      template <typename MatrixType>
      void
      foo(MatrixType &                                mat,
            LinearAlgebra::distributed::Vector<Number> &vec,
            bool /*omit_zeroing_entries*/,
            typename std::enable_if<cond1>::type * = nullptr)

instead?

@marcfehling
Copy link
Member Author

marcfehling commented May 27, 2022

These functions have the same signature, and if there are template arguments so that both cond1 and cond2 are both true, then the compiler complained that we end up with a redeclaration of a function, which is not allowed.

I still get the same error if I get rid of the second overload entirely. Moving the enable_if statement into a function parameter also didn't work.

However, getting rid of the static constexpr variables and writing for example

std::enable_if<is_supported_operation<get_mpi_communicator_t, MatrixType>

instead of

std::enable_if<has_get_mpi_communicator<MatrixType>

worked!

Do you prefer getting rid of the static constexpr variables entirely, or shall I write a conditional as described in #13821 (comment)?

I will apply the changes and see if it's going to work for all cases.

@marcfehling
Copy link
Member Author

When compiling the examples, I am now facing linker problems:

../lib/libdeal_II.so.9.4.0-pre: error: undefined reference to 'void dealii::Utilities::MPI::internal::all_reduce<bool>(int const&, dealii::ArrayView<bool const, dealii::MemorySpace::Host> const&, int const&, dealii::ArrayView<bool, dealii::MemorySpace::Host> const&)'

@tjhei
Copy link
Member

tjhei commented May 27, 2022

I am now facing linker problems:

This looks like #13794, which should have been fixed already.

@marcfehling
Copy link
Member Author

marcfehling commented May 27, 2022

This looks like #13794, which should have been fixed already.

Thank you for the hint! #13794 is already part of my feature branch, but it didn't fix the problem. Interestingly, linking succeeds with Intel 19 - it's just Intel 18 that causes trouble.

I will double-check that my configuration is correct.

@marcfehling
Copy link
Member Author

Actually, Intel 19 bails out with a different error message when building in release mode:

          ": internal error: ** The compiler has encountered an unexpected problem.
** Segmentation violation signal raised. **
Access violation or stack overflow. Please contact Intel Support for assistance.

icpc: error #10014: problem during multi-file optimization compilation (code 4)

@marcfehling
Copy link
Member Author

I take everything back. My feature branch did not include #13794.

@bangerth
Copy link
Member

What's left here? Did #13838 and #13841 fix this?

@marcfehling
Copy link
Member Author

I encountered one more (and hopefully the last) linker error with both Intel 18 and Intel 19.

 /opt/apps/gcc/6.3.0/bin/ld.gold: error: ../lib/libdeal_II.g.so.9.4.0-pre: bad symbol name offset 947623309 at 0

But I have no idea how to figure out the cause for this one.

@tjhei
Copy link
Member

tjhei commented May 31, 2022

Can you try without the gold linker? Also try deleting all intermediate files and build again, i think I had something like this before where one of the artifacts is garbage (from an earlier unsuccessful build).

@marcfehling
Copy link
Member Author

Thanks for the hint @tjhei.

It indeed works when disabling gold by adding -D DEAL_II_COMPILER_HAS_FUSE_LD_GOLD=OFF to the cmake configuration, for both Intel 18 and Intel 19.

I guess we can close the issue now.


For completion, these are the errors I get:

  • Intel 18:
/opt/apps/gcc/6.3.0/bin/ld.gold: error: ../lib/libdeal_II.so.9.4.0-pre: bad symbol name offset 1196708948 at 0
  • Intel 19:
          ": internal error: ** The compiler has encountered an unexpected problem.
** Segmentation violation signal raised. **
Access violation or stack overflow. Please contact Intel Support for assistance.

icpc: error #10014: problem during multi-file optimization compilation (code 4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants