Failure when building Trilinos matrices with more than 2B entries #13428

bangerth · 2022-02-21T03:57:46Z

@marcfehling ran into a problem where he has a matrix with 15,110,817 rows and columns, and 2,280,732,000 entries (just barely more than the 2,147,483,647 that form the upper end of what a signed int can represent). This then leads to a segfault as follows:

Caught signal 11 (Segmentation fault: address not mapped to object at address 0x1554af260000)
 4  /expanse/lustre/projects/ucd150/mfehling/bin/aocc/trilinos-13.0.1/lib/libepetra.so.13(Epetra_CrsGraph::OptimizeStorage()+0x4e7) [0x155543905307]
 5  /home/mfehling/project/bin/aocc/dealii-10.0.0-pre/lib/libdeal_II.so.10.0.0-pre(dealii::TrilinosWrappers::SparsityPattern::compress()+0x110) [0x155550e8ec30]
 6  /home/mfehling/project/bin/aocc/dealii-10.0.0-pre/lib/libdeal_II.so.10.0.0-pre(dealii::BlockSparsityPatternBase<dealii::TrilinosWrappers::SparsityPattern>::compress()+0x6d) [0x155550cfb05d]
 7  /expanse/lustre/projects/ucd150/mfehling/build-parallel-matrixfree-hp/stokes(Problem::Stokes<3, Trilinos, 3>::setup_system()+0xe98) [0x43a9f8]
 8  /expanse/lustre/projects/ucd150/mfehling/build-parallel-matrixfree-hp/stokes(Problem::Stokes<3, Trilinos, 3>::run()+0x45f) [0x4390cf]
 9  /expanse/lustre/projects/ucd150/mfehling/build-parallel-matrixfree-hp/stokes(main+0x2a1) [0x33cbd1]
10  /lib64/libc.so.6(__libc_start_main+0xf3) [0x15553f7f5873]
11  /expanse/lustre/projects/ucd150/mfehling/build-parallel-matrixfree-hp/stokes(_start+0x2e) [0x33c86e]

He reports that this happens both with compiling in 32- and 64-bit mode.

We should take a look at whether we call one of Trilinos's 32-bit functions by accident where there is a 64-bit function we should be calling instead.

The text was updated successfully, but these errors were encountered:

marcfehling · 2022-02-21T04:06:39Z

I triggered the segfault with a single MPI process.

The program gets past the setup_system part with more than four processes. I get an unknown exception when running on two processes, which may or may not be related to this segfault. I encountered this issue when generating strong scaling results.

bangerth · 2022-02-21T04:48:47Z

On 2/20/22 21:06, Marc Fehling wrote: The program gets past the |setup_system| part with more than four processes. I get an |unknown exception| when running on two processes, which may or may not be related to this segfault. I encountered this issue when generating strong scaling results.

Trilinos occasionally indicates errors by just throwing bare |int| variables. We tend to not be very graceful with those and report them as |unknown exception|.

bangerth · 2022-02-23T04:56:41Z

The following test fails after we find that Trilinos throws an integer error with code -99:

void
test()
{
  TrilinosWrappers::SparsityPattern sp;
  const unsigned int                N = 10000000; // ten million                                                                                                                                                                                                               
  const unsigned int                entries_per_row =
    220; // so that we just get over the 2.1B limit for MAX_INT                                                                                                                                                                                                                

  sp.reinit(N, N, entries_per_row);
  deallog << "SP::is_compressed(): " << sp.is_compressed() << std::endl;

  for (unsigned int i = 0; i < N; ++i)
    for (unsigned int j = 0; j < entries_per_row; ++j)
      sp.add(i, j);

  deallog << "SP::is_compressed(): " << sp.is_compressed() << std::endl;

  try
    {
      sp.compress();
    }

  catch (const int i)
    {
      deallog << "sp.compress() ended with an integer exception with error code "
              << i
              << std::endl;
      return;
    }

  deallog << "SP::is_compressed(): " << sp.is_compressed() << std::endl;
  deallog << "Number of entries: " << sp.n_nonzero_elements() << std::endl;
  deallog << "Number of rows: " << sp.n_rows() << std::endl;
  deallog << "Number of columns: " << sp.n_cols() << std::endl;
  deallog << "Local size: " << sp.local_size() << std::endl;
  deallog << "Max row length: " << sp.max_entries_per_row() << std::endl;
  deallog << "SP::row_length(0): " << sp.row_length(0) << std::endl;
  deallog << "Bandwidth: " << sp.bandwidth() << std::endl;
  deallog << "SP::empty(): " << sp.empty() << std::endl;

  deallog << "OK" << std::endl;
}



int
main(int argc, char **argv)
{
  initlog();

  Utilities::MPI::MPI_InitFinalize mpi_initialization(
    argc, argv, testing_max_num_threads());

  try
    {
      test();
    }
  catch (std::exception &exc)
    {
      deallog << std::endl
              << std::endl
              << "----------------------------------------------------"
              << std::endl;
      deallog << "Exception on processing: " << std::endl
              << exc.what() << std::endl
              << "Aborting!" << std::endl
              << "----------------------------------------------------"
              << std::endl;

      return 1;
    }
  catch (...)
    {
      deallog << std::endl
              << std::endl
              << "----------------------------------------------------"
              << std::endl;
      deallog << "Unknown exception!" << std::endl
              << "Aborting!" << std::endl
              << "----------------------------------------------------"
              << std::endl;
      return 1;
    };
}

The test requires about 9GB of memory, and runs for around 4 minutes on my main machine.

jpthiele · 2022-02-23T20:36:06Z

I might be completely mistaken, but isn't the local ordinal type of Epetra hard coded to int?
That would explain why it's fine on 4 processes, as all local indices fit into int with the maximum somewhere around 0.52B
instead of the 2.1B local entries on a single process.

Rombur · 2022-02-23T20:42:52Z

I might be completely mistaken, but isn't the local ordinal type of Epetra hard coded to int?

Yes, that's correct. You cannot have more than 2B entries per MPI ranks

bangerth · 2022-02-24T04:32:36Z

Then that's something we should check on our side :-)

jpthiele · 2022-02-24T09:10:48Z

There is already a ToDo in the corresponding case that might be related.

jpthiele · 2022-02-24T09:12:10Z

I'm not sure which point would be the best to add the check/assert, but apart from that it might be a quick addition.
My idea for an AssertThrow ExcMessage would be something like
"The TrilinosWrappers use Epetra internally which uses signed int as local indices.
Therefore, an execution with around 2.1B nonzero matrix entries per core is not possible.
If possible, use more MPI processes."

bangerth · 2022-02-25T19:01:40Z

As pointed out in #13445, the real application doesn't actually go through the paths checked there. Rather, when we call make_sparsity_pattern() on Trilinos sparsity patterns, we treat the sparsity pattern as "dynamic", and add entries via this function:

  template <typename ForwardIterator>
  inline void
  SparsityPattern::add_entries(const size_type row,
                               ForwardIterator begin,
                               ForwardIterator end,
                               const bool /*indices_are_sorted*/)
  {
    if (begin == end)
      return;

    // verify that the size of the data type Trilinos expects matches that the
    // iterator points to. we allow for some slippage between signed and
    // unsigned and only compare that they are both either 32 or 64 bit. to
    // write this test properly, not that we cannot compare the size of
    // '*begin' because 'begin' may be an iterator and '*begin' may be an
    // accessor class. consequently, we need to somehow get an actual value
    // from it which we can by evaluating an expression such as when
    // multiplying the value produced by 2
    Assert(sizeof(TrilinosWrappers::types::int_type) == sizeof((*begin) * 2),
           ExcNotImplemented());

    TrilinosWrappers::types::int_type *col_index_ptr =
      reinterpret_cast<TrilinosWrappers::types::int_type *>(
        const_cast<typename std::decay<decltype(*begin)>::type *>(&*begin));
    // Check at least for the first index that the conversion actually works
    AssertDimension(*col_index_ptr, *begin);
    TrilinosWrappers::types::int_type trilinos_row_index = row;
    const int                         n_cols = static_cast<int>(end - begin);

    int ierr;
    if (row_is_stored_locally(row))
      ierr =
        graph->InsertGlobalIndices(trilinos_row_index, n_cols, col_index_ptr);
    else if (nonlocal_graph.get() != nullptr)
      {
        // this is the case when we have explicitly set the off-processor rows
        // and want to create a separate matrix object for them (to retain
        // thread-safety)
        Assert(nonlocal_graph->RowMap().LID(
                 static_cast<TrilinosWrappers::types::int_type>(row)) != -1,
               ExcMessage("Attempted to write into off-processor matrix row "
                          "that has not be specified as being writable upon "
                          "initialization"));
        ierr = nonlocal_graph->InsertGlobalIndices(trilinos_row_index,
                                                   n_cols,
                                                   col_index_ptr);
      }
    else
      ierr = graph->InsertGlobalIndices(1,
                                        &trilinos_row_index,
                                        n_cols,
                                        col_index_ptr);

    AssertThrow(ierr >= 0, ExcTrilinosError(ierr));
  }

The problem is that I can't easily check here whether we overflow the allowed number of entries in a sparsity pattern because we may be adding entries that are already in the sparsity pattern -- not every entry so added is a new one. What's worse, the error actually only happens at a later time, namely the call to Epetra_CrsGraph::OptimizeStorage(), which I take means that the calls above copy new entries into some sort of cache that is only compressed when calling Epetra_CrsGraph::OptimizeStorage(). So I don't see a good place to check whether we're overflowing int.

What I should probably do is wrap the call of Epetra_CrsGraph::OptimizeStorage() into a try-catch block and ensure that we get a reasonable error message if there is an exception. Does that sound reasonable?

jpthiele · 2022-02-25T20:53:17Z

From what I read in the Epetra_CrsGraph docs and code your assumption about the buffer is correct.
I think there is an error or a relatively easy fix in FillComplete()
When calculating the local index using int LID = Epetra_BlockMap::LID(gid) it checks for an error code -1
Instead it could check (LID < 0) for about the same price and return an error at that point.
Edit: I opened a trilinos issue to ask if there would be any reason to not check this instead

bangerth · 2022-02-25T21:44:23Z

I've got some more explanations in #13452. Fundamentally, there are two Trilinos issues:

The addition of nonzero entries inside Epetra_CrsGraph overflows CrsGraphData_->NumMyNonzeros_, at which point it becomes a negative number.
The Epetra_CrsGraph::OptimizeStorage() function should return error codes as regular return values, but decides to instead use an exception. That's just poor design.

marcfehling · 2022-02-27T01:00:29Z

I agree with @bangerth to merge both patches #13445 and #13452.

Are there other opinions on the topic?

bangerth mentioned this issue Feb 24, 2022

Error out with too large Trilinos sparsity patterns. #13445

Merged

bangerth added this to the Release 10.0 milestone Feb 24, 2022

bangerth mentioned this issue Feb 25, 2022

Provide a second error for Trilinos sparsity pattern problems. #13452

Merged

This was referenced Feb 25, 2022

Epetra: throw error instead of segfault when using more than MAX_INT local indices trilinos/Trilinos#10249

Closed

Epetra_CrsGraph: Overflow issues when trying to add too many nonzeros. trilinos/Trilinos#10251

Closed

kronbichler closed this as completed in #13452 Mar 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure when building Trilinos matrices with more than 2B entries #13428

Failure when building Trilinos matrices with more than 2B entries #13428

bangerth commented Feb 21, 2022 •

edited by marcfehling

marcfehling commented Feb 21, 2022

bangerth commented Feb 21, 2022 via email

bangerth commented Feb 23, 2022

jpthiele commented Feb 23, 2022

Rombur commented Feb 23, 2022

bangerth commented Feb 24, 2022

jpthiele commented Feb 24, 2022

jpthiele commented Feb 24, 2022

bangerth commented Feb 25, 2022

jpthiele commented Feb 25, 2022 •

edited

bangerth commented Feb 25, 2022

marcfehling commented Feb 27, 2022

Failure when building Trilinos matrices with more than 2B entries #13428

Failure when building Trilinos matrices with more than 2B entries #13428

Comments

bangerth commented Feb 21, 2022 • edited by marcfehling

marcfehling commented Feb 21, 2022

bangerth commented Feb 21, 2022 via email

bangerth commented Feb 23, 2022

jpthiele commented Feb 23, 2022

Rombur commented Feb 23, 2022

bangerth commented Feb 24, 2022

jpthiele commented Feb 24, 2022

jpthiele commented Feb 24, 2022

bangerth commented Feb 25, 2022

jpthiele commented Feb 25, 2022 • edited

bangerth commented Feb 25, 2022

marcfehling commented Feb 27, 2022

bangerth commented Feb 21, 2022 •

edited by marcfehling

jpthiele commented Feb 25, 2022 •

edited