Remove parallel::TriangulationBase::compute_n_locally_owned_active_cellls_per_processor #9945

peterrum · 2020-04-23T08:52:55Z

In 5eb3561, we have replaced the method n_locally_owned_active_cells_per_processor() by compute_n_locally_owned_active_cells_per_processor() since we did not want to store the information in the NumberCache anymore (since it is too expensive for large simulations). I think it would be an option to remove the new function since the same effect can be reached in the user code by the following one liner:

Utilities::MPI::all_gather(tr.get_communicator(), tr.n_locally_owned_active_cells());

I am not sure related backwards compatibility. But, I guess the change in 5eb3561 was not backwards compatible either.

kronbichler · 2020-04-24T18:01:13Z

This touches the bigger question of the same function in the DoFHandler:

dealii/include/deal.II/dofs/dof_handler.h

Lines 1057 to 1058 in 2ba7b4d

    
           std::vector<types::global_dof_index> 
        
           compute_n_locally_owned_dofs_per_processor() const;

This was introduced before in #8298, so we have not released that part either and we could replace it as well. The only nuisance is the other function

dealii/include/deal.II/dofs/dof_handler.h

Lines 1034 to 1035 in 2ba7b4d

    
           std::vector<IndexSet> 
        
           compute_locally_owned_dofs_per_processor() const;

where we get an IndexSet and I am not sure whether all_gather works. In case it does, we should remove this lengthy implementation

dealii/source/dofs/number_cache.cc

Lines 121 to 229 in 2ba7b4d

    
               std::vector<IndexSet> 
        
               NumberCache::get_locally_owned_dofs_per_processor( 
        
                 const MPI_Comm mpi_communicator) const 
        
               { 
        
                 AssertDimension(locally_owned_dofs.size(), n_global_dofs); 
        
                 const unsigned int n_procs = 
        
                   Utilities::MPI::job_supports_mpi() ? 
        
                     Utilities::MPI::n_mpi_processes(mpi_communicator) : 
        
                     1; 
        
                 if (n_global_dofs == 0) 
        
                   return std::vector<IndexSet>(); 
        
                 else if (locally_owned_dofs_per_processor.empty() == false) 
        
                   { 
        
                     AssertDimension(locally_owned_dofs_per_processor.size(), n_procs); 
        
                     return locally_owned_dofs_per_processor; 
        
                   } 
        
                 else 
        
                   { 
        
                     std::vector<IndexSet> locally_owned_dofs_per_processor( 
        
                       n_procs, locally_owned_dofs); 
        
           #ifdef DEAL_II_WITH_MPI 
        
                     if (n_procs > 1) 
        
                       { 
        
                         // this step is substantially more complicated because indices 
        
                         // might be distributed arbitrarily among the processors. Here we 
        
                         // have to serialize the IndexSet objects and shop them across the 
        
                         // network. 
        
                         std::vector<char> my_data; 
        
                         { 
        
           #  ifdef DEAL_II_WITH_ZLIB 
        
                           boost::iostreams::filtering_ostream out; 
        
                           out.push(boost::iostreams::gzip_compressor( 
        
                             boost::iostreams::gzip_params( 
        
                               boost::iostreams::gzip::best_speed))); 
        
                           out.push(boost::iostreams::back_inserter(my_data)); 
        
                           boost::archive::binary_oarchive archive(out); 
        
                           archive << locally_owned_dofs; 
        
                           out.flush(); 
        
           #  else 
        
                           std::ostringstream              out; 
        
                           boost::archive::binary_oarchive archive(out); 
        
                           archive << locally_owned_dofs; 
        
                           const std::string &s = out.str(); 
        
                           my_data.reserve(s.size()); 
        
                           my_data.assign(s.begin(), s.end()); 
        
           #  endif 
        
                         } 
        
                         // determine maximum size of IndexSet 
        
                         const unsigned int max_size = 
        
                           Utilities::MPI::max(my_data.size(), mpi_communicator); 
        
                         // as the MPI_Allgather call will be reading max_size elements, 
        
                         // and as this may be past the end of my_data, we need to increase 
        
                         // the size of the local buffer. This is filled with zeros. 
        
                         my_data.resize(max_size); 
        
                         std::vector<char> buffer(max_size * n_procs); 
        
                         const int         ierr = MPI_Allgather(my_data.data(), 
        
                                                        max_size, 
        
                                                        MPI_BYTE, 
        
                                                        buffer.data(), 
        
                                                        max_size, 
        
                                                        MPI_BYTE, 
        
                                                        mpi_communicator); 
        
                         AssertThrowMPI(ierr); 
        
                         for (unsigned int i = 0; i < n_procs; ++i) 
        
                           if (i == Utilities::MPI::this_mpi_process(mpi_communicator)) 
        
                             locally_owned_dofs_per_processor[i] = locally_owned_dofs; 
        
                           else 
        
                             { 
        
                               // copy the data previously received into a stringstream 
        
                               // object and then read the IndexSet from it 
        
                               std::string decompressed_buffer; 
        
                               // first decompress the buffer 
        
                               { 
        
           #  ifdef DEAL_II_WITH_ZLIB 
        
                                 boost::iostreams::filtering_ostream decompressing_stream; 
        
                                 decompressing_stream.push( 
        
                                   boost::iostreams::gzip_decompressor()); 
        
                                 decompressing_stream.push( 
        
                                   boost::iostreams::back_inserter(decompressed_buffer)); 
        
                                 decompressing_stream.write(&buffer[i * max_size], 
        
                                                            max_size); 
        
           #  else 
        
                                 decompressed_buffer.assign(&buffer[i * max_size], 
        
                                                            max_size); 
        
           #  endif 
        
                               } 
        
                               // then restore the object from the buffer 
        
                               std::istringstream              in(decompressed_buffer); 
        
                               boost::archive::binary_iarchive archive(in); 
        
                               archive >> locally_owned_dofs_per_processor[i]; 
        
                             } 
        
                       } 
        
           #endif 
        
                     return locally_owned_dofs_per_processor; 
        
                   } 
        
               }

and just kick out the functions altogether.

peterrum · 2020-04-25T07:25:01Z

@kronbichler After a quick look at the implementation of Utilities::MPI::all_gather(), I am quite sure it should work also for IndexSet: it is working via Boost's serialization and can handle variable size quantities.

See:

dealii/include/deal.II/base/mpi.h

Lines 1199 to 1253 in fc54b91

    
               template <typename T> 
        
               std::vector<T> 
        
               all_gather(const MPI_Comm &comm, const T &object) 
        
               { 
        
           #  ifndef DEAL_II_WITH_MPI 
        
                 (void)comm; 
        
                 std::vector<T> v(1, object); 
        
                 return v; 
        
           #  else 
        
                 const auto n_procs = dealii::Utilities::MPI::n_mpi_processes(comm); 
        
                 std::vector<char> buffer = Utilities::pack(object); 
        
                 int n_local_data = buffer.size(); 
        
                 // Vector to store the size of loc_data_array for every process 
        
                 std::vector<int> size_all_data(n_procs, 0); 
        
                 // Exchanging the size of each buffer 
        
                 MPI_Allgather( 
        
                   &n_local_data, 1, MPI_INT, size_all_data.data(), 1, MPI_INT, comm); 
        
                 // Now computing the displacement, relative to recvbuf, 
        
                 // at which to store the incoming buffer 
        
                 std::vector<int> rdispls(n_procs); 
        
                 rdispls[0] = 0; 
        
                 for (unsigned int i = 1; i < n_procs; ++i) 
        
                   rdispls[i] = rdispls[i - 1] + size_all_data[i - 1]; 
        
                 // Step 3: exchange the buffer: 
        
                 std::vector<char> received_unrolled_buffer(rdispls.back() + 
        
                                                            size_all_data.back()); 
        
                 MPI_Allgatherv(buffer.data(), 
        
                                n_local_data, 
        
                                MPI_CHAR, 
        
                                received_unrolled_buffer.data(), 
        
                                size_all_data.data(), 
        
                                rdispls.data(), 
        
                                MPI_CHAR, 
        
                                comm); 
        
                 std::vector<T> received_objects(n_procs); 
        
                 for (unsigned int i = 0; i < n_procs; ++i) 
        
                   { 
        
                     std::vector<char> local_buffer(received_unrolled_buffer.begin() + 
        
                                                      rdispls[i], 
        
                                                    received_unrolled_buffer.begin() + 
        
                                                      rdispls[i] + size_all_data[i]); 
        
                     received_objects[i] = Utilities::unpack<T>(local_buffer); 
        
                   } 
        
                 return received_objects; 
        
           #  endif 
        
               }

peterrum · 2020-04-25T07:26:36Z

Should I include compute_n_locally_owned_dofs_per_processor() and compute_locally_owned_dofs_per_processor() in this PR as well?

kronbichler · 2020-04-25T10:08:24Z

They are all of the same kind, so I would say we either remove all or we remove none. I would vote fore removing them all.

peterrum · 2020-04-25T13:27:27Z

@kronbichler Do you think like this c41b635? Currently 57 tests fail (these use once of the compute_-functions). Before I tackle these and modify the hp::DoFHandler in the same way, I would like to know if this is what you had in mind?

kronbichler

This looks almost exactly like what I had in mind, except that I would purge the functionality from NumberCache and simply let the DoFHandler compute that info? The function is deprecated and I think it would help later clean-up if there is no other class/struct involved any more.

kronbichler · 2020-04-25T13:40:55Z

include/deal.II/dofs/dof_handler.h

  if (number_cache.n_locally_owned_dofs_per_processor.empty() &&
      number_cache.n_global_dofs > 0)
    {
+      MPI_Comm comm;
+
+      const parallel::TriangulationBase<dim, spacedim> *tr =
+        (dynamic_cast<const parallel::TriangulationBase<dim, spacedim> *>(
+          &this->get_triangulation()));
+      if (tr != nullptr)
+        comm = tr->get_communicator();
+      else
+        comm = MPI_COMM_SELF;
+
      const_cast<dealii::internal::DoFHandlerImplementation::NumberCache &>(
        number_cache)
        .n_locally_owned_dofs_per_processor =
-        compute_n_locally_owned_dofs_per_processor();
+        number_cache.get_n_locally_owned_dofs_per_processor(comm);
    }
  return number_cache.n_locally_owned_dofs_per_processor;


For this implementation, can't we bypass this code and simply return Utilities::MPI::all_gather(...)?

kronbichler · 2020-04-25T13:42:19Z

examples/step-55/step-55.cc

+        Utilities::MPI::all_gather(mpi_communicator,
+                                   dof_handler.locally_owned_dofs()),


This is unrelated to this patch, but I believe we should replace this argument by simply dof_handler.locally_owned_dofs() - I must have forgotten those in #9710.

kronbichler · 2020-04-25T13:42:36Z

include/deal.II/dofs/dof_handler.h

      const_cast<dealii::internal::DoFHandlerImplementation::NumberCache &>(
        number_cache)
        .locally_owned_dofs_per_processor =
-        compute_locally_owned_dofs_per_processor();
+        number_cache.get_locally_owned_dofs_per_processor(comm);


kronbichler · 2020-04-25T13:42:45Z

include/deal.II/dofs/dof_handler.h

      const_cast<dealii::internal::DoFHandlerImplementation::NumberCache &>(
        mg_number_cache[level])
        .locally_owned_dofs_per_processor =
-        compute_locally_owned_mg_dofs_per_processor(level);
+        mg_number_cache[level].get_locally_owned_dofs_per_processor(comm);
    }
  return mg_number_cache[level].locally_owned_dofs_per_processor;


kronbichler · 2020-04-25T13:43:29Z

source/dofs/number_cache.cc

-#endif
-          return locally_owned_dofs_per_processor;
+          return Utilities::MPI::all_gather(mpi_communicator,
+                                            locally_owned_dofs);


here you use this function, but I think we can remove the function altogether - this class is mostly internal anyway, I think.

peterrum · 2020-04-25T13:55:01Z

@kronbichler Thanks for the fast feedback.

For this implementation, can't we bypass this code and simply return Utilities::MPI::all_gather(...)?

Remember back why this functions caused us headaches half a year ago: reason 1) memory consumption scales with the number processes and reason 2) the function does not return a vector but a reference to a vector. Latter was the reason why we fill the vector only when needed.

I would remove the vector once the deprecated functions are deleted. I.e in two weeks?

kronbichler · 2020-04-25T13:59:03Z

Sure, the vector can only be removed once we have eliminated the deprecated function. But the number cache can be cleaned up already in terms of the unnecessary function - but I agree on a second sight that it is probably no big difference since the vector is in NumberCache.

masterleinad · 2020-04-27T15:16:00Z

/home/runner/work/dealii/dealii/source/dofs/number_cache.cc:90:26: error: unused variable ‘n_procs’ [-Werror=unused-variable]
       const unsigned int n_procs =
                          ^~~~~~~

…lls_per_processor

kronbichler reviewed Apr 25, 2020

View reviewed changes

peterrum added this to the Release 9.2 milestone Apr 27, 2020

peterrum force-pushed the compute_n_locally_owned_active_cells_per_processor_remove branch from c41b635 to 14c3053 Compare April 27, 2020 22:28

peterrum added the ready to test label Apr 27, 2020

Remove parallel::TriangulationBase::compute_n_locally_owned_active_ce…

78bdbb1

…lls_per_processor

peterrum force-pushed the compute_n_locally_owned_active_cells_per_processor_remove branch from 14c3053 to 78bdbb1 Compare April 28, 2020 05:44

kronbichler approved these changes May 1, 2020

View reviewed changes

kronbichler merged commit e1bee5c into dealii:master May 1, 2020

kronbichler mentioned this pull request May 2, 2020

Fix step-40 test: Output n_dofs rather than IndexSet #10011

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove parallel::TriangulationBase::compute_n_locally_owned_active_cellls_per_processor #9945

Remove parallel::TriangulationBase::compute_n_locally_owned_active_cellls_per_processor #9945

peterrum commented Apr 23, 2020

kronbichler commented Apr 24, 2020

peterrum commented Apr 25, 2020

peterrum commented Apr 25, 2020 •

edited

kronbichler commented Apr 25, 2020 •

edited

peterrum commented Apr 25, 2020

kronbichler left a comment

kronbichler Apr 25, 2020

kronbichler Apr 25, 2020

kronbichler Apr 25, 2020

kronbichler Apr 25, 2020

kronbichler Apr 25, 2020

peterrum commented Apr 25, 2020

kronbichler commented Apr 25, 2020

masterleinad commented Apr 27, 2020

		Utilities::MPI::all_gather(mpi_communicator,
		dof_handler.locally_owned_dofs()),

Remove parallel::TriangulationBase::compute_n_locally_owned_active_cellls_per_processor #9945

Remove parallel::TriangulationBase::compute_n_locally_owned_active_cellls_per_processor #9945

Conversation

peterrum commented Apr 23, 2020

kronbichler commented Apr 24, 2020

peterrum commented Apr 25, 2020

peterrum commented Apr 25, 2020 • edited

kronbichler commented Apr 25, 2020 • edited

peterrum commented Apr 25, 2020

kronbichler left a comment

Choose a reason for hiding this comment

kronbichler Apr 25, 2020

Choose a reason for hiding this comment

kronbichler Apr 25, 2020

Choose a reason for hiding this comment

kronbichler Apr 25, 2020

Choose a reason for hiding this comment

kronbichler Apr 25, 2020

Choose a reason for hiding this comment

kronbichler Apr 25, 2020

Choose a reason for hiding this comment

peterrum commented Apr 25, 2020

kronbichler commented Apr 25, 2020

masterleinad commented Apr 27, 2020

peterrum commented Apr 25, 2020 •

edited

kronbichler commented Apr 25, 2020 •

edited