Add parallel support to TpetraWrappers::Vector. #16290

kinnewig · 2023-11-24T13:57:27Z

When I tried to use it in parallel, I discovered several bugs in the TpetraWrappers::Vector class. I bundled all fixes into this one pull request.

There are some changes I would like to highlight so that we can discuss them:

`operator=`

I have several issues with the original implementation:

The existing implementation of the operator= does not work in parallel (see issue Regression tester regressed [baseline] #16286).
It does not copy the complete content of the vector, which is not the behavior I would expect. (Here, I should highlight that the original implementation behaves similarly to the Epetra implementation. But still, this differs from what I would expect when I call the operator=.)
There is a separation by cases, which isn't necessary. In one case, it calls the Tpetra::import() function without checking that the conditions are met to call that function. Moreover, import() requires communication between the ranks. So, I don't know if this is faster.

So, I propose making a deep copy of the right-hand side.

`compress()`/`decompress()`

So far, the compress() function has not been implemented. But for parallel usage, we need a way to communicate changes from one rank to another rank. But in contrast to the Epetra::Vector, the Tpetra::Vector can not write into off-processor entries and then communicate that later. The way to overcome that problem, I came up with is to introduce two maps:

owned_elements: containing a non-overlapping map of all elements of the vector. So, each rank can only access the values that belong to that rank.
relevant_elements: an overlapping map of all elements so each rank can access all elements that belong to that rank, as well as all necessary ghost elements.

We call a vector compressed if and only if the underlying map of the Tpetra::Vector is non-overlapping.
When we want to access a ghost element (that only lives in the relevant_elements map), we must first call decompress() (in case the vector is compressed).
And once we are done, we have to call compress() to communicate the changes between all ranks.

decompress() uses import() to distribute the internal TpetraVector from the owned_elements map to the relevant_elements map.

compress() uses export() to distribute the internal Tpetra::Vector from the relevant_elements map to the owned_elements map.

`print()`

I reworked the print() function to match the output from the Epetra-based TrilinosWrappers::MPI::Vector. This should make reusing tests from the Epetra-based vector for the Tpetra-based vector easier.
Since I made some changes to the print() function, I had to adapt the expected output from some already existing tests for the Tpetra-based vector.

Edit:
In the original draft of this pull request, I included an additional template parameter for Kokkos::Nodes.
I removed that template parameter from this pull request and will create later an individual pull request to add that additional template parameter.

bangerth

I've left a few comments, but it's a holiday long weekend here so I'm not going to work too hard today :-)

I think that the operator= thing might require more thought. But if you want to break the changes that add a Node argument to the class out into a separate patch, that is pretty uncontroversial and could be merged separately.

include/deal.II/lac/trilinos_tpetra_vector.h

bangerth · 2023-12-21T22:33:09Z

Let me know when you want me to take another look at this patch!

kinnewig · 2023-12-22T10:53:10Z

Sorry for the delay; I was very busy the last few weeks, and on top of that, I was on vacation in between.
But I just continued working on this pull request yesterday.

First, Thank you, @bangerth, for the feedback, especially on the copy assignment operator.

Based on your feedback, I took another look into that. I restored the original behavior of the copy assignment operator, but now it also works in parallel. But I added a new assert to address at least my main concern.

This makes the decompress() function redundant; therefore, I removed it.

Moreover, I adopted a few assert messages from the (Epetra-based) Trilinos Vector class.

So, I would be happy if someone could take another look into the reworked changes.

include/deal.II/lac/trilinos_tpetra_vector.h

bangerth · 2024-01-02T22:35:45Z

include/deal.II/lac/trilinos_tpetra_vector.h

      /**
       * IndexSet of the elements of the last imported vector.
       */
-      ::dealii::IndexSet source_stored_elements;
+      dealii::IndexSet source_stored_elements;


This is pre-existing, but I'm not fond of this variable name -- perhaps that can be addressed in a separate PR if you were willing?

Conceptually, objects live in the present. Perhaps they have been initialized, but they typically do not retain how they have been initialized. If they need to cache certain information, as I think you are doing here, maybe a better name would be to use last_ as a prefix instead of source_. But I may also be missing what it is you are doing here.

I will open a separate PR for this later.

include/deal.II/lac/trilinos_tpetra_vector.templates.h

bangerth · 2024-01-02T23:12:58Z

include/deal.II/lac/trilinos_tpetra_vector.templates.h

+    Vector<Number>::compress(const VectorOperation::values operation,
+                             const IndexSet               &locally_owned)
+    {
+      if (!compressed)


I made calling compress() a crime on vectors with ghost elements in #16378 Perhaps you can add the same assertion here, and then you don't need the if (!compressed).

It seems like we have a clash of notation here:

Originally, I called a vector in this class compressed if and only if the vector can only read and write its locally owned elements. Only a non-compressed vector can read and write elements from other ranks.

As discussed above, I changed compressed to has_ghost_elements (where has_ghost_elements has the opposite meaning). And now it only makes sense to call compress() on vectors where has_ghost_elements == true...

No, it's the other way around. Vectors with has_ghost_values == true are read-only. It does not make sense to call compress() on them.

In a nutshell: It does not hold true that:

compressed = (!has_ghost_elements)

I compared the vector class against the Epetra-based vector class. Both compressed and has_ghost_elements are used there. They have similar meanings, but it still does not work to use compressed = (!has_ghost_elements). (If you are interested, I can explain which conflict occurs).

Edit:
Writing it down helped me to gain a better understanding of this problem.

I now understand when I have to say if a vector has ghost elements or not for everything to work.
I just had to adapt two points in trilinos_tpetra_vector.templates.h.

Sorry for the confusion.

We looked into this together and found that the capabilities of the Epetra FEVector are no longer there.

So what we can no longer provide is a reinit(local_set,MPI_Comm,omit_zeroing_entries) as we need the nonlocal_vector to handle all incoming data as SumIntoGlobalValues will only work on locally owned global indices.

What we can do instead is actually use the vector_writable flag to construct a locally owned vector
as well as a nonlocal vector that stores all off-processor entries and communicates them during compress.

Sebastian is working on implementing this.
The question would now be what to do with the three argument reinit above.
We see three options

Just don't implement it (could make porting code harder to understand)

Implement it and throw an assertion (leads to completely undefined behaviour in release mode)

Implement it and throw an exception

I'm tending towards the last option. In the two 'implement' cases I would suggest the following text.

Due to differences between Tpetra and Epetra this capability is no longer supported. To construct a locally owned vector use reinit(locally_owned,locally_relevant,mpi_communicator,/*writable_vector*/ true)

@kronbichler, thank you for the detailed feedback. And the FEVector history is a very interesting side mark!

I just pushed my latest changes, where I added the nonlocal_vector. The TpetraWrappers::Vector<Number> current state works quite similarly to the old Epetra-based vector class.

But at this point, it would be pretty easy to make the new Tpetra-based vector class more similar to the LinearAlgebra::distributed::Vector and use vector_is_ghosted in the same way as it is used in LA::d::Vector. This is mostly a design decision, therefore one of the principal developers has to say how we want to continue at this point.

I am obviously a bit biased and would feel comfortable with the LA::d::Vector setup, but I do also think other options could work. I think we should have a common understanding here, so additional opinions would be appreciated.

This is a long-standing disagreement between @kronbichler and me that we've had a friendly exchange about for probably the better part of a decade :-)

My perspective is that it is useful to have a system in place that minimizes mistakes. Creating a vector so that it knows whether it can or cannot be written into is one such way, and it enables us to have assertions that check these semantics; it also avoids users to have to remember when they need to update ghost values. @kronbichler's perspective -- equally with merit -- is that we can come up with a system that does not require users to keep around two kinds of vectors, and that is perhaps more performant.

I recognize that it is unfair to you to have to choose between the positions of two long-time developers of the library. My personal preference would be if we could get the semantics of the Tpetra wrappers as close as possible to those of the Epetra wrappers, principally because we would like to replace the latter by the former with the minimal amount of changes necessary -- ideally, our users will never notice that the underlying technology has changed. I recognize that @kronbichler has good reasons to argue for switching to his preferred approach as well.

All right,

I just pushed one last change, removing the now-unused locally_(owned|relevant)_map.
Now everything is ready from my side to be merged.

I am also fine with having a minimal interface change between Epetra and Tpetra, and have the vector take either state (ghosted or non-ghosted) at the point it is constructed/initialized. I think the underlying mechanism with nonlocal_vector and manual import/export is uncontroversial, so the main mechanism is in place should we ever want to re-work the vector interfaces in a direction where a deep copy of the data would not be necessary to go from the read-only/ghosted state to the writable/non-ghosted state. I think there is consensus between @bangerth and me that starting to write into a ghosted vector leads to all kinds of problems because data can and will get inconsistent, and so one should help the user avoid this situation. The other things are not parts where I feel strongly.

include/deal.II/lac/trilinos_tpetra_vector.templates.h

bangerth · 2024-01-02T23:18:52Z

include/deal.II/lac/trilinos_tpetra_vector.templates.h

+          VectorType dummy = VectorType(locally_owned_map.getConst());
+
+          Teuchos::RCP<const ExportType> exporter =
+            Tpetra::createExport(locally_relevant_map.getConst(),
+                                 locally_owned_map.getConst());
+          dummy.doExport(*vector, *exporter, tpetra_operation);
+
+          *vector    = dummy;


This is quite an expensive operation for compress(). Why is this necessary to make things work, given that that other vector types do not need to create such a temporary vector?

It is not necessary to use a dummy vector here.

I learned this after I wrote that part of the class. I removed that dummy vector.

Again, thanks for pointing this out!

I should have tested this before I answered; it seems like we need that dummy vector here.

It is not necessary in the Epetra-based vector class since a second vector, i.e., the nonlocal_vector, is already defined.

bangerth · 2024-01-09T22:03:36Z

Like for the other patch, once you're happy with it, squash into a self-contained set of commits and let me know!

bangerth · 2024-01-10T20:07:41Z

FYI, the 'indent' CI failure is this one:

/home/runner/work/dealii/dealii/include/deal.II/lac/trilinos_tpetra_vector.h:700: warning: argument 'operator' of command @param is not found in the argument list of LinearAlgebra::TpetraWrappers::Vector< Number >::compress(const VectorOperation::values operation)
/home/runner/work/dealii/dealii/include/deal.II/lac/trilinos_tpetra_vector.h:700: warning: The following parameter of LinearAlgebra::TpetraWrappers::Vector::compress(const VectorOperation::values operation) is not documented:
parameter 'operation'

kronbichler

I did not check all details, but looking through the code is a pleasure. I'm looking forward to have all functionality moved to Tpetra.

kinnewig · 2024-01-18T08:29:29Z

Thank you for all the nice feedback!

From my side, the pull request can now be merged.

kronbichler · 2024-01-18T09:27:24Z

All comments were addressed. Let me merge this now since we do not seem to get new comments; we can always make follow-up PRs with additional cleanups if we observe them. Thank you @kinnewig for the work!

bangerth reviewed Nov 24, 2023

View reviewed changes

bangerth mentioned this pull request Dec 21, 2023

Prohibit calling compress() on vectors with ghost elements. #16378

Merged

bangerth reviewed Jan 2, 2024

View reviewed changes

bangerth added the ready to test label Jan 10, 2024

Add parallel support to TpetraWrappers::Vector.

f7d9a35

kinnewig force-pushed the tpetra_vector_parallel branch 2 times, most recently from 76dfcea to 93825f2 Compare January 12, 2024 14:08

add nonlocal_vector

729542d

kinnewig force-pushed the tpetra_vector_parallel branch from 93825f2 to 729542d Compare January 14, 2024 16:25

kronbichler approved these changes Jan 15, 2024

View reviewed changes

kronbichler added Linear Algebra External libraries labels Jan 15, 2024

kronbichler merged commit 08160c0 into dealii:master Jan 18, 2024
15 checks passed

tamiko mentioned this pull request Jan 18, 2024

Regression tester regressed 9c9a7d #16491

Closed

kinnewig mentioned this pull request Jan 26, 2024

Simplify locally_owned_elements in TpetraWrappers::Vector. #16547

Merged

kinnewig deleted the tpetra_vector_parallel branch February 18, 2024 16:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parallel support to TpetraWrappers::Vector. #16290

Add parallel support to TpetraWrappers::Vector. #16290

kinnewig commented Nov 24, 2023 •

edited

bangerth left a comment

bangerth commented Dec 21, 2023

kinnewig commented Dec 22, 2023

bangerth Jan 2, 2024

kinnewig Jan 4, 2024 •

edited

bangerth Jan 2, 2024

kinnewig Jan 10, 2024 •

edited

bangerth Jan 10, 2024

kinnewig Jan 11, 2024 •

edited

jpthiele Jan 11, 2024

kinnewig Jan 12, 2024

kronbichler Jan 12, 2024

bangerth Jan 14, 2024

kinnewig Jan 14, 2024

kronbichler Jan 15, 2024

bangerth Jan 2, 2024

kinnewig Jan 10, 2024

kinnewig Jan 11, 2024

bangerth commented Jan 9, 2024

bangerth commented Jan 10, 2024

kronbichler left a comment

kinnewig commented Jan 18, 2024

kronbichler commented Jan 18, 2024

Add parallel support to TpetraWrappers::Vector. #16290

Add parallel support to TpetraWrappers::Vector. #16290

Conversation

kinnewig commented Nov 24, 2023 • edited

operator=

compress()/decompress()

print()

bangerth left a comment

Choose a reason for hiding this comment

bangerth commented Dec 21, 2023

kinnewig commented Dec 22, 2023

Choose a reason for hiding this comment

kinnewig Jan 4, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kinnewig Jan 10, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kinnewig Jan 11, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bangerth commented Jan 9, 2024

bangerth commented Jan 10, 2024

kronbichler left a comment

Choose a reason for hiding this comment

kinnewig commented Jan 18, 2024

kronbichler commented Jan 18, 2024

kinnewig commented Nov 24, 2023 •

edited

`operator=`

`compress()`/`decompress()`

`print()`

kinnewig Jan 4, 2024 •

edited

kinnewig Jan 10, 2024 •

edited

kinnewig Jan 11, 2024 •

edited