Use MPI groups to parallelize data assimilation #252

Rombur · 2023-12-22T16:55:53Z

With this PR, we can finally run data assimilation in parallel. We can use up to as many processors as there are ensemble members. The way this is done is by splitting the global MPI communicator (we use MPI_COMM_WORLD) in multiple local communicators and then using these local communicators where needed. A lot of the PR is just about choosing the right communicator. The data assimilation itself is still done in serial on rank 0. This means that all the vectors need to be move to the rank 0 and out of it after they have been updated. There are two difficulties with this operation:

there is no facility in deal.II to move an entire vector to a single processor and so we have to ship the data and rebuild the vector by hand
the algorithm is more complicated than it should be because I wrote it to support more processors than there are ensemble members but it is not working right now. I think we should keep that extra code because it's still a step in the right direction and we can come back to the issue later.

stvdwtt · 2024-01-19T02:11:06Z

source/ensemble_management.hh

+ * normal distribution of average @p mean and standard deviation @stddev. The
+ * first @p n_rejected_draws are rejected.


It took me a minute to realize the rationale for throwing away a predetermined number of draws. The idea is to get consistent fills of the vectors regardless of the number of global ranks, right? That's niche enough that a comment would help. Originally I was trying to sort out if throwing away initial draws gets higher quality pseudorandom numbers from the PRNG.

That's right the idea is that it shouldn't matter if you use one MPI rank or ten, the ensemble members should use the same random numbers. If we don't reject the first few draws. all the processors will use the same "random" numbers since they have the same seed.

I'll add a comment.

Oh right -- even worse than I was thinking. Thanks for adding the comment.

stvdwtt · 2024-01-19T03:05:16Z

source/DataAssimilator.cc

+  // We need to gather the augmented_state_ensemble from the other processors.
+  // BlockVector is a complex structure with its own communicator and so we
+  // cannot simple use dealii's gather to perform the communication. Instead, we
+  // extract the locally owned data and gather it to processor zero. We do this
+  // in a two step process. First, we move the data to local rank zero. Second,
+  // we move the data to global rank zero. The first step allows to simply move
+  // complete vectors to global rank zero. Otherwise, we have the vector data
+  // divided in multiple chunks when gathered on global rank zero and we have to
+  // reconstruct the vector. Finally we can build new BlockVector using the
+  // local communicator.


Naively this seems like something that could be in it's own function (either still as a member of DataAssimilator or not). What is your reasoning for putting it directly in this function? That we won't be keeping it forever since eventually we'd like to do distributed DA? That this is the only place we expect to use/need this functionality? Something else?

I'm not suggesting that it has to be moved, I just want to better understand the design.

My thinking for putting everything in the function was that during the gather step, I will need some data to know which processor owns which elements and I can reuse this data during the scatter phase. I wrote this code a while back so I don't know if it's still true. I am fine moving this code to two functions if you think it improves the code.

If you can move that code out pretty cleanly and the recalculation isn't going to be really expensive, then I think moving it out would improve the code. If one of those isn't true, a comment explaining it would be ok.

Use MPI groups to parallelize data assimilation

5a6083b

stvdwtt reviewed Jan 19, 2024

View reviewed changes

Rombur added 2 commits January 19, 2024 19:41

Add comments

c0ce7da

Move gather/scatter code to their own functions

77edb6d

stvdwtt approved these changes Jan 23, 2024

View reviewed changes

stvdwtt merged commit 0f68aba into adamantine-sim:master Jan 24, 2024
1 check passed

Rombur deleted the mpi_groups branch January 24, 2024 21:49

stvdwtt mentioned this pull request Apr 16, 2024

Independent MPI communicators for ensemble members #196

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use MPI groups to parallelize data assimilation #252

Use MPI groups to parallelize data assimilation #252

Rombur commented Dec 22, 2023

stvdwtt Jan 19, 2024

Rombur Jan 19, 2024

stvdwtt Jan 19, 2024

stvdwtt Jan 19, 2024

Rombur Jan 19, 2024

stvdwtt Jan 19, 2024

		* normal distribution of average @p mean and standard deviation @stddev. The
		* first @p n_rejected_draws are rejected.

Use MPI groups to parallelize data assimilation #252

Use MPI groups to parallelize data assimilation #252

Conversation

Rombur commented Dec 22, 2023

stvdwtt Jan 19, 2024

Choose a reason for hiding this comment

Rombur Jan 19, 2024

Choose a reason for hiding this comment

stvdwtt Jan 19, 2024

Choose a reason for hiding this comment

stvdwtt Jan 19, 2024

Choose a reason for hiding this comment

Rombur Jan 19, 2024

Choose a reason for hiding this comment

stvdwtt Jan 19, 2024

Choose a reason for hiding this comment