Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use MPI groups to parallelize data assimilation #252

Merged
merged 3 commits into from
Jan 24, 2024

Conversation

Rombur
Copy link
Member

@Rombur Rombur commented Dec 22, 2023

With this PR, we can finally run data assimilation in parallel. We can use up to as many processors as there are ensemble members. The way this is done is by splitting the global MPI communicator (we use MPI_COMM_WORLD) in multiple local communicators and then using these local communicators where needed. A lot of the PR is just about choosing the right communicator. The data assimilation itself is still done in serial on rank 0. This means that all the vectors need to be move to the rank 0 and out of it after they have been updated. There are two difficulties with this operation:

  1. there is no facility in deal.II to move an entire vector to a single processor and so we have to ship the data and rebuild the vector by hand
  2. the algorithm is more complicated than it should be because I wrote it to support more processors than there are ensemble members but it is not working right now. I think we should keep that extra code because it's still a step in the right direction and we can come back to the issue later.

Comment on lines 19 to 20
* normal distribution of average @p mean and standard deviation @stddev. The
* first @p n_rejected_draws are rejected.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a minute to realize the rationale for throwing away a predetermined number of draws. The idea is to get consistent fills of the vectors regardless of the number of global ranks, right? That's niche enough that a comment would help. Originally I was trying to sort out if throwing away initial draws gets higher quality pseudorandom numbers from the PRNG.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right the idea is that it shouldn't matter if you use one MPI rank or ten, the ensemble members should use the same random numbers. If we don't reject the first few draws. all the processors will use the same "random" numbers since they have the same seed.

I'll add a comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right -- even worse than I was thinking. Thanks for adding the comment.

Comment on lines +95 to +104
// We need to gather the augmented_state_ensemble from the other processors.
// BlockVector is a complex structure with its own communicator and so we
// cannot simple use dealii's gather to perform the communication. Instead, we
// extract the locally owned data and gather it to processor zero. We do this
// in a two step process. First, we move the data to local rank zero. Second,
// we move the data to global rank zero. The first step allows to simply move
// complete vectors to global rank zero. Otherwise, we have the vector data
// divided in multiple chunks when gathered on global rank zero and we have to
// reconstruct the vector. Finally we can build new BlockVector using the
// local communicator.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naively this seems like something that could be in it's own function (either still as a member of DataAssimilator or not). What is your reasoning for putting it directly in this function? That we won't be keeping it forever since eventually we'd like to do distributed DA? That this is the only place we expect to use/need this functionality? Something else?

I'm not suggesting that it has to be moved, I just want to better understand the design.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking for putting everything in the function was that during the gather step, I will need some data to know which processor owns which elements and I can reuse this data during the scatter phase. I wrote this code a while back so I don't know if it's still true. I am fine moving this code to two functions if you think it improves the code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can move that code out pretty cleanly and the recalculation isn't going to be really expensive, then I think moving it out would improve the code. If one of those isn't true, a comment explaining it would be ok.

@stvdwtt stvdwtt merged commit 0f68aba into adamantine-sim:master Jan 24, 2024
1 check passed
@Rombur Rombur deleted the mpi_groups branch January 24, 2024 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants