Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug fix: ensure that all processes have the same npert and pert_select #642

Merged
merged 5 commits into from
Apr 20, 2023

Conversation

weiyuan-jiang
Copy link
Contributor

@weiyuan-jiang weiyuan-jiang commented Apr 13, 2023

This PR fixes a bug in perturbation simulations that occurred when the root processor was not assigned any tiles in the domain decomposition, which can happen for very small domains (even when the number of tiles exceeds the number of processors). When root did not have tiles, its local fft_npert=0 was imposed on all processors, and consequently no perturbations were applied at all.

The bug consists of an MPI_AllReduce --> MPI_MAX of the pert_select vector.

With the bug fix, there is no need to have pert_std values > 0 for grid cells that contain land tiles. That is, regions with pert_std=0 should be fine.


Original comment:
@gmao-qliu Could you try this branch and see if this branch fixes your issue ? I will add some more checks on heterogenous std_dev and make sure it is greater than 0 on land.

@github-actions
Copy link

This PR is being prevented from merging because you have not added one of our required labels: 0 diff, 0 diff trivial, Non 0-diff, 0 diff structural, 0-diff trivial, Not 0-diff, 0-diff, automatic, 0-diff uncoupled. Please add one so that the PR can be merged.

@gmao-qliu
Copy link
Contributor

@gmao-qliu Could you try this branch and see if this branch fixes your issue ? I will add some more checks on heterogenous std_dev and make sure it is greater than 0 on land.

Yes. The fix works in my tests.

@github-actions
Copy link

This PR is being prevented from merging because you have not added one of our required labels: 0 diff, 0 diff trivial, Non 0-diff, 0 diff structural, 0-diff trivial, Not 0-diff, 0-diff, automatic, 0-diff uncoupled. Please add one so that the PR can be merged.

@github-actions
Copy link

This PR is being prevented from merging because you have not added one of our required labels: 0 diff, 0 diff trivial, Non 0-diff, 0 diff structural, 0-diff trivial, Not 0-diff, 0-diff, automatic, 0-diff uncoupled. Please add one so that the PR can be merged.

@weiyuan-jiang weiyuan-jiang marked this pull request as ready for review April 19, 2023 16:39
@weiyuan-jiang weiyuan-jiang requested a review from a team as a code owner April 19, 2023 16:39
@weiyuan-jiang
Copy link
Contributor Author

@gmao-rreichle , I have added the check. I think it should be ready

@gmao-rreichle gmao-rreichle changed the title all processes have the same npert and pert_select bug fix: ensure that all processes have the same npert and pert_select Apr 19, 2023
@gmao-rreichle
Copy link
Contributor

@gmao-rreichle , I have added the check. I think it should be ready

Thanks, @weiyuan-jiang. When I edited the intro to the PR so we have some description of what's done here, I started wondering if we actually need to enforce pert_std>0 for all grid cells that contain land tiles. Specifically, I now think that the MPI_AllReduce --> MPI_MAX operation may be just fine in dealing with a processor having exact zeros for pert_std in its subdomain when others have non-zero pert_std values.
Say for some k we have one processor with pert_std(k,:,:)=0 but another processor with maxval(pert_std(k,:,:))>0, then after the MPI_AllReduce --> MPI_MAX operation we should end up with pert_select(k)=1 as desired.
In other words, I don't think we actually need to force the user to avoid exact zero values where there is land.
Put differently, the MPI_AllReduce --> MPI_MAX operation is a really nifty way of implementing what would otherwise be a fairly complicated communication+calculation.
Does this make sense to you?

@weiyuan-jiang
Copy link
Contributor Author

@gmao-rreichle , I have added the check. I think it should be ready

Thanks, @weiyuan-jiang. When I edited the intro to the PR so we have some description of what's done here, I started wondering if we actually need to enforce pert_std>0 for all grid cells that contain land tiles. Specifically, I now think that the MPI_AllReduce --> MPI_MAX operation may be just fine in dealing with a processor having exact zeros for pert_std in its subdomain when others have non-zero pert_std values. Say for some k we have one processor with pert_std(k,:,:)=0 but another processor with maxval(pert_std(k,:,:))>0, then after the MPI_AllReduce --> MPI_MAX operation we should end up with pert_select(k)=1 as desired. In other words, I don't think we actually need to force the user to avoid exact zero values where there is land. Put differently, the MPI_AllReduce --> MPI_MAX operation is a really nifty way of implementing what would otherwise be a fairly complicated communication+calculation. Does this make sense to you?

Yes, That makes sense to me. I can roll back the commit.

@biljanaorescanin
Copy link
Contributor

All our tests passed.

@gmao-rreichle gmao-rreichle merged commit 9f972ab into develop Apr 20, 2023
@gmao-rreichle gmao-rreichle deleted the feature/wjiang/bcast_pert_select branch April 20, 2023 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants