Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable OpenMP in particle push and coordinate transformation routines. #241

Merged
merged 8 commits into from Sep 22, 2022

Conversation

atmyers
Copy link
Member

@atmyers atmyers commented Sep 9, 2022

Close #195

To Do

  • turn tiling on
  • set a reasonable default tile size when OMP is selected
  • rebase and also OpenMP-accelerate new kernels from Space Charge Solver #162

@atmyers atmyers requested a review from ax3l September 9, 2022 20:24
@ax3l ax3l self-assigned this Sep 9, 2022
@ax3l ax3l added the backend: openmp Specific to OpenMP execution (CPUs) label Sep 9, 2022
@ax3l
Copy link
Member

ax3l commented Sep 9, 2022

Thanks! Do we need to change some defaults, like the dynamic scheduling default we do in WarpX?

@atmyers
Copy link
Member Author

atmyers commented Sep 15, 2022

Yes, at minimum we should set turn tiling on and set a reasonable default tile size when OMP is selected - I will update.

@ax3l
Copy link
Member

ax3l commented Sep 20, 2022

@atmyers thanks! We also merged #162 now. Can you please rebase and OpenMP-enable those routines as well? :)

@atmyers
Copy link
Member Author

atmyers commented Sep 21, 2022

Here are the current performance results on the expanding beam test:

dynamic_1.out:TinyProfiler total time across processes [min...avg...max]: 1.35 ... 1.35 ... 1.35
dynamic_2.out:TinyProfiler total time across processes [min...avg...max]: 0.8144 ... 0.8144 ... 0.8144
dynamic_4.out:TinyProfiler total time across processes [min...avg...max]: 0.5541 ... 0.5541 ... 0.5541
static_1.out:TinyProfiler total time across processes [min...avg...max]: 1.353 ... 1.353 ... 1.353
static_2.out:TinyProfiler total time across processes [min...avg...max]: 0.8157 ... 0.8157 ... 0.8157
static_4.out:TinyProfiler total time across processes [min...avg...max]: 0.5614 ... 0.5614 ... 0.5614

@ax3l
Copy link
Member

ax3l commented Sep 22, 2022

Thank you! Can you perform a little test on a single CPU package comparing MPI w/ 1 OMP thread vs. all OMP threads?

@ax3l
Copy link
Member

ax3l commented Sep 22, 2022

Can you please also update MFIter loops to be OpenMP accelerated?

@atmyers
Copy link
Member Author

atmyers commented Sep 22, 2022

Here is a comparison between pure MPI and pure OMP on the expanding beam test, with diags disabled:

MPI:
1 ranks, 1 threads: 1.357
2 ranks, 1 threads: 0.9569
4 ranks, 1 threads: 0.6122

OMP:
1 ranks, 1 threads: 1.353
1 ranks, 2 threads: 0.8035
1 ranks, 4 threads: 0.6009

@ax3l

This comment was marked as outdated.

@ax3l

This comment was marked as outdated.

@ax3l ax3l merged commit 54741c5 into ECP-WarpX:development Sep 22, 2022
@ax3l

This comment was marked as outdated.

@@ -40,6 +40,9 @@ namespace impactx::spacecharge
space_charge_field.at(lev).at("y").setVal(0.);
space_charge_field.at(lev).at("z").setVal(0.);

#ifdef AMREX_USE_OMP
#pragma omp parallel if (amrex::Gpu::notInLaunchRegion())
#endif
for (amrex::MFIter mfi(phi.at(lev)); mfi.isValid(); ++mfi) {
Copy link
Member

@ax3l ax3l Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeiqunZhang commented:
we can also do tiling on CPU for MFIter loops (not yet done here).

Let's investigate if this helps us here and potentially make it a user option.

Example:
https://github.com/ECP-WarpX/WarpX/blob/3fe406c9701f61e07b23f7123cf0a7bad492c6dc/Source/Diagnostics/ComputeDiagFunctors/BackTransformFunctor.cpp#L110

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: openmp Specific to OpenMP execution (CPUs)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenMP Support
2 participants