-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option for Cartesian velocity grid with VelocityCoincidenceThinning
#4886
Add option for Cartesian velocity grid with VelocityCoincidenceThinning
#4886
Conversation
This reverts commit abe027f.
1d28900
to
75a42fb
Compare
// get the minimum and maximum velocities to determine the velocity space | ||
// grid boundaries | ||
using PType = typename WarpXParticleContainer::SuperParticleType; | ||
velocityBinCalculator.ux_min = ReduceMin( *pc, | ||
[=] AMREX_GPU_HOST_DEVICE (const PType& p) | ||
{ return p.rdata(PIdx::ux); } | ||
); | ||
velocityBinCalculator.uy_min = ReduceMin( *pc, | ||
[=] AMREX_GPU_HOST_DEVICE (const PType& p) | ||
{ return p.rdata(PIdx::uy); } | ||
); | ||
velocityBinCalculator.uz_min = ReduceMin( *pc, | ||
[=] AMREX_GPU_HOST_DEVICE (const PType& p) | ||
{ return p.rdata(PIdx::uz); } | ||
); | ||
velocityBinCalculator.ux_max = ReduceMax( *pc, | ||
[=] AMREX_GPU_HOST_DEVICE (const PType& p) | ||
{ return p.rdata(PIdx::ux); } | ||
); | ||
velocityBinCalculator.uy_max = ReduceMax( *pc, | ||
[=] AMREX_GPU_HOST_DEVICE (const PType& p) | ||
{ return p.rdata(PIdx::uy); } | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WeiqunZhang this is the section in which I would like to perform a reduction over the particle tile rather than the whole particle container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like
const int np = pti.numParticles();
GpuArray<ParticleReal, AMREX_SPACEDIM*2> result;
{
using ReduceOpsT = TypeMultiplier<ReduceOps,
ReduceOpMin[AMREX_SPACEDIM],
ReduceOpMax[AMREX_SPACEDIM]>;
using ReduceDataT = TypeMultiplier<ReduceData, ParticleReal[AMREX_SPACEDIM*2]>;
ReduceOpsT reduce_op;
ReduceDataT reduce_data(reduce_op);
using ReduceTuple = typename ReduceDataT::Type;
reduce_op.eval(np, reduce_data, [=] AMREX_GPU_DEVICE(int i) -> ReduceTuple {
....
return {AMREX_D_DECL(ux,uy,uz),
AMREX_D_DECL(ux,uy,uz)};
});
auto hv = reduce_data.value(reduce_op);
result = tupleToArray(hv);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Thanks so much for your help!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should add a function to amrex to make this relatively easier for the users. Something like
auto result = amrex::ParReduce(TypeList<ReduceOpMin[AMREX_SPACEDIM],ReduceOpMax[AMREX_SPACEDIM]>{},
TypeList<ParticleReal[AMERX_SPACEDIM*2]>{},
np, [=] AMREX_GPU_DEVICE (int i) -> TypeMultiplier<GpuTuple, ParticleReal[AMREX_SPACEDIM*2]> { return ...; });
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such a simplified function would be helpful. For example, it could improve the code in https://github.com/ECP-WarpX/WarpX/blob/development/Source/Diagnostics/ReducedDiags/ParticleExtrema.cpp. Note however, that in both that case and my present case we need the min and max for each direction rather than the overall min and max (and I actually don't need the max of uz
). I modified the code you suggested accordingly.
Thanks again for your help!
VelocityCoincidenceThinning
VelocityCoincidenceThinning
Python/pywarpx/picmi.py
Outdated
self.resampling_algorithm_delta_ur = kw.pop('warpx_resampling_algorithm_delta_ur', None) | ||
self.resampling_algorithm_n_theta = kw.pop('warpx_resampling_algorithm_n_theta', None) | ||
self.resampling_algorithm_n_phi = kw.pop('warpx_resampling_algorithm_n_phi', None) | ||
self.resampling_algorithm_delta_ux = kw.pop('warpx_resampling_algorithm_delta_ux', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, all three numbers will be required. Perhaps have one input parameter that takes the three values, something like self.resampling_algorithm_delta_us
.
for (int i = cell_start; i < cell_stop; ++i) | ||
{ | ||
int ii = -1, jj = -1, kk = -1; | ||
if (velocity_grid_type == VelocityGridType::Spherical) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this check be put outside the loop so that it isn't done for every particle? Could what is being done in PR #4888 be done here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I'll update accordingly. I was not aware of that technique for reducing register pressure. Thanks for pointing it out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Digging a little deeper I'm not sure whether the CompileTimeOptions
will help in this case since the velocity binning is done inside a GPU loop already. But I have moved things around some so that now we only check the grid type once per tile rather than for every particle.
else if (velocity_grid_type == VelocityGridType::Cartesian) { | ||
ii = static_cast<int>((ux[indices[i]] - ux_min) / dux); | ||
jj = static_cast<int>((uy[indices[i]] - uy_min) / duy); | ||
kk = static_cast<int>((uz[indices[i]] - uz_min) / duz); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of a way to avoid having to calculate the min and max. How does this look?
ii = static_cast<int>(std::fmod(ux[indices[i]] / dux, 1290.));
And have n1 and n2 be 1290. This will keep the indices spread out as long as the velocities don't fill out the space 1290*du. Note that 1290 is approximately the cube root of the max integer, the type of bin_array
. Yes, I know this is a bit hacky but should work in most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point about being able to avoid the need to calculate max / min values by simply using the full range of integer values. The single cell index will be less straight forward to calculate when allowing negative ii
, jj
and kk
values. But this can be circumvented by using
auto ux_min = 645 * dux;
ii = static_cast<int>((ux[indices[i]] - ux_min) / dux);
The question for me is whether avoiding the calculation of max / min values is worth sacrificing the algorithmic flexibility to have:
- velocity grids not centered at 0 m/s
- different number of cells in different directions - for example a species with a low temperature in
x
but a large temperature inz
would naturally allow a small number of cells invx
(< 1290) and a large number of cells invz
(> 1290) if the samedu
values are used
Seeing as the min / max calculation is very likely not the limiting step in this scheme (it is GPU parallelized) I am inclined to favor the algorithmic flexibility over the minor performance improvement. Do you disagree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my suggestion, negative values of indices should be valid, as long as they don't overlap with positive values from the neighboring cell, which would be the case if the particles don't fill the 1290*du space. The centering would in some sense be arbitrary (because of the modulus).
In the second case, if the same du was used, then the thinning in x would be poor since all of the particles would fall into a small number of bins so particles with widely varying ux would be combined. Also, if there's a case which needs more that 1290 bins, there are already a large number of particles per cell and there probably should be thinning happening earlier.
Though, if as you say the calculation of the min and max takes little time, then there is no need to avoid it. It is just a fun exercise to think about how it could be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is fun to think about alternative and more clever ways of doing the combinatorics, so I would like to understand your strategy.
I tried to illustrate the problem I'm having with the negative indices in the following table (for a 2d grid to make the visualization easier):
The dashed lines represent the ux=0, uy=0 axes, the blue row gives the ii
value for each cell and the orange row the jj
values. The integer in each cell is calculated from idx = ii + jj * nx
where in this reduced case nx = 3
. As you can see I get many cells with the same index, which is why I said the index value will be less straight forward to calculate. The top left and bottom right quadrants are fine but the others have duplicated cell indices. The all positive indices approach essentially shifts the origin so that only the top left quadrant is used. Am I missing something? Or do you have a different way of finding idx
from ii
and jj
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that there can be overlap when using negative indices. But I'm suggesting using a large value for nx
, with 1290 being a good value. There would only be overlaps if there are particles with for example velocities near 0 and near -1290dux, which is unlikely. This is what I mean in my comment "negative values of indices should be valid, as long as they don't overlap with positive values from the neighboring cell, which would be the case if the particles don't fill the 1290du space."
} | ||
else if (velocity_grid_type_str == "cartesian") { | ||
m_velocity_grid_type = VelocityGridType::Cartesian; | ||
utils::parser::getWithParser( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking that these three could be combined into one input parameter, reducing the verbosity of the input file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also considered that, but then I looked at our convention with similar parameters in other cases and it seems that we generally use an array for PICMI input but then specify each direction separately in the naive input.
Examples include:
GaussianBunchDistribution
which takesrms_bunch_size
array as input but passes parameters to the native input (and same withcentroid_position
) as
species.add_new_group_attr(source_name, 'x_rms', self.rms_bunch_size[0])
species.add_new_group_attr(source_name, 'y_rms', self.rms_bunch_size[1])
species.add_new_group_attr(source_name, 'z_rms', self.rms_bunch_size[2])
DensityDistributionBase
takeslower_bound
andupper_bound
arrays but passes
species.add_new_group_attr(source_name, 'xmin', self.lower_bound[0])
species.add_new_group_attr(source_name, 'xmax', self.upper_bound[0])
species.add_new_group_attr(source_name, 'ymin', self.lower_bound[1])
species.add_new_group_attr(source_name, 'ymax', self.upper_bound[1])
species.add_new_group_attr(source_name, 'zmin', self.lower_bound[2])
species.add_new_group_attr(source_name, 'zmax', self.upper_bound[2])
- Boundary potentials are also specified as separate input parameters for each boundary -
lo/hi_x/y/z
.
Of course there are counter examples like the specification of the domain lower and upper bounds and boundary condition types. So maybe we should decide on a single convention and start enforcing it for new code additions. Your preference is to pass arrays rather than individual direction parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked around in the code and noticed this pattern. Where there are multiple parameters like this, if they are all required, then they are input as a vector. Otherwise they are input separately. This makes sense since in the first case, putting them in a vector is compact and keeps closely coupled quantities together. In the latter, if they were input as a vector, a default or unset flag would be needed to specify the unset parameters in the vector.
I do agree that it would be good to have a standard here in how to specify related parameters like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, yes. Seeing as all three parameters are required in this case, I will switch to using an array as input.
Going forward I will also keep an eye out for this general rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I'll approve as is, but a nice addition would be adding a CI test. This should be easy since you already have the resample_velocity_coincidence_thinning
test.
Thanks @dpgrote. I agree adding a test would be good to also cover the specific functionality for the Cartesian velocity grid. I'll add one in a follow-up PR. |
…ing` (ECP-WarpX#4886) * initial implementation of the grid based merging routine * add compiler directives for different dimensions * avoid implicit capture of `this` in lambda * code clean-up * use `PIdx::x` for r-coordinate while 4667 is still being reviewed * fix clang-tidy errors * another clang-tidy fix * improve doc-strings * use iterative heap sort since "SYCL kernel cannot call a recursive function" * fix clang-tidy error * fix sorting bug; add documentation and CI test * add dummy CI benchmark file to get proper values * update benchmark values based on Azure results * rename `GridBasedMerging` -> `VelocityCoincidenceThinning` * use `Algorithms::KineticEnergy` * reorganize merging loop * update CI benchmark values * Relativistic correction in product particles' velocity calculation * update benchmark values after changing energy calculation * handle edge case with zero cluster momentum * call redistribute after particle resampling to remove invalid particles * use unsigned ints for indexing * Revert "use unsigned ints for indexing" This reverts commit abe027f. * call `Redistribute` before merging * code clean-up * also check for `std::isnan` in edge case handling * add reference for grid based merging * check that cluster has total weight > 0 before merging * add defense against numerical error leading to nan * make resampling message more verbose * use `deleteInvalidParticles` instead of `Redistribute` * remove default values for merging parameters * remove doc-string default specifications in picmi.py * apply suggestions from code review * update benchmark values; avoid possible nans * add assert to prevent merging of photons * add `BackwardCompatibility` check to `LevelingThinning` * implement option for Cartesian velocity mesh * use `enum` for velocity grid types * fix Windows `uint` issue and clang-tidy error * use `Reduce` functions for GPU compatibility * merge kernels finding min and max velocities * take array of du values in picmi * avoid checking velocity grid type for every particle * fix issue in picmi * use array input for velocity bin sizes
Follow-up to #4820. Adding an option to use a Cartesian velocity grid.
Testing with the same problem as before, the merging seems to work as expected:
where blue is before merging and orange afterwards.
About 50% of particles were merged in this test with total energy change = -8.836e-13% and total weight change = 6.7259e-14%.