Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for Cartesian velocity grid with VelocityCoincidenceThinning #4886

Merged

Conversation

roelof-groenewald
Copy link
Member

Follow-up to #4820. Adding an option to use a Cartesian velocity grid.

Testing with the same problem as before, the merging seems to work as expected:
image
where blue is before merging and orange afterwards.
About 50% of particles were merged in this test with total energy change = -8.836e-13% and total weight change = 6.7259e-14%.

roelof-groenewald and others added 30 commits April 3, 2024 19:23
Comment on lines 124 to 146
// get the minimum and maximum velocities to determine the velocity space
// grid boundaries
using PType = typename WarpXParticleContainer::SuperParticleType;
velocityBinCalculator.ux_min = ReduceMin( *pc,
[=] AMREX_GPU_HOST_DEVICE (const PType& p)
{ return p.rdata(PIdx::ux); }
);
velocityBinCalculator.uy_min = ReduceMin( *pc,
[=] AMREX_GPU_HOST_DEVICE (const PType& p)
{ return p.rdata(PIdx::uy); }
);
velocityBinCalculator.uz_min = ReduceMin( *pc,
[=] AMREX_GPU_HOST_DEVICE (const PType& p)
{ return p.rdata(PIdx::uz); }
);
velocityBinCalculator.ux_max = ReduceMax( *pc,
[=] AMREX_GPU_HOST_DEVICE (const PType& p)
{ return p.rdata(PIdx::ux); }
);
velocityBinCalculator.uy_max = ReduceMax( *pc,
[=] AMREX_GPU_HOST_DEVICE (const PType& p)
{ return p.rdata(PIdx::uy); }
);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeiqunZhang this is the section in which I would like to perform a reduction over the particle tile rather than the whole particle container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

const int np = pti.numParticles();
GpuArray<ParticleReal, AMREX_SPACEDIM*2> result;
{
    using ReduceOpsT = TypeMultiplier<ReduceOps,
                                      ReduceOpMin[AMREX_SPACEDIM],
                                      ReduceOpMax[AMREX_SPACEDIM]>;
    using ReduceDataT = TypeMultiplier<ReduceData, ParticleReal[AMREX_SPACEDIM*2]>;
    ReduceOpsT reduce_op;
    ReduceDataT reduce_data(reduce_op);
    using ReduceTuple = typename ReduceDataT::Type;
    reduce_op.eval(np, reduce_data, [=] AMREX_GPU_DEVICE(int i) -> ReduceTuple {
        ....
        return {AMREX_D_DECL(ux,uy,uz),
                AMREX_D_DECL(ux,uy,uz)};
    });
    auto hv = reduce_data.value(reduce_op);
    result = tupleToArray(hv);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Thanks so much for your help!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should add a function to amrex to make this relatively easier for the users. Something like

auto result = amrex::ParReduce(TypeList<ReduceOpMin[AMREX_SPACEDIM],ReduceOpMax[AMREX_SPACEDIM]>{},
                                  TypeList<ParticleReal[AMERX_SPACEDIM*2]>{},
                                  np, [=] AMREX_GPU_DEVICE (int i) -> TypeMultiplier<GpuTuple, ParticleReal[AMREX_SPACEDIM*2]> { return ...; });

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a simplified function would be helpful. For example, it could improve the code in https://github.com/ECP-WarpX/WarpX/blob/development/Source/Diagnostics/ReducedDiags/ParticleExtrema.cpp. Note however, that in both that case and my present case we need the min and max for each direction rather than the overall min and max (and I actually don't need the max of uz). I modified the code you suggested accordingly.
Thanks again for your help!

@roelof-groenewald roelof-groenewald changed the title [WIP] Resampling: Add option for Cartesian velocity grid with VelocityCoincidenceThinning Add option for Cartesian velocity grid with VelocityCoincidenceThinning Apr 25, 2024
self.resampling_algorithm_delta_ur = kw.pop('warpx_resampling_algorithm_delta_ur', None)
self.resampling_algorithm_n_theta = kw.pop('warpx_resampling_algorithm_n_theta', None)
self.resampling_algorithm_n_phi = kw.pop('warpx_resampling_algorithm_n_phi', None)
self.resampling_algorithm_delta_ux = kw.pop('warpx_resampling_algorithm_delta_ux', None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, all three numbers will be required. Perhaps have one input parameter that takes the three values, something like self.resampling_algorithm_delta_us.

for (int i = cell_start; i < cell_stop; ++i)
{
int ii = -1, jj = -1, kk = -1;
if (velocity_grid_type == VelocityGridType::Spherical) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this check be put outside the loop so that it isn't done for every particle? Could what is being done in PR #4888 be done here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I'll update accordingly. I was not aware of that technique for reducing register pressure. Thanks for pointing it out!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging a little deeper I'm not sure whether the CompileTimeOptions will help in this case since the velocity binning is done inside a GPU loop already. But I have moved things around some so that now we only check the grid type once per tile rather than for every particle.

else if (velocity_grid_type == VelocityGridType::Cartesian) {
ii = static_cast<int>((ux[indices[i]] - ux_min) / dux);
jj = static_cast<int>((uy[indices[i]] - uy_min) / duy);
kk = static_cast<int>((uz[indices[i]] - uz_min) / duz);
Copy link
Member

@dpgrote dpgrote Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of a way to avoid having to calculate the min and max. How does this look?

ii = static_cast<int>(std::fmod(ux[indices[i]] / dux, 1290.));

And have n1 and n2 be 1290. This will keep the indices spread out as long as the velocities don't fill out the space 1290*du. Note that 1290 is approximately the cube root of the max integer, the type of bin_array. Yes, I know this is a bit hacky but should work in most cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point about being able to avoid the need to calculate max / min values by simply using the full range of integer values. The single cell index will be less straight forward to calculate when allowing negative ii, jj and kk values. But this can be circumvented by using

auto ux_min = 645 * dux;
ii = static_cast<int>((ux[indices[i]] - ux_min) / dux);

The question for me is whether avoiding the calculation of max / min values is worth sacrificing the algorithmic flexibility to have:

  • velocity grids not centered at 0 m/s
  • different number of cells in different directions - for example a species with a low temperature in x but a large temperature in z would naturally allow a small number of cells in vx (< 1290) and a large number of cells in vz (> 1290) if the same du values are used

Seeing as the min / max calculation is very likely not the limiting step in this scheme (it is GPU parallelized) I am inclined to favor the algorithmic flexibility over the minor performance improvement. Do you disagree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my suggestion, negative values of indices should be valid, as long as they don't overlap with positive values from the neighboring cell, which would be the case if the particles don't fill the 1290*du space. The centering would in some sense be arbitrary (because of the modulus).
In the second case, if the same du was used, then the thinning in x would be poor since all of the particles would fall into a small number of bins so particles with widely varying ux would be combined. Also, if there's a case which needs more that 1290 bins, there are already a large number of particles per cell and there probably should be thinning happening earlier.
Though, if as you say the calculation of the min and max takes little time, then there is no need to avoid it. It is just a fun exercise to think about how it could be done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is fun to think about alternative and more clever ways of doing the combinatorics, so I would like to understand your strategy.
I tried to illustrate the problem I'm having with the negative indices in the following table (for a 2d grid to make the visualization easier):
image
The dashed lines represent the ux=0, uy=0 axes, the blue row gives the ii value for each cell and the orange row the jj values. The integer in each cell is calculated from idx = ii + jj * nx where in this reduced case nx = 3. As you can see I get many cells with the same index, which is why I said the index value will be less straight forward to calculate. The top left and bottom right quadrants are fine but the others have duplicated cell indices. The all positive indices approach essentially shifts the origin so that only the top left quadrant is used. Am I missing something? Or do you have a different way of finding idx from ii and jj?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that there can be overlap when using negative indices. But I'm suggesting using a large value for nx, with 1290 being a good value. There would only be overlaps if there are particles with for example velocities near 0 and near -1290dux, which is unlikely. This is what I mean in my comment "negative values of indices should be valid, as long as they don't overlap with positive values from the neighboring cell, which would be the case if the particles don't fill the 1290du space."

}
else if (velocity_grid_type_str == "cartesian") {
m_velocity_grid_type = VelocityGridType::Cartesian;
utils::parser::getWithParser(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking that these three could be combined into one input parameter, reducing the verbosity of the input file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also considered that, but then I looked at our convention with similar parameters in other cases and it seems that we generally use an array for PICMI input but then specify each direction separately in the naive input.
Examples include:

  • GaussianBunchDistribution which takes rms_bunch_size array as input but passes parameters to the native input (and same with centroid_position) as
        species.add_new_group_attr(source_name, 'x_rms', self.rms_bunch_size[0])
        species.add_new_group_attr(source_name, 'y_rms', self.rms_bunch_size[1])
        species.add_new_group_attr(source_name, 'z_rms', self.rms_bunch_size[2])
  • DensityDistributionBase takes lower_bound and upper_bound arrays but passes
        species.add_new_group_attr(source_name, 'xmin', self.lower_bound[0])
        species.add_new_group_attr(source_name, 'xmax', self.upper_bound[0])
        species.add_new_group_attr(source_name, 'ymin', self.lower_bound[1])
        species.add_new_group_attr(source_name, 'ymax', self.upper_bound[1])
        species.add_new_group_attr(source_name, 'zmin', self.lower_bound[2])
        species.add_new_group_attr(source_name, 'zmax', self.upper_bound[2])
  • Boundary potentials are also specified as separate input parameters for each boundary - lo/hi_x/y/z.

Of course there are counter examples like the specification of the domain lower and upper bounds and boundary condition types. So maybe we should decide on a single convention and start enforcing it for new code additions. Your preference is to pass arrays rather than individual direction parameters?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked around in the code and noticed this pattern. Where there are multiple parameters like this, if they are all required, then they are input as a vector. Otherwise they are input separately. This makes sense since in the first case, putting them in a vector is compact and keeps closely coupled quantities together. In the latter, if they were input as a vector, a default or unset flag would be needed to specify the unset parameters in the vector.

I do agree that it would be good to have a standard here in how to specify related parameters like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, yes. Seeing as all three parameters are required in this case, I will switch to using an array as input.
Going forward I will also keep an eye out for this general rule.

Copy link
Member

@dpgrote dpgrote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'll approve as is, but a nice addition would be adding a CI test. This should be easy since you already have the resample_velocity_coincidence_thinning test.

@roelof-groenewald
Copy link
Member Author

Thanks @dpgrote. I agree adding a test would be good to also cover the specific functionality for the Cartesian velocity grid. I'll add one in a follow-up PR.

@roelof-groenewald roelof-groenewald merged commit 8a9bedc into ECP-WarpX:development May 2, 2024
45 checks passed
@roelof-groenewald roelof-groenewald deleted the cart_vel_merging branch May 2, 2024 21:43
Haavaan pushed a commit to Haavaan/WarpX that referenced this pull request Jun 5, 2024
…ing` (ECP-WarpX#4886)

* initial implementation of the grid based merging routine

* add compiler directives for different dimensions

* avoid implicit capture of `this` in lambda

* code clean-up

* use `PIdx::x` for r-coordinate while 4667 is still being reviewed

* fix clang-tidy errors

* another clang-tidy fix

* improve doc-strings

* use iterative heap sort since "SYCL kernel cannot call a recursive function"

* fix clang-tidy error

* fix sorting bug; add documentation and CI test

* add dummy CI benchmark file to get proper values

* update benchmark values based on Azure results

* rename `GridBasedMerging` -> `VelocityCoincidenceThinning`

* use `Algorithms::KineticEnergy`

* reorganize merging loop

* update CI benchmark values

* Relativistic correction in product particles' velocity calculation

* update benchmark values after changing energy calculation

* handle edge case with zero cluster momentum

* call redistribute after particle resampling to remove invalid particles

* use unsigned ints for indexing

* Revert "use unsigned ints for indexing"

This reverts commit abe027f.

* call `Redistribute` before merging

* code clean-up

* also check for `std::isnan` in edge case handling

* add reference for grid based merging

* check that cluster has total weight > 0 before merging

* add defense against numerical error leading to nan

* make resampling message more verbose

* use `deleteInvalidParticles` instead of `Redistribute`

* remove default values for merging parameters

* remove doc-string default specifications in picmi.py

* apply suggestions from code review

* update benchmark values; avoid possible nans

* add assert to prevent merging of photons

* add `BackwardCompatibility` check to `LevelingThinning`

* implement option for Cartesian velocity mesh

* use `enum` for velocity grid types

* fix Windows `uint` issue and clang-tidy error

* use `Reduce` functions for GPU compatibility

* merge kernels finding min and max velocities

* take array of du values in picmi

* avoid checking velocity grid type for every particle

* fix issue in picmi

* use array input for velocity bin sizes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants