Add option for Cartesian velocity grid with `VelocityCoincidenceThinning` #4886

roelof-groenewald · 2024-04-23T22:44:22Z

Follow-up to #4820. Adding an option to use a Cartesian velocity grid.

Testing with the same problem as before, the merging seems to work as expected:

where blue is before merging and orange afterwards.
About 50% of particles were merged in this test with total energy change = -8.836e-13% and total weight change = 6.7259e-14%.

…nction"

This reverts commit abe027f.

…erging

…ging

roelof-groenewald · 2024-04-24T17:31:00Z

Source/Particles/Resampling/VelocityCoincidenceThinning.cpp

+        // get the minimum and maximum velocities to determine the velocity space
+        // grid boundaries
+        using PType = typename WarpXParticleContainer::SuperParticleType;
+        velocityBinCalculator.ux_min = ReduceMin( *pc,
+            [=] AMREX_GPU_HOST_DEVICE (const PType& p)
+            { return p.rdata(PIdx::ux); }
+        );
+        velocityBinCalculator.uy_min = ReduceMin( *pc,
+            [=] AMREX_GPU_HOST_DEVICE (const PType& p)
+            { return p.rdata(PIdx::uy); }
+        );
+        velocityBinCalculator.uz_min = ReduceMin( *pc,
+            [=] AMREX_GPU_HOST_DEVICE (const PType& p)
+            { return p.rdata(PIdx::uz); }
+        );
+        velocityBinCalculator.ux_max = ReduceMax( *pc,
+            [=] AMREX_GPU_HOST_DEVICE (const PType& p)
+            { return p.rdata(PIdx::ux); }
+        );
+        velocityBinCalculator.uy_max = ReduceMax( *pc,
+            [=] AMREX_GPU_HOST_DEVICE (const PType& p)
+            { return p.rdata(PIdx::uy); }
+        );


@WeiqunZhang this is the section in which I would like to perform a reduction over the particle tile rather than the whole particle container.

Something like

const int np = pti.numParticles(); GpuArray<ParticleReal, AMREX_SPACEDIM*2> result; { using ReduceOpsT = TypeMultiplier<ReduceOps, ReduceOpMin[AMREX_SPACEDIM], ReduceOpMax[AMREX_SPACEDIM]>; using ReduceDataT = TypeMultiplier<ReduceData, ParticleReal[AMREX_SPACEDIM*2]>; ReduceOpsT reduce_op; ReduceDataT reduce_data(reduce_op); using ReduceTuple = typename ReduceDataT::Type; reduce_op.eval(np, reduce_data, [=] AMREX_GPU_DEVICE(int i) -> ReduceTuple { .... return {AMREX_D_DECL(ux,uy,uz), AMREX_D_DECL(ux,uy,uz)}; }); auto hv = reduce_data.value(reduce_op); result = tupleToArray(hv); }

Cool! Thanks so much for your help!

We probably should add a function to amrex to make this relatively easier for the users. Something like

auto result = amrex::ParReduce(TypeList<ReduceOpMin[AMREX_SPACEDIM],ReduceOpMax[AMREX_SPACEDIM]>{}, TypeList<ParticleReal[AMERX_SPACEDIM*2]>{}, np, [=] AMREX_GPU_DEVICE (int i) -> TypeMultiplier<GpuTuple, ParticleReal[AMREX_SPACEDIM*2]> { return ...; });

Such a simplified function would be helpful. For example, it could improve the code in https://github.com/ECP-WarpX/WarpX/blob/development/Source/Diagnostics/ReducedDiags/ParticleExtrema.cpp. Note however, that in both that case and my present case we need the min and max for each direction rather than the overall min and max (and I actually don't need the max of uz). I modified the code you suggested accordingly.
Thanks again for your help!

dpgrote · 2024-04-26T16:34:22Z

Python/pywarpx/picmi.py

        self.resampling_algorithm_delta_ur = kw.pop('warpx_resampling_algorithm_delta_ur', None)
        self.resampling_algorithm_n_theta = kw.pop('warpx_resampling_algorithm_n_theta', None)
        self.resampling_algorithm_n_phi = kw.pop('warpx_resampling_algorithm_n_phi', None)
+        self.resampling_algorithm_delta_ux = kw.pop('warpx_resampling_algorithm_delta_ux', None)


In this case, all three numbers will be required. Perhaps have one input parameter that takes the three values, something like self.resampling_algorithm_delta_us.

dpgrote · 2024-04-26T16:38:28Z

Source/Particles/Resampling/VelocityCoincidenceThinning.H

+            for (int i = cell_start; i < cell_stop; ++i)
+            {
+                int ii = -1, jj = -1, kk = -1;
+                if (velocity_grid_type == VelocityGridType::Spherical) {


Could this check be put outside the loop so that it isn't done for every particle? Could what is being done in PR #4888 be done here?

Sounds good! I'll update accordingly. I was not aware of that technique for reducing register pressure. Thanks for pointing it out!

Digging a little deeper I'm not sure whether the CompileTimeOptions will help in this case since the velocity binning is done inside a GPU loop already. But I have moved things around some so that now we only check the grid type once per tile rather than for every particle.

dpgrote · 2024-04-26T17:11:52Z

Source/Particles/Resampling/VelocityCoincidenceThinning.H

+                else if (velocity_grid_type == VelocityGridType::Cartesian) {
+                    ii = static_cast<int>((ux[indices[i]] - ux_min) / dux);
+                    jj = static_cast<int>((uy[indices[i]] - uy_min) / duy);
+                    kk = static_cast<int>((uz[indices[i]] - uz_min) / duz);


I was thinking of a way to avoid having to calculate the min and max. How does this look?

ii = static_cast<int>(std::fmod(ux[indices[i]] / dux, 1290.));

And have n1 and n2 be 1290. This will keep the indices spread out as long as the velocities don't fill out the space 1290*du. Note that 1290 is approximately the cube root of the max integer, the type of bin_array. Yes, I know this is a bit hacky but should work in most cases.

That's a good point about being able to avoid the need to calculate max / min values by simply using the full range of integer values. The single cell index will be less straight forward to calculate when allowing negative ii, jj and kk values. But this can be circumvented by using

auto ux_min = 645 * dux; ii = static_cast<int>((ux[indices[i]] - ux_min) / dux);

The question for me is whether avoiding the calculation of max / min values is worth sacrificing the algorithmic flexibility to have:

velocity grids not centered at 0 m/s

different number of cells in different directions - for example a species with a low temperature in x but a large temperature in z would naturally allow a small number of cells in vx (< 1290) and a large number of cells in vz (> 1290) if the same du values are used

Seeing as the min / max calculation is very likely not the limiting step in this scheme (it is GPU parallelized) I am inclined to favor the algorithmic flexibility over the minor performance improvement. Do you disagree?

In my suggestion, negative values of indices should be valid, as long as they don't overlap with positive values from the neighboring cell, which would be the case if the particles don't fill the 1290*du space. The centering would in some sense be arbitrary (because of the modulus).
In the second case, if the same du was used, then the thinning in x would be poor since all of the particles would fall into a small number of bins so particles with widely varying ux would be combined. Also, if there's a case which needs more that 1290 bins, there are already a large number of particles per cell and there probably should be thinning happening earlier.
Though, if as you say the calculation of the min and max takes little time, then there is no need to avoid it. It is just a fun exercise to think about how it could be done.

I agree it is fun to think about alternative and more clever ways of doing the combinatorics, so I would like to understand your strategy.
I tried to illustrate the problem I'm having with the negative indices in the following table (for a 2d grid to make the visualization easier):

The dashed lines represent the ux=0, uy=0 axes, the blue row gives the ii value for each cell and the orange row the jj values. The integer in each cell is calculated from idx = ii + jj * nx where in this reduced case nx = 3. As you can see I get many cells with the same index, which is why I said the index value will be less straight forward to calculate. The top left and bottom right quadrants are fine but the others have duplicated cell indices. The all positive indices approach essentially shifts the origin so that only the top left quadrant is used. Am I missing something? Or do you have a different way of finding idx from ii and jj?

I do agree that there can be overlap when using negative indices. But I'm suggesting using a large value for nx, with 1290 being a good value. There would only be overlaps if there are particles with for example velocities near 0 and near -1290dux, which is unlikely. This is what I mean in my comment "negative values of indices should be valid, as long as they don't overlap with positive values from the neighboring cell, which would be the case if the particles don't fill the 1290du space."

dpgrote · 2024-04-26T22:19:14Z

Source/Particles/Resampling/VelocityCoincidenceThinning.cpp

+    }
+    else if (velocity_grid_type_str == "cartesian") {
+        m_velocity_grid_type = VelocityGridType::Cartesian;
+        utils::parser::getWithParser(


I was also thinking that these three could be combined into one input parameter, reducing the verbosity of the input file.

I also considered that, but then I looked at our convention with similar parameters in other cases and it seems that we generally use an array for PICMI input but then specify each direction separately in the naive input.
Examples include:

GaussianBunchDistribution which takes rms_bunch_size array as input but passes parameters to the native input (and same with centroid_position) as

species.add_new_group_attr(source_name, 'x_rms', self.rms_bunch_size[0]) species.add_new_group_attr(source_name, 'y_rms', self.rms_bunch_size[1]) species.add_new_group_attr(source_name, 'z_rms', self.rms_bunch_size[2])

DensityDistributionBase takes lower_bound and upper_bound arrays but passes

species.add_new_group_attr(source_name, 'xmin', self.lower_bound[0]) species.add_new_group_attr(source_name, 'xmax', self.upper_bound[0]) species.add_new_group_attr(source_name, 'ymin', self.lower_bound[1]) species.add_new_group_attr(source_name, 'ymax', self.upper_bound[1]) species.add_new_group_attr(source_name, 'zmin', self.lower_bound[2]) species.add_new_group_attr(source_name, 'zmax', self.upper_bound[2])

Boundary potentials are also specified as separate input parameters for each boundary - lo/hi_x/y/z.

Of course there are counter examples like the specification of the domain lower and upper bounds and boundary condition types. So maybe we should decide on a single convention and start enforcing it for new code additions. Your preference is to pass arrays rather than individual direction parameters?

I looked around in the code and noticed this pattern. Where there are multiple parameters like this, if they are all required, then they are input as a vector. Otherwise they are input separately. This makes sense since in the first case, putting them in a vector is compact and keeps closely coupled quantities together. In the latter, if they were input as a vector, a default or unset flag would be needed to specify the unset parameters in the vector.

I do agree that it would be good to have a standard here in how to specify related parameters like this.

That makes sense, yes. Seeing as all three parameters are required in this case, I will switch to using an array as input.
Going forward I will also keep an eye out for this general rule.

dpgrote

Looks good. I'll approve as is, but a nice addition would be adding a CI test. This should be easy since you already have the resample_velocity_coincidence_thinning test.

roelof-groenewald · 2024-05-02T21:43:42Z

Thanks @dpgrote. I agree adding a test would be good to also cover the specific functionality for the Cartesian velocity grid. I'll add one in a follow-up PR.

…ing` (ECP-WarpX#4886) * initial implementation of the grid based merging routine * add compiler directives for different dimensions * avoid implicit capture of `this` in lambda * code clean-up * use `PIdx::x` for r-coordinate while 4667 is still being reviewed * fix clang-tidy errors * another clang-tidy fix * improve doc-strings * use iterative heap sort since "SYCL kernel cannot call a recursive function" * fix clang-tidy error * fix sorting bug; add documentation and CI test * add dummy CI benchmark file to get proper values * update benchmark values based on Azure results * rename `GridBasedMerging` -> `VelocityCoincidenceThinning` * use `Algorithms::KineticEnergy` * reorganize merging loop * update CI benchmark values * Relativistic correction in product particles' velocity calculation * update benchmark values after changing energy calculation * handle edge case with zero cluster momentum * call redistribute after particle resampling to remove invalid particles * use unsigned ints for indexing * Revert "use unsigned ints for indexing" This reverts commit abe027f. * call `Redistribute` before merging * code clean-up * also check for `std::isnan` in edge case handling * add reference for grid based merging * check that cluster has total weight > 0 before merging * add defense against numerical error leading to nan * make resampling message more verbose * use `deleteInvalidParticles` instead of `Redistribute` * remove default values for merging parameters * remove doc-string default specifications in picmi.py * apply suggestions from code review * update benchmark values; avoid possible nans * add assert to prevent merging of photons * add `BackwardCompatibility` check to `LevelingThinning` * implement option for Cartesian velocity mesh * use `enum` for velocity grid types * fix Windows `uint` issue and clang-tidy error * use `Reduce` functions for GPU compatibility * merge kernels finding min and max velocities * take array of du values in picmi * avoid checking velocity grid type for every particle * fix issue in picmi * use array input for velocity bin sizes

roelof-groenewald and others added 30 commits April 3, 2024 19:23

initial implementation of the grid based merging routine

11b6719

add compiler directives for different dimensions

9a21dc2

avoid implicit capture of this in lambda

7d52151

code clean-up

fbefbec

use PIdx::x for r-coordinate while 4667 is still being reviewed

2750287

fix clang-tidy errors

12b55f3

another clang-tidy fix

039d59e

improve doc-strings

d221bb0

use iterative heap sort since "SYCL kernel cannot call a recursive fu…

ee0a6d7

…nction"

fix clang-tidy error

2663871

fix sorting bug; add documentation and CI test

00ac099

add dummy CI benchmark file to get proper values

2ad980d

update benchmark values based on Azure results

c19ff60

rename GridBasedMerging -> VelocityCoincidenceThinning

9754493

use Algorithms::KineticEnergy

732f52f

reorganize merging loop

c8af1c5

update CI benchmark values

d0a5d2e

Relativistic correction in product particles' velocity calculation

87c19a2

update benchmark values after changing energy calculation

e4018ef

handle edge case with zero cluster momentum

289a962

call redistribute after particle resampling to remove invalid particles

b62d9bc

use unsigned ints for indexing

abe027f

Revert "use unsigned ints for indexing"

538b920

This reverts commit abe027f.

call Redistribute before merging

d2ae794

code clean-up

9a65712

also check for std::isnan in edge case handling

7745522

add reference for grid based merging

dfde8eb

check that cluster has total weight > 0 before merging

58a0e4a

add defense against numerical error leading to nan

6838b6a

make resampling message more verbose

06573e3

roelof-groenewald added 6 commits April 22, 2024 23:16

add BackwardCompatibility check to LevelingThinning

d06188e

Merge remote-tracking branch 'upstream/development' into grid_based_m…

69bb10f

…erging

implement option for Cartesian velocity mesh

3ea8c76

use enum for velocity grid types

13118fc

fix Windows uint issue and clang-tidy error

32622b3

Merge remote-tracking branch 'upstream/development' into cart_vel_mer…

75a42fb

…ging

roelof-groenewald force-pushed the cart_vel_merging branch from 1d28900 to 75a42fb Compare April 24, 2024 05:28

roelof-groenewald added the enhancement New feature or request label Apr 24, 2024

use Reduce functions for GPU compatibility

31a9109

roelof-groenewald commented Apr 24, 2024

View reviewed changes

merge kernels finding min and max velocities

548bf67

roelof-groenewald changed the title ~~[WIP] Resampling: Add option for Cartesian velocity grid with VelocityCoincidenceThinning~~ Add option for Cartesian velocity grid with VelocityCoincidenceThinning Apr 25, 2024

roelof-groenewald requested a review from dpgrote April 25, 2024 22:59

roelof-groenewald closed this Apr 25, 2024

roelof-groenewald reopened this Apr 25, 2024

dpgrote reviewed Apr 26, 2024

View reviewed changes

roelof-groenewald added 3 commits April 26, 2024 12:16

take array of du values in picmi

565e9bc

avoid checking velocity grid type for every particle

7d4162c

fix issue in picmi

6b2cda7

dpgrote reviewed Apr 26, 2024

View reviewed changes

use array input for velocity bin sizes

27ded39

ax3l assigned dpgrote Apr 29, 2024

dpgrote approved these changes May 2, 2024

View reviewed changes

roelof-groenewald merged commit 8a9bedc into ECP-WarpX:development May 2, 2024
45 checks passed

roelof-groenewald deleted the cart_vel_merging branch May 2, 2024 21:43

roelof-groenewald mentioned this pull request May 9, 2024

Add CI test for velocity coincidence thinning #4919

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option for Cartesian velocity grid with `VelocityCoincidenceThinning` #4886

Add option for Cartesian velocity grid with `VelocityCoincidenceThinning` #4886

roelof-groenewald commented Apr 23, 2024

roelof-groenewald Apr 24, 2024

WeiqunZhang Apr 24, 2024 •

edited

roelof-groenewald Apr 24, 2024

WeiqunZhang Apr 24, 2024

roelof-groenewald Apr 24, 2024

dpgrote Apr 26, 2024

dpgrote Apr 26, 2024

roelof-groenewald Apr 26, 2024

roelof-groenewald Apr 26, 2024

dpgrote Apr 26, 2024 •

edited

roelof-groenewald Apr 26, 2024

dpgrote Apr 26, 2024

roelof-groenewald Apr 28, 2024

dpgrote Apr 29, 2024

dpgrote Apr 26, 2024

roelof-groenewald Apr 28, 2024

dpgrote Apr 29, 2024

roelof-groenewald Apr 29, 2024

dpgrote left a comment

roelof-groenewald commented May 2, 2024

Add option for Cartesian velocity grid with VelocityCoincidenceThinning #4886

Add option for Cartesian velocity grid with VelocityCoincidenceThinning #4886

Conversation

roelof-groenewald commented Apr 23, 2024

Choose a reason for hiding this comment

WeiqunZhang Apr 24, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dpgrote Apr 26, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dpgrote left a comment

Choose a reason for hiding this comment

roelof-groenewald commented May 2, 2024

Add option for Cartesian velocity grid with `VelocityCoincidenceThinning` #4886

Add option for Cartesian velocity grid with `VelocityCoincidenceThinning` #4886

WeiqunZhang Apr 24, 2024 •

edited

dpgrote Apr 26, 2024 •

edited