Make use of faster RNG implementation for ionization #1542

n01r · 2016-08-02T13:26:58Z

Important refactoring

Greatly overdue, the random number generator for the ADK model (and all future rate models for ionization) has been exchanged.
Before we used an RNG that was creating new states during each time step of the simulation which were unique for each step, particle species, cell and MPI rank.
Now the state is created only once and then saved as well as changed for each cell after each use. This implies the uniqueness of the seed for the aforementioned conditions and reduces workload at the same time.

This now allows for faster testing of all the ionization models and more importantly: highly increased simulation speed for more sophisticated ionization models than BSI.

Big kudos to @Flamefire ! 🎉

This pull request depends on the changes in #1541

n01r · 2016-08-02T13:42:47Z

Very (!) naive runtime testing of the branch:

Ions activated, Ionization with ADKLinPol, 1 node, 4 NVIDIA K20 GPUs
mpiexec -n 4 ./picongpu -d 2 2 1 -g 128 128 64 -s 500 --e_png.period 32 --e_png.axis yz --e_png.slicePoint 0.5 --e_png.folder pngElectronsYZ

before

initialization time: 58sec 521msec = 58 sec
  0 % =        0 | time elapsed:                    0msec | avg time per step:   0msec
  5 % =       25 | time elapsed:             6sec 187msec | avg time per step: 247msec
 10 % =       50 | time elapsed:            12sec 371msec | avg time per step: 247msec
 15 % =       75 | time elapsed:            18sec 557msec | avg time per step: 247msec
 20 % =      100 | time elapsed:            24sec 742msec | avg time per step: 247msec
 25 % =      125 | time elapsed:            30sec 928msec | avg time per step: 247msec
 30 % =      150 | time elapsed:            37sec 111msec | avg time per step: 247msec
 35 % =      175 | time elapsed:            43sec 340msec | avg time per step: 249msec
 40 % =      200 | time elapsed:            49sec 798msec | avg time per step: 258msec
 45 % =      225 | time elapsed:            56sec 501msec | avg time per step: 268msec
 50 % =      250 | time elapsed:       1min  3sec 376msec | avg time per step: 274msec
 55 % =      275 | time elapsed:       1min 10sec 374msec | avg time per step: 279msec
 60 % =      300 | time elapsed:       1min 17sec 515msec | avg time per step: 285msec
 65 % =      325 | time elapsed:       1min 24sec 797msec | avg time per step: 291msec
 70 % =      350 | time elapsed:       1min 32sec 265msec | avg time per step: 298msec
 75 % =      375 | time elapsed:       1min 39sec 761msec | avg time per step: 299msec
 80 % =      400 | time elapsed:       1min 47sec 160msec | avg time per step: 295msec
 85 % =      425 | time elapsed:       1min 54sec 450msec | avg time per step: 291msec
 90 % =      450 | time elapsed:       2min  1sec 644msec | avg time per step: 287msec
 95 % =      475 | time elapsed:       2min  8sec 771msec | avg time per step: 285msec
100 % =      500 | time elapsed:       2min 15sec 834msec | avg time per step: 282msec
calculation  simulation time:  2min 15sec 834msec = 135 sec
full simulation time:  3min 14sec 759msec = 194 sec

after

initialization time: 58sec 208msec = 58 sec
  0 % =        0 | time elapsed:                    0msec | avg time per step:   0msec
  5 % =       25 | time elapsed:                  441msec | avg time per step:  17msec
 10 % =       50 | time elapsed:                  881msec | avg time per step:  17msec
 15 % =       75 | time elapsed:             1sec 319msec | avg time per step:  17msec
 20 % =      100 | time elapsed:             1sec 756msec | avg time per step:  17msec
 25 % =      125 | time elapsed:             2sec 192msec | avg time per step:  17msec
 30 % =      150 | time elapsed:             2sec 629msec | avg time per step:  17msec
 35 % =      175 | time elapsed:             3sec 113msec | avg time per step:  19msec
 40 % =      200 | time elapsed:             3sec 840msec | avg time per step:  29msec
 45 % =      225 | time elapsed:             4sec 816msec | avg time per step:  39msec
 50 % =      250 | time elapsed:             5sec 959msec | avg time per step:  45msec
 55 % =      275 | time elapsed:             7sec 230msec | avg time per step:  50msec
 60 % =      300 | time elapsed:             8sec 646msec | avg time per step:  56msec
 65 % =      325 | time elapsed:            10sec 194msec | avg time per step:  61msec
 70 % =      350 | time elapsed:            11sec 928msec | avg time per step:  69msec
 75 % =      375 | time elapsed:            13sec 696msec | avg time per step:  70msec
 80 % =      400 | time elapsed:            15sec 364msec | avg time per step:  66msec
 85 % =      425 | time elapsed:            16sec 919msec | avg time per step:  62msec
 90 % =      450 | time elapsed:            18sec 371msec | avg time per step:  58msec
 95 % =      475 | time elapsed:            19sec 755msec | avg time per step:  55msec
100 % =      500 | time elapsed:            21sec  81msec | avg time per step:  53msec
calculation  simulation time: 21sec  81msec = 21 sec
full simulation time:  1min 19sec 679msec = 79 sec

speedup: 6.43x (ratio of the calculation simulation times)

bussmann · 2016-08-02T14:27:23Z

Nice! Is this the speedup you expected?

n01r · 2016-08-02T14:47:16Z

Yes, very much so! It will change with different domain sizes and with the number of GPUs used, of course but I expected a speedup of about 5-10x. It's finally feasible to run simulations with ADK and whatever else I'll implement from now on.

PrometheusPi · 2016-08-03T10:50:47Z

I added a WIP because it still depends on #1541.

n01r · 2016-08-09T13:36:54Z

Waiting for #1547

PrometheusPi · 2016-08-09T14:01:36Z

#1547 has been merged

PrometheusPi · 2016-08-10T07:10:05Z

@n01r please:

rebase against the current dev
safeguard your RNG init with a Boolean OR in MySimulation.hpp

n01r · 2016-08-11T13:57:48Z

src/picongpu/include/traits/UsesRNG.hpp

+ * ::type (boost::mpl::bool_<>)
+ */
+template<typename T_Object>
+struct UsesRNG


This trait looks fairly generic to me. Should it rather be in PMacc?

Note: It is more common to have a value result defined as a member value rather than a subtype. This allows e.g. deriving from boost::true_type which is much more compact.
In general: Type results as type value results as value and a bool certainly is a value

But isn't boost::mpl::bool_<C> a type? Just so that boost::mpl::bool_<C>::value == C.

This is correct. But with this trait you want a value as its return value, not a type. Or is there any benefit in returning type<true> instead of true?
Examples: boost::make_unsigned<...> obviously returns a type, but boost::is_unsigned<...> also obviously returns a bool value(!).

Is this true as well for the HasFlag trait, then? #1548

I stand corrected: I forgot the ...::type in the above example which turned it into unevaluated expressions not triggering the error.
As per offline discussion with @ax3l the solution is to inherit from bmpl::bool_<true/false> like in the following, corrected example:

#include <boost/mpl/vector.hpp> #include <boost/mpl/copy_if.hpp> #include <boost/mpl/if.hpp> #include <boost/type_traits.hpp> namespace bmpl = boost::mpl; template<class T> struct Foo: public bmpl::bool_<true>{}; template<> struct Foo<int>: public bmpl::bool_<false>{}; typedef bmpl::vector<int, unsigned> MyVec; typedef typename bmpl::copy_if<MyVec, boost::is_unsigned<bmpl::_1> >::type Result; typedef typename bmpl::copy_if<MyVec, Foo<bmpl::_1> >::type Result2; typedef typename bmpl::copy_if<MyVec, bmpl::and_<Foo<bmpl::_1>, boost::is_unsigned<bmpl::_1> > >::type Result3; typedef typename bmpl::if_<bmpl::and_<Foo<int>, Foo<unsigned> >, int, bool>::type Result4; int main(){ if(Foo<int>::value) return 1; else return 0; }

yep that is wonderful, removes the useRNG run-time only function need and makes the world a pretty place.

Note that boost integral_constants are in fact nullary metafunctions. See their definition:

[...]struct integral_constant : public mpl::integral_c<T, val>[...]

This is why the above use of boost::is_unsigned works. This is defined here and also in C++11

So as a matter of fact you can (and should) inherit from boost::true_type/boost::false_type which was my original statement :)

For more complex checks one can use the same approach as integral constants:

template<class T> class MyTrait{ typedef MyTrait<T> type; // For metafunction capability [...] // Some checks and finally: BOOST_STATIC_CONSTEXPR bool value = <value-based-on-checks>; }

Note that deriving from true_type/false_type also allows if(is_unsigned<int>()) due to conversion operators defined, which is not done in the more complex case.

also, correcting myself, they are mpl compatible: boost note.

nevertheless, @Flamefire your code snipped is outdated, recent versions of boost type traits (see link) are not inheriting from mpl::integral_c any more: (code, compatibility note) - but still are compatible nullary meta-functions.

@n01r use:

#include <boost/type_traits/integral_constant.hpp> // default: template<typename T_Object> struct UsesRNG : public boost::false_type { };

and specialize

#include <boost/type_traits/integral_constant.hpp> namespace traits { template<typename T_IonizationAlgorithm, typename T_DestSpecies, typename T_SrcSpecies> struct UsesRNG<particles::ionization::ADK_Impl<T_IonizationAlgorithm, T_DestSpecies, T_SrcSpecies> > : public boost::true_type { }; } // namespace traits

@n01r pls implement to finalise PR

n01r · 2016-08-11T14:03:25Z

@PrometheusPi there you go 😉

PrometheusPi · 2016-08-11T14:17:59Z

@n01r thank you

psychocoderHPC · 2016-08-11T14:27:28Z

src/libPMacc/include/particles/traits/FilterByRNG.hpp

+ * \TODO rather pass a sequence of objects, like ionization modules
+ */
+template<typename T_MPLSeq>
+struct FilterByRNG


The name must reflect that only species with ionizer can be handled

@n01r rename: FilterIonsByRNG

add to description: @tparam T_MPLSeq sequence of particle species (ions) with ionization enabled

actually dependency logic is messed up in this place already.
RNG might be required by a species independent of ionization, too.

offline result: @n01r will fix that in a follow-up PR (not important for release)

ax3l · 2016-08-16T08:27:14Z

transferred review powers by @PrometheusPi - thank you for the great work guys!

@Flamefire

Exchange the RNG (Random Number Generator) for a faster version in ADK_Impl.hpp (and all future ionization models). Instead of creating new states for the RNG during each time step the state is now saved. It has the dimensions of the grid and is mapped to the cells. Whenever a new random number is created the state changes in that cell. Therefore it is also unique between time steps and particle species but the RNG does not do any superfluous work. Kudos to @Flamefire Status: compiles [x], runs [x]

`ionizationMethods.hpp`: removed the old implementation that created unique seeds and RNG states per time step, species, cell and MPI rank Status: compiles [x], runs [x]

RNGFactory is only created when it is needed.

psychocoderHPC · 2016-08-29T11:16:36Z

src/picongpu/include/particles/ionization/byField/ADK/ADK_Impl.hpp

-            /* \todo fix: cannot PMACC_ALIGN() because it seems to be too large */
+            /* random number generator */
+            typedef PMacc::random::RNGProvider<simDim, PMacc::random::methods::XorMin> RNGFactory;
+            typedef PMacc::random::distributions::Uniform<float> Distribution;


This must be a float_X!

n01r mentioned this pull request Aug 3, 2016

Add Keldysh ionization model #1543

Merged

PrometheusPi added the refactoring code change to improve performance or to unify a concept but does not change public API label Aug 3, 2016

PrometheusPi added this to the 0.2.0: Open Beta milestone Aug 3, 2016

PrometheusPi added the wait-4-author/WIP label Aug 3, 2016

n01r force-pushed the topic-useFasterRNG branch from d925dfb to 362f469 Compare August 4, 2016 15:44

PrometheusPi self-assigned this Aug 10, 2016

n01r force-pushed the topic-useFasterRNG branch from 362f469 to 7edd7a3 Compare August 11, 2016 13:54

n01r reviewed Aug 11, 2016
View reviewed changes

PrometheusPi removed the wait-4-author/WIP label Aug 11, 2016

psychocoderHPC reviewed Aug 11, 2016
View reviewed changes

ax3l assigned ax3l and unassigned PrometheusPi Aug 16, 2016

n01r added 2 commits August 18, 2016 00:46

Remove old RNG implementation for ionization

e73de97

`ionizationMethods.hpp`: removed the old implementation that created unique seeds and RNG states per time step, species, cell and MPI rank Status: compiles [x], runs [x]

n01r force-pushed the topic-useFasterRNG branch from 7edd7a3 to cc4047d Compare August 17, 2016 23:11

Modified UsesRNG trait, encapsulate RNGFactory

44ecfed

RNGFactory is only created when it is needed.

n01r force-pushed the topic-useFasterRNG branch from cc4047d to 44ecfed Compare August 17, 2016 23:15

ax3l added the component: core in PIConGPU (core application) label Aug 18, 2016

ax3l merged commit 8715467 into ComputationalRadiationPhysics:dev Aug 18, 2016

n01r deleted the topic-useFasterRNG branch August 19, 2016 09:01

psychocoderHPC reviewed Aug 29, 2016
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make use of faster RNG implementation for ionization #1542

Make use of faster RNG implementation for ionization #1542

n01r commented Aug 2, 2016

n01r commented Aug 2, 2016

bussmann commented Aug 2, 2016

n01r commented Aug 2, 2016

PrometheusPi commented Aug 3, 2016 •

edited

n01r commented Aug 9, 2016

PrometheusPi commented Aug 9, 2016

PrometheusPi commented Aug 10, 2016 •

edited

n01r Aug 11, 2016

Flamefire Aug 11, 2016 •

edited

n01r Aug 11, 2016

Flamefire Aug 11, 2016

n01r Aug 12, 2016

Flamefire Aug 16, 2016

ax3l Aug 16, 2016

Flamefire Aug 16, 2016 •

edited by ax3l

ax3l Aug 17, 2016 •

edited

ax3l Aug 17, 2016 •

edited

n01r commented Aug 11, 2016

PrometheusPi commented Aug 11, 2016

psychocoderHPC Aug 11, 2016

ax3l Aug 16, 2016

ax3l Aug 16, 2016

ax3l Aug 17, 2016 •

edited

ax3l commented Aug 16, 2016

psychocoderHPC Aug 29, 2016 •

edited

Make use of faster RNG implementation for ionization #1542

Make use of faster RNG implementation for ionization #1542

Conversation

n01r commented Aug 2, 2016

n01r commented Aug 2, 2016

bussmann commented Aug 2, 2016

n01r commented Aug 2, 2016

PrometheusPi commented Aug 3, 2016 • edited

n01r commented Aug 9, 2016

PrometheusPi commented Aug 9, 2016

PrometheusPi commented Aug 10, 2016 • edited

Choose a reason for hiding this comment

Flamefire Aug 11, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Flamefire Aug 16, 2016 • edited by ax3l

Choose a reason for hiding this comment

ax3l Aug 17, 2016 • edited

Choose a reason for hiding this comment

ax3l Aug 17, 2016 • edited

Choose a reason for hiding this comment

n01r commented Aug 11, 2016

PrometheusPi commented Aug 11, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ax3l Aug 17, 2016 • edited

Choose a reason for hiding this comment

ax3l commented Aug 16, 2016

psychocoderHPC Aug 29, 2016 • edited

Choose a reason for hiding this comment

PrometheusPi commented Aug 3, 2016 •

edited

PrometheusPi commented Aug 10, 2016 •

edited

Flamefire Aug 11, 2016 •

edited

Flamefire Aug 16, 2016 •

edited by ax3l

ax3l Aug 17, 2016 •

edited

ax3l Aug 17, 2016 •

edited

ax3l Aug 17, 2016 •

edited

psychocoderHPC Aug 29, 2016 •

edited