Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make use of faster RNG implementation for ionization #1542

Merged
merged 3 commits into from Aug 18, 2016

Conversation

n01r
Copy link
Member

@n01r n01r commented Aug 2, 2016

Important refactoring

Greatly overdue, the random number generator for the ADK model (and all future rate models for ionization) has been exchanged.
Before we used an RNG that was creating new states during each time step of the simulation which were unique for each step, particle species, cell and MPI rank.
Now the state is created only once and then saved as well as changed for each cell after each use. This implies the uniqueness of the seed for the aforementioned conditions and reduces workload at the same time.

This now allows for faster testing of all the ionization models and more importantly: highly increased simulation speed for more sophisticated ionization models than BSI.

Big kudos to @Flamefire ! 🎉

This pull request depends on the changes in #1541

@n01r
Copy link
Member Author

n01r commented Aug 2, 2016

Very (!) naive runtime testing of the branch:

Ions activated, Ionization with ADKLinPol, 1 node, 4 NVIDIA K20 GPUs
mpiexec -n 4 ./picongpu -d 2 2 1 -g 128 128 64 -s 500 --e_png.period 32 --e_png.axis yz --e_png.slicePoint 0.5 --e_png.folder pngElectronsYZ

before

initialization time: 58sec 521msec = 58 sec
  0 % =        0 | time elapsed:                    0msec | avg time per step:   0msec
  5 % =       25 | time elapsed:             6sec 187msec | avg time per step: 247msec
 10 % =       50 | time elapsed:            12sec 371msec | avg time per step: 247msec
 15 % =       75 | time elapsed:            18sec 557msec | avg time per step: 247msec
 20 % =      100 | time elapsed:            24sec 742msec | avg time per step: 247msec
 25 % =      125 | time elapsed:            30sec 928msec | avg time per step: 247msec
 30 % =      150 | time elapsed:            37sec 111msec | avg time per step: 247msec
 35 % =      175 | time elapsed:            43sec 340msec | avg time per step: 249msec
 40 % =      200 | time elapsed:            49sec 798msec | avg time per step: 258msec
 45 % =      225 | time elapsed:            56sec 501msec | avg time per step: 268msec
 50 % =      250 | time elapsed:       1min  3sec 376msec | avg time per step: 274msec
 55 % =      275 | time elapsed:       1min 10sec 374msec | avg time per step: 279msec
 60 % =      300 | time elapsed:       1min 17sec 515msec | avg time per step: 285msec
 65 % =      325 | time elapsed:       1min 24sec 797msec | avg time per step: 291msec
 70 % =      350 | time elapsed:       1min 32sec 265msec | avg time per step: 298msec
 75 % =      375 | time elapsed:       1min 39sec 761msec | avg time per step: 299msec
 80 % =      400 | time elapsed:       1min 47sec 160msec | avg time per step: 295msec
 85 % =      425 | time elapsed:       1min 54sec 450msec | avg time per step: 291msec
 90 % =      450 | time elapsed:       2min  1sec 644msec | avg time per step: 287msec
 95 % =      475 | time elapsed:       2min  8sec 771msec | avg time per step: 285msec
100 % =      500 | time elapsed:       2min 15sec 834msec | avg time per step: 282msec
calculation  simulation time:  2min 15sec 834msec = 135 sec
full simulation time:  3min 14sec 759msec = 194 sec

after

initialization time: 58sec 208msec = 58 sec
  0 % =        0 | time elapsed:                    0msec | avg time per step:   0msec
  5 % =       25 | time elapsed:                  441msec | avg time per step:  17msec
 10 % =       50 | time elapsed:                  881msec | avg time per step:  17msec
 15 % =       75 | time elapsed:             1sec 319msec | avg time per step:  17msec
 20 % =      100 | time elapsed:             1sec 756msec | avg time per step:  17msec
 25 % =      125 | time elapsed:             2sec 192msec | avg time per step:  17msec
 30 % =      150 | time elapsed:             2sec 629msec | avg time per step:  17msec
 35 % =      175 | time elapsed:             3sec 113msec | avg time per step:  19msec
 40 % =      200 | time elapsed:             3sec 840msec | avg time per step:  29msec
 45 % =      225 | time elapsed:             4sec 816msec | avg time per step:  39msec
 50 % =      250 | time elapsed:             5sec 959msec | avg time per step:  45msec
 55 % =      275 | time elapsed:             7sec 230msec | avg time per step:  50msec
 60 % =      300 | time elapsed:             8sec 646msec | avg time per step:  56msec
 65 % =      325 | time elapsed:            10sec 194msec | avg time per step:  61msec
 70 % =      350 | time elapsed:            11sec 928msec | avg time per step:  69msec
 75 % =      375 | time elapsed:            13sec 696msec | avg time per step:  70msec
 80 % =      400 | time elapsed:            15sec 364msec | avg time per step:  66msec
 85 % =      425 | time elapsed:            16sec 919msec | avg time per step:  62msec
 90 % =      450 | time elapsed:            18sec 371msec | avg time per step:  58msec
 95 % =      475 | time elapsed:            19sec 755msec | avg time per step:  55msec
100 % =      500 | time elapsed:            21sec  81msec | avg time per step:  53msec
calculation  simulation time: 21sec  81msec = 21 sec
full simulation time:  1min 19sec 679msec = 79 sec

speedup: 6.43x (ratio of the calculation simulation times)

@bussmann
Copy link
Member

bussmann commented Aug 2, 2016

Nice! Is this the speedup you expected?

@n01r
Copy link
Member Author

n01r commented Aug 2, 2016

Yes, very much so! It will change with different domain sizes and with the number of GPUs used, of course but I expected a speedup of about 5-10x. It's finally feasible to run simulations with ADK and whatever else I'll implement from now on.

@PrometheusPi PrometheusPi added the refactoring code change to improve performance or to unify a concept but does not change public API label Aug 3, 2016
@PrometheusPi PrometheusPi added this to the 0.2.0: Open Beta milestone Aug 3, 2016
@PrometheusPi
Copy link
Member

PrometheusPi commented Aug 3, 2016

I added a WIP because it still depends on #1541.

@n01r
Copy link
Member Author

n01r commented Aug 9, 2016

Waiting for #1547

@PrometheusPi
Copy link
Member

#1547 has been merged

@PrometheusPi PrometheusPi self-assigned this Aug 10, 2016
@PrometheusPi
Copy link
Member

PrometheusPi commented Aug 10, 2016

@n01r please:

  • rebase against the current dev
  • safeguard your RNG init with a Boolean OR in MySimulation.hpp

* ::type (boost::mpl::bool_<>)
*/
template<typename T_Object>
struct UsesRNG
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trait looks fairly generic to me. Should it rather be in PMacc?

Copy link
Contributor

@Flamefire Flamefire Aug 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: It is more common to have a value result defined as a member value rather than a subtype. This allows e.g. deriving from boost::true_type which is much more compact.
In general: Type results as type value results as value and a bool certainly is a value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't boost::mpl::bool_<C> a type? Just so that boost::mpl::bool_<C>::value == C.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct. But with this trait you want a value as its return value, not a type. Or is there any benefit in returning type<true> instead of true?
Examples: boost::make_unsigned<...> obviously returns a type, but boost::is_unsigned<...> also obviously returns a bool value(!).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true as well for the HasFlag trait, then? #1548

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stand corrected: I forgot the ...::type in the above example which turned it into unevaluated expressions not triggering the error.
As per offline discussion with @ax3l the solution is to inherit from bmpl::bool_<true/false> like in the following, corrected example:

#include <boost/mpl/vector.hpp>
#include <boost/mpl/copy_if.hpp>
#include <boost/mpl/if.hpp>
#include <boost/type_traits.hpp>

namespace bmpl = boost::mpl;

template<class T>
struct Foo: public bmpl::bool_<true>{};

template<>
struct Foo<int>: public bmpl::bool_<false>{};

typedef bmpl::vector<int, unsigned> MyVec;

typedef typename bmpl::copy_if<MyVec, boost::is_unsigned<bmpl::_1> >::type Result;
typedef typename bmpl::copy_if<MyVec, Foo<bmpl::_1> >::type Result2;
typedef typename bmpl::copy_if<MyVec, bmpl::and_<Foo<bmpl::_1>, boost::is_unsigned<bmpl::_1> > >::type Result3;
typedef typename bmpl::if_<bmpl::and_<Foo<int>, Foo<unsigned> >, int, bool>::type Result4;

int main(){
    if(Foo<int>::value) return 1;
    else return 0;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that is wonderful, removes the useRNG run-time only function need and makes the world a pretty place.

Copy link
Contributor

@Flamefire Flamefire Aug 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that boost integral_constants are in fact nullary metafunctions. See their definition:

[...]struct integral_constant : public mpl::integral_c<T, val>[...]

This is why the above use of boost::is_unsigned works. This is defined here and also in C++11

So as a matter of fact you can (and should) inherit from boost::true_type/boost::false_type which was my original statement :)

For more complex checks one can use the same approach as integral constants:

template<class T> class MyTrait{
    typedef MyTrait<T> type; // For metafunction capability
    [...] // Some checks and finally:
    BOOST_STATIC_CONSTEXPR bool value = <value-based-on-checks>;
}

Note that deriving from true_type/false_type also allows if(is_unsigned<int>()) due to conversion operators defined, which is not done in the more complex case.

Copy link
Member

@ax3l ax3l Aug 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, correcting myself, they are mpl compatible: boost note.

nevertheless, @Flamefire your code snipped is outdated, recent versions of boost type traits (see link) are not inheriting from mpl::integral_c any more: (code, compatibility note) - but still are compatible nullary meta-functions.

@n01r use:

#include <boost/type_traits/integral_constant.hpp>

// default:
template<typename T_Object>
struct UsesRNG : public boost::false_type
{
};

and specialize

#include <boost/type_traits/integral_constant.hpp>

namespace traits
{

    template<typename T_IonizationAlgorithm, typename T_DestSpecies, typename T_SrcSpecies>
    struct UsesRNG<particles::ionization::ADK_Impl<T_IonizationAlgorithm, T_DestSpecies, T_SrcSpecies> > :
        public boost::true_type
    {
    };
} // namespace traits

Copy link
Member

@ax3l ax3l Aug 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n01r pls implement to finalise PR

@n01r
Copy link
Member Author

n01r commented Aug 11, 2016

@PrometheusPi there you go 😉

@PrometheusPi
Copy link
Member

@n01r thank you

* \TODO rather pass a sequence of objects, like ionization modules
*/
template<typename T_MPLSeq>
struct FilterByRNG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name must reflect that only species with ionizer can be handled

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n01r rename: FilterIonsByRNG

add to description: @tparam T_MPLSeq sequence of particle species (ions) with ionization enabled

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually dependency logic is messed up in this place already.
RNG might be required by a species independent of ionization, too.

Copy link
Member

@ax3l ax3l Aug 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offline result: @n01r will fix that in a follow-up PR (not important for release)

@ax3l
Copy link
Member

ax3l commented Aug 16, 2016

transferred review powers by @PrometheusPi - thank you for the great work guys!

@ax3l ax3l assigned ax3l and unassigned PrometheusPi Aug 16, 2016
Exchange the RNG (Random Number Generator) for a faster version in
ADK_Impl.hpp (and all future ionization models). Instead of creating new
states for the RNG during each time step the state is now saved. It has the
dimensions of the grid and is mapped to the cells. Whenever a new random number
is created the state changes in that cell. Therefore it is also unique between
time steps and particle species but the RNG does not do any superfluous work.

Kudos to @Flamefire

Status: compiles [x], runs [x]
`ionizationMethods.hpp`:
    removed the old implementation that created unique seeds and RNG states
    per time step, species, cell and MPI rank

Status: compiles [x], runs [x]
RNGFactory is only created when it is needed.
@ax3l ax3l added the component: core in PIConGPU (core application) label Aug 18, 2016
@ax3l ax3l merged commit 8715467 into ComputationalRadiationPhysics:dev Aug 18, 2016
@n01r n01r deleted the topic-useFasterRNG branch August 19, 2016 09:01
/* \todo fix: cannot PMACC_ALIGN() because it seems to be too large */
/* random number generator */
typedef PMacc::random::RNGProvider<simDim, PMacc::random::methods::XorMin> RNGFactory;
typedef PMacc::random::distributions::Uniform<float> Distribution;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be a float_X!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: core in PIConGPU (core application) refactoring code change to improve performance or to unify a concept but does not change public API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants