Openmp #315

yger · 2014-08-18T09:41:09Z

The OpenMP branch is now up and running, and in the default case, it does not depend on openmp or insert any #pragma in the code. The number of threads used in the simulation has to be set via brian_prefs.codegen.cpp_standalone.openmp_threads. By default this number is 0, but if positive, then OpenMP is used in the templates and the appropriate number of threads is recruited. Note that on benchmarks, results are very encouraging, and speedup are pretty robust for codes that are located in dev/benchmark/openmp. All the cpp standalone templates are now OpenMP compatible. Basically, all the object creation is done without the parallelism, so no changes were made, but then all objects inserted within the run() loop need to be openmp-compatible. Major changes, done with Marcel, are also that now network.cpp and synapses.h (renamed in synapses_classes.cpp) are now templates and not in brianlib anymore. This was needed to have a fully template-based solution.

…apticPathway etc.

…uch that we can use a function (openmp_pragma) that will handle the insert of pragma statements in the code. Now the number of threads can be given as a brian preference. By default, if 0, nothing is done and the code is not relying on openmp at all. Otherwise, if a positive number is given, then that number of threads is used

… master before launching a pull request

…d into templates now

…port system, needs some tests to check that openmp support is fine, and I'll push it Conflicts: brian2/devices/cpp_standalone/brianlib/network.cpp brian2/devices/cpp_standalone/brianlib/network.h brian2/devices/cpp_standalone/device.py brian2/devices/cpp_standalone/templates/main.cpp brian2/devices/cpp_standalone/templates/ratemonitor.cpp

…ging, and change the default value of the openmp preference to 0. Everything seems to run smoothly, speedup depends on the simulation, so more intensive benchmarks should be performed

…d some more pragma statement in the main.cpp files, when objects are created from the device.py file. Now I'll start to do some benchmarks to check when optimizations are useful

…tion. Benchmarks are on their way, but the results seems to vary a lot depending on computer architectures, networks, ... Quite some overhead due to openmp threads creation could maybe be reduce by merging threshold and stateupdate loops, need to check

…nd thanks to the ordered flag. This seems to lead to large speed improvement, because the threshold operation is one of the major bottleneck in according to some profiling.

… Not sure this has any effect on speed performances, but at least, it allows to better understand what are the calls exectuted by single thread that need to be synchronized and those that do not require it. Note that updates are always synchronized because of the barrier in the network.cpp file

… in the code. Also add the static-ordered trick in the peek() function of the SynapticPathway, to share the load safely among threads. Even if not parallel, this seems to be still a bit faster than one thread running everything. Did some profiling: performances are always increased with OpenMP, as long as the defaultclock is not too small. Sadly, for code using small dt (<0.1ms), the overhead due to loops and to synchronization slows everything compared to without OpenMP

…recorded and displayed, thanks to code insertion in the c++ templates. To do so, I added one #include in the main.cpp templates. Note that those benchmarks are showing that for some deterministic example, operations are not always performed in the same order on a run-to-run basis, so results can be bistable

yger · 2014-08-28T08:49:50Z

To make the point that OpenMP implementations are speeding the code, here are 4 benchmarks that can be found in dev/benchmarks/openmp/test_openmp_**.py. Those are 4 different networks covering various objects and launched without OpenMP, and then with using 1, 2, 4, 6 threads. Simulations results are superimposed, and bottom right of those plots show the exact simulation time (extracted thanks to the insert_code_device() function). As you can see, openmp if threads > 1 is always speeding up the simulation :-) Results are also always the same, and for simulation 3/4, discrepancies are due to the fact that the network has a bistable behavior, as shown to Marcel, because of the order of pre/post operations..

…even for arbitrary orderings (same dt, when and order attributes).

mstimberg · 2014-08-28T11:09:25Z

Nice! I cherry-picked the commit that makes the order of objects deterministic (by falling back to the name) -- I think we all agree that even if ordering is arbitrary, it should be deterministic. But in the long run, for this specific case, I think pre codes should be executed before post codes (IIRC, this was the case in Brian1)

mstimberg · 2014-08-28T11:13:45Z

Oh, and I think from my side, the only thing missing in this branch is the documentation for the OpenMP features -- docs_sphinx/user/devices.rst would be the obvious place to put it, I guess.

@thesamovar: if you are happy with the changes, feel free to merge it.

thesamovar · 2014-08-28T14:07:37Z

Great stuff!

I haven't looked at the code yet but do we have a deterministic test that shows that the output is exactly the same? I think we need this before we can merge.

I'll take a look at this in the next couple of days.

yger · 2014-08-28T14:45:24Z

As said, I have 4 test cases of real and pretty complex networks showing
that the results are always the same. Among those 4, 2 are deterministic
networks, and results are the same irregardless of the number of threads
(modulo Marcel's fix for the pre/post order of the synaptic updates)

Best

Pierre

Great stuff!

I haven't looked at the code yet but do we have a deterministic test
that shows that the output is exactly the same? I think we need this
before we can merge.

I'll take a look at this in the next couple of days.

—
Reply to this email directly or view it on GitHub
#315 (comment).

thesamovar · 2014-08-28T15:55:42Z

What I mean is that we need tests in the test suite that show identical behaviour. Afaict there is nothing in the test suite at the moment?

I've had a look at the code now and run the examples and I have to say I'm really impressed. On my machine I am actually getting some superlinear speedups. I am getting numbers that are a little slower for 0/1 thread, but faster for 6 threads. I'm on a quad core with hyperthreading (8 virtual cores). Specifically an i7 3612QM @ 2.1 GHz.

I wonder if you have time you could perhaps write some notes on the parallelisation strategies and why you chose them/how they work?

I'm happy for this to be merged once we have some docs and tests in the suite.

On August 28, 2014 3:45:25 PM GMT+01:00, Pierre Yger notifications@github.com wrote:

As said, I have 4 test cases of real and pretty complex networks
showing
that the results are always the same. Among those 4, 2 are
deterministic
networks, and results are the same irregardless of the number of
threads
(modulo Marcel's fix for the pre/post order of the synaptic updates)

Best

Pierre

Great stuff!

I haven't looked at the code yet but do we have a deterministic test
that shows that the output is exactly the same? I think we need this
before we can merge.

I'll take a look at this in the next couple of days.

—
Reply to this email directly or view it on GitHub

#315 (comment).

Reply to this email directly or view it on GitHub:
#315 (comment)

yger · 2014-08-28T17:47:45Z

Good to hear !

I would be more than happy to write some documentation, about the
optimizations that I performed, and also about how to use it.
Regarding the tests needed, i 'll ask Marcel, as always, for advice and I
could write some. The catch now is simply that any new templates that will
be added in the standalone module will need to be 'openmp' compatible. If
not, then default behaviour without openmp will be fine, but as soon as
multi threading will be turned on, the device have to handle it properly,
otherwise simulations may crash or lead to false results. This is a
developer guideline that should be explained somewhere....

Best

Pierre
Le 28 août 2014 17:55, "Dan Goodman" notifications@github.com a écrit :

What I mean is that we need tests in the test suite that show identical
behaviour. Afaict there is nothing in the test suite at the moment?

I've had a look at the code now and run the examples and I have to say I'm
really impressed. On my machine I am actually getting some superlinear
speedups. I am getting numbers that are a little slower for 0/1 thread, but
faster for 6 threads. I'm on a quad core with hyperthreading (8 virtual
cores). Specifically an i7 3612QM @ 2.1 GHz.

I wonder if you have time you could perhaps write some notes on the
parallelisation strategies and why you chose them/how they work?

I'm happy for this to be merged once we have some docs and tests in the
suite.

On August 28, 2014 3:45:25 PM GMT+01:00, Pierre Yger <
notifications@github.com> wrote:

As said, I have 4 test cases of real and pretty complex networks
showing
that the results are always the same. Among those 4, 2 are
deterministic
networks, and results are the same irregardless of the number of
threads
(modulo Marcel's fix for the pre/post order of the synaptic updates)

Best

Pierre

Great stuff!

I haven't looked at the code yet but do we have a deterministic test
that shows that the output is exactly the same? I think we need this
before we can merge.

I'll take a look at this in the next couple of days.

—
Reply to this email directly or view it on GitHub

#315 (comment).

Reply to this email directly or view it on GitHub:
#315 (comment)

—
Reply to this email directly or view it on GitHub
#315 (comment).

thesamovar · 2014-08-29T17:21:47Z

We could perhaps have something so that if a template has the string I_AM_OPENMP_COMPATIBLE in it then it is treated as such, otherwise we wrap the whole thing with a #pragma openmp single (or whatever the correct syntax is)?

mstimberg · 2014-09-12T12:29:45Z

We could perhaps have something so that if a template has the string I_AM_OPENMP_COMPATIBLE in it then it is treated as such, otherwise we wrap the whole thing with a #pragma openmp single (or whatever the correct syntax is)?

I discussed a similar thing with Pierre previously, I think it is a good idea to have templates explicitly mention that they know about OpenMP (maybe OPENMP_AWARE?). I'm not sure how trivial the wrapping would be, I personally would not mind if the user simply gets an error message "your code is using template XYZ which is not compatible with OpenMP, switching off multithreading".

…m a developer point of view. Note that the part on the OPENMP_AWARE flag is not written, because this has still to be done. Marcel said that Dan may have the time to implement that

…-threading. A quite decently complex network is covering almost all the templates, and simulations results are carefully compared without OpenMP, with OpenMP and 1 or 2 threads.

Conflicts: brian2/devices/cpp_standalone/device.py brian2/devices/cpp_standalone/templates/spikemonitor.cpp brian2/tests/test_cpp_standalone.py

…all numerical differences

… Mark all the current standalone templates as compatible.

…at it does not raise an error when a target is not available

mstimberg · 2014-10-01T22:39:01Z

I merged this branch with the cython_codegen2 branch (which is the branch that will be merged into master next, I presume). I adapted the new correctness test to use assert_allclose instead of the strict equality test since we can get small numerical differences (e.g. because of a different order of additions). I also added a new template comment {# IS_OPENMP_COMPATIBLE #} to all standalone templates (they are all openmp-compatible, right?) and made CppStandaloneDevice.build raise an error if one of the used templates does not have this flag and a number of threads > 0 has been requested. Maybe we should instead just raise a warning and set the number of threads to 0?

The test pass, except for the Python3 timeout because of Cython (see #326)

thesamovar · 2014-10-01T22:55:21Z

Great, in that case I'm happy to merge this after we merge Cython. I think an error is probably better than a warning in this case, since they're explicitly trying to do something that doesn't work.

Conflicts: brian2/tests/__init__.py

mstimberg · 2014-10-02T16:23:44Z

I think an error is probably better than a warning in this case, since they're explicitly trying to do something that doesn't work.

I don't really mind but since the number of threads is a preference, I could image a situation where you have this set to, say, 4 in your global preference file and then it's annoying to set it to 0 manually whenever you run something that does not support multithreading. On the other hand, we don't have anything that does not support multithreading in standalone, anyway...

thesamovar · 2014-10-02T16:58:59Z

Yeah it's pretty marginal one way or the other, indeed. OK I'm going to have another look at this now. From your point of view is it good to merge now?

mstimberg · 2014-10-02T17:04:24Z

From your point of view is it good to merge now?

Yes, it is.

thesamovar · 2014-10-02T18:07:47Z

I modified the OpenMP test to check that the standalone results are the same as the runtime results (as well as checking that standalone with threads was the same as without). The tests all pass on Windows and I'm happy with the code so as soon as Travis reports back that the tests pass I'll merge this. I'll write some documentation for this and Cython in the docs branch (I think a new 'ways to run Brian' page might be a good idea).

thesamovar · 2014-10-02T19:15:53Z

OK it passes, merging now!

Openmp

yger · 2014-10-02T20:36:53Z

Great ! I'll have a look tomorrow. I wrote some doc for the openmp branch,
but more will never hurt...

Best

Pierre
Le 2 oct. 2014 21:15, "Dan Goodman" notifications@github.com a écrit :

Merged #315 #315.

—
Reply to this email directly or view it on GitHub
#315 (comment).

thesamovar · 2014-10-03T15:54:34Z

Hey @yger, I'm looking through your documentation for the openmp branch now - great stuff! I'm going to work on it for a little while in the docs_improvements branch. So maybe some time this weekend or next week you could take a look and add anything that isn't finished?

yger and others added 13 commits July 8, 2014 17:08

Add OpenMP support to standalone mode, including the spike queue

8cfd379

Default to OpenMP with 1 thread, using 0 threads does not work in Syn…

b038210

…apticPathway etc.

Forgot one file. Now I'll try to merge that openmp branch with latest…

6e58f46

… master before launching a pull request

Remove files that are not needed anymore because they have been turne…

6f3268c

…d into templates now

Fixa bug in the summed variable synapses, that occured during the mer…

394c01a

…ging, and change the default value of the openmp preference to 0. Everything seems to run smoothly, speedup depends on the simulation, so more intensive benchmarks should be performed

Forgot a pragma insert for the summed_variable templates, and also ad…

af4c0cc

…d some more pragma statement in the main.cpp files, when objects are created from the device.py file. Now I'll start to do some benchmarks to check when optimizations are useful

Now the threshold.cpp file is also handle in parallel, with OpenMP, a…

ad01bca

…nd thanks to the ordered flag. This seems to lead to large speed improvement, because the threshold operation is one of the major bottleneck in according to some profiling.

romainbrette force-pushed the master branch from f848a0e to 6374aed Compare August 26, 2014 15:59

Enhance the plots of the OpenMP benchmarks

65d7121

Make sure that the order of objects in a Network is deterministic, …

3ab16d3

…even for arbitrary orderings (same dt, when and order attributes).

Pierre Yger and others added 5 commits September 25, 2014 14:17

Add some documentation for the OpenMP mode, either from a user or fro…

e35c139

…m a developer point of view. Note that the part on the OPENMP_AWARE flag is not written, because this has still to be done. Marcel said that Dan may have the time to implement that

Addition of a test checking the consistency of the results with multi…

feca9a0

…-threading. A quite decently complex network is covering almost all the templates, and simulations results are carefully compared without OpenMP, with OpenMP and 1 or 2 threads.

Merge remote-tracking branch 'origin/cython_codegen2' into openmp

c66f18b

Conflicts: brian2/devices/cpp_standalone/device.py brian2/devices/cpp_standalone/templates/spikemonitor.cpp brian2/tests/test_cpp_standalone.py

Use assert_allclose in the OpenMP consistency check to allow for sm…

e139423

…all numerical differences

Store the name and source code of templates in the CodeObject

6d57845

Marcel Stimberg added 2 commits October 1, 2014 23:33

Only allow OpenMP for templates that claim to be "OpenMP-compatible".…

cae14c3

… Mark all the current standalone templates as compatible.

Move setting of optimization preference in the testing function so th…

6df66a0

…at it does not raise an error when a target is not available

mstimberg force-pushed the openmp branch from 72d5a94 to 6df66a0 Compare October 1, 2014 21:33

Merge branch 'master' into openmp

9040969

Conflicts: brian2/tests/__init__.py

thesamovar added 2 commits October 2, 2014 18:52

Updated test_openmp_consistency to check consistency with runtime

ef602dd

Improvement to test to avoid warning

4707a46

thesamovar added a commit that referenced this pull request Oct 2, 2014

Merge pull request #315 from brian-team/openmp

f9475df

Openmp

thesamovar merged commit f9475df into master Oct 2, 2014

thesamovar deleted the openmp branch October 2, 2014 19:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Openmp #315

Openmp #315

yger commented Aug 18, 2014

yger commented Aug 28, 2014

mstimberg commented Aug 28, 2014

mstimberg commented Aug 28, 2014

thesamovar commented Aug 28, 2014

yger commented Aug 28, 2014

thesamovar commented Aug 28, 2014

yger commented Aug 28, 2014

thesamovar commented Aug 29, 2014

mstimberg commented Sep 12, 2014

mstimberg commented Oct 1, 2014

thesamovar commented Oct 1, 2014

mstimberg commented Oct 2, 2014

thesamovar commented Oct 2, 2014

mstimberg commented Oct 2, 2014

thesamovar commented Oct 2, 2014

thesamovar commented Oct 2, 2014

yger commented Oct 2, 2014

thesamovar commented Oct 3, 2014

Openmp #315

Openmp #315

Conversation

yger commented Aug 18, 2014

yger commented Aug 28, 2014

mstimberg commented Aug 28, 2014

mstimberg commented Aug 28, 2014

thesamovar commented Aug 28, 2014

yger commented Aug 28, 2014

thesamovar commented Aug 28, 2014

yger commented Aug 28, 2014

thesamovar commented Aug 29, 2014

mstimberg commented Sep 12, 2014

mstimberg commented Oct 1, 2014

thesamovar commented Oct 1, 2014

mstimberg commented Oct 2, 2014

thesamovar commented Oct 2, 2014

mstimberg commented Oct 2, 2014

thesamovar commented Oct 2, 2014

thesamovar commented Oct 2, 2014

yger commented Oct 2, 2014

thesamovar commented Oct 3, 2014