Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad performance on CUBAFixedConnectivity benchmark #68

Closed
denisalevi opened this issue Aug 27, 2018 · 9 comments
Closed

Bad performance on CUBAFixedConnectivity benchmark #68

denisalevi opened this issue Aug 27, 2018 · 9 comments

Comments

@denisalevi
Copy link
Member

denisalevi commented Aug 27, 2018

In our CUBA benchmark with fixed connectivity (constant number of synapses per neuron), brian2GeNN performs surprisingly bad, see plot below. In other benchmarks, brian2CUDA and brian2GeNN performance is comparable, not in this one. This behaviour is also not new, I have similar plots for this benchmark from April this year (using older brian2, GeNN and brian2GeNN versions). Does anyone have an idea why this is the case?

speed_test_cubafixedconnectivitynomonitor_absolute

You can reproduce this behaviour by running the script below (with either dev = 'genn' or 'dev = 'cpp'), which runs the benchmark for N = 1e6 neurons and prints the device._last_run_time value (so no synapse creation and device memory initialisation included). The figure above also plots the _last_run_time values. You need to incorporate the changes from PR #65 in order to get the _last_run_time in brian2GeNN.

CUBAFixedConnectivity.py

import time                                                               
from brian2 import *                                                      
                                                                          
dev = 'genn'                                                              
#dev = 'cpp'                                                              
                                                                          
if dev == 'genn':                                                         
    import brian2genn                                                     
    set_device('genn', directory='CUBAFixedConnectivity_GeNN')            
    prefs.devices.genn.benchmarking = True                                
elif dev == 'cpp':                                                        
    set_device('cpp_standalone', directory='CUBAFixedConnectivity_CPP')   
                                                                          
taum = 20*ms                                                              
taue = 5*ms                                                               
taui = 10*ms                                                              
Vt = -50*mV                                                               
Vr = -60*mV                                                               
El = -49*mV                                                               
                                                                          
eqs = '''                                                                 
dv/dt  = (ge+gi-(v-El))/taum : volt (unless refractory)                   
dge/dt = -ge/taue : volt (unless refractory)                                                 
dgi/dt = -gi/taui : volt (unless refractory)                                                  
'''                                                                       
N = int(1e6)                                                              
Ne = int(0.8 * N)                                                         
                                                                          
P = NeuronGroup(N, eqs, threshold='v>Vt', reset='v = Vr', refractory=5*ms,
                method='exact')                                           
P.v = 'Vr + rand() * (Vt - Vr)'                                           
P.ge = 0*mV                                                               
P.gi = 0*mV                                                               
                                                                          
we = (60*0.27/10)*mV # excitatory synaptic weight (voltage)               
wi = (-20*4.5/10)*mV # inhibitory synaptic weight                         
Ce = Synapses(P, P, on_pre='ge += we')                                    
Ci = Synapses(P, P, on_pre='gi += wi')                                    
Ce.connect('i<Ne', p=80. / N)                                             
Ci.connect('i>=Ne', p=80. / N)                                            
                                                                                                                    
start = time.time()                                                       
run(1 * second, report='text')                                            
print("Run took {:.2f} s".format(time.time() - start))                    
print("_last_run_time: {:.2f}".format(device._last_run_time))             

I just ran these on our GeForece GTX TITAN Black (Kepler architecture) with brian2GeNN commit 8c6da48b3ae (benchmarking branch), brian-team/brian2@6c50e3a22d
(master), genn-team/genn@3b794457b81 (3.0.0). I get for

dev == cpp:

_last_run_time: 119.47

dev == 'genn':

_last_run_time: 483.55

Could someone reproduce this? And is this something you would have expected? To me this benchmark looks like a standard example of pre spikes, post effects.

@tnowotny
Copy link
Contributor

I believe this is a genuine result where GeNN isn't doing very well due to our default parallelisation strategy for synapses. In brief, by default (and as used by brian2genn), synapse event processing is parallelised across post-synaptic neurons and loops across the entries of the queue of incoming spikes. If there are many incoming spikes and few incoming synapses per neuron, this is extremely inefficient.
This benchmark is constructed so that there are almost no synapses per neuron (in the large simulations 80 incoming synapses for spikes from a million neurons is a tiny fraction). Hence, the simulation is very slow compared to e.g. cpp_standalone for which pretty much only the overall number n_spikes*n_synapses matters.
I might add that 80 synapses per neuron isn't very realistic for a brain simulation.

@denisalevi
Copy link
Member Author

@tnowotny Thanks for clarification. When you say "default synapse processing", is there a GeNN peference that might give better results in such a constructed scenario?

@tnowotny
Copy link
Contributor

You can try
SynapseGroup::setSpanType(SynapseGroup::SpanType::PRESYNAPTIC)
for any synapse group and the parallelisation should follow a pre-synaptic parallelisation strategy. I am not sure whether this is available for all connectivity types (@newordeofjamie ?) and how well tested it is. We have not exposed this into brian2genn currently.

@tnowotny
Copy link
Contributor

Hi again - I have now run this test and indeed with the PRESYNAPTIC spantype the COBAHH example with 512000 neurons runs 11X faster on my Tesla K40c than with the standard POSTSYNAPTIC parallelisation. This explains the bad performance ... hope this helps.

@denisalevi
Copy link
Member Author

@tnowotny Cool. Will include that / try it out in our benchmarks when its exposed to brian2genn.

@tnowotny
Copy link
Contributor

it has been exposed in brian2genn as preference
prefs.devices.genn.synapse_span_type
with possible values 'PRESYNAPTIC' or 'POSTSYNAPTIC' and default 'POSTSYNAPTIC'

@tnowotny
Copy link
Contributor

PS: It might be in the 'blocksize_test' branch ... forgot whether we already merged it into master.

@mstimberg
Copy link
Member

Sorry, forgot to reply to this... The change was in the expose_blocksize_prefs branch which we had already merged into master, but the changes for the synapse span type were added after the merge. I now merged the branch again (we'll delete it soon to avoid future confusion), so everything that is needed is now in Brian2GeNN master.

@tnowotny
Copy link
Contributor

As we haven't touched this in a long while I think we can close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants