Bad performance on CUBAFixedConnectivity benchmark #68

denisalevi · 2018-08-27T16:26:03Z

In our CUBA benchmark with fixed connectivity (constant number of synapses per neuron), brian2GeNN performs surprisingly bad, see plot below. In other benchmarks, brian2CUDA and brian2GeNN performance is comparable, not in this one. This behaviour is also not new, I have similar plots for this benchmark from April this year (using older brian2, GeNN and brian2GeNN versions). Does anyone have an idea why this is the case?

You can reproduce this behaviour by running the script below (with either dev = 'genn' or 'dev = 'cpp'), which runs the benchmark for N = 1e6 neurons and prints the device._last_run_time value (so no synapse creation and device memory initialisation included). The figure above also plots the _last_run_time values. You need to incorporate the changes from PR #65 in order to get the _last_run_time in brian2GeNN.

CUBAFixedConnectivity.py

import time                                                               
from brian2 import *                                                      
                                                                          
dev = 'genn'                                                              
#dev = 'cpp'                                                              
                                                                          
if dev == 'genn':                                                         
    import brian2genn                                                     
    set_device('genn', directory='CUBAFixedConnectivity_GeNN')            
    prefs.devices.genn.benchmarking = True                                
elif dev == 'cpp':                                                        
    set_device('cpp_standalone', directory='CUBAFixedConnectivity_CPP')   
                                                                          
taum = 20*ms                                                              
taue = 5*ms                                                               
taui = 10*ms                                                              
Vt = -50*mV                                                               
Vr = -60*mV                                                               
El = -49*mV                                                               
                                                                          
eqs = '''                                                                 
dv/dt  = (ge+gi-(v-El))/taum : volt (unless refractory)                   
dge/dt = -ge/taue : volt (unless refractory)                                                 
dgi/dt = -gi/taui : volt (unless refractory)                                                  
'''                                                                       
N = int(1e6)                                                              
Ne = int(0.8 * N)                                                         
                                                                          
P = NeuronGroup(N, eqs, threshold='v>Vt', reset='v = Vr', refractory=5*ms,
                method='exact')                                           
P.v = 'Vr + rand() * (Vt - Vr)'                                           
P.ge = 0*mV                                                               
P.gi = 0*mV                                                               
                                                                          
we = (60*0.27/10)*mV # excitatory synaptic weight (voltage)               
wi = (-20*4.5/10)*mV # inhibitory synaptic weight                         
Ce = Synapses(P, P, on_pre='ge += we')                                    
Ci = Synapses(P, P, on_pre='gi += wi')                                    
Ce.connect('i<Ne', p=80. / N)                                             
Ci.connect('i>=Ne', p=80. / N)                                            
                                                                                                                    
start = time.time()                                                       
run(1 * second, report='text')                                            
print("Run took {:.2f} s".format(time.time() - start))                    
print("_last_run_time: {:.2f}".format(device._last_run_time))

I just ran these on our GeForece GTX TITAN Black (Kepler architecture) with brian2GeNN commit 8c6da48b3ae (benchmarking branch), brian-team/brian2@6c50e3a22d
(master), genn-team/genn@3b794457b81 (3.0.0). I get for

dev == cpp:

_last_run_time: 119.47

dev == 'genn':

_last_run_time: 483.55

Could someone reproduce this? And is this something you would have expected? To me this benchmark looks like a standard example of pre spikes, post effects.

The text was updated successfully, but these errors were encountered:

tnowotny · 2018-08-29T10:00:06Z

I believe this is a genuine result where GeNN isn't doing very well due to our default parallelisation strategy for synapses. In brief, by default (and as used by brian2genn), synapse event processing is parallelised across post-synaptic neurons and loops across the entries of the queue of incoming spikes. If there are many incoming spikes and few incoming synapses per neuron, this is extremely inefficient.
This benchmark is constructed so that there are almost no synapses per neuron (in the large simulations 80 incoming synapses for spikes from a million neurons is a tiny fraction). Hence, the simulation is very slow compared to e.g. cpp_standalone for which pretty much only the overall number n_spikes*n_synapses matters.
I might add that 80 synapses per neuron isn't very realistic for a brain simulation.

denisalevi · 2018-09-10T14:56:06Z

@tnowotny Thanks for clarification. When you say "default synapse processing", is there a GeNN peference that might give better results in such a constructed scenario?

tnowotny · 2018-09-10T19:33:29Z

You can try
SynapseGroup::setSpanType(SynapseGroup::SpanType::PRESYNAPTIC)
for any synapse group and the parallelisation should follow a pre-synaptic parallelisation strategy. I am not sure whether this is available for all connectivity types (@newordeofjamie ?) and how well tested it is. We have not exposed this into brian2genn currently.

tnowotny · 2018-09-21T20:22:35Z

Hi again - I have now run this test and indeed with the PRESYNAPTIC spantype the COBAHH example with 512000 neurons runs 11X faster on my Tesla K40c than with the standard POSTSYNAPTIC parallelisation. This explains the bad performance ... hope this helps.

denisalevi · 2018-09-28T16:53:11Z

@tnowotny Cool. Will include that / try it out in our benchmarks when its exposed to brian2genn.

tnowotny · 2018-09-28T18:17:40Z

it has been exposed in brian2genn as preference
prefs.devices.genn.synapse_span_type
with possible values 'PRESYNAPTIC' or 'POSTSYNAPTIC' and default 'POSTSYNAPTIC'

tnowotny · 2018-09-28T18:18:57Z

PS: It might be in the 'blocksize_test' branch ... forgot whether we already merged it into master.

mstimberg · 2018-10-05T15:30:00Z

Sorry, forgot to reply to this... The change was in the expose_blocksize_prefs branch which we had already merged into master, but the changes for the synapse span type were added after the merge. I now merged the branch again (we'll delete it soon to avoid future confusion), so everything that is needed is now in Brian2GeNN master.

tnowotny · 2021-03-26T11:23:55Z

As we haven't touched this in a long while I think we can close the issue.

tnowotny closed this as completed Mar 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad performance on CUBAFixedConnectivity benchmark #68

Bad performance on CUBAFixedConnectivity benchmark #68

denisalevi commented Aug 27, 2018 •

edited

Loading

tnowotny commented Aug 29, 2018

denisalevi commented Sep 10, 2018

tnowotny commented Sep 10, 2018

tnowotny commented Sep 21, 2018

denisalevi commented Sep 28, 2018

tnowotny commented Sep 28, 2018

tnowotny commented Sep 28, 2018

mstimberg commented Oct 5, 2018

tnowotny commented Mar 26, 2021

Bad performance on CUBAFixedConnectivity benchmark #68

Bad performance on CUBAFixedConnectivity benchmark #68

Comments

denisalevi commented Aug 27, 2018 • edited Loading

tnowotny commented Aug 29, 2018

denisalevi commented Sep 10, 2018

tnowotny commented Sep 10, 2018

tnowotny commented Sep 21, 2018

denisalevi commented Sep 28, 2018

tnowotny commented Sep 28, 2018

tnowotny commented Sep 28, 2018

mstimberg commented Oct 5, 2018

tnowotny commented Mar 26, 2021

denisalevi commented Aug 27, 2018 •

edited

Loading