Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

Can I make instantiating a L5 Pyramidal Cell faster? #642

Closed
xanderladd opened this issue Sep 26, 2021 · 7 comments
Closed

Can I make instantiating a L5 Pyramidal Cell faster? #642

xanderladd opened this issue Sep 26, 2021 · 7 comments
Assignees

Comments

@xanderladd
Copy link

xanderladd commented Sep 26, 2021

Hi CoreNeuron team,

Description

Thanks for developing and sharing this software! I am working on benchmarking the performance of different neuron simulators on NERSC Cori HPC platform and so far I have reached a bottleneck in instantiating cells (not simulating, where I would expect to have the bottleneck).

Example

I have distilled to this code example

import time
from neuron import coreneuron
from neuron import h
from neuron.units import ms, mV
coreneuron.gpu = True
coreneuron.enable = True
h.load_file("hoc_files/runModel.hoc")
h.cvode.cache_efficient(1)

start = time.time()
h.cADpyr232_L5_TTPC1_0fb1ca4724()
end =time.time()
print(f'time to create a single mode: {end-start}')

start = time.time()
[h.cADpyr232_L5_TTPC1_0fb1ca4724() for _ in range(20)]
end =time.time()
print(f'time to create 20 models: {end-start}')

exit()

Which produces:

time to create a single mode: 0.7422175407409668
time to create 20 models: 17.264891147613525

clocking in 1 second to construct a model is fine, but when I want to instantiate say 1000 of them I'm hosed.
I believe there is a more efficient way to do this but I am not sure how... that is my question!

Hoc files

These hoc files are adapted from the BBP project and I want to specifically use the specification from hoc_files/biophysics.hoc
I uploaded a folder of all the hoc files I am using to run this in dropbox here

Prerequisites

I am using the installment instructions specified here: neuronsimulator/nrn#806
so PGI compiler with OpenMPI on GPU.

  • Cray Python 3.7.3
  • OpenMPI 4.0.3
  • PGI 20.11
  • cmake/3.21.3
  • hpcsdk/20.11

Thank you all in advance for your suggestions and support

@xanderladd
Copy link
Author

On second thought, maybe I should have opened this at https://github.com/neuronsimulator/nrn/ ... if that's the case let me know and I can close and switch to there

@nrnhines
Copy link
Collaborator

I see on my machine that the morphology method in hoc_files/morphology.hoc takes 1.23s to import the morphology and 0.43s to instantiate it. (Note m.morphology(this) is called from proc load_morphology() in hoc_files/template.hoc).
So you can get a 3 or 4 fold speedup by just doing the import portion once. Beyond, that it would be necessary to create a full template and instantiate that multiple times.

@nrnhines
Copy link
Collaborator

It may be the case, but I'm not sure, that a cell.clone() method that returns a clone of the cell would also have better performance.

@xanderladd
Copy link
Author

Thanks for your feedback. I removed the load_file(morphology.hoc) imports from template.hoc and I am seeing that the time to instantiate a cell is ~ .4 seconds. I did not change proc load_morphology() though, but I don't think that was intended.

For cell.clone() ... when you have a chance would you mind linking the relevant documentation for this? h.cADpyr232_L5_TTPC1_0fb1ca4724() returns a hoc_object that does not have a clone method (or at least one I can find in the docs or from calling dir(h.cADpyr232_L5_TTPC1_0fb1ca4724())). Or are you suggesting that I write my own clone method? I could follow the example from nrn/share/lib/hoc/celbild/celtopol.hoc.

Thank you in advance for your help.

@nrnhines
Copy link
Collaborator

nrnhines commented Oct 4, 2021

I was imagining reading the morphology data once as in the following

hines@hines-T7500:~/Downloads/l5$ cat hoc_files/morphology.hoc
load_file("import3d.hoc")

objref morph_0fb1ca4724

proc morphology_0fb1ca4724() { localobj nl, nil
    if (morph_0fb1ca4724 == nil) {
        nl = new Import3d_Neurolucida3()
        nl.quiet = 1
        nl.input("morphology/dend-C060114A2_axon-C060114A5.asc")
        morph_0fb1ca4724 = new Import3d_GUI(nl, 0)
    }
    morph_0fb1ca4724.instantiate($o1)
}

and then adjusting a few lines to call the above procedure instead of instantiating it as an object. E.g. in template.hoc

diff --git a/hoc_files/template.hoc b/hoc_files/template.hoc
index f74750c..9bf82fb 100755
--- a/hoc_files/template.hoc
+++ b/hoc_files/template.hoc
@@ -47,6 +47,8 @@ begintemplate cADpyr232_L5_TTPC1_0fb1ca4724
   strdef tstr
   objref rngList, rng
 
+  external morphology_0fb1ca4724
+
 /** Constructor 
 
     Arguments: 
@@ -123,9 +125,8 @@ proc biophys() {localobj bp
 }
 
 /** Load the morphology */                                                                             
-proc load_morphology() {localobj m
-    m = new morphology_0fb1ca4724()
-    m.morphology(this)
+proc load_morphology() {local z  localobj m
+    morphology_0fb1ca4724(this)
 }
 

I get (note that the time to create a single model is misleading as morphology_0fb1ca4724() is also called earlier in the code).

time to create a single mode: 0.45157670974731445
time to create 20 models: 9.198066473007202

instead of

time to create a single mode: 1.7176282405853271
time to create 20 models: 34.48686218261719

On second thought I don't think it is worth the effort to write a clone function with the interpreter as there would be little or no
speedup relative to morph_0fb1ca4724.instantiate($o1)

@alexsavulescu
Copy link
Contributor

I have similar speedup with the solution above :

time to create a single mode: 0.47017455101013184
time to create 20 models: 9.628225564956665

vs

time to create a single mode: 0.12910890579223633
time to create 20 models: 2.582068920135498

Of course this strategy works if you have the exact same morphology (we can think of it as "cloning" ).

@xanderladd
Copy link
Author

Yes, this worked for me too:

time to create a single mode: 0.14014601707458496
time to create 20 models: 2.8425421714782715

Thank you very much @nrnhines for demonstrating the necessary changes to make to the hoc files. Thanks @alexsavulescu for testing. I am still learning the ropes for hoc. I will resolve/close this issue as this speedup gives a considerable boost and enables my use case.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants