BinQ and spike compression fixes #701

nrnhines · 2021-11-29T23:00:46Z

See neuronsimulator/nrn#1548
Superceded by neuronsimulator/nrn#1556

CI_BRANCHES:NEURON_BRANCH=hines/binq-fix-rebase,

Merge use special executable instead of nrniv for GPU enabled run neuronsimulator/tqperf#6
Merge Bug fix: Parameter updated in VERBATIM block shouldn't be const nmodl#791 and update submodule

coreneuron/apps/main1.cpp

nrnhines · 2021-11-29T23:28:48Z

coreneuron/network/netcvode.cpp

        d.inter_thread_events_.clear();
        d.tqe_->nshift_ = -1;
-        d.tqe_->shift_bin(nrn_threads->_t);
+        d.tqe_->shift_bin(nrn_threads->_t - 0.5 * nrn_threads->_dt);


There is a similar statement in the following function I did not change. A specific test would have to be written to see if there could be an issue with file mode demonstrating the need for an equivalent change.

The corresponding change was made in NetCvode::init_events. In retrospect, it is likely that BinQ is not supported for --restore since there is no mention of shift_bin in the nrn_checkpoint files. (Analogous to shift_bin in nrn/src/nrniv/bbsavestate.cpp)

nrnhines · 2021-11-29T23:29:57Z

coreneuron/network/netpar.cpp

            ps->send(spikein[i].spiketime, net_cvode_instance, nt);
        }
    }
+    nrn_multithread_job(interthread_enqueue);


I'm not sure this change is needed. But it can't hurt.

bbpbuildbot · 2021-11-29T23:43:48Z

Logfiles from GitLab pipeline #27163 (:white_check_mark:) have been uploaded here!

Status and direct links:

bbpbuildbot · 2021-12-01T02:28:32Z

Logfiles from GitLab pipeline #27361 (:no_entry:) have been uploaded here!

Status and direct links:

bbpbuildbot · 2021-12-01T13:56:03Z

Logfiles from GitLab pipeline #27467 (:no_entry:) have been uploaded here!

Status and direct links:

Release random123 instance when multisend setup no longer needs it. Psolve restores a few more arg default values.

bbpbuildbot · 2021-12-03T19:22:08Z

Logfiles from GitLab pipeline #27994 (:white_check_mark:) have been uploaded here!

Status and direct links:

bbpbuildbot · 2021-12-14T23:29:34Z

Logfiles from GitLab pipeline #29270 (:no_entry:) have been uploaded here!

Status and direct links:

olupton

First pass through this changeset on its own looks good, I'll move on to the NEURON side.

coreneuron/apps/main1.cpp

coreneuron/utils/randoms/nrnran123.cu

alexsavulescu

LGTM. Maybe merge Olli's change from #720 before.

alexsavulescu · 2021-12-17T11:11:37Z

Please retest

nrnhines · 2021-12-17T11:24:27Z

LGTM. Maybe merge Olli's change from #720 before.

Yes. But is this best done after #720 is merged to master?

Eliminating the warning when Random123 global index does not change, increases the chance of hiding a bug.

codecov-commenter · 2021-12-17T11:39:31Z

Codecov Report

Merging #701 (e6e3cb0) into master (423ae6c) will increase coverage by 0.08%.
The diff coverage is 60.86%.

❗ Current head e6e3cb0 differs from pull request most recent head ddf048c. Consider uploading reports for the commit ddf048c to get more accurate results

@@            Coverage Diff             @@
##           master     #701      +/-   ##
==========================================
+ Coverage   56.14%   56.22%   +0.08%     
==========================================
  Files         107      107              
  Lines        8947     8966      +19     
==========================================
+ Hits         5023     5041      +18     
- Misses       3924     3925       +1

Impacted Files	Coverage Δ
coreneuron/apps/main1.cpp	`48.24% <ø> (ø)`
coreneuron/io/core2nrn_data_return.cpp	`1.60% <0.00%> (-0.01%)`	⬇️
coreneuron/io/nrn2core_data_init.cpp	`1.09% <0.00%> (-0.01%)`	⬇️
coreneuron/mpi/lib/mpispike.cpp	`64.23% <0.00%> (-0.48%)`	⬇️
coreneuron/network/netpar.cpp	`43.54% <33.33%> (-0.23%)`	⬇️
coreneuron/io/nrn_setup.cpp	`85.73% <66.66%> (+0.05%)`	⬆️
coreneuron/mpi/lib/nrnmpi.cpp	`55.55% <66.66%> (+0.48%)`	⬆️
coreneuron/network/multisend_setup.cpp	`75.24% <100.00%> (+0.24%)`	⬆️
coreneuron/network/netcvode.cpp	`74.36% <100.00%> (+0.24%)`	⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 423ae6c...ddf048c. Read the comment docs.

alexsavulescu · 2021-12-17T12:06:23Z

Yes. But is this best done after #720 is merged to master?

Indeed

bbpbuildbot · 2021-12-17T12:32:22Z

Logfiles from GitLab pipeline #29746 (:no_entry:) have been uploaded here!

Status and direct links:

alexsavulescu · 2021-12-20T19:15:54Z

Was thinking about something along the lines of: neuronsimulator/nrn@e686c1d

LE: I have removed the commit above and will instead rely on the increase of processors via #723

bbpbuildbot · 2021-12-21T09:48:45Z

Logfiles from GitLab pipeline #30040 (:no_entry:) have been uploaded here!

Status and direct links:

bbpbuildbot · 2021-12-21T14:54:16Z

Logfiles from GitLab pipeline #30108 (:no_entry:) have been uploaded here!

Status and direct links:

alexsavulescu · 2021-12-21T19:10:31Z

Now we get

CoreNEURON single and multiple threads
4 /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P30108/J112635/_/spack-build/spack-stage-neuron-develop-vkogqddlipn6wfzq2o4mkbt36t7uknzy/spack-build-vkogqdd/bin/nrniv: Could not find CoreNEURON library
4  near line 0
4  ^
        4 ParallelContext[0].psolve(50)
MPT ERROR: Rank 4(g:4) is aborting with error code -1.
	Process ID: 51011, Host: ldir01u05.bbp.epfl.ch, Program: /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P30108/J112635/_/spack-build/spack-stage-neuron-develop-vkogqddlipn6wfzq2o4mkbt36t7uknzy/spack-build-vkogqdd/bin/nrniv
	MPT Version: HPE HMPT 2.25  10/22/21 03:18:39

MPT: --------stack traceback-------
13 /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P30108/J112635/_/spack-build/spack-stage-neuron-develop-vkogqddlipn6wfzq2o4mkbt36t7uknzy/spack-build-vkogqdd/bin/nrniv: Could not find CoreNEURON library
13  near line 0
13  ^
        13 ParallelContext[0].psolve(50)
12 /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P30108/J112635/_/spack-build/spack-stage-neuron-develop-vkogqddlipn6wfzq2o4mkbt36t7uknzy/spack-build-vkogqdd/bin/nrniv: Could not find CoreNEURON library

pramodk · 2021-12-23T05:28:41Z

/nrniv: Could not find CoreNEURON library

@alexsavulescu : In case of GPU build, coreneuron is a static library and hence we have to use special to launch the simulation. nrniv can not be used to run coreneuron simulation.

* if PARAMETER variable of RANGE type is updated in VERBATIM block then it's write_count could be 0 even though it's updated in VERBATIM block - such variable can not be declared as `const` instance variable - one such example is https://github.com/nrnhines/tqperf/blob/master/mod/invlfire.mod * Codegen helper visitor now keep track of all symbols that are used in all verbatim blocks and the codegen visitor check this list before deciding if variable can be const or not. - note that variable could be read-only in verbatim block but currently we don't have robut C code analysis capability and it's not worth (yet) * this issue was encountered in BlueBrain/CoreNeuron#701

bbpbuildbot · 2021-12-23T10:24:46Z

Logfiles from GitLab pipeline #30371 (:no_entry:) have been uploaded here!

Status and direct links:

bbpbuildbot · 2021-12-23T13:05:21Z

* if PARAMETER variable of RANGE type is updated in VERBATIM block then it's write_count could be 0 even though it's updated in VERBATIM block - such variable can not be declared as `const` instance variable - one such example is https://github.com/nrnhines/tqperf/blob/master/mod/invlfire.mod * Codegen helper visitor now keep track of all symbols that are used in all verbatim blocks and the codegen visitor check this list before deciding if variable can be const or not. - note that variable could be read-only in verbatim block but currently we don't have robut C code analysis capability and it's not worth (yet) * this issue was encountered in BlueBrain/CoreNeuron#701 * test update: keep ast node, do not return codegen_c_visitor instance with local AST

bbpbuildbot · 2021-12-23T17:55:18Z

Logfiles from GitLab pipeline #30438 (:no_entry:) have been uploaded here!

Status and direct links:

pramodk · 2021-12-23T18:03:40Z

/gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P30438/J115285/_/spack-build/spack-stage-neuron-develop-fwu47zj5ksgprg63usqqvmnfvuvjnlmc/spack-build-fwu47zj/share/nrn/nrnmain.cpp:
622ptxas warning : Conflicting options --device-debug and --generate-line-info specified, ignoring --generate-line-info option
623ptxas warning : Conflicting options --device-debug and --generate-line-info specified, ignoring --generate-line-info option
624Successfully created x86_64/special
625++ find . -type f -name special -print -quit
626+ special_exe=./x86_64/special
627+ mpiexec -n 16 ./x86_64/special -mpi -python test1.py
628MPT ERROR: Not enough slots from job scheduler for requested ranks
629	(HPE HMPT 2.25  10/22/21 03:19:55)

I thought this was fixed. ~~Will take a look soon...~~

@alexsavulescu : was this addressed? i.e. number of cores already increased? (I am not too familiar with the setup...)

nrnhines · 2021-12-23T18:35:44Z

My intention for a substantive gpu test is to allow use of a POINT_PROCESS version of ARTIFICIAL_CELL IntervalFireSHA, assuming I can arrange for a gpu version of #include <openssl/sha.h> (though we could drop back to the less strict test using IntervalFire which only counts input and output events without checking event times.). Anyway, that will provide good performance and validation of the spike buffering algorithms on the GPU.

bbpbuildbot · 2021-12-23T18:56:34Z

pramodk · 2021-12-25T10:15:29Z

GPFS finally reasonably working:

* Extending the tqperf repository (test1.py) to the cases of binq and spike compression exposed a half dozen or so bugs in the categories of BinQ initialization, incomplete BinQ queue transfer, and failure to enqueue the interthread event buffers in a timely manner. * The tqperf test now does a SHA1 hash comparison of all spike input and output times of all the artificial cells. This test is run from the latest http://github.com/nrnhines/tqperf.git. See instructions in https://github.com/neuronsimulator/tqperf/blob/master/README.md * This PR depends on the BlueBrain/CoreNeuron#701 * Interthread enqueuing must occur after spike exchange. This is needed for binq + compressed spike exchange + threads. * nrn binq must be initialized before core2nrn queue transfer. * nrn2core queue transfer must also iterate over BinQ. * Refactoring makes nrn2core queue transfer more understandable. * neuron.coreneuron: if --multisend then also --ms-subintervals and --ms-phases * tqperf is added as a ci test * Updated to most recent coreneuron master as submodule

* Bug fixes with regard to option --spkcompress <nspike> * After compressed spike exchange, do interthread_enqueue. * nrn binq must be initialized before core2nrn queue transfer. * nrncore binq must be initialized before nrn2core queue transfer * interthread buffer must be enqueued at beginning of psolve. * Initialization of binq consistent everywhere. * Avoid Random123 globalindex warning if the index has not changed. Release random123 instance when multisend setup no longer needs it. Psolve restores a few more arg default values. * Revert nrnran123.cu Eliminating the warning when Random123 global index does not change, increases the chance of hiding a bug. * update nmodl submodule * ntasks=16 for gpu tests as well Co-authored-by: Alexandru Săvulescu <alexandru.savulescu@epfl.ch> Co-authored-by: Olli Lupton <oliver.lupton@epfl.ch> Co-authored-by: Pramod Kumbhar <pramod.s.kumbhar@gmail.com> CoreNEURON Repo SHA: BlueBrain/CoreNeuron@318e25a

nrnhines added 7 commits November 24, 2021 09:15

Bug fixes with regard to option --spkcompress <nspike>

cef08b9

Temporary change to allow binqueue and multisend to be turned off.

570092d

After compressed spike exchange, do interthread_enqueue.

be8efb8

nrn binq must be initialized before core2nrn queue transfer.

5619fff

nrncore binq must be initialized before nrn2core queue transfer

7d1c6b6

interthread buffer must be enqueued at beginning of psolve.

d51f75d

clang-format

f4c35ec

nrnhines mentioned this pull request Nov 29, 2021

Hines/binq fix neuronsimulator/nrn#1548

Closed

nrnhines commented Nov 29, 2021

View reviewed changes

coreneuron/apps/main1.cpp Outdated Show resolved Hide resolved

nrnhines commented Nov 29, 2021

View reviewed changes

fix some memory leaks

fb58730

Initialization of binq consistent everywhere.

a9f60c5

Avoid Random123 globalindex warning if the index has not changed.

f1604ed

Release random123 instance when multisend setup no longer needs it. Psolve restores a few more arg default values.

pramodk requested a review from olupton December 14, 2021 14:53

Merge branch 'master' into hines/spike-compress

330f8aa

nrnhines mentioned this pull request Dec 16, 2021

Integration of tqperf model for CI and CoreNEURON testing neuronsimulator/nrn#1556

Merged

olupton reviewed Dec 17, 2021

View reviewed changes

coreneuron/apps/main1.cpp Outdated Show resolved Hide resolved

coreneuron/utils/randoms/nrnran123.cu Outdated Show resolved Hide resolved

alexsavulescu approved these changes Dec 17, 2021

View reviewed changes

Revert nrnran123.cu

a2531f3

Eliminating the warning when Random123 global index does not change, increases the chance of hiding a bug.

alexsavulescu closed this Dec 21, 2021

alexsavulescu reopened this Dec 21, 2021

Merge branch 'master' into hines/spike-compress

5af8d25

olupton approved these changes Dec 21, 2021

View reviewed changes

pramodk mentioned this pull request Dec 23, 2021

Bug fix: Parameter updated in VERBATIM block shouldn't be const BlueBrain/nmodl#791

Merged

pramodk closed this Dec 23, 2021

pramodk reopened this Dec 23, 2021

Merge remote-tracking branch 'origin/master' into hines/spike-compress

ddf048c

pramodk closed this Dec 23, 2021

pramodk reopened this Dec 23, 2021

update nmodl submodule

2dd175a

ntasks=16 for gpu tests as well

363943e

pramodk merged commit 318e25a into master Dec 25, 2021

pramodk deleted the hines/spike-compress branch December 25, 2021 10:17

BinQ and spike compression fixes #701

BinQ and spike compression fixes #701

Uh oh!

Conversation

nrnhines commented Nov 29, 2021 • edited by pramodk Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

nrnhines Nov 29, 2021

Choose a reason for hiding this comment

Uh oh!

nrnhines Dec 1, 2021

Choose a reason for hiding this comment

Uh oh!

nrnhines Nov 29, 2021

Choose a reason for hiding this comment

Uh oh!

bbpbuildbot commented Nov 29, 2021

Uh oh!

bbpbuildbot commented Dec 1, 2021

Uh oh!

bbpbuildbot commented Dec 1, 2021

Uh oh!

bbpbuildbot commented Dec 3, 2021

Uh oh!

bbpbuildbot commented Dec 14, 2021

Uh oh!

olupton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alexsavulescu left a comment

Choose a reason for hiding this comment

Uh oh!

alexsavulescu commented Dec 17, 2021

Uh oh!

nrnhines commented Dec 17, 2021

Uh oh!

codecov-commenter commented Dec 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

alexsavulescu commented Dec 17, 2021

Uh oh!

bbpbuildbot commented Dec 17, 2021

Uh oh!

alexsavulescu commented Dec 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bbpbuildbot commented Dec 21, 2021

Uh oh!

bbpbuildbot commented Dec 21, 2021

Uh oh!

alexsavulescu commented Dec 21, 2021

Uh oh!

pramodk commented Dec 23, 2021

Uh oh!

bbpbuildbot commented Dec 23, 2021

Uh oh!

bbpbuildbot commented Dec 23, 2021

Uh oh!

bbpbuildbot commented Dec 23, 2021

Uh oh!

pramodk commented Dec 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nrnhines commented Dec 23, 2021

Uh oh!

bbpbuildbot commented Dec 23, 2021

Uh oh!

pramodk commented Dec 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

nrnhines commented Nov 29, 2021 •

edited by pramodk

Loading

codecov-commenter commented Dec 17, 2021 •

edited

Loading

alexsavulescu commented Dec 20, 2021 •

edited

Loading

pramodk commented Dec 23, 2021 •

edited

Loading