Skip to content
This repository was archived by the owner on Mar 20, 2023. It is now read-only.

Conversation

@nrnhines
Copy link
Collaborator

@nrnhines nrnhines commented Nov 29, 2021

d.inter_thread_events_.clear();
d.tqe_->nshift_ = -1;
d.tqe_->shift_bin(nrn_threads->_t);
d.tqe_->shift_bin(nrn_threads->_t - 0.5 * nrn_threads->_dt);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a similar statement in the following function I did not change. A specific test would have to be written to see if there could be an issue with file mode demonstrating the need for an equivalent change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The corresponding change was made in NetCvode::init_events. In retrospect, it is likely that BinQ is not supported for --restore since there is no mention of shift_bin in the nrn_checkpoint files. (Analogous to shift_bin in nrn/src/nrniv/bbsavestate.cpp)

ps->send(spikein[i].spiketime, net_cvode_instance, nt);
}
}
nrn_multithread_job(interthread_enqueue);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this change is needed. But it can't hurt.

Release random123 instance when multisend setup no longer needs it.
Psolve restores a few more arg default values.
@pramodk pramodk requested a review from olupton December 14, 2021 14:53
Copy link
Contributor

@olupton olupton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass through this changeset on its own looks good, I'll move on to the NEURON side.

Copy link
Contributor

@alexsavulescu alexsavulescu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Maybe merge Olli's change from #720 before.

@alexsavulescu
Copy link
Contributor

Please retest

@nrnhines
Copy link
Collaborator Author

LGTM. Maybe merge Olli's change from #720 before.

Yes. But is this best done after #720 is merged to master?

Eliminating the warning when Random123 global index does not change, increases the chance of hiding a bug.
@codecov-commenter
Copy link

codecov-commenter commented Dec 17, 2021

Codecov Report

Merging #701 (e6e3cb0) into master (423ae6c) will increase coverage by 0.08%.
The diff coverage is 60.86%.

❗ Current head e6e3cb0 differs from pull request most recent head ddf048c. Consider uploading reports for the commit ddf048c to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #701      +/-   ##
==========================================
+ Coverage   56.14%   56.22%   +0.08%     
==========================================
  Files         107      107              
  Lines        8947     8966      +19     
==========================================
+ Hits         5023     5041      +18     
- Misses       3924     3925       +1     
Impacted Files Coverage Δ
coreneuron/apps/main1.cpp 48.24% <ø> (ø)
coreneuron/io/core2nrn_data_return.cpp 1.60% <0.00%> (-0.01%) ⬇️
coreneuron/io/nrn2core_data_init.cpp 1.09% <0.00%> (-0.01%) ⬇️
coreneuron/mpi/lib/mpispike.cpp 64.23% <0.00%> (-0.48%) ⬇️
coreneuron/network/netpar.cpp 43.54% <33.33%> (-0.23%) ⬇️
coreneuron/io/nrn_setup.cpp 85.73% <66.66%> (+0.05%) ⬆️
coreneuron/mpi/lib/nrnmpi.cpp 55.55% <66.66%> (+0.48%) ⬆️
coreneuron/network/multisend_setup.cpp 75.24% <100.00%> (+0.24%) ⬆️
coreneuron/network/netcvode.cpp 74.36% <100.00%> (+0.24%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 423ae6c...ddf048c. Read the comment docs.

@alexsavulescu
Copy link
Contributor

Yes. But is this best done after #720 is merged to master?

Indeed

@alexsavulescu
Copy link
Contributor

alexsavulescu commented Dec 20, 2021

Was thinking about something along the lines of: neuronsimulator/nrn@e686c1d

LE: I have removed the commit above and will instead rely on the increase of processors via #723

@alexsavulescu
Copy link
Contributor

Now we get

CoreNEURON single and multiple threads
4 /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P30108/J112635/_/spack-build/spack-stage-neuron-develop-vkogqddlipn6wfzq2o4mkbt36t7uknzy/spack-build-vkogqdd/bin/nrniv: Could not find CoreNEURON library
4  near line 0
4  ^
        4 ParallelContext[0].psolve(50)
MPT ERROR: Rank 4(g:4) is aborting with error code -1.
	Process ID: 51011, Host: ldir01u05.bbp.epfl.ch, Program: /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P30108/J112635/_/spack-build/spack-stage-neuron-develop-vkogqddlipn6wfzq2o4mkbt36t7uknzy/spack-build-vkogqdd/bin/nrniv
	MPT Version: HPE HMPT 2.25  10/22/21 03:18:39

MPT: --------stack traceback-------
13 /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P30108/J112635/_/spack-build/spack-stage-neuron-develop-vkogqddlipn6wfzq2o4mkbt36t7uknzy/spack-build-vkogqdd/bin/nrniv: Could not find CoreNEURON library
13  near line 0
13  ^
        13 ParallelContext[0].psolve(50)
12 /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P30108/J112635/_/spack-build/spack-stage-neuron-develop-vkogqddlipn6wfzq2o4mkbt36t7uknzy/spack-build-vkogqdd/bin/nrniv: Could not find CoreNEURON library

@pramodk
Copy link
Collaborator

pramodk commented Dec 23, 2021

/nrniv: Could not find CoreNEURON library

@alexsavulescu : In case of GPU build, coreneuron is a static library and hence we have to use special to launch the simulation. nrniv can not be used to run coreneuron simulation.

pramodk added a commit to BlueBrain/nmodl that referenced this pull request Dec 23, 2021
 * if PARAMETER variable of RANGE type is updated in VERBATIM
   block then it's write_count could be 0 even though it's
   updated in VERBATIM block
   - such variable can not be declared as `const` instance variable
   - one such example is https://github.com/nrnhines/tqperf/blob/master/mod/invlfire.mod
 * Codegen helper visitor now keep track of all symbols that
   are used in all verbatim blocks and the codegen visitor
   check this list before deciding if variable can be const or not.
   - note that variable could be read-only in verbatim block but
     currently we don't have robut C code analysis capability
     and it's not worth (yet)
 * this issue was encountered in BlueBrain/CoreNeuron#701
@pramodk pramodk closed this Dec 23, 2021
@pramodk pramodk reopened this Dec 23, 2021
@pramodk pramodk closed this Dec 23, 2021
@pramodk pramodk reopened this Dec 23, 2021
pramodk added a commit to BlueBrain/nmodl that referenced this pull request Dec 23, 2021
* if PARAMETER variable of RANGE type is updated in VERBATIM
   block then it's write_count could be 0 even though it's
   updated in VERBATIM block
   - such variable can not be declared as `const` instance variable
   - one such example is https://github.com/nrnhines/tqperf/blob/master/mod/invlfire.mod
 * Codegen helper visitor now keep track of all symbols that
   are used in all verbatim blocks and the codegen visitor
   check this list before deciding if variable can be const or not.
   - note that variable could be read-only in verbatim block but
     currently we don't have robut C code analysis capability
     and it's not worth (yet)
 * this issue was encountered in BlueBrain/CoreNeuron#701
 * test update: keep ast node, do not return codegen_c_visitor instance with local AST
@pramodk
Copy link
Collaborator

pramodk commented Dec 23, 2021

/gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P30438/J115285/_/spack-build/spack-stage-neuron-develop-fwu47zj5ksgprg63usqqvmnfvuvjnlmc/spack-build-fwu47zj/share/nrn/nrnmain.cpp:
622ptxas warning : Conflicting options --device-debug and --generate-line-info specified, ignoring --generate-line-info option
623ptxas warning : Conflicting options --device-debug and --generate-line-info specified, ignoring --generate-line-info option
624Successfully created x86_64/special
625++ find . -type f -name special -print -quit
626+ special_exe=./x86_64/special
627+ mpiexec -n 16 ./x86_64/special -mpi -python test1.py
628MPT ERROR: Not enough slots from job scheduler for requested ranks
629	(HPE HMPT 2.25  10/22/21 03:19:55)

I thought this was fixed. Will take a look soon...

@alexsavulescu : was this addressed? i.e. number of cores already increased? (I am not too familiar with the setup...)

@nrnhines
Copy link
Collaborator Author

My intention for a substantive gpu test is to allow use of a POINT_PROCESS version of ARTIFICIAL_CELL IntervalFireSHA, assuming I can arrange for a gpu version of #include <openssl/sha.h> (though we could drop back to the less strict test using IntervalFire which only counts input and output events without checking event times.). Anyway, that will provide good performance and validation of the spike buffering algorithms on the GPU.

@pramodk
Copy link
Collaborator

pramodk commented Dec 25, 2021

GPFS finally reasonably working:
image

@pramodk pramodk merged commit 318e25a into master Dec 25, 2021
@pramodk pramodk deleted the hines/spike-compress branch December 25, 2021 10:17
pramodk pushed a commit to neuronsimulator/nrn that referenced this pull request Dec 25, 2021
* Extending the tqperf repository (test1.py) to the cases of binq and spike compression exposed
   a half dozen or so bugs in the categories of BinQ initialization, incomplete  BinQ queue transfer,
   and failure to enqueue the interthread event buffers in a timely manner.
* The tqperf test  now does a SHA1 hash comparison of all spike input and output times of all the artificial cells.
   This test is run from the latest http://github.com/nrnhines/tqperf.git.
   See instructions in https://github.com/neuronsimulator/tqperf/blob/master/README.md
* This PR depends on the BlueBrain/CoreNeuron#701
* Interthread enqueuing must occur after spike exchange. This is needed for binq + compressed spike exchange + threads.
* nrn binq must be initialized before core2nrn queue transfer.
* nrn2core queue transfer must also iterate over BinQ.
* Refactoring makes nrn2core queue transfer more understandable.
* neuron.coreneuron: if --multisend then also --ms-subintervals and --ms-phases
* tqperf is added as a ci test
* Updated to most recent coreneuron master as submodule
pramodk pushed a commit to neuronsimulator/nrn that referenced this pull request Nov 2, 2022
* Bug fixes with regard to option --spkcompress <nspike>
* After compressed spike exchange, do interthread_enqueue.
* nrn binq must be initialized before core2nrn queue transfer.
* nrncore binq must be initialized before nrn2core queue transfer
* interthread buffer must be enqueued at beginning of psolve.
* Initialization of binq consistent everywhere.
* Avoid Random123 globalindex warning if the index has not changed.
    Release random123 instance when multisend setup no longer needs it.
    Psolve restores a few more arg default values.
* Revert  nrnran123.cu
    Eliminating the warning when Random123 global index does not change, increases the chance of hiding a bug.
* update nmodl submodule
* ntasks=16 for gpu tests as well

Co-authored-by: Alexandru Săvulescu <alexandru.savulescu@epfl.ch>
Co-authored-by: Olli Lupton <oliver.lupton@epfl.ch>
Co-authored-by: Pramod Kumbhar <pramod.s.kumbhar@gmail.com>

CoreNEURON Repo SHA: BlueBrain/CoreNeuron@318e25a
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants