Skip to content
This repository was archived by the owner on Mar 20, 2023. It is now read-only.

Conversation

@olupton
Copy link
Contributor

@olupton olupton commented Oct 29, 2021

Description
The MUT* macros were only used in one place, and I believe the 0 in MUTCONSTRUCT(0) meant that the mutex was never actually used.

This is partly motivated by having been led on a wild goose chase by these macros when debugging another issue...

Use certain branches for the SimulationStack CI

CI_BRANCHES:NEURON_BRANCH=master,

@olupton olupton requested a review from alkino October 29, 2021 13:21
@pramodk
Copy link
Collaborator

pramodk commented Oct 29, 2021

The MUT* macros were only used in one place, and I believe the 0 in MUTCONSTRUCT(0) meant that the mutex was never actually used.

If this is true in the current code then need to look at recent refactoring PRs. I am (pretty) sure that queues are not thread-safe and hence need mutex locks. And CI may not be revealing the issue because most/all tests are today in pure MPI mode (without use of heavy thread usage, long pending #292)

@olupton
Copy link
Contributor Author

olupton commented Oct 29, 2021

OK..there is a non-macro mutex that lives alongside the queue instance here:

TQueue<QTYPE>* tqe_;
std::vector<InterThreadEvent> inter_thread_events_;
OMP_Mutex mut;
and is used in a couple of places:
void NetCvodeThreadData::interthread_send(double td, DiscreteEvent* db, NrnThread* /* nt */) {
std::lock_guard<OMP_Mutex> lock(mut);
inter_thread_events_.emplace_back(InterThreadEvent{db, td});
}
void NetCvodeThreadData::enqueue(NetCvode* nc, NrnThread* nt) {
std::lock_guard<OMP_Mutex> lock(mut);
for (const auto& ite: inter_thread_events_) {
nc->bin_event(ite.t_, ite.de_, nt);
}
inter_thread_events_.clear();
}

It may well be the case that there are supposed to be more locks, but I don't think this PR actually removes any.

@alexsavulescu
Copy link
Contributor

Not straightforward to see which mutexes are enabled or not. In NEURON I debugged with some prints: neuronsimulator/nrntest#13 (comment)

@pramodk
Copy link
Collaborator

pramodk commented Oct 29, 2021

The origin of / changes to tqueue are bit older (than when I started) but I looked into the history bit - TQueue in coreneuron/network/tqueue.ipp seems to be derived from src/nrncvode/spt2queue.cpp in neuron. It doesn't have code related to mutex locks but considering below code MUTCONSTRUCT(0) in coreneuron:

Not sure why coreneuron has just MUTCONSTRUCT(0) but still had all mutex related locks/unlocks (via macros) in place.

I don't think I have understood details with certainty yet but there needs further clarity in order to be confident about this change. We can resume on this next week.

@ohm314
Copy link
Contributor

ohm314 commented Nov 1, 2021

Should we not anyway redo those queues completely using stdlib queues and a proper C++ implementation? Also, probably a stupid question, but would running some tests with ThreadSanitizer help here?

@pramodk pramodk closed this Nov 8, 2021
@pramodk pramodk reopened this Nov 8, 2021
The MUT* macros were only used in one place, and the 0 in
MUTCONSTRUCT(0) meant that the mutex was never actually used.
@olupton olupton force-pushed the olupton/drop-mut-macros branch from 61f3714 to b8e4766 Compare November 9, 2021 10:39
@alkino alkino self-requested a review November 9, 2021 11:04
@alkino alkino merged commit ff93a93 into master Nov 9, 2021
@alkino alkino deleted the olupton/drop-mut-macros branch November 9, 2021 12:33
@pramodk
Copy link
Collaborator

pramodk commented Nov 9, 2021

TQueue in coreneuron/network/tqueue.ipp seems to be derived from src/nrncvode/spt2queue.cpp in neuron. It doesn't have code related to mutex locks

I was wrong with the above comment - the TQueue implementation is derived from nrncvode/sptbinq.cpp which has locks.

The issue / inconsistency in CoreNEURON was that the mutex was never constructed (0 as an argument to MUTCONSTRUCT):

template <container C>
TQueue<C>::TQueue() {
    MUTCONSTRUCT(0)
    nshift_ = 0;

whereas neuron has constructor with a default argument mkmut=0:

class TQueue {
public:
    TQueue(TQItemPool*, int mkmut = 0);
....

TQueue::TQueue(TQItemPool* tp, int mkmut) {
	MUTCONSTRUCT(mkmut)
	tpool_ = tp;

In the neuron source tree I don't see any TQueue instantiation with mkmut=1:

kumbhar@bbpv1:~/.../nrn/src$ grep -r "new TQueue" *
nrncvode/netcvode.cpp:	tqe_ = new TQueue(tpool_, 0);
nrncvode/netcvode.cpp:			d.tq_ = new TQueue(d.tpool_);
nrncvode/netcvode.cpp:		p.lcv_[i].neosim_self_events_ = new TQueue();
nrncvode/netcvode.cpp:		d.tqe_ = new TQueue(p[i].tpool_);
nrncvode/tqueue.cpp:	TQueue* q = new TQueue(0);

Does this mean we can remove all MUT* macros in TQueue? I would say YES.

There was one more thing where I saw mkmut=1 and that is in TQItemPool:

$ grep -r TQItemPool *
....
nrncvode/netcvode.h:	TQItemPool* tpool_;
nrncvode/netcvode.cpp:	tpool_ = new TQItemPool(1000, 1);


nrncvode/tqueue.h:declarePool(TQItemPool, TQItem)

# and in nrncvode/pool.h

#define declarePool(Pool,T) \
class Pool { \
public: \
    Pool(long count, int mkmut = 0); \

but this doesn't interact with TQueue.

With above summary, I had quick call with @nrnhines to verify the origin of mutex macros and if the removal of those is safe to do. He pointed out that the next/newer iteration of original per queue mutex was inter-thread-event buffers i.e. instead of copying events to TQueue using mutex, there is a buffer which is populated by the thread who owns the PreSyn (see interthread_send()) and this is where mutex is required/used. The thread who owns the queue will then copy all those buffered event into it's queue (and hence mutex is not required now). In PreSyn::send() we can see how thread decide to put event into it's queue or inter-thread-event buffer.

This also explains why coreneuron always had MUTCONSTRUCT(0).

In summary, the merge of this PR is safe.

Should we not anyway redo those queues completely using stdlib queues and a proper C++ implementation?

I believe so and this was what attempted by Tim already with the use of std::priority_queue. This implementation is not supporting moving events which is used by net_move() NMODL constructs . (I vaguely remember some perf/efficiently related discussion about this with default SplayTree implementation).

Also, probably a stupid question, but would running some tests with ThreadSanitizer help here?

It would but such test is missing. I am creating ticket in neuron to adapt tqperf model with threads so that it can be used for such validations.

pramodk pushed a commit to neuronsimulator/nrn that referenced this pull request Nov 2, 2022
The MUT* macros were only used in one place, and the 0 in
MUTCONSTRUCT(0) meant that the mutex was never actually used.

CoreNEURON Repo SHA: BlueBrain/CoreNeuron@ff93a93
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants