comm: set parsec_tls_execution_stream in comm thread#422
comm: set parsec_tls_execution_stream in comm thread#422bosilca merged 1 commit intoICLDisco:masterfrom
Conversation
|
Feel free to bikeshed |
PR ICLDisco#413 fixed issues around recursive tasks by retrieving the current execution stream via `parsec_my_execution_stream`. However, under some circumstances, it appears that the `on_complete` callback can be called from the communication thread. The comm thread never sets the TLS variable needed for `parsec_my_execution_stream`, so you end up with a NULL execution stream and everything breaks. Ensure that `parsec_tls_execution_stream` is set by the communication thread as well. This introduces a new internal API to do so, `parsec_set_my_execution_stream`, since `parsec_tls_execution_stream` is a static variable in parsec.c. Signed-off-by: Omri Mor <omrimor2@illinois.edu>
|
I feel a little uneasy about this... I believe there are places where we rely on |
|
|
In general all parsec threads should have their TLS set. I still don't understand why you are linking this with the profiling issue, because I still don't understand how the recursive tasks (as they exist today in parsec) can be completed by the communication thread. Can you provide a backtrace when this happens ? |
|
|
@bosilca I have a coredump, but it no longer matches the binary—but from the disassembly I still have on my console screen from when I was looking at the coredump, I've traced this to calls to static inline void remote_dep_dec_flying_messages(parsec_taskpool_t *tp) {
parsec_taskpool_update_runtime_nbtask(tp, -1);
}
int parsec_taskpool_update_runtime_nbtask(parsec_taskpool_t *tp, int32_t nb_tasks) {
int remaining = tp->update_nb_runtime_task(tp, nb_tasks); /* points to parsec_add_fetch_runtime_task */
return parsec_check_complete_cb(tp, tp->context, remaining);
}
int32_t parsec_add_fetch_runtime_task(parsec_taskpool_t *tp, int32_t nb_tasks) {
return parsec_atomic_fetch_add_int32(&tp->nb_pending_actions, nb_tasks) + nb_tasks;
}
int parsec_check_complete_cb(parsec_taskpool_t *tp, parsec_context_t *context, int remaining) {
if (0 == remaining) {
if (NULL != tp->on_complete) {
tp->on_complete(tp, tp->on_complete_data); /* points to parsec_recursivecall_callback */
}
parsec_atomic_fetch_dec_int32(&context->active_taskpools);
return 1;
}
return 0;
}I'm fairly certain that the (inlined) call to The reason why LCI seemed to trigger this more than MPI (which I don't think did in my couple tests) is probably due to differences in timing—maybe LCI was busy processing communications at the time and so decremented the counter later, after the actual tasks were complete. As a side note, |
|
My only criticism of this PR is that the other threads should also call Today, the comm thread has an |
|
@omor1 I still don't understand how a communication can release the on_complete callback of a totally local taskpool. Yes, communications can complete tasks (when all their data transfers for remote processes are completed), so the execution path your are highlighting makes sense, for distributed taskpools. However, the recursive taskpools are entirely local, they do not create communications and therefore should have no outside visible effects (and should not be completed by a communication). We need to dig a little deeper into this to understand what is really going on. |
Can do. |
I think I sort of explained this above:
Essentially, when the recursive taskpool is created, PaRSEC seemingly has no way to know that all the data it will need is local and that it has no remote dependents, so it necessarily treats it like a distributed taskpool. The heuristic |
OK, this is weird but plausible. The runtime does not know the taskpool is local, so it need to register it with the communication engine. It does so by increasing the number of flying messages (a way to refcount the communication engire) and by adding a command to the comm engine queue. If the local tasks all complete before the comme engine has time to handle the registration of the new taskpool (via remote_dep_mpi_new_taskpool), then indeed when the communication engine decrements the inflying messages at the end of remote_dep_mpi_new_taskpool it will trigger the taskpool completion event. Complicated, definitively a behavior outside the runtime design. So, you're right we need to set the execution stream for the communication thread. |
This is already done, all threads are setting their I really don't like the fact that we need to create a special accessor just for this case, but I see that the TLS variables are created statically (which forces us to have an accessor in parsec.c). But I don't think they need to be static, instead allowing the creator to decide if they are static or extern. Using this approach we can leave all other TLS variable as they are today (aka static) and make parsec_tls_execution_stream visible across the entire code base. This way we will not need an additional accessor and the communication thread should be able to use |
Right. Somehow with LCI we're triggering this more frequently. Maybe since we're faster at sending/receiving messages the comm thread spends more time continuing to do that—it only stops processing communications if there's an iteration where nothing progressed—so things like processing the new taskpool gets delayed.
Generally agree with all this. I guess we'd need a new macro, tentatively named |
|
Why not having |
Was a bit concerned about breaking API, if someone later adds a new TLS var and assumes it's |
|
don't worry, we'll make sure it does not happen. |
|
I never did the changes re: |
|
It is coming in #437 437. |
PR #413 fixed issues around recursive tasks by retrieving the current
execution stream via
parsec_my_execution_stream. However, under somecircumstances, it appears that the
on_completecallback can be calledfrom the communication thread. The comm thread never sets the TLS
variable needed for
parsec_my_execution_stream, so you end up with aNULL execution stream and everything breaks.
Ensure that
parsec_tls_execution_streamis set by the communicationthread as well. This introduces a new internal API to do so,
parsec_set_my_execution_stream, sinceparsec_tls_execution_streamisa static variable in parsec.c.
Signed-off-by: Omri Mor omrimor2@illinois.edu