Deadlock .. somewhere? (probably serialization) #1189

Finomnis · 2014-07-10T21:21:50Z

I found this problem where calling an hpx server action from a client never actually gets executed and the future never gets triggered.

I tried really hard to reduce it to a minimal problem, and this is as small as i could get it:

https://www.dropbox.com/s/6mkwkfrbnsnx9c7/deadlock_example.zip?dl=1

The way this program works:

it creates 8 worker threads per locality
every worker requests work (which is stripped down to a call of bool(void) in this example) and hands in the work after calculating it (void(void) in this example).
in the first iteration the worker 'initializes' component.
a total of 1000 packets get generated, after that the workers get shut down

i have the following counters set up:

number of packets requested
- 1. different points at the initialize-if-clause
number of packets requested.

At the end of every worker iteration I print these counters to hpx::cout.

In my example, I ran 4 nodes (mpi parcelport), that is a total of 32 workers.
That should give me the following output:

... cut away ...
(995/32/32/32/995)
(996/32/32/32/996)
(997/32/32/32/997)
(998/32/32/32/998)
(999/32/32/32/999)
(1000/32/32/32/1000)

or in a different order, as it is executed in parallel.
But it should have the (1000/32/32/32/1000) in somewhere, and then synchronize and finish the program.

what i get, though, is:

... cut away ...
(998/32/32/17/983)
(999/32/32/17/984)
(1000/32/32/17/985)
<deadlock>

or in a different order.
which indicates that multiple (in this case 15, varies from 0-20) workers got stuck between counter 3 and 4.
To be precise, at:

worker.cpp(line 101): kernel.set_arg(buf).get();

This only happens on distributed execution. (So far only tried on 4+ nodes, doesn't happen reproducably on two nodes)

I do not know what causes this lock.

The text was updated successfully, but these errors were encountered:

hkaiser · 2014-07-13T12:31:47Z

This has been fixed by c3f50f1

hkaiser added category: core labels Jul 10, 2014

hkaiser added this to the 0.9.9 milestone Jul 10, 2014

hkaiser self-assigned this Jul 10, 2014

hkaiser added a commit that referenced this issue Jul 11, 2014

Fixing #1189: Deadlock .. somewhere? (probably serialization)

8ec0343

hkaiser mentioned this issue Jul 12, 2014

Fixing 1189 #1191

Merged

hkaiser closed this as completed Jul 13, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock .. somewhere? (probably serialization) #1189

Deadlock .. somewhere? (probably serialization) #1189

Finomnis commented Jul 10, 2014

hkaiser commented Jul 13, 2014

Deadlock .. somewhere? (probably serialization) #1189

Deadlock .. somewhere? (probably serialization) #1189

Comments

Finomnis commented Jul 10, 2014

hkaiser commented Jul 13, 2014