DPL: optimize to use ipc:// on the same node#2517
Conversation
|
@knopers8 @sawenzel @matthiasrichter @shahor02 this now should optimise local DPL workflows to use (managed) shared memory. @knopers8 could you try to repeat your QC benchmark? |
|
I will run them. Just to make sure, this will work also for messages created with standard |
|
It should be completely transparent. |
|
@ktf I see that some different workflows fail in each build, so the issue is most likely is with shm in a container. What are the settings for those currently? |
|
Indeed. I use the defaults which I think allows for 64MB of shared memory. |
* Optimize to use ipc:// on the same node * Allow specifying resources on the command line. Currently works only for a single host. * Foundations to be able to run the same workflow in multinode distributed manner.
|
It actually seems to work with a multinode setup as well!! To try it: |
Checking if using it is actually causing the tests to randomly fail.
|
@aalkin ok, I think I understand the issue. There is a unique id "--session" which needs to be passed to FairMQ to allow different shared memory pools. We can simply put back uniqueWorkflowId as an argument of |
|
The gpu error seems unrelated. I will merge this to have the improved resource scheduling and address the actual usage of |
|
ok, but could we check what is going on with the GPU CI? It is just failling constantly at random recipes. Doesn't make much sense this way. |
|
I am looking into it. |
Some of the results for larger payloads and number of producers are missing, because something crashes inside FairMQ/shmem/boost. It helps a bit if I reduce the buffers size from standard 1000 to something smaller, but with rates high enough the problem appears anyway. I guess it is connected with https://alice.its.cern.ch/jira/browse/O2-879 |
|
By default FairMQ has 2GB shared memory segment. I suppose that is what is crashing the 1GB test. You should be able to change it with |
|
There seems to be a clear throughput maximum around 4M-16M payload size, which is far enough from 2GB limit. It would be interesting to look at a heatmap-like throughput plot with number of producers and payload sizes at the axes. |
I've been using that for these tests, it helped only to some extent.
Yes, that might be interesting to see, I will let it run later. |
|
I have tested it on previous 'force-pushed' version (be1a810). Would the new one have something different regarding these crashes? |
* Optimise to use ipc:// on the same node. This will allow us to get rid of the FreePortFinder. * Allow specifying resources on the command line. For the moment devices are allocated to resources in a naive way (each device gets roughtly 1/N of the total resources). Notice that there is not any actual QoS happening, this is just to evenly subdivide device across available resources. * Foundations to be able to run the same workflow in a multinode distributed manner. It actually works (tested with a two node setup), however there are still issues like the fact that not all the workflows will quit correctly in such setup (i.e. `QuitRequest::All` cannot be propagated to other hosts, somewhat by design).
No description provided.