Distribute vegasflow on clusters with dask #55

scarlehoff · 2020-09-11T10:36:56Z

This seems to work in one single computer. I'll try it in galileo as soon as I am able to.

As far as I understand the point of the matter is to have a run_event per distributed system where the dask client connects to the appropriate one.

The way it will work is by sending a job per chunk of data while the master node / central server collects all data and gives you the results.

At first I thought "this is so simple we should use this instead of joblib" but then I realised not only complicates pickability and device selection but also the distribute package from pip was not working in dom... (the one from Arch is) so for now I prefer to keep it as a completely separate option.

scarrazza · 2020-09-11T11:06:07Z

Great, looks good. I will give a try in other IT infrastructures.

scarlehoff · 2020-09-11T14:23:08Z

In indaco I've had the same problems as with dom so definitely not having this in the main package. The pickle is also very tricky and it only seems to work with tf > 2.2

In any case, the way this needs to be done is by passing a dask cluster object to, for instance, the compile call. I'll have some example and then I'll have the docs point to the list of supported systems from dask

The advantage is that by doing that we are compatible with all queue system dask is.

The latest commit is working in indaco. I have to say I'm very happy with dask, other than the expected pitfalls when passing around objects through sockets everything works as advertised.

src/vegasflow/vflow.py

scarlehoff · 2020-09-11T21:54:47Z

This is ready for review. If you have access to a non-slurm workload manager it would be helpful to have a second example. If not I think this one is enough.

scarrazza · 2020-09-13T16:58:00Z

Very good, I have tried the local cluster (the dask monitor panel seems to work fine) and the PBSCluster, both cases are working fine. Just wondering if we have some multi-GPU nodes in some cluster (maybe marco?), if not we can try to rent and configure slurm on some cloud machines.

scarlehoff · 2020-09-13T18:17:31Z

Even in that case you would want to send two jobs to that node. I haven't even tried to make dask + multiGPU work at the same time because it seems redundant to me (and because it scares me tbh).

scarlehoff · 2020-09-17T10:31:30Z

If you are happy with this, I'll merge.

scarrazza · 2020-09-17T11:15:20Z

Fine by me, and the instructions are clear.

scarlehoff added 2 commits September 11, 2020 12:27

first try

8e23f24

pass dask

7f56018

scarlehoff requested a review from scarrazza September 11, 2020 10:36

working version

300adc4

pylint again

63f7cdc

scarrazza reviewed Sep 11, 2020

View reviewed changes

src/vegasflow/vflow.py Outdated Show resolved Hide resolved

clean up

618f541

scarlehoff force-pushed the dasktribute branch from f62612a to 618f541 Compare September 11, 2020 15:21

scarlehoff mentioned this pull request Sep 11, 2020

vegasflow+pineappl example #56

Merged

3 tasks

docs

63526f8

scarlehoff force-pushed the dasktribute branch from c3e9958 to 63526f8 Compare September 11, 2020 21:53

scarlehoff added this to the v1.2 milestone Sep 14, 2020

scarlehoff mentioned this pull request Sep 14, 2020

Release of v 1.2 #59

Closed

6 tasks

add a note about multigpu and dask

6c34651

scarlehoff merged commit b1616b7 into master Sep 17, 2020

scarlehoff deleted the dasktribute branch September 17, 2020 11:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distribute vegasflow on clusters with dask #55

Distribute vegasflow on clusters with dask #55

scarlehoff commented Sep 11, 2020

scarrazza commented Sep 11, 2020

scarlehoff commented Sep 11, 2020

scarlehoff commented Sep 11, 2020

scarrazza commented Sep 13, 2020

scarlehoff commented Sep 13, 2020

scarlehoff commented Sep 17, 2020

scarrazza commented Sep 17, 2020

Distribute vegasflow on clusters with dask #55

Distribute vegasflow on clusters with dask #55

Conversation

scarlehoff commented Sep 11, 2020

scarrazza commented Sep 11, 2020

scarlehoff commented Sep 11, 2020

scarlehoff commented Sep 11, 2020

scarrazza commented Sep 13, 2020

scarlehoff commented Sep 13, 2020

scarlehoff commented Sep 17, 2020

scarrazza commented Sep 17, 2020