# Introduction to dask-mpi

Before proceeding to this notebook, we suggest the reading of ["Introduction to dask"](Dask.ipynb)

## Initialization
### Interactive jobs

When dask is used interactively (e.g. like here in a notebook), dask-mpi needs to be run in the background as a server with a command of the kind
```bash
mpirun -n $((N+1)) dask-mpi --no-nanny --scheduler-file scheduler.json --nthreads 1
```
where `N+1` is the total number of processes having one scheduler and N workers.

Then in the notebook we connect to the server by doing
```python
from dask.distributed import Client
client = Client(scheduler_file="scheduler.json")
```

### Batch jobs

When dask is used in a script, the script needs to be executed in parallel with a command of the kind
```bash
mpirun -n $((N+1)) python script.py
```
and the first line of script.py should be
```python
from dask_mpi import initialize
initialize(nthreads=1, nanny=False)

from dask.distributed import Client
client = Client()
```

For more details about dask-mpi refer to its [documentation](https://mpi.dask.org/en/latest/index.html).

## Example
In the following we start start the server and connect to it.

In [56]:
import sh
import tempfile

# Since dask-mpi produces several file we create a temporary directory
tmppath = tempfile.mkdtemp()
sh.cd(tmppath)

# Here we set the number of workers
workers = 8
threads_per_worker = 1

# The command runs in the background (_bg=True) and the stdout(err) is stored in tmppath+"/log.out(err)"
server = sh.mpirun("-n", workers+1, "dask-mpi", "--no-nanny", "--nthreads", threads_per_worker,
          "--scheduler-file", "scheduler.json", _bg = True, _out="log.out", _err="log.err")


In [58]:
from dask.distributed import Client
client = Client(scheduler_file=tmppath+"/scheduler.json")
client

0,1
Client  Scheduler: tcp://10.20.110.9:46756  Dashboard: http://10.20.110.9:8787/status,Cluster  Workers: 8  Cores: 8  Memory: 33.67 GB


## Workers

Information about the workers can be get using
```python
client.scheduler_info()["workers"]
```
that returns a dictionary with keys the workers name and content the last update about the worker.


In [59]:
workers = list(client.scheduler_info()["workers"].keys())
workers

['tcp://10.20.110.9:33646',
 'tcp://10.20.110.9:34231',
 'tcp://10.20.110.9:34565',
 'tcp://10.20.110.9:37555',
 'tcp://10.20.110.9:38215',
 'tcp://10.20.110.9:38225',
 'tcp://10.20.110.9:40114',
 'tcp://10.20.110.9:44958']

In [60]:
# The known information are for example
client.scheduler_info()["workers"][workers[0]]

{'type': 'Worker',
 'id': 3,
 'host': '10.20.110.9',
 'resources': {},
 'local_directory': '/tmp/tmp51a9u_o8/worker-lv7zrwwr',
 'name': 3,
 'nthreads': 1,
 'memory_limit': 4208159232,
 'last_seen': 1579600151.471659,
 'services': {},
 'metrics': {'cpu': 2.0,
  'memory': 37462016,
  'time': 1579600151.4707541,
  'read_bytes': 11126.181017576566,
  'write_bytes': 11126.181017576566,
  'num_fds': 30,
  'executing': 0,
  'in_memory': 0,
  'ready': 0,
  'in_flight': 0,
  'bandwidth': {'total': 100000000, 'workers': {}, 'types': {}}},
 'nanny': None}

## Distributed operations
We can initialize a group of workers for performing a task using the function 
```python
client.scatter(list, workers = None or workers, broadcast=True)
```
where one of each element of the list will be given to one of the workers in a round-robin based. The list of workers can be selected between the workers available.

The content of the list should contain information that the worker needs to proceed.

Here a dummy example.


In [99]:
dummy = range(len(workers))
group = client.scatter(dummy)
group

[<Future: status: finished, type: int, key: int-5c8a950061aa331153f4a172bbcbfd1b>,
 <Future: status: finished, type: int, key: int-c0a8a20f903a4915b94db8de3ea63195>,
 <Future: status: finished, type: int, key: int-58e78e1b34eb49a68c65b54815d1b158>,
 <Future: status: finished, type: int, key: int-d3395e15f605bc35ab1bac6341a285e2>,
 <Future: status: finished, type: int, key: int-5cd9541ea58b401f115b751e79eabbff>,
 <Future: status: finished, type: int, key: int-ce9a05dd6ec76c6a6d171b0c055f3127>,
 <Future: status: finished, type: int, key: int-7ec5d3339274cee5cb507a4e4d28e791>,
 <Future: status: finished, type: int, key: int-06e5a71c9839bd98760be56f629b24cc>]

In [100]:
[g.result() for g in group]

[0, 1, 2, 3, 4, 5, 6, 7]

To check that they are actually distributed we get the rank of each process.

In [102]:
def get_rank(*args,comm=None):
    if comm is None:
        from mpi4py.MPI import COMM_WORLD as comm
    return comm.rank

ranks = client.map(get_rank, group)
ranks

[<Future: status: pending, key: get_rank-365664a3a51879f4972dfa0f25d10d4d>,
 <Future: status: pending, key: get_rank-8cf27643ff4fe6d33fab32af28ca3028>,
 <Future: status: pending, key: get_rank-cfedff0fe2e07c540be32d6a9f432aa5>,
 <Future: status: pending, key: get_rank-ebefe6f7e3178184c9c891ef1d5944b1>,
 <Future: status: pending, key: get_rank-6097bbfd21e515f4843d28ea4fe3a954>,
 <Future: status: pending, key: get_rank-323e5de00ac65950eacef2420da10580>,
 <Future: status: pending, key: get_rank-e83d33807ee8a903f9e068d55d6acab3>,
 <Future: status: pending, key: get_rank-e657977a6fd8541160567ecc93590a39>]

In [103]:
ranks = [rank.result() for rank in ranks]
ranks

[5, 8, 2, 7, 4, 1, 6, 3]

We note that `rank = 0` is not in the list because indeed the scheduler is running on it and not a worker.

Thus any MPI operation need to be run on a communcator involing only the workers and not the scheduler.

In [104]:
def create_comm(*args, ranks=None, comm=None):
    assert ranks
    if comm is None:
        from mpi4py.MPI import COMM_WORLD as comm
    return comm.Create_group(comm.group.Incl(ranks))

comms = client.map(create_comm, group, workers=workers, ranks=ranks, actor=True)
comms

In [107]:
comms = [comm.result() for comm in comms]
comms

[<Actor: Intracomm, key=create_comm-0bf3fd85-3aba-4187-9073-90bbd20c13b5-0>,
 <Actor: Intracomm, key=create_comm-0bf3fd85-3aba-4187-9073-90bbd20c13b5-1>,
 <Actor: Intracomm, key=create_comm-0bf3fd85-3aba-4187-9073-90bbd20c13b5-2>,
 <Actor: Intracomm, key=create_comm-0bf3fd85-3aba-4187-9073-90bbd20c13b5-3>,
 <Actor: Intracomm, key=create_comm-0bf3fd85-3aba-4187-9073-90bbd20c13b5-4>,
 <Actor: Intracomm, key=create_comm-0bf3fd85-3aba-4187-9073-90bbd20c13b5-5>,
 <Actor: Intracomm, key=create_comm-0bf3fd85-3aba-4187-9073-90bbd20c13b5-6>,
 <Actor: Intracomm, key=create_comm-0bf3fd85-3aba-4187-9073-90bbd20c13b5-7>]

In [109]:
[comm.rank for comm in comms]

[0, 1, 3, 2, 4, 5, 6, 7]

In [120]:
reductions = [comm.allreduce(1) for comm in comms]
[r.result() for r in reductions]

[8, 8, 8, 8, 8, 8, 8, 8]