# Ray Crash Course - Ray Multiprocessing

This lesson explores how to replace two popular multiprocessing libraries with Ray replacements to break the one-machine boundary:

* [`multiprocessing.Pool`](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool) for general management of process pools.
* [`joblib`](https://joblib.readthedocs.io/en/latest/), the underpinnings of [scikit-learn](https://scikit-learn.org/stable/), which Ray can scale to a cluster.

We also examine how Ray can work with Python's [`asyncio`](https://docs.python.org/3/library/asyncio.html).

> **Tip:** For more about Ray, see [ray.io](https://ray.io) or the [Ray documentation](https://docs.ray.io/en/latest/).

In [4]:
import ray, time, sys, os
import numpy as np

In [2]:
!../tools/start-ray.sh


Ray already running or successfully started


In [3]:
ray.init(address='auto', ignore_reinit_error=True)



{'node_ip_address': '192.168.1.149',
 'raylet_ip_address': '192.168.1.149',
 'redis_address': '192.168.1.149:45926',
 'object_store_address': '/tmp/ray/session_2020-05-21_07-14-58_032308_38947/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2020-05-21_07-14-58_032308_38947/sockets/raylet',
 'webui_url': 'localhost:8265',
 'session_dir': '/tmp/ray/session_2020-05-21_07-14-58_032308_38947'}

## Drop-in Replacements for Popular Single-node, Multiprocessing Libraries

The Python community has three popular libraries for breaking out of Python's _global interpreter lock_ to enable better multiprocessing and concurrency. Ray now offers drop-in replacements for two of them, [`multiprocessing.Pool`](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool) and [`joblib`](https://joblib.readthedocs.io/en/latest/), and integration with the third, Python's [`asyncio`](https://docs.python.org/3/library/asyncio.html).

This section explores the `multiprocessing.Pool` and `joblib` replacements.

| Library | Library Docs | Ray Docs | Description |
| :------ | :----------- | :------- | :---------- |
| `multiprocessing.Pool` | [docs](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool) | [Ray](https://docs.ray.io/en/latest/multiprocessing.html) | Create a pool of processes for running work. The Ray replacement allows scaling to a cluster. |
| `joblib` | [docs](https://joblib.readthedocs.io/en/latest/) | [Ray](https://docs.ray.io/en/latest/joblib.html) | Ray supports running distributed [scikit-learn](https://scikit-learn.org/stable/) programs by implementing a Ray backend for `joblib` using Ray Actors instead of local processes. This makes it easy to scale existing applications that use scikit-learn from a single node to a cluster. |


### Multiprocessing.Pool

If your application already uses `multiprocessing.Pool`, then scaling beyond a single just requires replacing your import statements from this:

```python
from multiprocessing.pool import Pool
```

To this:

```python
from ray.util.multiprocessing.pool import Pool
```

You also have to call `ray.init(...)`.

| `asyncio` | [docs](https://docs.python.org/3/library/asyncio.html) | [Ray](https://docs.ray.io/en/latest/async_api.html) | |
