# A Guided Tour of Ray Core: Multiprocessing Pool

© 2019-2022, Anyscale. All Rights Reserved

📖 [Back to Table of Contents](./ex_00_tutorial_overview.ipynb)<br>
➡ [Next notebook](./ex_06_ray_api_calls.ipynb) <br>
⬅️ [Previous notebook](./ex_04_remote_classes_revisited.ipynb) <br>

### Learning objectives
In this this tutorial, you will learn about:

 * Ray's replacement for distribtued Python's normal `Multiprocessing.pool` library 
 * Scaling CPU bound tasks using different strategies:
   * A serial approach for a CPU bound task
   * A multi-threaded approach for a CPU bound task
   * A multiprocess approach for a CPU bound task
   * A Ray distributed multiprocess appraoch for a CPU bound task

[*Distributed multiprocessing.Pool*](https://docs.ray.io/en/latest/multiprocessing.html) makes it easy to scale existing Python applications that use [`multiprocessing.Pool`](https://docs.python.org/3/library/multiprocessing.html) by leveraging *Ray Actors*. Ray supports running distributed python programs with the **multiprocessing.Pool** API using Ray Actors, each running on a [workder node](https://docs.ray.io/en/latest/ray-core/actors.html#faq-actors-workers-and-resources), instead of local processes. This makes it easy to scale existing applications that use regular `multiprocessing.Pool` from a single node to a cluster.

<img src="images/dist_multi_pool.png" width="80%" height="55%">

First, let's have go ....

In [2]:
import numpy as np
from pprint import pprint
import time
import logging
import ray
from ray.util.multiprocessing import Pool

## Multiprocessing Pool example

The following is a simple Python function with a slight delay added (to make it behave like a more complex calculation)...

In [5]:
# This could be some complicated and compute intensive task
def func(x):
    time.sleep(1.5)
    return x ** 2

In [6]:
# Let's compute some prime numbers between a range  2-->N
def is_prime(n):
    for divisor in range(2, int(n ** 0.5) + 1):
        if n % divisor == 0:
            return 0
    return 1

Let's start Ray

In [3]:
if ray.is_initialized:
    ray.shutdown()
ray.init(logging_level=logging.ERROR)

0,1
Python version:,3.8.13
Ray version:,3.0.0.dev0
Dashboard:,http://127.0.0.1:8269


Now we'll create a *Pool* of Actors and distribute its tasks across a cluster (or across the available cores on a laptop). Let's use Ray's drop-in replacement for [multiprocessing pool](https://docs.ray.io/en/latest/multiprocessing.html) uses Ray Actors to distribute the tasks.

In [10]:
# Create a Pool of Actors
pool = Pool()

for result in pool.map(func, range(12)):
    time.sleep(2)      # sleep for you to check the dashboard; you should see 10 Actors
    print(result)

0
1
4
9
16
25
36
49
64
81
100
121


In [11]:
pool.terminate()

**NOTE**: The distributed version has the trade-off of initial increased overhead, albeit now it can scale-out horizontally across a cluster. The benefits would be more pronounced with a more computationally expensive calculation. In other words, its value is amortized over time with compute-intensive complex operations.

In [13]:
num = 2000000 
lst = list(range(num))
results = []
pool = Pool(5) # by default it will create pool == number of cores on the machine

In [14]:
%%time
for result in pool.map(is_prime, lst):
    results.append(result)
print(f"Total number of primes in {num} are {sum(results)}")


Total number of primes in 2000000 are 148935
CPU times: user 3.88 s, sys: 143 ms, total: 4.02 s
Wall time: 4.37 s


In [15]:
pool.terminate()

Let's define a compute intensive class that does some matrix
computation. Consider this to be a compute intenstive task
doing massive tensor transformation or computation.

**NOTE**: This will be your excercise

In [16]:
def task(n):
    # Simulate a long intensive task
    #TODO
    
    # do some matrix computation 
    # and return results
    return

Define a Ray remote task that launches `task()` across a pool of Actors on the cluster. It creates
a pool of `Ray Actors`, each scheduled on a cluster worker. On a single node or localhost it will be an Actor per CPU

In [17]:
# A long running task doing the work, collecting data, updating the database
# create an Actor pool of num_pool workers nodes
@ray.remote
def launch_long_running_tasks(num_pool):
    pool = Pool(num_pool) # num_pool of Actors
    results = []
    # Iterate over 50 times in batches of 10
    # TODO, replace func with task() here for the exercise
    for result in pool.map(func, range(1, 51, 10)):
        results.append(result)
        
    # Done so terminate pool
    pool.terminate()
    
    return results

### Create a Actor like supervisor that launches all these remote tasks

In [18]:
@ray.remote
class LaunchDistributedTasks:
    def __init__(self, limit=5):
        self._limit = limit

    def launch(self):
        # launch the remote task
        return launch_long_running_tasks.remote(self._limit)

### Launch our supervisor

In [19]:
launcher = LaunchDistributedTasks.remote()
print("Launched remote jobs")

Launched remote jobs


In [20]:
values = ray.get(ray.get(launcher.launch.remote()))
print(f" list of results :{values}")
print(f" Total results: {len(values)}")

 list of results :[1, 121, 441, 961, 1681]
 Total results: 5


In [21]:
ray.shutdown()

### Exercises

1. Can you convert `task()` into a compute-intensive function?
2. Use `task()` in `pool.map(task,....)`
3. (Optional) Explore the CPU bound tasks using different strategies in this [notebook](extra/mp_all_nb.ipynb) in the `extra` directory

### Homework

1. Write a Python `multiprocessing.pool` version of `task()`, with a large dataset, and compare the timings with 
the Ray distributed `multiprocessing.pool`. 
2. Do you see a difference in timings?

### Next step

Let's take a tour of the [Ray APIs](ex_06_ray_api_calls.ipynb).

📖 [Back to Table of Contents](./ex_00_tutorial_overview.ipynb)<br>
➡ [Next notebook](./ex_06_ray_api_calls.ipynb) <br>
⬅️ [Previous notebook](./ex_04_remote_classes_revisited.ipynb) <br>