# A Guided Tour of Ray Core: Multiprocessing Pool

© 2019-2022, Anyscale. All Rights Reserved

[*Distributed multiprocessing.Pool*](https://docs.ray.io/en/latest/multiprocessing.html) makes it easy to scale existing Python applications that use [`multiprocessing.Pool`](https://docs.python.org/3/library/multiprocessing.html) by leveraging *Ray Actors*. Ray supports running distributed python programs with the **multiprocessing.Pool** API using Ray Actors, each running on a [workder node](https://docs.ray.io/en/latest/ray-core/actors.html#faq-actors-workers-and-resources), instead of local processes. This makes it easy to scale existing applications that use `multiprocessing.Pool` from a single node to a cluster.

<img src="images/dist_multi_pool.png" width="80%" height="55%">



First, let's start Ray…

In [14]:
import numpy as np
from pprint import pprint
import time
import logging
import ray
from ray.util.multiprocessing import Pool

## Multiprocessing Pool example

The following is a simple Python function with a slight delay added (to make it behave like a more complex calculation)...

In [15]:
# This could be some complicated and compute intensive task
def func(x):
    time.sleep(1.5)
    return x ** 2

In [16]:
def is_prime(n):
    for divisor in range(2, int(n ** 0.5) + 1):
        if n % divisor == 0:
            return 0
    return 1

Then, use the Ray's drop-in replacement for [multiprocessing pool](https://docs.ray.io/en/latest/multiprocessing.html)

In [17]:
if ray.is_initialized:
    ray.shutdown()
context = ray.init(logging_level=logging.ERROR)
pprint(context)

RayContext(dashboard_url='127.0.0.1:8265', python_version='3.8.12', ray_version='1.12.0', ray_commit='f18fc31c7562990955556899090f8e8656b48d2d', address_info={'node_ip_address': '127.0.0.1', 'raylet_ip_address': '127.0.0.1', 'redis_address': None, 'object_store_address': '/tmp/ray/session_2022-05-20_09-37-51_403506_16545/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2022-05-20_09-37-51_403506_16545/sockets/raylet', 'webui_url': '127.0.0.1:8265', 'session_dir': '/tmp/ray/session_2022-05-20_09-37-51_403506_16545', 'metrics_export_port': 63960, 'gcs_address': '127.0.0.1:63855', 'address': '127.0.0.1:63855', 'node_id': 'cb62157f82b601209b5790d701b56a3067dd1e298c216d6bf20bf7f9'})


In [18]:
print(f"Dashboard url: http://{context.address_info['webui_url']}")

Dashboard url: http://127.0.0.1:8265


Now we'll create a *Pool* using and distribute its tasks across a cluster (or across the available cores on a laptop):

In [26]:
%%time

pool = Pool()

for result in pool.map(func, range(10)):
    print(result)

0
1
4
9
16
25
36
49
64
81
CPU times: user 48.8 ms, sys: 28.3 ms, total: 77.2 ms
Wall time: 3.43 s


In [27]:
pool.terminate()

The distributed version has the trade-off of increased overhead, although now it can scale-out horizontally across a cluster. The benefits would be more pronounced with a more computationally expensive calculation.

In [29]:
num = 2000000
lst = list(range(num))
results = []
pool = Pool()

In [30]:
%%time
for result in pool.map(is_prime, lst):
    results.append(result)
print(f"Total number of primes in {num} are {sum(results)}")


All primes in 2000000 are 148935
CPU times: user 7.41 s, sys: 147 ms, total: 7.56 s
Wall time: 7.55 s


In [31]:
pool.terminate()

Let's define a compute intensive class that does some matrix
computation. Consider this could be a compute intenstive task
doing massive tensor transformation or computation.

**NOTE**: This will your excercise

In [32]:
def task(n):
    # Simulate a long intensive task
    #TODO
    
    # do some matrix computation 
    # and return results
    return

Define a Ray remote task that launches `task()` across a pool of Actors on the cluster. It creates
a pool of `Ray Actors`, each scheduled on a cluster worker. 

In [33]:
@ray.remote
def launch_long_running_tasks(num_pool):
    # Doing the work, collecting data, updating the database
    # create an Actor pool of num_pool workers nodes
    pool = Pool(num_pool)
    results = []
    # Iterate over 50 times in batches of 10
    for result in pool.map(func, range(1, 50, 10)):
        results.append(result)
        
    # Done so terminate pool
    pool.terminate()
    
    return results

### Create a Actor like supervisor that launches all these remote tasks

In [34]:
@ray.remote
class LaunchDistributedTasks:
    def __init__(self, limit=5):
        self._limit = limit

    def launch(self):
        # launch the remote task
        return launch_long_running_tasks.remote(self._limit)

### Launch our supervisor

In [35]:
hdl = LaunchDistributedTasks.remote(5)
print("Launched remote jobs")

Launched remote jobs


In [36]:
values = ray.get(ray.get(hdl.launch.remote()))
print(f" list of results :{values}")
print(f" Total results: {len(values)}")

 list of results :[1, 121, 441, 961, 1681]
 Total results: 5


In [37]:
ray.shutdown()

### Excercises

1. Can you convert `task()` into a complicated function?
2. Use `task()` in `pool.map(task,....)`

### Homework

1. Write a Python `multiprocessing.pool` version of `task()` and compare the timings with 
the Ray distributed `multiprocessing.pool`. 
2. Do you see a difference in timings?