<img src="images/ray_basic_patterns.png" height="25%" width="50%">

In [1]:
import os
import time
import logging

import numpy as np
from numpy import loadtxt
import ray

## 1. Tasks Parallel Pattern

Ray converts decorated functions into stateless tasks, scheduled
anywhere onto a ray worker in the cluster by simply adding the `@ray.remote` decorator

All these functions are converted into Ray stateless tasks that will be executed on some worker process in a Ray cluster.
Where they will be placed, you don't have to worry about its details. All that is taken care for you. Nor do you have to
reason about it— all that burder is Ray's job.

You simply take your existing Python functions and covert them into *Ray Tasks*: as simple as that!

### Example 1: Adding two np arrays

Define a function as a Ray task to read an array

In [2]:
@ray.remote
def read_array(fn: str) -> np.array:
    arr = loadtxt(fn, comments="#", delimiter=",", unpack=False)
    return arr.astype('int')

Define a function as a Ray task to add two np arrays return the sum

In [3]:
@ray.remote
def add_array(arr1: np.array, arr2: np.array) -> np.array:
    return np.add(arr1, arr1)

Define a function as a Ray task to the contents of an np array

In [4]:
@ray.remote
def sum_array(arr1: np.array) -> int:
    return np.sum(arr1)

Now let's execute them our tasks. But first, we have to initialize Ray. For now
we will run Ray locally on our laptop.

Ray executes immediately and returns an object reference as a futures. This enables Ray to parallelize tasks and execute them
asynchronously.

But first let's initialize ray.

In [5]:
ray.init(
    ignore_reinit_error=True,
    logging_level=logging.ERROR,
)

{'node_ip_address': '127.0.0.1',
 'raylet_ip_address': '127.0.0.1',
 'redis_address': '127.0.0.1:64516',
 'object_store_address': '/tmp/ray/session_2021-12-28_16-51-45_848187_74720/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2021-12-28_16-51-45_848187_74720/sockets/raylet',
 'webui_url': '127.0.0.1:8266',
 'session_dir': '/tmp/ray/session_2021-12-28_16-51-45_848187_74720',
 'metrics_export_port': 65417,
 'node_id': 'd1b5664c1245494c14c99a8b3f673ea6ea793085346fa07dd2a2b92c'}

#### Read both arrays. 

In [6]:
obj_ref_arr1 = read_array.remote("data/file_1.txt")
print(f"array 1: {obj_ref_arr1}")

array 1: ObjectRef(a67dc375e60ddd1affffffffffffffffffffffff0100000001000000)


In [7]:
obj_ref_arr2 = read_array.remote("data/file_2.txt")
print(f"array 2: {obj_ref_arr2}")

array 2: ObjectRef(63964fa4841d4a2effffffffffffffffffffffff0100000001000000)


Let's add our two arrays by calling the remote method. *Note*: We are sending RayObject references as arguments.
Those will be resolved inline and fetched from owner or the note that owns them from its global object store. 

Ray scheduler is aware of where these object references reside or who owns them, so it will schedule this remote
task on node on the worker process for data locality.

In [8]:
result_obj_ref = add_array.remote(obj_ref_arr1, obj_ref_arr2)

Fetch the result: this will block if not finished

In [9]:
result = ray.get(result_obj_ref)
print(f"Result: add arr1 + arr2: \n {result}")

Result: add arr1 + arr2: 
 [[  0  96 144 150 108 178 168 136  18  76]
 [  6  80 146 116  20  70 192  12 130  66]
 [110 134  24 194 104 146  14 152  78 100]
 [118  68  40  80 184 110  22  78 186  76]
 [178 178  74 104  96 172  98   6  38 100]
 [168  74 136  22  40  72  92 122 104 154]
 [140 180 112 110  98 152 188  56  64  46]
 [ 10  88 184  30 106 126 174 150 122  50]
 [102 116  58  60 186 188 104 144 160  54]
 [  2  56 164  70 178  72  20 168 170 130]]


In [10]:
# Add the array elements and get the sum
sum_1 = ray.get(sum_array.remote(obj_ref_arr1))
sum_2 = ray.get(sum_array.remote(obj_ref_arr2))

In [11]:
print(f'Sum of arr1: {sum_1}')
print(f'Sum of arr2: {sum_2}')

Sum of arr1: 5173
Sum of arr2: 7719


### Example 2: Generating Fibonnaci series

Let's define two functions: one runs locally or serially, the other runs on a Ray cluster (local or remote). This example is borrowed and refactored from our blog: [Writing your First Distributed Python Application with Ray](https://www.anyscale.com/blog/writing-your-first-distributed-python-application-with-ray). 

Another similar blog of interest is how to compute the value of **pi**: [How to scale Python multiprocessing to a cluster with one line of code](https://medium.com/distributed-computing-with-ray/how-to-scale-python-multiprocessing-to-a-cluster-with-one-line-of-code-d19f242f60ff).

In [12]:
# Local execution 
def generate_fibonacci(sequence_size):
    fibonacci = []
    for i in range(0, sequence_size):
        if i < 2:
            fibonacci.append(i)
            continue
        fibonacci.append(fibonacci[i-1]+fibonacci[i-2])
    return len(fibonacci)

In [13]:
# Remote Task with just a wrapper
@ray.remote
def generate_fibonacci_distributed(sequence_size):
    return generate_fibonacci(sequence_size)

In [14]:
# Get the number of cores 
os.cpu_count()

12

In [15]:
# Normal Python in a single process 
def run_local(sequence_size):
    results = [generate_fibonacci(sequence_size) for _ in range(os.cpu_count())]
    return results

In [16]:
%%time
run_local(100000)

CPU times: user 2.94 s, sys: 1.95 s, total: 4.89 s
Wall time: 4.9 s


[100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000]

In [17]:
# Distributed on a Ray cluster
def run_remote(sequence_size):
    results = ray.get([generate_fibonacci_distributed.remote(sequence_size) for _ in range(os.cpu_count())])
    return results

In [18]:
%%time
run_remote(100000)

CPU times: user 58.6 ms, sys: 28.3 ms, total: 86.9 ms
Wall time: 1.95 s


[100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000]

In [19]:
# Normally will want to shutdown
ray.shutdown()

---

### References

1. [Modern Parallel and Distributed Python: A Quick Tutorial on Ray](https://towardsdatascience.com/modern-parallel-and-distributed-python-a-quick-tutorial-on-ray-99f8d70369b8) by Robert Nishihara, co-creator of Ray and co-founder Anyscale