# A Guided Tour of Ray Core: Remote Tasks

© 2019-2022, Anyscale. All Rights Reserved

### Learning objectives
In this this tutorial, we revist Ray task and learn about:
 * Remote Task Parallel Pattern
 * Stateless remote functions as distributed tasks
 * Serial vs Parallel execution 
 * Understand the concept of a Ray task 
 * Easy API to convert an existing Python function into a Ray remote task
 
<img src="images/py_2_ray.png" height="35%" width="60%">

In [7]:
import os
import time
import logging

import numpy as np
from numpy import loadtxt
import ray

## 1. Tasks Parallel Pattern

Ray converts decorated functions with `@ray.remote` into stateless tasks, scheduled anywhere on a Ray worker in the cluster. 

Where they will be executed (and by whom), you don't have to worry about its details. All that is taken care for you. Nor do 
you have to reason about it — all that burden is Ray's job. You simply take your existing Python functions and covert them into 
distributed stateless *Ray Tasks*: **as simple as that!**

### Example 1: Serial vs Parallelism

Let's look at simple tasks running serially and then in parallel. For illustration, we'll use a simple task, but this could be a compute-intensive task as part of your workload.


There are a few key differences between the original function and the decorated one:

**Invocation**: The regular version is called with `regular_function()`, whereas the remote version is called with `remote_function.remote()`. Keep this pattern in mind for all Ray remote execution methods.

**Return values**: `regular_function` executes synchronously and returns the result of the function as the value `1`, whereas `remote_function` immediately returns an `ObjectID` (a future) and then executes the task in the background on a remote worker process. The result of the future can be obtained by calling `ray.get` on the `ObjectID`. This is a blocking function.

In [8]:
# A regular Python function.
def regular_function():
    time.sleep(2)
    return 1

In [9]:
# A Ray remote function.
@ray.remote
def remote_function():
    time.sleep(2)
    return 1

Let's launch a Ray cluster on our local machine.# Let's invoke the regular function
regular_function()

In [10]:
if ray.is_initialized:
    ray.shutdown()
ray.init(logging_level=logging.ERROR)

0,1
Python version:,3.8.13
Ray version:,3.0.0.dev0
Dashboard:,http://127.0.0.1:8265


In [154]:
# Let's invoke the regular function
assert regular_function() == 1

In [155]:
# Let's invoke the remote regular function.
remote_function.remote()

ObjectRef(b6e3749adc4f7e7cffffffffffffffffffffffff0100000001000000)

In [156]:
assert ray.get(remote_function.remote()) == 1

**No Parallelism**: Invocations of `regular_function` in a comprehension loop happens `serially`:

In [157]:
# These are executed one at a time, back-to-back, in a list comprehension
results = [regular_function() for _ in range(10)]
assert sum(results) == 10

**Parallelism**: Invocations of `remote_function` in a loop happen `asynchronously` and in parallel:

In [158]:
# Executing these functions, in comprehension list, happens at the same time in the background, 
# and we get the results using ray.get.
results = [remote_function.remote() for _ in range(10)]
assert sum(ray.get(results)) == 10

### Example 2: Adding two np arrays

<img src="images/task_api_add_array.png" width="60%" height="40%">

Define a function as a Ray task to read an array

In [159]:
@ray.remote
def read_array(fn: str) -> np.array:
    arr = loadtxt(fn, comments="#", delimiter=",", unpack=False)
    return arr.astype('int')

Define a function as a Ray task to add two np arrays return the sum

In [160]:
@ray.remote
def add_array(arr1: np.array, arr2: np.array) -> np.array:
    return np.add(arr1, arr1)

Define a function as a Ray task to sum the contents of an np array

In [161]:
@ray.remote
def sum_array(arr1: np.array) -> int:
    return np.sum(arr1)

Now let's execute our tasks. For now we will run Ray locally on our laptop or on a single node, with potential access to utilize all the available cores when necessary.

Ray executes immediately and returns an object reference `ObjectRef` as a future. This enables Ray to parallelize tasks and execute them asynchronously.

### Read both arrays. 

Use the `func_name.remote(args)` extention to invoke a remote Ray Task

In [162]:
obj_ref_arr1 = read_array.remote(os.path.abspath("data/file_1.txt"))
print(f"array 1: {obj_ref_arr1}")

array 1: ObjectRef(9fcb74a9409f5448ffffffffffffffffffffffff0100000001000000)


In [163]:
obj_ref_arr2 = read_array.remote(os.path.abspath("data/file_2.txt"))
print(f"array 2: {obj_ref_arr2}")

array 2: ObjectRef(02c8b33094824b7fffffffffffffffffffffffff0100000001000000)


### Add both arrays

Let's add our two arrays by calling the remote method. *Note*: We are sending Ray `ObjectRef` references as arguments. Those arguments will be resolved inline and fetched from owner's object store. That is, the cluster node that creates the `ObjecRef` owns the meta data associated and stores it in its object store. 

Ray scheduler is aware of where these object references reside or who owns them, so it will schedule this remote task on node on the worker process for data locality.

In [164]:
result_obj_ref = add_array.remote(obj_ref_arr1, obj_ref_arr2)
result_obj_ref

ObjectRef(0295e1e397f9e892ffffffffffffffffffffffff0100000001000000)

### Fetch the result 

This will block if not finished

In [165]:
result = ray.get(result_obj_ref)
print(f"Result: add arr1 + arr2: \n {result}")

Result: add arr1 + arr2: 
 [[  0  96 144 150 108 178 168 136  18  76]
 [  6  80 146 116  20  70 192  12 130  66]
 [110 134  24 194 104 146  14 152  78 100]
 [118  68  40  80 184 110  22  78 186  76]
 [178 178  74 104  96 172  98   6  38 100]
 [168  74 136  22  40  72  92 122 104 154]
 [140 180 112 110  98 152 188  56  64  46]
 [ 10  88 184  30 106 126 174 150 122  50]
 [102 116  58  60 186 188 104 144 160  54]
 [  2  56 164  70 178  72  20 168 170 130]]


Add the array elements within an `np.array` and get the sum. 
**Note** that we are sending `ObjRefs` as arguments to the function. Ray will resolve or fetch the value of these arrays. 

In [166]:
sum_1 = ray.get(sum_array.remote(obj_ref_arr1))
sum_2 = ray.get(sum_array.remote(obj_ref_arr2))

In [167]:
print(f'Sum of arr1: {sum_1}')
print(f'Sum of arr2: {sum_2}')

Sum of arr1: 5173
Sum of arr2: 7719


### Example 3: Generating Fibonnaci series

Let's define two functions: one runs locally or serially, the other runs on a Ray cluster (local or remote). This example is borrowed and refactored from our 
blog: [Writing your First Distributed Python Application with Ray](https://www.anyscale.com/blog/writing-your-first-distributed-python-application-with-ray). 
(This is an excellent tutorial to get started with the concept of why and when to use Ray tasks and Ray Actors. Highly recommended read!)

Another similar blog of interest is how to compute the value of **pi**: [How to scale Python multiprocessing to a cluster with one line of code](https://medium.com/distributed-computing-with-ray/how-to-scale-python-multiprocessing-to-a-cluster-with-one-line-of-code-d19f242f60ff).

In [168]:
# Function for local execution 
def generate_fibonacci(sequence_size):
    fibonacci = []
    for i in range(0, sequence_size):
        if i < 2:
            fibonacci.append(i)
            continue
        fibonacci.append(fibonacci[i-1]+fibonacci[i-2])
    return len(fibonacci)

In [169]:
# Function for remote Ray task with just a wrapper
@ray.remote
def generate_fibonacci_distributed(sequence_size):
    return generate_fibonacci(sequence_size)

In [170]:
# Get the number of cores 
os.cpu_count()

10

In [171]:
# Normal Python in a single process 
def run_local(sequence_size):
    results = [generate_fibonacci(sequence_size) for _ in range(os.cpu_count())]
    return results

In [172]:
%%time
run_local(100000)

CPU times: user 1.51 s, sys: 141 ms, total: 1.65 s
Wall time: 1.65 s


[100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000]

In [173]:
# Distributed on a Ray cluster
def run_remote(sequence_size):
    results = ray.get([generate_fibonacci_distributed.remote(sequence_size) for _ in range(os.cpu_count())])
    return results

In [174]:
%%time
run_remote(100000)

CPU times: user 4.49 ms, sys: 3.59 ms, total: 8.08 ms
Wall time: 365 ms


[100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000]

### Example 4: Computing prime numbers
Let's compute some prime numbers between a range  2 --> N.

**NOT**: When running the in distributed fashion, examine the Ray Dashboard in the cluster. You will notice all cores
being used for parallelism.

In [175]:
def is_prime(n):
    for divisor in range(2, int(n ** 0.5) + 1):
        if n % divisor == 0:
            return 0
    return 1

In [176]:
%%time
NUM = 2000000
primes = [is_prime(i) for i in range(NUM)]
print(f"Total number of primes in {NUM} are {sum(primes)}")

Total number of primes in 2000000 are 148935
CPU times: user 6.21 s, sys: 39.4 ms, total: 6.25 s
Wall time: 6.24 s


In [177]:
@ray.remote
def run_is_prime_distributed(n):
    return is_prime(n)

In [178]:
%%time
results = ray.get([run_is_prime_distributed.remote(NUM)])
print(f"Total number of primes in {NUM} are {results}")

Total number of primes in 2000000 are [0]
CPU times: user 1.4 ms, sys: 1.22 ms, total: 2.61 ms
Wall time: 2.5 ms


### Example 5:  Use case of `tasks` for map-reduce

The `map-reduce` pattern is a good use case for writing distributed applications with Ray core APIs. For _map_, this example uses Ray tasks to execute a 
given function multiple times in parallel (on a separate process on a node).  

We then use `ray.get`, as part of the `reduce` process, to fetch the results of each of these functions.

<img src="https://docs.ray.io/en/latest/_images/map-reduce.svg">

### Single-threaded map 

In [179]:
items = list(range(100))
map_func = lambda i : i * 2
output = [map_func(i) for i in items]
print(output)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198]


### Ray parallel map
Use the `@ray.remote` decorator to convert this `map`function into a Ray task. It takes an object and func argument and invokes the function to process the object.
Simple and elegant!

In [4]:
@ray.remote
def map(obj, f):
    return f(obj)

In [6]:
items = list(range(100))
map_func = lambda i : i * 2

# map.remote() will return an objRef to the value computed. We fetch
# that value using ray.get
output = ray.get([map.remote(i, map_func) for i in items])
print(output)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198]


### Single-threaded reduce
For reduce, let's imagine that we want to sum up the numbers computed from
our map function.

In [7]:
items = list(range(100))
map_func = lambda i : i * 2
output = sum([map_func(i) for i in items])
output

9900

### Ray parallel map and reduce

In [15]:
@ray.remote
def map(obj, f):
    return f(obj)

# our reduce operation expecting multipe arguments
# it sums up all arguments using np.sum(
@ray.remote
def sum_results(*elements):
    return np.sum(elements)

Let's do our Ray parallel map. Note that comprehension list is collection of `ObjRefs`, each element returned by `map.remote(i, map_func)`

In [16]:
items = list(range(100))
map_func = lambda i : i * 2
remote_elements = [map.remote(i, map_func) for i in items]
remote_elements[:2]

[ObjectRef(f49efea119c2d7e2ffffffffffffffffffffffff0100000001000000),
 ObjectRef(6401157137d63fd8ffffffffffffffffffffffff0100000001000000)]

#### Simple reduce
The `sum_results.remote()` as a reduce step returns the `ObjectRef` to results
of all the values in the elements.

In [18]:
# simple reduce
remote_final_sum = sum_results.remote(*remote_elements)
# fetch the reduce sumed result
result = ray.get(remote_final_sum)
print(result)

9900


### Tree reduce
Simply break into intermediate results, followed by the final reduce. 
In short break into five groups of 20 object references, and then final
reduce.

In [31]:
# tree reduce using comprehension list. 
# split in five groups of 20 ObjecRefs for intermediate reduce,
# followed by final reduce.
intermediate_results = [sum_results.remote(
    *remote_elements[i * 20: (i + 1) * 20]) for i in range(5)]

# get the reduce results of these groups
remote_final_sum = sum_results.remote(*intermediate_results)
result = ray.get(remote_final_sum)
print(result)

9900


### Example 6 (optional): Task Dependencies: Aggregating Values Efficiently

Task dependencies can be used in much more sophisticated ways. For example, suppose we wish to aggregate 8 values together. This example uses naive integer addition, but in many applications, aggregating large vectors across multiple machines can be a bottleneck. In this case, changing a single line of code can change the aggregation’s running time from linear to logarithmic in the number of values being aggregated.

<img src="images/task_dependencies_graphs.png" height="50%" width="70%">

In [None]:
# define a task to add two intergers
@ray.remote
def add(x, y):
    time.sleep(1)
    return x + y

#### Add values the slow approach

In [None]:
values = [i for i in range(1, 9)]
values

In [None]:
%%time

while len(values) > 1:
    values = [add.remote(values[0], values[1])] + values[2:]
    print(values)
result = ray.get(values[0])
print(result)

#### Add values the faster approach

In [None]:
%%time

while len(values) > 1:
    values = values[2:] + [add.remote(values[0], values[1])]
result = ray.get(values[0])
print(result)

In [None]:
# Normally will want to shutdown
ray.shutdown()

---

### Exercises

1. Increase the fibonacci with 200K, 300K
2. Add a compute intensive function; pick some function from your repo.
3. Run how to [compute PI](extra/highly_parallel.ipynb). **Note**: You can tweak with the `FULL_SAMPLE_COUNT`, to adjust the accuracy to the value of `math.pi`. `100 billion samples` may take too long. Play with this number.

### Homework
1. Try writing a map-reduce app for word or character count in a list. Try first with simple ase of a few lines, then extend it to a large file.
2. Try using local [bubble sort](https://www.geeksforgeeks.org/python-program-for-bubble-sort/) and remote bubble sort
3. Do you see the difference for small and large numbers?
4. Read this [blog](https://www.anyscale.com/blog/parallelizing-python-code) and try some examples.

### Next Step

Let's move on to the distributed [remote objects lesson](ex_02_remote_objs.ipynb).

### References

1. [Modern Parallel and Distributed Python: A Quick Tutorial on Ray](https://towardsdatascience.com/modern-parallel-and-distributed-python-a-quick-tutorial-on-ray-99f8d70369b8) by Robert Nishihara, co-creator of Ray and co-founder Anyscale
2. [Ray Core Introduction](https://www.anyscale.com/events/2022/02/03/introduction-to-ray-core-and-its-ecosystem) by Jules S. Damji