# A Guided Tour of Ray Core: Remote Tasks

© 2019-2022, Anyscale. All Rights Reserved

<img src="images/ray_basic_patterns.png" height="35%" width="60%">

In [1]:
import os
import time
import logging
from pprint import pprint

import numpy as np
from numpy import loadtxt
import ray

## 1. Tasks Parallel Pattern

Ray converts decorated functions into stateless tasks, scheduled anywhere on a ray worker in the cluster by simply adding the `@ray.remote` decorator. 
All these functions are converted into Ray stateless tasks that will be executed on some worker process on a Ray cluster.

Where they will be executed (and by whom), you don't have to worry about its details. All that is taken care for you. Nor do 
you have to reason about it — all that burden is Ray's job. You simply take your existing Python functions and 
covert them into distributed stateless *Ray Tasks*: as simple as that!

### Example 1: Serial vs Parallelism

Let's look at simple tasks running serially and then in parallel. For illustration, we'll use a simple task, but this could be a compute-intensive task as part of your workload.


In [2]:
# A regular Python function.
def regular_function():
    return 1

In [3]:
# A Ray remote function.
@ray.remote
def remote_function():
    return 1

There are a few key differences between the original function and the decorated one:

**Invocation**: The regular version is called with `regular_function()`, whereas the remote version is called with `remote_function.remote()`. Keep this pattern in mind for all Ray remote execution methods.

**Return values**: `regular_function` executes synchronously and returns the result of the function as thevalue `1'`, whereas `remote_function` immediately returns an `ObjectID` (a future) and then executes the task in the background on a separate worker process. The result of the future can be obtained by calling `ray.get` on the `ObjectID`. This is a blocking function.

In [4]:
# Let's invoke the regular function
regular_function()

1

Let's launch a Ray cluster on our local machine. This will run a headnode.

In [5]:
if ray.is_initialized:
    ray.shutdown()
context = ray.init(logging_level=logging.ERROR)
pprint(context)

RayContext(dashboard_url='127.0.0.1:8265', python_version='3.8.12', ray_version='2.0.0.dev0', ray_commit='{{RAY_COMMIT_SHA}}', address_info={'node_ip_address': '127.0.0.1', 'raylet_ip_address': '127.0.0.1', 'redis_address': None, 'object_store_address': '/tmp/ray/session_2022-04-06_10-05-21_955763_75805/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2022-04-06_10-05-21_955763_75805/sockets/raylet', 'webui_url': '127.0.0.1:8265', 'session_dir': '/tmp/ray/session_2022-04-06_10-05-21_955763_75805', 'metrics_export_port': 65003, 'gcs_address': '127.0.0.1:57125', 'address': '127.0.0.1:57125', 'node_id': 'f4b3291fec268074a1d6ab685f37db40e463148ca2c81f4e112522cb'})


In [6]:
print(f"Dashboard url: http://{context.address_info['webui_url']}")

Dashboard url: http://127.0.0.1:8265


In [7]:
# Let's invoke the remote regular function.
remote_function.remote()

ObjectRef(c8ef45ccd0112571ffffffffffffffffffffffff0100000001000000)

In [8]:
ray.get(remote_function.remote())

1

**No Parallelism**: Invocations of `regular_function` in a loop happen `serially`:

In [9]:
# These are executed one at a time, back-to-back, in a list comprehension
results = [regular_function() for _ in range(10)]
assert sum(results)

**Parallelism**: Invocations of `remote_function` in a loop happen `asynchronously` and in parallel:

In [10]:
# Executing these functions, in comprehension list, happens at the same time in the background, 
# and we get the results using ray.get.
results = [remote_function.remote() for _ in range(10)]
assert sum(ray.get(results)) == 10

### Example 2: Adding two np arrays

<img src="images/task_api_add_array.png" width="60%" height="40%">

Define a function as a Ray task to read an array

In [11]:
@ray.remote
def read_array(fn: str) -> np.array:
    arr = loadtxt(fn, comments="#", delimiter=",", unpack=False)
    return arr.astype('int')

Define a function as a Ray task to add two np arrays return the sum

In [12]:
@ray.remote
def add_array(arr1: np.array, arr2: np.array) -> np.array:
    return np.add(arr1, arr1)

Define a function as a Ray task to sum the contents of an np array

In [13]:
@ray.remote
def sum_array(arr1: np.array) -> int:
    return np.sum(arr1)

Now let's execute our tasks. For now we will run Ray locally on our laptop, with a single node, and use all the cores available.

Ray executes immediately and returns an object reference `ObjectRef` as a future. This enables Ray to parallelize tasks and execute them asynchronously.

### Read both arrays. 

Use the `func_name.remote(args)` extention to invoke a remote Ray Task

In [14]:
obj_ref_arr1 = read_array.remote("data/file_1.txt")
print(f"array 1: {obj_ref_arr1}")

array 1: ObjectRef(85748392bcd969ccffffffffffffffffffffffff0100000001000000)


In [15]:
obj_ref_arr2 = read_array.remote("data/file_2.txt")
print(f"array 2: {obj_ref_arr2}")

array 2: ObjectRef(d695f922effe6d99ffffffffffffffffffffffff0100000001000000)


### Add both arrays

Let's add our two arrays by calling the remote method. *Note*: We are sending Ray `ObjectRef` references as arguments.Those will be resolved inline and fetched from owner. Note that node that creates the ref owns them and stores it in its object store. 

Ray scheduler is aware of where these object references reside or who owns them, so it will schedule this remote task on node on the worker process for data locality.

In [16]:
result_obj_ref = add_array.remote(obj_ref_arr1, obj_ref_arr2)
result_obj_ref

ObjectRef(2751d69548dba956ffffffffffffffffffffffff0100000001000000)

### Fetch the result 

This will block if not finished

In [17]:
result = ray.get(result_obj_ref)
print(f"Result: add arr1 + arr2: \n {result}")

Result: add arr1 + arr2: 
 [[  0  96 144 150 108 178 168 136  18  76]
 [  6  80 146 116  20  70 192  12 130  66]
 [110 134  24 194 104 146  14 152  78 100]
 [118  68  40  80 184 110  22  78 186  76]
 [178 178  74 104  96 172  98   6  38 100]
 [168  74 136  22  40  72  92 122 104 154]
 [140 180 112 110  98 152 188  56  64  46]
 [ 10  88 184  30 106 126 174 150 122  50]
 [102 116  58  60 186 188 104 144 160  54]
 [  2  56 164  70 178  72  20 168 170 130]]


Add the array elements and get the sum
Note we are sending objRefs as arguments to the function
Ray will resolve or fetch the value of these arrays. 

In [18]:
sum_1 = ray.get(sum_array.remote(obj_ref_arr1))
sum_2 = ray.get(sum_array.remote(obj_ref_arr2))

In [19]:
print(f'Sum of arr1: {sum_1}')
print(f'Sum of arr2: {sum_2}')

Sum of arr1: 5173
Sum of arr2: 7719


### Example 3: Generating Fibonnaci series

Let's define two functions: one runs locally or serially, the other runs on a Ray cluster (local or remote). This example is borrowed and refactored from our blog: [Writing your First Distributed Python Application with Ray](https://www.anyscale.com/blog/writing-your-first-distributed-python-application-with-ray). (This is an excellent tutorial to get started with the concept of why and when to use Ray tasks and Ray Actors. Highly recommended read!)

Another similar blog of interest is how to compute the value of **pi**: [How to scale Python multiprocessing to a cluster with one line of code](https://medium.com/distributed-computing-with-ray/how-to-scale-python-multiprocessing-to-a-cluster-with-one-line-of-code-d19f242f60ff).

In [20]:
# Local execution 
def generate_fibonacci(sequence_size):
    fibonacci = []
    for i in range(0, sequence_size):
        if i < 2:
            fibonacci.append(i)
            continue
        fibonacci.append(fibonacci[i-1]+fibonacci[i-2])
    return len(fibonacci)

In [21]:
# Remote Task with just a wrapper
@ray.remote
def generate_fibonacci_distributed(sequence_size):
    return generate_fibonacci(sequence_size)

In [22]:
# Get the number of cores 
os.cpu_count()

12

In [23]:
# Normal Python in a single process 
def run_local(sequence_size):
    results = [generate_fibonacci(sequence_size) for _ in range(os.cpu_count())]
    return results

In [24]:
%%time
run_local(100000)

CPU times: user 2.63 s, sys: 1.51 s, total: 4.14 s
Wall time: 4.12 s


[100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000]

In [25]:
# Distributed on a Ray cluster
def run_remote(sequence_size):
    results = ray.get([generate_fibonacci_distributed.remote(sequence_size) for _ in range(os.cpu_count())])
    return results

In [26]:
%%time
run_remote(100000)

CPU times: user 16.8 ms, sys: 11.5 ms, total: 28.3 ms
Wall time: 972 ms


[100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000]

### Concept of Task Dependencies

Consider the following basic remote function that returns the argument passed to it. If we pass in some normal Python objects, the results 
returned by `ray.get` should be the same objects.

In [27]:
@ray.remote
def f(x):
    return x

x1_id = f.remote(1)
ray.get(x1_id)

1

In [28]:
x2_id = f.remote([1, 2, 3])
ray.get(x2_id)

[1, 2, 3]

However, `object IDs` can also be passed into remote functions. When the function is executed, Ray will automatically substitute the underlying Python object that the object ID refers to. In a sense, it's the same as calling `ray.get` on each argument that's passed in as an argument.

In [29]:
y1_id = f.remote(x1_id)
ray.get(y1_id)

1

In [30]:
y2_id = f.remote(x2_id)
ray.get(y2_id)

[1, 2, 3]

**NOTE**: When implementing a remote function, the function should expect a regular Python object regardless of whether the caller passes in a regular Python object or an object ID.

**These task dependencies affect scheduling**. In the example above, the task that creates `y1_id` depends on the task that creates `x1_id`. This means that:

 * The second task will not be executed until the first task has finished executing.
 * If the two tasks are scheduled on different machines, the output of the first task (the value corresponding to x1_id) will be copied over the network to the machine where the second task is scheduled.

### Example 4: Task Dependencies: Aggregating Values Efficiently

Task dependencies can be used in much more sophisticated ways. For example, suppose we wish to aggregate 8 values together. This example uses naive integer addition, but in many applications, aggregating large vectors across multiple machines can be a bottleneck. In this case, changing a single line of code can change the aggregation’s running time from linear to logarithmic in the number of values being aggregated.

<img src="images/task_dependencies_graphs.png" height="50%" width="70%">

In [35]:
# define a task to add two intergers
@ray.remote
def add(x, y):
    time.sleep(1)
    return x + y

#### Add values the slow approach

In [36]:
values = [i for i in range(1, 9)]
values

[1, 2, 3, 4, 5, 6, 7, 8]

In [37]:
%%time

while len(values) > 1:
    values = [add.remote(values[0], values[1])] + values[2:]
    print(values)
result = ray.get(values[0])
print(result)

[ObjectRef(89af82725933373effffffffffffffffffffffff0100000001000000), 3, 4, 5, 6, 7, 8]
[ObjectRef(5168ff79929289e3ffffffffffffffffffffffff0100000001000000), 4, 5, 6, 7, 8]
[ObjectRef(3e43f22e6ab31cdcffffffffffffffffffffffff0100000001000000), 5, 6, 7, 8]
[ObjectRef(594c3bb38e594811ffffffffffffffffffffffff0100000001000000), 6, 7, 8]
[ObjectRef(64ac0404a8f0916fffffffffffffffffffffffff0100000001000000), 7, 8]
[ObjectRef(cf9aed5eec5a308bffffffffffffffffffffffff0100000001000000), 8]
[ObjectRef(4f4ef6205ce35f90ffffffffffffffffffffffff0100000001000000)]
36
CPU times: user 50.7 ms, sys: 43.5 ms, total: 94.2 ms
Wall time: 7.04 s


#### Add values the faster approach

In [40]:
%%time

while len(values) > 1:
    values = values[2:] + [add.remote(values[0], values[1])]
result = ray.get(values[0])
print(result)

36
CPU times: user 536 µs, sys: 282 µs, total: 818 µs
Wall time: 689 µs


In [41]:
@ray.remote
def f():
    return 1

@ray.remote
def g():
    # Call f() 4 times and return the resulting object IDs.
    results = []
    for _ in range(4):
      results.append(f.remote())
    return results

@ray.remote
def h():
    # Call f() 4 times, block until those 4 tasks finish,
    # retrieve the results, and return the values.
    results = []
    for _ in range(4):
      results.append(f.remote())
    return ray.get(results)

Then calling `g` and `h` produces the following behavior.

In [42]:
ray.get(g.remote())

[ObjectRef(3747422454d6e3c3ffffffffffffffffffffffff0100000001000000),
 ObjectRef(f89232946911ffa3ffffffffffffffffffffffff0100000001000000),
 ObjectRef(0f3b51d123bcf084ffffffffffffffffffffffff0100000001000000),
 ObjectRef(b1c178fa7653e451ffffffffffffffffffffffff0100000001000000)]

In [43]:
ray.get(h.remote())

[1, 1, 1, 1]

In [44]:
# Normally will want to shutdown
ray.shutdown()

---

### Exercises

1. Increase the fibonacci with 200K, 300K
2. Add a compute intensive function; pick some function from your repo

### Homework
1. Try adding local [bubble sort](https://www.geeksforgeeks.org/python-program-for-bubble-sort/) and remote bubble sort
2. Do you see the difference for small and large numbers?
3. Read this [blog](https://www.anyscale.com/blog/parallelizing-python-code) and try some examples.

### References

1. [Modern Parallel and Distributed Python: A Quick Tutorial on Ray](https://towardsdatascience.com/modern-parallel-and-distributed-python-a-quick-tutorial-on-ray-99f8d70369b8) by Robert Nishihara, co-creator of Ray and co-founder Anyscale
2. [Ray Core Introduction](https://www.anyscale.com/events/2022/02/03/introduction-to-ray-core-and-its-ecosystem) by Jules S. Damji