# A Guided Tour of Ray Core: Remote Tasks

© 2019-2022, Anyscale. All Rights Reserved

📖 [Back to Table of Contents](./ex_00_tutorial_overview.ipynb)<br>
➡ [Next notebook](./ex_02_remote_objs.ipynb) <br>

### Introduction

Ray enables arbitrary functions to be executed asynchronously on separate Python workers. These asynchronous Ray functions are called “tasks.” You can specify task's resource requirements in terms of CPUs, GPUs, and custom resources. These resource requests are used by the cluster scheduler to distribute tasks across the cluster for parallelized execution.  

<img src="images/py_2_ray.png" height="55%" width="50%">

### Learning objectives
In this this tutorial, we learn about:
 * Remote Task Parallel Pattern
 * Stateless remote functions as distributed tasks
 * Serial vs Parallel execution 
 * Understand the concept of a Ray task 
 * Easy API to convert an existing Python function into a Ray remote task
 * Walk through a map-reduce and distribute batch inference use cases

In [2]:
import os
import time
import logging

import numpy as np
import pandas as pd
import pyarrow.parquet as pq
from numpy import loadtxt
import ray

## 1. Tasks Parallel Pattern

Ray converts decorated functions with `@ray.remote` into stateless tasks, scheduled anywhere on a Ray node's worker in the cluster. 

Where they will be executed (and by whom), you don't have to worry about its details. All that is taken care for you. Nor do 
you have to reason about it — all that burden is Ray's job. You simply take your existing Python functions and covert them into 
distributed stateless *Ray Tasks*: **as simple as that!**

### Example 1: Serial vs Parallelism

Let's look at simple tasks running serially and then in parallel. For illustration, we'll use a simple task, but this could be a compute-intensive task as part of your workload.


There are a few key differences between the original function and the decorated one:

**Invocation**: The regular version is called with `regular_function()`, whereas the remote version is called with `remote_function.remote()`. Keep this pattern in mind for all Ray remote execution methods.

**Mode of execution and return values**: `regular_function` executes synchronously and returns the result of the function as the value `1` (in our case), whereas `remote_function` immediately returns an `ObjectID` (a future) and then executes the task in the background on a remote worker process. The result of the future is obtained by calling `ray.get` on the `ObjectID`. This is a blocking function.

In [21]:
# A regular Python function.
def regular_function():
    time.sleep(1)
    return 1

In [22]:
# A Ray remote function.
@ray.remote
def remote_function():
    time.sleep(1)
    return 1

Let's launch a Ray cluster on our local machine.

In [3]:
if ray.is_initialized:
    ray.shutdown()
ray.init(logging_level=logging.ERROR)

0,1
Python version:,3.8.13
Ray version:,2.0.0
Dashboard:,http://127.0.0.1:8265


In [24]:
# Let's invoke the regular function
assert regular_function() == 1

In [25]:
# Let's invoke the remote function.
obj_ref = remote_function.remote()
obj_ref

ObjectRef(85748392bcd969ccffffffffffffffffffffffff0100000001000000)

In [26]:
assert ray.get(obj_ref) == 1

#### Serial execution in Python with no parallelism
Invocations of `regular_function` in a comprehension loop happens `serially`:

In [27]:
# These are executed one at a time, back-to-back, in a list comprehension
results = [regular_function() for _ in range(10)]
assert sum(results) == 10

#### Parallel execution in Python with Ray

Invocations of `remote_function` in a loop happen `asynchronously` and in parallel:

In [28]:
# Executing these functions, in comprehension list, happens at the same time in the background, 
# and we get the results using ray.get.

results = [remote_function.remote() for _ in range(10)]
assert sum(ray.get(results)) == 10

### Example 2: Adding two np arrays

<img src="images/task_api_add_array.png" width="50%" height="25%">

Define a function as a Ray task to read an array

In [29]:
@ray.remote
def read_array(fn: str) -> np.array:
    arr = loadtxt(fn, comments="#", delimiter=",", unpack=False)
    return arr.astype('int')

Define a function as a Ray task to add two np arrays return the sum

In [30]:
@ray.remote
def add_array(arr1: np.array, arr2: np.array) -> np.array:
    return np.add(arr1, arr1)

Define a function as a Ray task to sum the contents of an np array

In [31]:
@ray.remote
def sum_array(arr1: np.array) -> int:
    return np.sum(arr1)

Now let's execute our tasks. For now we will run Ray locally on our laptop or on a single node, with potential access to utilize all the available cores when necessary.

Ray executes immediately and returns an object reference `ObjectRef` as a future. This enables Ray to parallelize tasks and execute them asynchronously.

### Read both arrays. 

Use the `func_name.remote(args)` extention to invoke a remote Ray Task

In [32]:
obj_ref_arr1 = read_array.remote(os.path.abspath("data/file_1.txt"))
print(f"array 1: {obj_ref_arr1}")

array 1: ObjectRef(465c0fb8d6cb3cdcffffffffffffffffffffffff0100000001000000)


In [33]:
obj_ref_arr2 = read_array.remote(os.path.abspath("data/file_2.txt"))
print(f"array 2: {obj_ref_arr2}")

array 2: ObjectRef(3d3e27c54ed1f5cfffffffffffffffffffffffff0100000001000000)


### Add both arrays

Let's add our two arrays by calling the remote method. *Note*: We are sending Ray `ObjectRef` references as arguments. Those arguments will be resolved inline and fetched from owner's object store. That is, the cluster node that creates the `ObjectRef` owns the meta data associated and stores it in its object store. 

Ray scheduler is aware of where these object references reside or who owns them, so it will schedule this remote task on node on the worker process for data locality.

In [34]:
result_obj_ref = add_array.remote(obj_ref_arr1, obj_ref_arr2)
result_obj_ref

ObjectRef(cae5e964086715a4ffffffffffffffffffffffff0100000001000000)

### Fetch the result 

This will task if not finished will block during `.get(object_ref`)

In [35]:
result = ray.get(result_obj_ref)
print(f"Result: add arr1 + arr2: \n {result}")

Result: add arr1 + arr2: 
 [[  0  96 144 150 108 178 168 136  18  76]
 [  6  80 146 116  20  70 192  12 130  66]
 [110 134  24 194 104 146  14 152  78 100]
 [118  68  40  80 184 110  22  78 186  76]
 [178 178  74 104  96 172  98   6  38 100]
 [168  74 136  22  40  72  92 122 104 154]
 [140 180 112 110  98 152 188  56  64  46]
 [ 10  88 184  30 106 126 174 150 122  50]
 [102 116  58  60 186 188 104 144 160  54]
 [  2  56 164  70 178  72  20 168 170 130]]


Add the array elements within an `np.array` and get the sum. 
**Note** that we are sending `ObjRefs` as arguments to the function. Ray will resolve or fetch the value of these arrays. 

In [36]:
sum_1 = ray.get(sum_array.remote(obj_ref_arr1))
sum_2 = ray.get(sum_array.remote(obj_ref_arr2))

In [37]:
print(f'Sum of arr1: {sum_1}')
print(f'Sum of arr2: {sum_2}')

Sum of arr1: 5173
Sum of arr2: 7719


### Any questions?

### Example 3: Generating Fibonnaci series

Let's define two functions: one runs locally or serially, the other runs on a Ray cluster (local or remote). This example is borrowed and refactored from our 
blog: [Writing your First Distributed Python Application with Ray](https://www.anyscale.com/blog/writing-your-first-distributed-python-application-with-ray). 
(This is an excellent tutorial to get started with the concept of why and when to use Ray tasks and Ray Actors. Highly recommended read!)

Another similar blog of interest is how to compute the value of **pi**: [How to scale Python multiprocessing to a cluster with one line of code](https://medium.com/distributed-computing-with-ray/how-to-scale-python-multiprocessing-to-a-cluster-with-one-line-of-code-d19f242f60ff).

In [38]:
# Function for local execution 
def generate_fibonacci(sequence_size):
    fibonacci = []
    for i in range(0, sequence_size):
        if i < 2:
            fibonacci.append(i)
            continue
        fibonacci.append(fibonacci[i-1]+fibonacci[i-2])
    return len(fibonacci)

In [39]:
# Function for remote Ray task with just a wrapper
@ray.remote
def generate_fibonacci_distributed(sequence_size):
    return generate_fibonacci(sequence_size)

In [40]:
# Get the number of cores 
os.cpu_count()

10

In [41]:
# Normal Python in a single process 
def run_local(sequence_size):
    results = [generate_fibonacci(sequence_size) for _ in range(os.cpu_count())]
    return results

In [42]:
%%time
run_local(100000)

CPU times: user 1.54 s, sys: 550 ms, total: 2.09 s
Wall time: 2.08 s


[100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000]

In [43]:
# Distributed on a Ray cluster
def run_remote(sequence_size):
    results = ray.get([generate_fibonacci_distributed.remote(sequence_size) for _ in range(os.cpu_count())])
    return results

In [44]:
%%time
run_remote(100000)

CPU times: user 7.28 ms, sys: 13.3 ms, total: 20.6 ms
Wall time: 368 ms


[100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000,
 100000]

### Any questions?

### Example 4:  Use case of `tasks` for map-reduce

The `map-reduce` pattern is a good use case for writing distributed applications with Ray core APIs. For _map_, this example uses Ray tasks to execute a 
given function multiple times in parallel (on a separate process on a node).  

We then use `ray.get`, as part of the `reduce` process, to fetch the results of each of these functions.

<img src="https://docs.ray.io/en/latest/_images/map-reduce.svg">

### Single-threaded map 

In [4]:
items = list(range(100))
map_func = lambda i : i * 2
output = [map_func(i) for i in items]
print(output)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198]


### Ray parallel map
Use the `@ray.remote` decorator to convert this `map`function into a Ray task. It takes an object and func argument and invokes the function to process the object.
Simple and elegant!

In [5]:
@ray.remote
def map(obj, f):
    return f(obj)

In [6]:
items = list(range(100))
map_func = lambda i : i * 2

# map.remote() will return an objRef to the computed value. We fetch
# that value using ray.get
output = ray.get([map.remote(i, map_func) for i in items])
print(output)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198]


### Single-threaded reduce
For reduce, let's imagine that we want to sum up the numbers computed from
our map function.

In [7]:
items = list(range(100))
map_func = lambda i : i * 2
output = sum([map_func(i) for i in items])
output

9900

### Ray parallel map and reduce

In [8]:
@ray.remote
def map(obj, f):
    return f(obj)

# Our reduce operation is expecting multipe arguments.
# It sums up all arguments using np.sum(elements)
@ray.remote
def sum_results(*elements):
    return np.sum(elements)

Let's do our Ray parallel map. Note that comprehension list is a collection of `ObjRefs`, each element returned by `map.remote(i, map_func)`

In [9]:
items = list(range(100))
map_func = lambda i : i * 2
remote_elements = [map.remote(i, map_func) for i in items]
remote_elements[:2]

[ObjectRef(298e3e66d66deed9ffffffffffffffffffffffff0100000001000000),
 ObjectRef(664a780010703836ffffffffffffffffffffffff0100000001000000)]

#### Simple reduce
The `sum_results.remote()` as a reduce step returns the `ObjectRef` to results
of all the values in the elements.

In [10]:
# simple reduce
remote_final_sum = sum_results.remote(*remote_elements)
# fetch the reduce sumed result
result = ray.get(remote_final_sum)
print(result)

9900


### Tree reduce
Simply break into intermediate results, followed by the final reduce. 
In short break into five groups of 20 object references, and then final
reduce.

In [11]:
# Tree reduce using comprehension list. 
# Split in five groups of 20 ObjecRefs for intermediate reduce,
# followed by final reduce, for all 100 elements
intermediate_results = [sum_results.remote(
    *remote_elements[i * 20: (i + 1) * 20]) for i in range(5)]

# Get the reduce results of these groups
remote_final_sum = sum_results.remote(*intermediate_results)
result = ray.get(remote_final_sum)
print(result)

9900


### Example 5:  How to use Tasks for distributed batch inference 

Batch inference is a common distributed application workload in machine learning. It's a process of using a trained model to generate predictions for a collection of observations. 
Primarily, it has the following elements:

**Input dataset**: This is a large collection of observations to generate predictions for. The data is usually stored in an external storage system like S3, HDFS or database, across
many files.
**ML model**: This is a trained ML model that is usually also stored in an external storage system or in a model store.
**Predictions**: These are the outputs when applying the ML model on observations. Normally, predictions are usually written back to the storage system.

For purpose of this tutorial, we make the following provisions:
 * create a dummy model that returns some fake prediction
 * use real-world NYC taxi data to provide large data set for batch inference
 * return the predictions instead of writing it back to the disk

As an example of scaling pattern called Different Data Same Function (DDSF), also known as Distributed Data Parallel (DDP) paradigm, our function in this digaram is the 
pretrained **model** and the data is split and disributed as **shards**.

<img src="images/batch-inference.png" width="25%" height="25%">


Define a Python closure to load our pretrained model. This model is just a fake model that predicts whether a 
tip is warranted continent on the number of fares (2 or more) on collective rides.

**Note**: This prediction is fake. The real model will invoke model's `model.predict(input_data)`. Yet
it suffices for this example.

In [12]:
def load_trained_model():
    # A fake model that predicts whether tips were given based on number of passengers in the taxi cab.
    def model(batch: pd.DataFrame) -> pd.DataFrame:
        # Some model payload so Ray copies the model in the shared plasma store to tasks scheduled across nodes.
        model.payload = np.arange(100, 100_000_000, dtype=float)
        model.cls = "regression"
        
        # give a tip if 2 or more passengers
        predict = batch["passenger_count"] >= 2 
        return pd.DataFrame({"score": predict})
    
    return model    

Let's define a Ray task that will handle each shard of the NYC taxt data

In [13]:
@ray.remote
def make_model_batch_predictions(model, shard_path):
    print(f"Batch inference for shard file: {shard_path}")
    df = pq.read_table(shard_path).to_pandas()
    result = model(df)

    # Return our prediction data frame
    return result

Get the 12 files consisting of NYC data per month

In [14]:
# 12 files, one for each remote task.
input_files = [
    f"s3://anonymous@air-example-data/ursa-labs-taxi-data/downsampled_2009_full_year_data.parquet"
    f"/fe41422b01c04169af2a65a83b753e0f_{i:06d}.parquet" for i in range(12)
]

`ray.put()` the model just once to local object store, and then pass the reference to the remote tasks.
This is Ray core API for putting objects into the Ray Plasma store. We discuss these APIs and Plasma store
in the next tutorial. 

In [15]:
# Get the model 
model = load_trained_model()

# Put the model object into the shared object store.
model_ref = ray.put(model)
model_ref

ObjectRef(00ffffffffffffffffffffffffffffffffffffff0100000001000000)

In [16]:
# List for holding all object references from the model's predictions
result_refs = []

# Launch all prediction tasks. For each file create a Ray remote task
# to do a batch inference
for file in input_files:
    
    # Launch a prediction task by passing model reference and shard file to it.
    # NOTE: it would be highly inefficient if you are passing the model itself
    # like  make_model_prediction.remote(model, file), which in order to pass the model
    # to remote node will ray.put(model) for each task, potentially overwhelming
    # the local object store and causing out-of-memory or out-of-disk error.
    result_refs.append(make_model_batch_predictions.remote(model_ref, file))

[2m[36m(make_model_batch_predictions pid=82481)[0m Batch inference for shard file: s3://anonymous@air-example-data/ursa-labs-taxi-data/downsampled_2009_full_year_data.parquet/fe41422b01c04169af2a65a83b753e0f_000002.parquet
[2m[36m(make_model_batch_predictions pid=82484)[0m Batch inference for shard file: s3://anonymous@air-example-data/ursa-labs-taxi-data/downsampled_2009_full_year_data.parquet/fe41422b01c04169af2a65a83b753e0f_000009.parquet
[2m[36m(make_model_batch_predictions pid=82486)[0m Batch inference for shard file: s3://anonymous@air-example-data/ursa-labs-taxi-data/downsampled_2009_full_year_data.parquet/fe41422b01c04169af2a65a83b753e0f_000001.parquet
[2m[36m(make_model_batch_predictions pid=82485)[0m Batch inference for shard file: s3://anonymous@air-example-data/ursa-labs-taxi-data/downsampled_2009_full_year_data.parquet/fe41422b01c04169af2a65a83b753e0f_000007.parquet
[2m[36m(make_model_batch_predictions pid=82479)[0m Batch inference for shard file: s3://anony

Fetch the results

In [17]:
results = ray.get(result_refs)

In [18]:
 # Let's check predictions and output size.
for r in results:
    print(f"Predictions dataframe size: {len(r)} | Total score for tips: {r['score'].sum()}")

Predictions dataframe size: 141062 | Total score for tips: 46360
Predictions dataframe size: 133932 | Total score for tips: 42175
Predictions dataframe size: 144014 | Total score for tips: 45175
Predictions dataframe size: 143087 | Total score for tips: 45510
Predictions dataframe size: 148108 | Total score for tips: 47713
Predictions dataframe size: 141981 | Total score for tips: 45188
Predictions dataframe size: 136394 | Total score for tips: 43234
Predictions dataframe size: 136999 | Total score for tips: 45142
Predictions dataframe size: 139985 | Total score for tips: 44138
Predictions dataframe size: 156198 | Total score for tips: 49909
Predictions dataframe size: 142893 | Total score for tips: 46112
Predictions dataframe size: 145976 | Total score for tips: 48036


In [19]:
ray.shutdown()

---

### Exercises

1. Increase the fibonacci with 200K, 300K
2. Add a compute intensive function; pick some function from your repo and convert to a remote task.
3. (Optional) Run how to [compute PI](extra/highly_parallel.ipynb). **Note**: You can tweak with the `FULL_SAMPLE_COUNT`, to adjust the accuracy to the value of `math.pi`. `100 billion samples` may take too long. Play with this number.

### Homework
1. Try writing a map-reduce app for word or character count in a list. Try first with simple case of a few lines, then extend it to a large file.
2. Try using local [bubble sort](https://www.geeksforgeeks.org/python-program-for-bubble-sort/) and remote bubble sort
3. Do you see the difference for small and large numbers?
4. Read this [blog](https://www.anyscale.com/blog/parallelizing-python-code) and try some examples.
5. Take an existing regression model, save it as in a model format, use this scaling technique to do batch inference at scale and in parallel

### Next Step

Let's move on to the distributed [remote objects lesson](ex_02_remote_objs.ipynb).

### References

1. [Modern Parallel and Distributed Python: A Quick Tutorial on Ray](https://towardsdatascience.com/modern-parallel-and-distributed-python-a-quick-tutorial-on-ray-99f8d70369b8) by Robert Nishihara, co-creator of Ray and co-founder Anyscale
2. [Ray Core Introduction](https://www.anyscale.com/events/2022/02/03/introduction-to-ray-core-and-its-ecosystem) by Jules S. Damji

📖 [Back to Table of Contents](./ex_00_tutorial_overview.ipynb)<br>
➡ [Next notebook](./ex_02_remote_objs.ipynb) <br>