# A Guided Tour of Ray Core: Remote Objects

© 2019-2022, Anyscale. All Rights Reserved

📖 [Back to Table of Contents](./ex_00_tutorial_overview.ipynb)<br>
➡ [Next notebook](./ex_03_remote_classes.ipynb) <br>
⬅️ [Previous notebook](./ex_01_remote_funcs.ipynb) <br>

### Overview


In Ray, tasks and actors create and compute on objects. We refer to these objects as remote objects because they can be stored anywhere in a Ray cluster, and we use object refs to refer to them. Remote objects are cached in Ray’s distributed shared-memory object store, and there is one object store per node in the cluster. In the cluster setting, a remote object can live on one or many nodes, independent of who holds the object ref(s).

[*Remote Objects*](https://docs.ray.io/en/latest/walkthrough.html#objects-in-ray)
reside in a distributed [*shared-memory object store*](https://en.wikipedia.org/wiki/Shared_memory).

Objects are immutable and can be accessed from anywhere on the cluster, as they are stored in the cluster shared memory. An object ref is essentially a pointer or a unique ID that can be used to refer to a remote object without seeing its value. If you’re familiar with futures in Python, Java or Scala, Ray object refs are conceptually similar.


In general, small objects are stored in their owner’s **in-process store** (**<=100KB**), while large objects are stored in the **distributed object store**. This decision is meant to reduce the memory footprint and resolution time for each object. Note that in the latter case, a placeholder object is stored in the in-process store to indicate the object has been promoted to shared memory.

In the case if there is no space in the shared-memory, objects are spilled over to disk. But the main point here is that
shared-memory allows _zero-copy_ access to processes on the same worker node.

<img src="images/shared_memory_plasma_store.png"  height="40%" width="65%">

### Learning objectives

In this tutorial, you learn about:
 * Ray Futures as one of the patterns
 * Ray's distributed Plasma object store
 * How obejcts are stored and fetched from the distributed shared object store
     * Use `ray.get` and `ray.put` examples
 * How to use tasks and object store to train and tune regression model: sequential vs. parallel
     


### 2. Object references as futures pattern

First, let's start Ray…

In [1]:
import logging
import numpy as np
import ray

In [2]:
if ray.is_initialized:
    ray.shutdown()
ray.init(logging_level=logging.ERROR)

0,1
Python version:,3.8.13
Ray version:,2.0.1
Dashboard:,http://127.0.0.1:8265


### Remote Objects example

To start, we'll create some python objects and put them in shared memory

In [3]:
num_list = [23, 42, 93]

# returns an objectRef
obj_ref = ray.put(num_list)
obj_ref

ObjectRef(00ffffffffffffffffffffffffffffffffffffff0100000001000000)

Then retrieve the value of this object reference. 

Small objects are resolved by copying them directly from the _owner’s_ **in-process store**. For example, if the owner calls `ray.get`, the system looks up and deserializes the value from the local **in-process store**. For larger objects greater than 100KB, they will be stored in the distributed object store.

In [4]:
val = ray.get(obj_ref)
val

[23, 42, 93]

You can gather the values of multiple object references in parallel using a list comprehension:
 1. Each value is put in the object store and its `ObjRefID` is immediately returned
 2. The comprehension constructs a list of `ObjRefIDs` for each element in the loop
 3. A final `get(list_obj_refs`) is invoked to fetch the list

In [5]:
results = ray.get([ray.put(i) for i in range(10)])
results

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

### Passing Objects by Reference

Ray object references can be freely passed around a Ray application. This means that they can be passed as arguments to tasks, actor methods, and even stored in other objects. Objects are tracked via distributed reference counting, and their data is automatically freed once all references to the object are deleted.

In [6]:
# Define a Task
@ray.remote
def echo(x):
    print(f"current value of argument x: {x}")

In [7]:
# Define some variables
x = list(range(10))
obj_ref_x = ray.put(x)
y = 25

### Pass-by-value

Send the object to a task as a top-level argument.
The object will be *de-referenced* automatically, so the task only sees its value.

In [8]:
# send y as value argument
echo.remote(y)

ObjectRef(c8ef45ccd0112571ffffffffffffffffffffffff0100000001000000)

[2m[36m(echo pid=96385)[0m current value of argument x: 25


In [9]:
# send a an object reference
# note that the echo function deferences it
echo.remote(obj_ref_x)

ObjectRef(16310a0f0a45af5cffffffffffffffffffffffff0100000001000000)

[2m[36m(echo pid=96385)[0m current value of argument x: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


### Pass-by-reference

When a parameter is passed inside a Python list or as any other data structure,
the *object ref is preserved*, meaning it's not *de-referenced*. The object data is not transferred to the worker when it is passed by reference, until `ray.get()` is called on the reference.

You can pass by reference in two ways:
 1. as a dictionary `.remote({"obj": obj_ref_x})`
 2. as list of objRefs `.remote([obj_ref_x])`

In [10]:
x = list(range(20))
obj_ref_x = ray.put(x)
# Echo will not automaticall de-reference it
echo.remote({"obj": obj_ref_x})

ObjectRef(c2668a65bda616c1ffffffffffffffffffffffff0100000001000000)

[2m[36m(echo pid=96385)[0m current value of argument x: {'obj': ObjectRef(00ffffffffffffffffffffffffffffffffffffff010000000d000000)}


In [11]:
echo.remote([obj_ref_x])

ObjectRef(32d950ec0ccf9d2affffffffffffffffffffffff0100000001000000)

[2m[36m(echo pid=96385)[0m current value of argument x: [ObjectRef(00ffffffffffffffffffffffffffffffffffffff010000000d000000)]


### Any questions?

## What about long running tasks?

Sometimes, you may have tasks that are long running, past their expected times due to some problem, maybe blocked on accessing a variable in the object store. How do you exit or terminate it? Use a timeout!

Now let's set a timeout to return early from an attempted access of a remote object that is blocking for too long...

In [12]:
import time

@ray.remote
def long_running_function ():
    time.sleep(10)
    return 42

You can control how long you want to wait for the task to finish

In [13]:
%%time

from ray.exceptions import GetTimeoutError

obj_ref = long_running_function.remote()

try:
    ray.get(obj_ref, timeout=6)
except GetTimeoutError:
    print("`get` timed out")

`get` timed out
CPU times: user 32.2 ms, sys: 19.6 ms, total: 51.8 ms
Wall time: 6.02 s


### Example: Hands-on code example - scaling regression 
Use Ray Tasks and Object store

In this example, you will run an illustrative code example that will give you better "feel" of Ray. Specifically, you will use Ray Core tasks and object store to scale a bare bones version of a common ML task: regression on the structured data.

#### Data

Dataset is [California House Prices](https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset) as available via scikit-learn.

|<img src="images/California_dataset.png" width="70%" loading="lazy">|
|:--|
|`n_samples = 20640`, target is numeric and corresponds to the average house value in units of 100k.|

#### Model and task

You will train and score [random forest](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) models using [mean squared error](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html) metric.

In order to find the best performing model you will train many models with varying `n_estimators` hyper-parameter. This brings the topic of sequential vs. parallel model training. You will first first implement the sequential approach, then improve it by distributing training with Ray Core - you will achieve better performance and faster model training.

### Sequential implementation

Vanilla implementation assumes sequential training. Models are trained one by one in the sequential way, as depicted on the diagram below. 

|<img src="images/sequential_timeline.png" width="70%" loading="lazy">|
|:--|
|Timeline of sequential tasks, one after the other.|

#### Preliminaries

In [14]:
# imports
import time
from operator import itemgetter

import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

#### Prepare dataset

In [15]:
X, y = fetch_california_housing(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [16]:
X.head(n=5)

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


#### Set number of models to train

In [17]:
NUM_MODELS = 10
NUM_ESTIMATORS_START = 100

#### Implement function to train and score model

This function takes data, instantiates `RandomForestRegressor` model, trains it and scores the model on the test set.

Function returns tuple:
```
(n_estimators, mse_score)
```

For example:

```
(100, 0.2596393978947323)
```

In [18]:
def train_and_score_model(
    X_train: pd.DataFrame,
    X_test: pd.DataFrame,
    y_train: pd.Series,
    y_test: pd.Series,
    n_estimators: int,
):
    start_time = time.time()  # measure wall time for single model training

    model = RandomForestRegressor(n_estimators=n_estimators, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    score = mean_squared_error(y_test, y_pred)

    time_delta = time.time() - start_time
    print(f"n_estimators={n_estimators}, mse={score:.4f}, took: {time_delta:.2f} seconds")

    return n_estimators, score

#### Implement function that runs **sequential** model training
This function train `n_models` sequentially for the increasing number of `n_estimators` (it increases by 5 for each model, so 100, 105, 110, 115, ...). 

Function returns list of tuples:
```
[(n_estimators, mse_score), (n_estimators, mse_score), ...]
```

For example:

```
[(100, 0.2596393978947323), (105, 0.26009608813335056), (110, 0.25980497900048843), (115, 0.2595780807428954), (120, 0.2600438515882204)]
```

In [19]:
def run_sequential(n_models: int):
    # Use comprehension to run n_models in a sequential manner
    return [
        train_and_score_model(
            X_train=X_train,
            X_test=X_test,
            y_train=y_train,
            y_test=y_test,
            n_estimators=NUM_ESTIMATORS_START + 5 * j,
        )
        for j in range(n_models)
    ]

#### Run sequential model training 

In [20]:
%%time

mse_scores = run_sequential(n_models=NUM_MODELS)

n_estimators=100, mse=0.2554, took: 6.48 seconds
n_estimators=105, mse=0.2558, took: 6.77 seconds
n_estimators=110, mse=0.2556, took: 7.09 seconds
n_estimators=115, mse=0.2556, took: 7.41 seconds
n_estimators=120, mse=0.2551, took: 7.74 seconds
n_estimators=125, mse=0.2551, took: 8.06 seconds
n_estimators=130, mse=0.2548, took: 8.38 seconds
n_estimators=135, mse=0.2550, took: 8.72 seconds
n_estimators=140, mse=0.2550, took: 9.06 seconds
n_estimators=145, mse=0.2549, took: 9.36 seconds
CPU times: user 1min 18s, sys: 437 ms, total: 1min 19s
Wall time: 1min 19s


Note: wall time on the M1 MacBook Pro: ~80 seconds

#### Analyse results

In [21]:
best = min(mse_scores, key=itemgetter(1))
print(f"Best model: mse={best[1]:.4f}, n_estimators={best[0]}")

Best model: mse=0.2548, n_estimators=130


Training completed, but performance is slow due to sequential nature of model training.

### Parallel implementation

Now, you use Ray to train these models in parallel, utilizing all available resources. Diagram below gives visual intuition for this setup.

|<img src="images/distributed_timeline.png" width="70%" loading="lazy">|
|:--|
|Sample timeline with ten tasks running across 4 worker nodes in parallel with minor overhead from scheduler.|

#### Put data in the object store

This allows all training tasks to fetch data object the shared memory object store.

Your data is now available for all remote Tasks and Actors in the cluster. You use Object ID to reference the object when needed.

Example Object ID look like this:

ObjectRef(00ffffffffffffffffffffffffffffffffffffff0100000002000000)

|<img src="images/x_y_training_data.png" width="70%" loading="lazy">|
|:--|

Training and test data inserted into the shared memory of the nodes 

Two presentations and blogs that show case how to reduce latency and load times using
object store are worthy reads

 * [How to Load PyTorch Models 340 Times Faster with Ray](https://medium.com/ibm-data-ai/how-to-load-pytorch-models-340-times-faster-with-ray-8be751a6944c)
 * [Zero-copy model loading with Ray and PyTorch](https://www.anyscale.com/ray-summit-2022/agenda/sessions/172)
 * [Data transfer speed comparison in a distributed ML application: Ray Plasma store vs. S3](https://www.anyscale.com/events/2022/09/29/ray-meetup-community-talks)

In [22]:
X_train_ref = ray.put(X_train)
X_test_ref = ray.put(X_test)
y_train_ref = ray.put(y_train)
y_test_ref = ray.put(y_test)

### Implement function as a Ray task to train and score model

You simple decorate the function with `ray.remote`. Note that the function is similar to one above.
This Ray task will run on a Ray cluster node's worker process.

* It is exactly the same function as in the sequential example
* You added `@ray.remote` decorator to specify that this function will be executed as a remote task in a different process (remotely)

In [23]:
@ray.remote
def train_and_score_model(
    X_train: pd.DataFrame,
    X_test: pd.DataFrame,
    y_train: pd.Series,
    y_test: pd.Series,
    n_estimators: int,
):
    start_time = time.time()  # measure wall time for single model training

    model = RandomForestRegressor(n_estimators=n_estimators, random_state=201)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    score = mean_squared_error(y_test, y_pred)

    time_delta = time.time() - start_time
    print(f"n_estimators={n_estimators}, mse={score:.4f}, took: {time_delta:.2f} seconds")

    return n_estimators, score

#### Implement function that runs **parallel** model training

You modified `run_sequential()` function to achieve parallel execution.

**Remote Tasks**

* Functions with `.remote` (as in the code above) suffix returns an `ObjectRef` associated with the computations to be done.
* When you run a remote function (Ray Task), it will immediately return an `ObjectRef` (Object Reference). It is a *promise* of future work (Python futures), meaning that the task is delegated to a worker, and `ObjectRef` is returned while the task executes in the background. This is an asynchronous operation.
* To access the expected output, you call `ray.get()` (as in the code above) on the `ObjectRef` or list of `ObjectRef`. It is a synchronous operation (blocking call). In other words: Use `ray.get()` on the returned list of `ObjectRef` to get remote objects from the object store.

Operation:

```
ray.get([ObjectRef, ObjectRef, ObjectRef, ...])
```

returns list of `(n_estimators, score)` tuples.

In [24]:
def run_parallel(n_models: int):
    return ray.get(
        [
            train_and_score_model.remote(
                X_train=X_train_ref,
                X_test=X_test_ref,
                y_train=y_train_ref,
                y_test=y_test_ref,
                n_estimators=100 + 5 * j,
            )
            for j in range(n_models)
        ]
    )

#### Run parallel model training 

In [25]:
%%time

mse_scores = run_parallel(n_models=NUM_MODELS)

[2m[36m(train_and_score_model pid=96385)[0m n_estimators=100, mse=0.2548, took: 7.57 seconds
[2m[36m(train_and_score_model pid=96380)[0m n_estimators=105, mse=0.2549, took: 7.88 seconds
[2m[36m(train_and_score_model pid=96382)[0m n_estimators=110, mse=0.2545, took: 8.19 seconds
[2m[36m(train_and_score_model pid=96386)[0m n_estimators=115, mse=0.2542, took: 8.57 seconds
[2m[36m(train_and_score_model pid=96389)[0m n_estimators=120, mse=0.2540, took: 8.95 seconds
[2m[36m(train_and_score_model pid=96383)[0m n_estimators=125, mse=0.2538, took: 9.17 seconds
[2m[36m(train_and_score_model pid=96388)[0m n_estimators=130, mse=0.2538, took: 9.71 seconds
[2m[36m(train_and_score_model pid=96384)[0m n_estimators=135, mse=0.2536, took: 9.92 seconds
[2m[36m(train_and_score_model pid=96387)[0m n_estimators=140, mse=0.2534, took: 10.18 seconds
CPU times: user 48.3 ms, sys: 31.8 ms, total: 80.1 ms
Wall time: 11.2 s
[2m[36m(train_and_score_model pid=96381)[0m n_estimators=145

Notice **11x performance gain**

* Parallel: 11s.
* Sequential: 1min 19s (~80s).


*(experiment on the M1 MacBook Pro)*

#### Analyse results

In [26]:
best = min(mse_scores, key=itemgetter(1))
print(f"Best model: mse={best[1]:.4f}, n_estimators={best[0]}")

Best model: mse=0.2532, n_estimators=145


In [27]:
ray.shutdown()

### Exercises

1. Create a list of object references containing integers returned by `ray.put(x)` 
 * Use comprehension to construct this list 
 * write a Ray task, `my_function.remote(list_of_object_refs)`, and return the sum of the list.
  * Use `ray.get(...)` to print the sum 
2. Create large lists and python dictionaries, put them in object store
  * Write a Ray task to process them.
3. Change the number of estimators and models and run both sequential and parallel

### Homework

1. Read references to get advanced deep dives and more about Ray objects
2. [Serialization](https://docs.ray.io/en/latest/ray-core/objects/serialization.html)
3. [Memory Management](https://docs.ray.io/en/latest/ray-core/objects/memory-management.html)
4. [Object Spilling](https://docs.ray.io/en/latest/ray-core/objects/object-spilling.html)
5. [Fault Tolerance](https://docs.ray.io/en/latest/ray-core/objects/fault-tolerance.html)

### Next Step

We covered how to use Ray `tasks`, `ray.get()` and `ray.put`, understand distributed remote object store, let's move on to the [Ray Actors lesson](ex_03_remote_classes.ipynb).

## References

 * [Ray Architecture Reference](https://docs.google.com/document/d/1lAy0Owi-vPz2jEqBSaHNQcy2IBSDEHyXNOQZlGuj93c/preview#)
 * [Ray Internals: A peek at ray,get](https://www.youtube.com/watch?v=a1kNnQu6vGw)
 * [Ray Internals: Object management with Ownership Model](https://www.youtube.com/watch?v=1oSBxTayfJc)
 * [Deep Dive into Ray scheduling Policies](https://www.youtube.com/watch?v=EJUYKXWGzfI)
 * [Redis in Ray: Past and future](https://www.anyscale.com/blog/redis-in-ray-past-and-future)
 * [StackOverFlow: How Ray Shares Data](https://stackoverflow.com/questions/58082023/how-exactly-does-ray-share-data-to-workers/71500979#71500979)
 

📖 [Back to Table of Contents](./ex_00_tutorial_overview.ipynb)<br>
➡ [Next notebook](./ex_03_remote_classes.ipynb) <br>
⬅️ [Previous notebook](./ex_01_remote_funcs.ipynb) <br>