# A Guided Tour of Ray Core: Remote Objects

© 2019-2022, Anyscale. All Rights Reserved

📖 [Back to Table of Contents](./ex_00_tutorial_overview.ipynb)<br>
➡ [Next notebook](./ex_03_remote_classes.ipynb) <br>
⬅️ [Previous notebook](./ex_01_remote_funcs.ipynb) <br>

### Overview


In Ray, tasks and actors create and compute on objects. We refer to these objects as remote objects because they can be stored anywhere in a Ray cluster, and we use object refs to refer to them. Remote objects are cached in Ray’s distributed shared-memory object store, and there is one object store per node in the cluster. In the cluster setting, a remote object can live on one or many nodes, independent of who holds the object ref(s).

[*Remote Objects*](https://docs.ray.io/en/latest/walkthrough.html#objects-in-ray)
reside in a distributed [*shared-memory object store*](https://en.wikipedia.org/wiki/Shared_memory).

Objects are immutable and can be accessed from anywhere on the cluster, as they are stored in the cluster shared memory. An object ref is essentially a pointer or a unique ID that can be used to refer to a remote object without seeing its value. If you’re familiar with futures in Python, Java or Scala, Ray object refs are conceptually similar.


In general, small objects are stored in their owner’s **in-process store** (**<=100KB**), while large objects are stored in the **distributed object store**. This decision is meant to reduce the memory footprint and resolution time for each object. Note that in the latter case, a placeholder object is stored in the in-process store to indicate the object has been promoted to shared memory.

In the case if there is no space in the shared-memory, objects are spilled over to disk. But the main point here is that
shared-memory allows _zero-copy_ access to processes on the same worker node.

<img src="images/shared_memory_plasma_store.png"  height="40%" width="65%">

### Learning objectives

In this tutorial, you learn about:
 * Ray Futures as one of the patterns
 * Ray's distributed Plasma object store
 * How obejcts are stored and fetched from the distributed shared object store
     * Use `ray.get` and `ray.put` examples


### 2. Object references as futures pattern

First, let's start Ray…

In [4]:
import logging
import numpy as np
import ray

In [5]:
if ray.is_initialized:
    ray.shutdown()
ray.init(logging_level=logging.ERROR)

0,1
Python version:,3.8.13
Ray version:,2.0.0rc1
Dashboard:,http://127.0.0.1:8266


### Remote Objects example

To start, we'll create some python objects and put them in shared memory

In [6]:
num_list = [23, 42, 93]

# returns an objectRef
obj_ref = ray.put(num_list)
obj_ref

ObjectRef(00ffffffffffffffffffffffffffffffffffffff0100000002000000)

Then retrieve the value of this object reference. 

Small objects are resolved by copying them directly from the _owner’s_ **in-process store**. For example, if the owner calls `ray.get`, the system looks up and deserializes the value from the local **in-process store**. For larger objects greater than 100KB, they will be stored in the distributed object store.

In [7]:
val = ray.get(obj_ref)
val

[23, 42, 93]

You can gather the values of multiple object references in parallel using a list comprehension:
 1. Each value is put in the object store and its `ObjRefID` is immediately returned
 2. The comprehension constructs a list of `ObjRefIDs` for each element in the loop
 3. A final `get(list_obj_refs`) is invoked to fetch the list

In [8]:
results = ray.get([ray.put(i) for i in range(10)])
results

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

### Passing Objects by Reference

Ray object references can be freely passed around a Ray application. This means that they can be passed as arguments to tasks, actor methods, and even stored in other objects. Objects are tracked via distributed reference counting, and their data is automatically freed once all references to the object are deleted.

In [13]:
# Define a Task
@ray.remote
def echo(x):
    print(f"current value of argument x: {x}")

In [14]:
# Define some variables
x = list(range(10))
obj_ref_x = ray.put(x)
y = 25

### Pass-by-value

Send the object to a task as a top-level argument.
The object will be *de-referenced* automatically, so the task only sees its value.

In [15]:
# send y as value argument
echo.remote(y)

ObjectRef(c2668a65bda616c1ffffffffffffffffffffffff0100000001000000)

[2m[36m(echo pid=63965)[0m current value of argument x: 25


In [16]:
# send a an object reference
# note that the echo function deferences it
echo.remote(obj_ref_x)

ObjectRef(32d950ec0ccf9d2affffffffffffffffffffffff0100000001000000)

[2m[36m(echo pid=63965)[0m current value of argument x: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


### Pass-by-reference

When a parameter is passed inside a Python list or as any other data structure,
the *object ref is preserved*, meaning it's not *de-referenced*. The object data is not transferred to the worker when it is passed by reference, until `ray.get()` is called on the reference.

You can pass by reference in two ways:
 1. as a dictionary `.remote({"obj": obj_ref_x})`
 2. as list of objRefs `.remote([obj_ref_x])`

In [17]:
x = list(range(20))
obj_ref_x = ray.put(x)
# Echo will not automaticall de-reference it
echo.remote({"obj": obj_ref_x})

ObjectRef(e0dc174c83599034ffffffffffffffffffffffff0100000001000000)

[2m[36m(echo pid=63965)[0m current value of argument x: {'obj': ObjectRef(00ffffffffffffffffffffffffffffffffffffff010000000f000000)}


In [18]:
echo.remote([obj_ref_x])

ObjectRef(f4402ec78d3a2607ffffffffffffffffffffffff0100000001000000)

[2m[36m(echo pid=63965)[0m current value of argument x: [ObjectRef(00ffffffffffffffffffffffffffffffffffffff010000000f000000)]


### Any questions?

## What about long running tasks?

Sometimes, you may have tasks that are long running, past their expected times due to some problem, maybe blocked on accessing a variable in the object store. How do you exit or terminate it? Use a timeout!

Now let's set a timeout to return early from an attempted access of a remote object that is blocking for too long...

In [19]:
import time

@ray.remote
def long_running_function ():
    time.sleep(10)
    return 42

You can control how long you want to wait for the task to finish

In [20]:
%%time

from ray.exceptions import GetTimeoutError

obj_ref = long_running_function.remote()

try:
    ray.get(obj_ref, timeout=6)
except GetTimeoutError:
    print("`get` timed out")

`get` timed out
CPU times: user 29.9 ms, sys: 16.3 ms, total: 46.2 ms
Wall time: 6.04 s


In [22]:
ray.shutdown()

### Exercises

1. Create a list of object references containing integers returned by `ray.put(x)` 
 * Use comprehension to construct this list 
 * write a Ray task, `my_function.remote(list_of_object_refs)`, and return the sum of the list.
  * Use `ray.get(...)` to print the sum 
2. Create large lists and python dictionaries, put them in object store
  * Write a Ray task to process them.

### Homework

1. Read references to get advanced deep dives and more about Ray objects
2. [Serialization](https://docs.ray.io/en/latest/ray-core/objects/serialization.html)
3. [Memory Management](https://docs.ray.io/en/latest/ray-core/objects/memory-management.html)
4. [Object Spilling](https://docs.ray.io/en/latest/ray-core/objects/object-spilling.html)
5. [Fault Tolerance](https://docs.ray.io/en/latest/ray-core/objects/fault-tolerance.html)

### Next Step

We covered how to use Ray `tasks`, `ray.get()` and `ray.put`, understand distributed remote object store, let's move on to the [Ray Actors lesson](ex_03_remote_classes.ipynb).

## References

 * [Ray Architecture Reference](https://docs.google.com/document/d/1lAy0Owi-vPz2jEqBSaHNQcy2IBSDEHyXNOQZlGuj93c/preview#)
 * [Ray Internals: A peek at ray,get](https://www.youtube.com/watch?v=a1kNnQu6vGw)
 * [Ray Internals: Object management with Ownership Model](https://www.youtube.com/watch?v=1oSBxTayfJc)
 * [Deep Dive into Ray scheduling Policies](https://www.youtube.com/watch?v=EJUYKXWGzfI)
 * [Redis in Ray: Past and future](https://www.anyscale.com/blog/redis-in-ray-past-and-future)
 * [StackOverFlow: How Ray Shares Data](https://stackoverflow.com/questions/58082023/how-exactly-does-ray-share-data-to-workers/71500979#71500979)
 

📖 [Back to Table of Contents](./ex_00_tutorial_overview.ipynb)<br>
➡ [Next notebook](./ex_03_remote_classes.ipynb) <br>
⬅️ [Previous notebook](./ex_01_remote_funcs.ipynb) <br>