# A Guided Tour of Ray Core: Remote Stateful Classes

© 2019-2022, Anyscale. All Rights Reserved

[*Remote Classes*](https://docs.ray.io/en/latest/walkthrough.html#remote-classes-actors)
involve using a `@ray.remote` decorator on a class. 

This implements an [*actor*](https://patterns.eecs.berkeley.edu/?page_id=258) pattern, with properties: *stateful*, *message-passing semantics*

Actors are extremely powerful. They allow you to take a Python class and instantiate it as a stateful microservice that can be queried from other actors and tasks and even other Python applications.

When you instantiate a remote Actor, a separate worker process is created as a worker process and becomes an Actor process on the worker node, for the purpose of running methods called on the actor. Other Ray tasks and actors can invoke its methods on that process, mutating its internal state. Actors can also be terminated manually if needed. The examples code below show all these cases.

<img src="images/ray_worker_actor_1.png" height="40%" width="70%">
<img src="images/ray_worker_actor_2.png" height="40%" width="70%">

---

First, let's start Ray…

In [1]:
import logging
import time
from pprint import pprint
import ray
import random
from random import randint
import numpy as np

In [2]:
if ray.is_initialized:
    ray.shutdown()
context = ray.init(logging_level=logging.ERROR)
pprint(context)

RayContext(dashboard_url='127.0.0.1:8265', python_version='3.8.12', ray_version='2.0.0.dev0', ray_commit='{{RAY_COMMIT_SHA}}', address_info={'node_ip_address': '127.0.0.1', 'raylet_ip_address': '127.0.0.1', 'redis_address': None, 'object_store_address': '/tmp/ray/session_2022-04-06_16-46-48_981971_12182/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2022-04-06_16-46-48_981971_12182/sockets/raylet', 'webui_url': '127.0.0.1:8265', 'session_dir': '/tmp/ray/session_2022-04-06_16-46-48_981971_12182', 'metrics_export_port': 61446, 'gcs_address': '127.0.0.1:61726', 'address': '127.0.0.1:61726', 'node_id': '7bdf7ffb4858d3bf84e0985fb519c3cc3fd0a36f2c46544c4f2823e6'})


In [4]:
print(f"Dashboard url: http://{context.address_info['webui_url']}")

Dashboard url: http://127.0.0.1:8265


## 3. Remote Class as a Stateful Actor Pattern

To start, we'll define a class and use the decorator: `@ray.remote`

Let's use Python class and convert that to a remote Actor class actor service as a Parameter Server. This is a common example in machine learning where you have a central Parameter server updating gradients from other worker processes computing individual gradients. 

<img src="https://terrytangyuan.github.io/img/inblog/mpi-operator-1.png" width="60%" height="30%">

In [5]:
@ray.remote
class ParameterSever:
    def __init__(self):
        # Initialized our gradients to zero
        self.params = np.zeros(10)

    def get_params(self):
        # Return current gradients
        return self.params

    def update_params(self, grad):
        # Update the gradients 
        self.params -= grad

Define worker or task as a function for a remote Worker process. This could be a machine learning objective function that computes gradients and sends them to the parameter server.

In [6]:
@ray.remote
def worker(ps):
    # Iterate over some epoch
    for i in range(25):
        time.sleep(1.5)  # this could be your loss function computing gradients
        grad = np.ones(10)
        # update the gradients in the parameter server
        ps.update_params.remote(grad)

Start our Parameter Server actor. This will be scheduled as a process on a remote Ray Worker. You invoke its `ActorClass.remote(...)` to instantiate an Actor instance of that type.

In [7]:
param_server = ParameterSever.remote()
param_server

Actor(ParameterSever, 938a24fad57fac3ca9bbd5ce01000000)

Let's get the initial values of the parameter server

In [8]:
print(f"Initial params: {ray.get(param_server.get_params.remote())}")

Initial params: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


### Create Workers Nodes Computing Gradients
Let's create three separate workers as our machine learning tasks that compute gradients.
These will be scheduled as tasks on a Ray cluster.

You can use list comprehension. Quite Pythonic!

If we need more workers to scale, we can always bump them up.

**Note**: That we are sending the `parameter_server` as an argument to the remote
worker task. Ray will resolve this.

In [9]:
[worker.remote(param_server) for _ in range(3)]

[ObjectRef(c2668a65bda616c1ffffffffffffffffffffffff0100000001000000),
 ObjectRef(32d950ec0ccf9d2affffffffffffffffffffffff0100000001000000),
 ObjectRef(e0dc174c83599034ffffffffffffffffffffffff0100000001000000)]

Now, let's iterate over a loop and query the Parameter Server 
as the workers are running independently and updating the gradients

In [10]:
for _i in range(20):
    print(f"Updated params: {ray.get(param_server.get_params.remote())}")
    time.sleep(1)

Updated params: [-27. -27. -27. -27. -27. -27. -27. -27. -27. -27.]
Updated params: [-30. -30. -30. -30. -30. -30. -30. -30. -30. -30.]
Updated params: [-33. -33. -33. -33. -33. -33. -33. -33. -33. -33.]
Updated params: [-33. -33. -33. -33. -33. -33. -33. -33. -33. -33.]
Updated params: [-36. -36. -36. -36. -36. -36. -36. -36. -36. -36.]
Updated params: [-39. -39. -39. -39. -39. -39. -39. -39. -39. -39.]
Updated params: [-39. -39. -39. -39. -39. -39. -39. -39. -39. -39.]
Updated params: [-42. -42. -42. -42. -42. -42. -42. -42. -42. -42.]
Updated params: [-45. -45. -45. -45. -45. -45. -45. -45. -45. -45.]
Updated params: [-45. -45. -45. -45. -45. -45. -45. -45. -45. -45.]
Updated params: [-48. -48. -48. -48. -48. -48. -48. -48. -48. -48.]
Updated params: [-51. -51. -51. -51. -51. -51. -51. -51. -51. -51.]
Updated params: [-51. -51. -51. -51. -51. -51. -51. -51. -51. -51.]
Updated params: [-54. -54. -54. -54. -54. -54. -54. -54. -54. -54.]
Updated params: [-57. -57. -57. -57. -57. -57. -

# Tree of Actors Pattern

A common pattern used in Ray libraries [Ray Tune](https://docs.ray.io/en/latest/tune/index.html), [Ray Train](https://docs.ray.io/en/latest/train/train.html), and [RLlib](https://docs.ray.io/en/latest/rllib/index.html) to train models in a parallel or conduct distributed HPO.

In this common pattern, tree of actors, a collection of workers as actors, are managed by a supervisor. For example, you want to train multiple models at the same time, while being able to checkpoint/inspect its state.

<img src="https://docs.ray.io/en/latest/_images/tree-of-actors.svg" width="50%" height="30%">

Let's implement a simple example to illustrate this pattern.

In [19]:
STATES = ["RUNNING", "PENDING", "DONE"]

class Model:

    def __init__(self, m:str):
        self._model = m

    def train(self):
        # do some training work here
        time.sleep(1)

# Factory function to return an instance of a model type
def model_factory(m: str):
    return Model(m)

### Create a Worker Actor

In [20]:
@ray.remote
class Worker(object):
    def __init__(self, m:str):
        # type of a model: lr, cl, or nn
        self._model = m                  
        
    def state(self) -> str:
        return random.choice(STATES)
    # Do the work for this model
    def work(self) -> None:
        model_factory(self._model).train()

### Create Supervisor Actor 

In [21]:
@ray.remote
class Supervisor:
    def __init__(self):
        # Create three Actor Workers, each by its unique model type
        self.workers = [Worker.remote(name) for name in ["lr", "cl", "nn"]]
                        
    def work(self):
        # do the work 
        [w.work.remote() for w in self.workers]
        
    def terminate(self):
        [ray.kill(w) for w in self.workers]
        
    def state(self):
        return ray.get([w.state.remote() for w in self.workers])

Create a Actor instance for supervisor and launch its workers

In [22]:
sup = Supervisor.remote()

# Launch remote actors as workers
sup.work.remote()

ObjectRef(ae46b8beecd25f3a8eb7de347541580cfd50348c0100000001000000)

### Look at the Ray Dashboard

You should see Actors running as process on the workders nodes
 * Parameter Server
 * Supervisor
 * Workers
 
Also, click on the `Logical View` to view more metrics and data on individual Ray Actors

In [23]:
# check their status
while True:
    # Fetch the states of all its workers
    states = ray.get(sup.state.remote())
    print(states)
    # check if all are DONE
    result = all('DONE' == e for e in states)
    if result:
        # Note: Actor processes will be terminated automatically when the initial actor handle goes out of scope in Python. 
        # If we create an actor with actor_handle = ActorClass.remote(), then when actor_handle goes out of scope and is destructed, 
        # the actor process will be terminated. Note that this only applies to the original actor handle created for the actor 
        # and not to subsequent actor handles created by passing the actor handle to other tasks.
        
        # kill supervisors all workers manually, only for illustrtation and demo
        sup.terminate.remote()

        # kill the supervisor manually, only for illustration and demo
        ray.kill(sup)
        break

['RUNNING', 'DONE', 'PENDING']
['DONE', 'DONE', 'PENDING']
['PENDING', 'RUNNING', 'RUNNING']
['PENDING', 'PENDING', 'RUNNING']
['PENDING', 'PENDING', 'DONE']
['RUNNING', 'PENDING', 'PENDING']
['RUNNING', 'RUNNING', 'DONE']
['DONE', 'RUNNING', 'RUNNING']
['PENDING', 'DONE', 'DONE']
['RUNNING', 'RUNNING', 'DONE']
['RUNNING', 'PENDING', 'DONE']
['DONE', 'PENDING', 'PENDING']
['PENDING', 'RUNNING', 'RUNNING']
['PENDING', 'PENDING', 'PENDING']
['DONE', 'DONE', 'RUNNING']
['PENDING', 'RUNNING', 'RUNNING']
['PENDING', 'DONE', 'DONE']
['RUNNING', 'RUNNING', 'RUNNING']
['PENDING', 'RUNNING', 'PENDING']
['DONE', 'RUNNING', 'RUNNING']
['DONE', 'DONE', 'PENDING']
['PENDING', 'DONE', 'PENDING']
['RUNNING', 'PENDING', 'RUNNING']
['DONE', 'PENDING', 'RUNNING']
['DONE', 'DONE', 'PENDING']
['PENDING', 'RUNNING', 'PENDING']
['PENDING', 'PENDING', 'PENDING']
['RUNNING', 'RUNNING', 'RUNNING']
['PENDING', 'DONE', 'DONE']
['RUNNING', 'PENDING', 'PENDING']
['DONE', 'DONE', 'PENDING']
['RUNNING', 'DONE', 'RUN

### Passing Actor handles to Ray Tasks

You can pass actor handle instances to remote Ray tasks, which can change its 
state. The `MessageActor` keeps or clears messages, depending on the its method
invoked.

In [26]:
@ray.remote
class MessageActor(object):
    def __init__(self):
        # Keep the state of the messages
        self.messages = []
    
    def add_message(self, message):
        self.messages.append(message)
    
    # reset and clear all messages
    def get_and_clear_messages(self):
        messages = self.messages
        self.messages = []
        return messages

Define a remote function which loops around and pushes messages to the actor, having access to a handle instance as an argument. That is, we are sending it a `MessageActor` instance handle ref as an argument to it.

In [27]:
@ray.remote
def worker(message_actor, j):
    for i in range(10):
        time.sleep(1)
        message_actor.add_message.remote(
            f"Message {i} from worker {j}.")


Create a message actor.

In [28]:
message_actor = MessageActor.remote()

Start 3 tasks that push messages to the actor.

In [29]:
[worker.remote(message_actor, j) for j in range(3)]

[ObjectRef(9e7872a82e7456d9ffffffffffffffffffffffff0100000001000000),
 ObjectRef(cd25e647a728676bffffffffffffffffffffffff0100000001000000),
 ObjectRef(57f023b5f2c83c93ffffffffffffffffffffffff0100000001000000)]

Periodically get the messages and print them.

In [30]:
for _ in range(10):
    new_messages = ray.get(message_actor.get_and_clear_messages.remote())
    print("New messages\n:", new_messages)
    time.sleep(1)

New messages
: ['Message 0 from worker 0.', 'Message 0 from worker 1.', 'Message 0 from worker 2.']
New messages
: ['Message 1 from worker 0.', 'Message 1 from worker 1.', 'Message 1 from worker 2.']
New messages
: ['Message 2 from worker 0.', 'Message 2 from worker 1.', 'Message 2 from worker 2.']
New messages
: ['Message 3 from worker 0.', 'Message 3 from worker 1.', 'Message 3 from worker 2.']
New messages
: ['Message 4 from worker 1.', 'Message 4 from worker 2.', 'Message 4 from worker 0.']
New messages
: ['Message 5 from worker 0.', 'Message 5 from worker 1.', 'Message 5 from worker 2.']
New messages
: ['Message 6 from worker 0.', 'Message 6 from worker 1.', 'Message 6 from worker 2.']
New messages
: ['Message 7 from worker 0.', 'Message 7 from worker 1.', 'Message 7 from worker 2.']
New messages
: ['Message 8 from worker 0.', 'Message 8 from worker 1.', 'Message 8 from worker 2.']
New messages
: ['Message 9 from worker 0.', 'Message 9 from worker 1.', 'Message 9 from worker 2.']


Finally, shutdown Ray

In [31]:
ray.shutdown()

### Exercises

1. Add a remote class, such as a logging actor, that keeps states by logging info (may be only in memory)
2. Implement methods that alters the state
3. Instantiate it and call its methods

### Solution hints

This solution is just a structural hint. There are few missing bits:
 * instantiation of `LoggingActor`
 * Need to use `ray.get()` to fetch the values from the object store

In [None]:
from collections import defaultdict
@ray.remote
class LoggingActor(object):
    def __init__(self):
        self.logs = defaultdict(list)
    
    def log(self, index, message):
        self.logs[index].append(message)
    
    def get_logs(self):
        return dict(self.logs)
    
@ray.remote
def run_experiment(experiment_index, logging_actor):
    for i in range(60):
        time.sleep(1)
        # Push a logging message to the actor.
        logging_actor.log.remote(experiment_index, 'On iteration {}'.format(i))    

In [None]:
# logging_actor = # TODO Instantiate Actor here
experiment_ids = []
for i in range(3):
    experiment_ids.append(run_experiment.remote(i, logging_actor))

In [None]:
logs = logging_actor.get_logs.remote()
# TODO use ray.get() to fetch the logs

### Homework
 * Read the references below

---
## References

 * [Writing your First Distributed Python Application with Ray](https://www.anyscale.com/blog/writing-your-first-distributed-python-application-with-ray)
 * [Using and Programming with Actors](https://docs.ray.io/en/latest/actors.html)
 * [Advanced Patterns and Anti-Patterns in Ray](https://docs.ray.io/en/latest/ray-design-patterns/index.htmlhttps://docs.ray.io/en/latest/ray-design-patterns/index.html)