# An Introduction to Ray: A quick start with Ray concepts and parallel primitives 
![](images/ray_header_logo.png)

## What is Ray?
Ray in an open source universal framework for writing scalable and distrubuted applications. Orginally concieved at [Berkeley RISELab](https://rise.cs.berkeley.edu/) in 2017 by the founders of [Anyscale](https://www.anyscale.com/), Ray now has a vibrant community and a growing ecosytem of machine learning libraries that leverage Rays' simple primitives for distributed computing and programming patterns. Think of Ray as layered archicture shown below. At center is the Ray core, providing all the distributed capabilites, fault tolerance, auto-scalability, APIs in Python, Java, and C++ (currently experimental). On top of Ray Core sit native libraries and several third-party libraries for running machine learning workloads.

 * **Tune**: Scalable Hyperparameter Tuning. 

 * **RLlib**: Scalable Reinforcement Learning

 * **RaySGD**: Distributed Training Wrappers

 * **Ray Serve**: Scalable and Programmable Serving

 * **Datasets**: Distributed Arrow on Ray (preview)
 
Ray can run locally on a single host or on any of the cloud providers.

![](images/ray_ecosystem.png)

In this quick-start tutorial, we are going to primarly focus on Ray core, the Python framework to write simple distributed applications—on your local host (using a single core). 
Alternatively, with a couple of lines of Python decorators, you can convert your program into a distrubuted application, taking advantage of all the cores on your machine. 
Likewise, with equally simple changes to Ray's [cluster configuration](https://docs.ray.io/en/master/configure.html), you can just as easily run your application to your cloud provider of choice.

## Why Use Ray?

While numerous Python modules and examples showcase Python's various multiprocessing modules like `multiprocessing`, the module is limited and fails to meet the rigor and demands of today's machine learning (ML) workloads, deep learning, and large scale model training. Created to focus and address these demands, Ray can run the same code on more than one machine, build applications that can leverage quick access to distributed shared memory across a cluster, implement Akka model to perform actions or manipuate shared ojects, and allows an ML engineer to develop distributed applications without system expertise and using simple and intuitive Ray primitivies. Ray along with all its native libraries and a growing third-party integrated ecosystems addresses these requirements at scale and with fault-tolerance. 

Before we dive into how to employ these primitives, let's cover some Ray concepts about parallel programming.

## Ray's Concept of Parallel Programming: Remote Functions

A single unit of execution in Ray is a _task_. A _task_ can be a Python function that can either run on a single core or it can run on multiple cores on a single host or on multiple cores on a remote cluster. What triggers what mode to run (single or parallel) is a simple Python decorator `@ray.remote` to your Python function. Rays executes _tasks_ asynchronously, meaning that your task will be put on Ray's execution 
queue as a [Python future](https://docs.python.org/3/library/asyncio-future.html). At any later point, when the task is finished, you can access this future's value using a Python `get` primitive. We will see that in a minute.

To demonstrate the Rays's merits and introduce you the concept of parallelism, we will first run a compute bound task by computing the sum of all the squares between a range of rumbers, and then use the 
same task (function) to run as a Ray task on each core.

#### Import the necessary libraries

In [69]:
import os
import time
import multiprocessing
import logging
import math
import ray

To see the benefits of Ray tasks and non Rays tasks, let's define some functions. We'll use simple function to illustrate how easy it's to use Ray's primitives to parallelize.
In reallity, these functions could be far more complex. For example, in Ray Tune, these functions could be parallized to do machine learning's hyperparameter tunning. For this
short quick start, we want to demonstrate the simplicity with which you can run Ray in local mode on your host. And with a single line of code, convert to run in parallel mode.

In [70]:
# Given a range of numbers, return the sum of its square root. For large number ranges this can be compute intensive, and best if they can be parallelized.
def none_ray_task(a: int, b: int) -> int:
    return math.sqrt(sum(x * x for x in range(a, b)))

Create a similar function as a Ray Task by simply using the `@remote`

In [71]:
@ray.remote
def ray_task(a: int, b: int) -> int:
    # get the process pid on which this task is scheduled
    print("pid={}; ppid={}".format(os.getpid(), os.getppid()))
    return math.sqrt(sum(x * x for x in range(a, b)))

### Get the number of cores on this machine

In [72]:
n_cores = multiprocessing.cpu_count()
print("Number of cores: {}".format(n_cores))

Number of cores: 12


### Execute as the local task on a single core

In [161]:
start = time.time()
for _ in range(n_cores):
    print("Local return valued: {:.2f}". format(none_ray_task(1, 10000000)))
print("Time elapsed for non Ray task: {:.2f}".format(time.time() - start))

Local return valued: 18257417214.20
Local return valued: 18257417214.20
Local return valued: 18257417214.20
Local return valued: 18257417214.20
Local return valued: 18257417214.20
Local return valued: 18257417214.20
Local return valued: 18257417214.20
Local return valued: 18257417214.20
Local return valued: 18257417214.20
Local return valued: 18257417214.20
Local return valued: 18257417214.20
Local return valued: 18257417214.20
Time elapsed for non Ray task: 10.31


### Initialize the local instance of Ray with appropriate configurations. 

That is, logging options and number of cores to use. On a local mode this `ray.init(...)` also launches a Dashboard that can be accessed on url: http://127.0.0.1:8265.

In [75]:
ray.init(ignore_reinit_error=True,
    logging_level=logging.ERROR,num_cpus=n_cores)

### Execute as Remote Ray task on multiple cores

To execute as a Ray task, we use the following Python syntax:
 * func.remote(args...), this will return a Future object. This will immediately return an object ref (a *future* in Python) and then create a task that will be executed on a worker process.
 * Use `ray.get` on the returned future object to return the computed value. The computed value are stored in-memory object store. 
 
 Note that we did not have to explicity wait; it seemed to have done all that eagerly, without us checking if the Ray task
 was finished. The easy with which we can simply execute Python functions in parallel is easy and intuitive while being quite
 Pythonic in invocation, behavior and composibility or chaining of method calls.

In [66]:
start = time.time()
for _ in range(n_cores):
    print("Remote Ray task returned value: {:.2f}".format(ray.get(ray_task.remote(1, 10000000))))
print("Time elapsed for non ray task: {:.2f}".format(time.time() - start))

[2m[36m(pid=71676)[0m pid=71676; ppid=71673
Remote Ray task returned value: 18257417214.20
[2m[36m(pid=71676)[0m pid=71676; ppid=71673
Remote Ray task returned value: 18257417214.20
[2m[36m(pid=71676)[0m pid=71676; ppid=71673
Remote Ray task returned value: 18257417214.20
[2m[36m(pid=71676)[0m pid=71676; ppid=71673
Remote Ray task returned value: 18257417214.20
[2m[36m(pid=71676)[0m pid=71676; ppid=71673
Remote Ray task returned value: 18257417214.20
[2m[36m(pid=71676)[0m pid=71676; ppid=71673
Remote Ray task returned value: 18257417214.20
[2m[36m(pid=71676)[0m pid=71676; ppid=71673
Remote Ray task returned value: 18257417214.20
[2m[36m(pid=71676)[0m pid=71676; ppid=71673
Remote Ray task returned value: 18257417214.20
[2m[36m(pid=71676)[0m pid=71676; ppid=71673
Remote Ray task returned value: 18257417214.20
[2m[36m(pid=71676)[0m pid=71676; ppid=71673
Remote Ray task returned value: 18257417214.20
[2m[36m(pid=71676)[0m pid=71676; ppid=71673
Remote Ray ta

## Ray's Concept of Parallel Programming: Futures

As noted above, Ray executes tasks asynchronously. That is, Ray will return with a future. But the actual task is scheduled to be executed on the worker.
You can then use this future reference to fetch the computed value by the finished remote task. All the scheduling of the futures is handled by Ray's scheduler, 
and the computed value is stored in the object store as shown in the diagram.

 * **Head Node**: Is the main node in a Ray cluster that talks to workers. On the local host or laptop, it's the headnode
 * **Worker**: The worker process that executes your Ray Tasks
 * **Raylet**: comprises the in-memory store for computed values from the tasks and the scheduler
 * **Global Control Store**: As the name suggests,there is only a single instance that's shared among all the worker nodes.
 
 While this tutorial will not go into details about the communication or serialization of objects across the cluster and what happens under the hood, it's fitting to get a higher level what 
 are the various Ray archictural components. In the future tutorials, we will go in depth into how each of these components play and interact during an execution of a complicated Ray task in a Ray cluster.
 
![](images/ray_cluster_arch.png)

Let's illustrate an aysnchronous execution of Ray tasks and how futures can be accessed:
 * Define a Ray task
 * Invoke Ray task. This will return immediately with a future reference.
 * Print the type and value of the reference
 * Now fetch the value of the task by using the `ray.get(obj_ref)`
 * Create a list of python objects, store and then use a Ray task to fetch and compute a result.

In [90]:
@ray.remote
def func():
    return 42

Notice the Python objec type: It a `Raylet` object reference

In [91]:
obj_ref = func.remote()
print(obj_ref, type(obj_ref))

ObjectRef(9d78bd90898368caffffffffffffffffffffffff0100000001000000) <class 'ray._raylet.ObjectRef'>


Fetch the value from the future reference

In [92]:
print(f"Value of the computed and finished future: {ray.get(obj_ref)}")

Value of the computed and finished future: 42


Create a list and use Ray's `ray.put` method to create a `Raylet` instance and store it. You don't have to worry where it's stored; Ray manages its references.

In [96]:
num_list = [ 23, 42, 93 ]
num_ref = ray.put(num_list)
print(num_ref, type(num_ref))

ObjectRef(ffffffffffffffffffffffffffffffffffffffff0100000003000000) <class 'ray._raylet.ObjectRef'>


In [102]:
# Now get the value of the original list stored
ray.get(num_ref)

[23, 42, 93]

In [105]:
@ray.remote
def add_list (num_list):
    return sum(num_list)

In [111]:
# The remote Ray task will receice an object reference, compute the sum and return the future 
calc_ref = add_list.remote(num_ref)
# Now we can get the value of the computed future
ray.get(calc_ref)

158

## Ray's Concept of Parallel Programming: Remote Stateful Objects (Actors)

In the above example, all the objects'computed values are stateless. That is, they are in memory, and can be evicted or removed. What if you want to keep state of a remote object.
Ray provides a simple api to create a remote stateful object. This introduces the concept a Remote class. Remote classes in Ray are like Actors. If you are familiar with [Akka Actors](https://doc.akka.io/docs/akka/current/typed/actors.html), the idea is similar. The Actor Model provides a higher level of abstraction for writing concurrent and distributed systems. 
It alleviates the developer from having to deal with explicit locking and thread management, making it easier to write correct concurrent and parallel systems. In Ray, an actor is essentially a stateful worker (or a service). When a new actor is instantiated, a new worker is created, and methods of the actor are scheduled on that specific worker and can access and mutate the state of that worker.

That state of mutation is maintained in the shared memory: Global Control Store (GCS). In other words, if other workers need access to this object, they can simply get access by referencing.

How the communication and serialization of these object references across a Ray cluster is materialized is beyond the scope of this tutorial. Future advanced tutorials will expound on these nunances
and under the hood details. For now, we'll keep it a conceptual level and share how you can use simple Ray's Pythonic APIs to create a stateful remote object, manipulate it, and share it.

We going to show a typical Actor's workflow:

 * Define an Remote class with `@remote` decorator
 * Instanstiate an actor and inspect its type
 * Invoke Actors remote method `score` via Ray's `remote` primitive. This will return a reference object
 * Inspect its type 
 * Finally, fetch the value in the via Ray's `ray.get(ref)` method

In [162]:
from random import randint
@ray.remote
class GoalsScored:
    def __init__ (self) -> None:
        self._goals = 0

    def score (self) -> None:
        self._goals = randint(1, 5)
        return self._goals

Now Lets use this  class `GoalsScored` to create an actor, which returns a reference to an Actor handle.

In [170]:
goals = GoalsScored.remote()
print(goals, type(goals))

Actor(GoalsScored,e86554b48b060ca201a84e7101000000) <class 'ray.actor.ActorHandle'>


Executes Actor's remote instance method `score` and returns an object reference.

In [167]:
total_goals_ref = goals.score.remote()
print(total_goals_ref, type(total_goals_ref))

ObjectRef(02f9b42dcf9096dba0eaca7fee0f5728794e7e600100000001000000) <class 'ray._raylet.ObjectRef'>


In [168]:
print(f"Total goals scored: {ray.get(total_goals_ref)}")

Total goals scored: 1


## Ray's Dashoard

When you install ray locally, as we did in this tutorial, and run tasks on it, Ray launches a web-based Dashboard, accessible at http://127.0.0.1:8265. This gives you real-time insight into all
the resources Ray consumes: CPUs, Memory, Logs, Actors, etc.

![](images/ray_dashboard.png)

## Shutdown Ray

For graceful shutdown you can simply use the `ray.showdown()` API to terminate the Ray runtime

In [171]:
ray.shutdown()

## Summary

We covered three high-level concetps at the heart of Ray core: Tasks, Futures, and Remote Stateful objects (Actors). Using a simple Python Ray decorator, we converted a 
local Python function into a remote Ray task, and let Ray, under the hood, execute it on multiple cores. 

You got a peek at Ray's simple primitives to develop distributed Python. You felt the ease and Pythonic way in Ray to write distributed Python. 
Although the examples were contrived and simple, they demonstrated Ray's ease of use. 

And through a dashboard launched while running ray on a local host, you got a peek into Ray's real-time resource and compute usage.

Stay tuned for advanced tutorials on Ray programming patterns as well as introductory and advance tutorails on Ray's native libraries.

## Resources and References

* [Ray Documentation](https://docs.ray.io/en/master/index.html#)
* [Futures and Promises](http://dist-prog-book.com/chapter/2/futures.html)
* [Ray Distributed Library Patterns](https://www.anyscale.com/blog/ray-distributed-library-patterns)
* [Anyscale Academy](https://github.com/anyscale/academy)
* [Ray Tutorial](https://github.com/ray-project/tutorial)
