# Overview of a Ray Task

## What is a ray task ?

A ray task is a **distributed** function in the Ray framework. **Distributed** means Ray executes the function on a different process and possibly on a separate machine from where you initiate it.

## When Should You Use Ray Tasks?

You should consider using a ray task in these situations:

- Your code is in Python, Java, or C++ and functions correctly.
- Your code runs slowly because it runs sequentially
- Your code is complex
    - it involves scaling functions that have different resource requirements
    - it involves dynamically composing functions together
       - i.e. functions depend on each other and require passing data between them
- You would like to speed up your code's execution by running it asynchronously
- You would like to scale your code to run on multiple machines with ease
- You don't want to perform a major code rewrite to enable this speed up

Let's look at an example to make this more concrete.

### Example

We have a Python function conveniently named `expensive_computation`, which executes a computation that requires significant resources and time.

`expensive_computation` performs the following function calls:
1. call `perform_naive_matrix_multiplication` to perform a naive matrix multiplication
2. call `compute_matrix_sum` to return the sum of the elements in the resulting matrix

In [47]:
from itertools import product

Matrix = list[list[int]]

def perform_naive_matrix_multiplication(size: int) -> Matrix:
    matrix1 = matrix2 = [[1 for _ in range(size)] for _ in range(size)]

    result = [[0 for _ in range(size)] for _ in range(size)]
    for i, j, k in product(range(size), range(size), range(size)):
        result[i][j] += matrix1[i][k] * matrix2[k][j]

    return result

def compute_matrix_sum(matrix: Matrix) -> int:
    return sum([sum([col for col in row]) for row in matrix])

def expensive_computation(size: int) -> int:
    result = perform_naive_matrix_multiplication(size)
    return compute_matrix_sum(result)

We require running our `expensive_computation` for `n_runs`. We can do this sequentially, but it will take a long time.

In [48]:
n_runs = 10
size = 300
results = [expensive_computation(size) for _ in range(n_runs)]
expected_result = size**3
assert sum(results) == n_runs * expected_result

Below is our code execution visualized 

<img src="https://assets-training.s3.us-west-2.amazonaws.com/ray-core/task-actor-lifecycle/v2/overview/sequential_execution_python_func.svg" width="600px">

#### Desired execution

We would like, instead, to execute our python function in parallel and distribute it over as many machines as possible

<img src="https://assets-training.s3.us-west-2.amazonaws.com/ray-core/task-actor-lifecycle/v2/overview/desired_execution_python_func.svg" width="600px">

In the next section, we will learn how to achieve this desired state using Ray tasks!

## How do you define a ray task ?

To define a Ray task, you can use the `@ray.remote` decorator in Python. When you decorate a Python function with `@ray.remote`, it converts the python function into a Ray task.

### Example
Let's revisit our example and convert our `perform_naive_matrix_multiplication` function into a Ray task.

In [50]:
import ray

@ray.remote(num_cpus=2) # decorator to convert python function to ray task
def perform_naive_matrix_multiplication(size: int) -> Matrix:
    matrix1 = matrix2 = [[1 for _ in range(size)] for _ in range(size)]

    result = [[0 for _ in range(size)] for _ in range(size)]
    for i, j, k in product(range(size), range(size), range(size)):
        result[i][j] += matrix1[i][k] * matrix2[k][j]

    return result

In this example, we also specify the task resource requirements inside the `ray.remote` decorator. Here are some more arguments you can pass the decorator to specify your task's resource requirements:
- `num_cpus`: the quantity of CPU resources to reserve for this task. By default, tasks use 1 CPU resource
- `num_gpus`: The quantity of GPU resources to reserve for this task
- `resources`: The quantity of various custom resources to reserve for this task (e.g. think of Google `TPU` or AWS accelerators)
- `memory`: The heap memory request in bytes for this task

If we had a "proper" matrix multiplication implementation and we wanted to speed it up on a GPU we can. For now, our example will have to remain simple for educational purposes.

## How do you submit a ray task ?

To submit a Ray task, you can use the `.remote` method instead of calling the task directly. The `.remote` method accepts the same arguments as the original function.

### Example

Let's submit our `perform_naive_matrix_multiplication` task. The `.remote` method is a non-blocking method that immediately returns an object reference. An object reference is a placeholder reference for the value that will be returned by the Ray task (think of it as a future).

In [51]:
size = 100
object_ref = perform_naive_matrix_multiplication.remote(size=size)
object_ref

ObjectRef(b5f40f7c7d38fc79ffffffffffffffffffffffff0100000001000000)

This will return something like `ObjectRef(359ec6ce30d3ca2dffffffffffffffffffffffff0100000001000000)`

## How do you fetch the ray task results ?

Use `ray.get(object_ref)` to wait for a given object's value to be ready.

### Example

We use `ray.get` to fetch the resulting object value from our object_ref

In [52]:
object_value = ray.get(object_ref)
assert len(object_value[0]) == size

## How do you compose ray tasks together ? 

You can call tasks within another task in ray without any additional considerations

### Example 

Let's revisit our initial intention of scaling `expensive_computation`. In this case we want our `expensive_computation` task to:
- call the `perform_naive_matrix_multiplication` task
- pass the output of the task to `compute_matrix_sum` task
- return the resulting object of the `compute_matrix_sum` task

In [53]:
# while we might run the matrix mult on a GPU, the sum we definitely want computed on a CPU
@ray.remote(num_cpus=1) 
def compute_matrix_sum(matrix: Matrix) -> int:
    return sum([sum([col for col in row]) for row in matrix])

# default num_cpus = 1 so it does not need to be explicitly specified
@ray.remote
def expensive_computation(size: int) -> int:
    result_obj_ref = perform_naive_matrix_multiplication.remote(size)
    sum_obj_ref = compute_matrix_sum.remote(result_obj_ref)
    return ray.get(sum_obj_ref)

Now we submit our `expensive_computation` task for `n_runs` and get back the results by calling `ray.get(object_refs)`.

Note that when `ray.get` is called against a list of object references, it will block until the last object is availabe.

In [56]:
n_runs = 10
size = 300
object_refs = [expensive_computation.remote(size) for _ in range(n_runs)]
results = ray.get(object_refs)
expected_result = size**3
assert sum(results) == n_runs * expected_result

Here is our execution visualized using Ray tasks. Do you spot the resemblance with our desired execution diagram?

<img src="https://assets-training.s3.us-west-2.amazonaws.com/ray-core/task-actor-lifecycle/v2/overview/overview_distributed_ray_task.svg" width="600px">

# Understanding Ray Task Execution

Let's explore in more detail how the `expensive_computation` task is executed.

Here it is laid out in steps:

1. `expensive_computation.remote(...)` is called with its inputs
2. An `expensive_computation` task is submitted for scheduling on the ray cluster
3. Ray will autoscale the cluster to meet the the task resource requirements
3. Ray will schedule the task to run on a worker
4. The worker process will execute the task

The task will output a resulting object which can be fetched using `ray.get`

The above steps are visualized in this diagram

<img src="https://assets-training.s3.us-west-2.amazonaws.com/ray-core/task-actor-lifecycle/v2/overview/overview_detailed_execution_ray_task.svg" width="800px">