# Ray Basics

Here we present some ray basics;

In [1]:
import ray

info = ray.init()

2024-10-01 04:45:07,661	INFO worker.py:1786 -- Started a local Ray instance.


Check cluster information

In [2]:
ray.cluster_resources()

{'CPU': 12.0,
 'memory': 4112488859.0,
 'node:127.0.0.1': 1.0,
 'object_store_memory': 2056244428.0,
 'node:__internal_head__': 1.0}

## Use case: db item retrieval

This is an example on retreiving information from a db with potential delays

In [3]:
import time 

database = [ 
    "Learning", 
    "Ray", 
    "Flexible", 
    "Distributed", 
    "Python", 
    "for", 
    "Machine", 
    "Learning" 
] 

def retrieve(item): 
    time.sleep(item / 10.) 
    return item, database[item]

In [4]:
def print_runtime(input_data, start_time): 
    print(f'Runtime: {time.time() - start_time:.2f} seconds, data:') 
    print(*input_data, sep="\n")

In [5]:
start = time.time() 
data = [retrieve(item) for item in range(8)] 
print_runtime(data, start)

Runtime: 2.83 seconds, data:
(0, 'Learning')
(1, 'Ray')
(2, 'Flexible')
(3, 'Distributed')
(4, 'Python')
(5, 'for')
(6, 'Machine')
(7, 'Learning')


## `get`

To introduce ways to schedule tasks asynchronously and in paralell

In [6]:
@ray.remote 
def retrieve_task(item): 
    return retrieve(item)

One can use `rat.get` to trigger task execution, but tasks are blocked until total execution.

In [7]:
start = time.time() 
object_references = [ retrieve_task.remote(item) for item in range(8) ] 
data = ray.get(object_references)
print_runtime(data, start)

Runtime: 0.73 seconds, data:
(0, 'Learning')
(1, 'Ray')
(2, 'Flexible')
(3, 'Distributed')
(4, 'Python')
(5, 'for')
(6, 'Machine')
(7, 'Learning')


Best practice... global variables always shared through **object store**

In [8]:
db_object_ref = ray.put(database)

In [9]:
@ray.remote 
def retrieve_task(item, db): 
    time.sleep(item / 10.) 
    return item, db[item]

## `wait`

If one need tasks to be executed as soon as poosible use `ray.wait`

In [10]:
start = time.time() 
object_references = [ retrieve_task.remote(item, db_object_ref) for item in range(8) ] 
all_data = [] 

while len(object_references) > 0:
    # Pull two tasks at a time
    finished, object_references = ray.wait( object_references, num_returns=2, timeout=7.0 )
    data = ray.get(finished) 
    print_runtime(data, start) 
    all_data.extend(data)

Runtime: 0.11 seconds, data:
(0, 'Learning')
(1, 'Ray')
Runtime: 0.31 seconds, data:
(2, 'Flexible')
(3, 'Distributed')
Runtime: 0.51 seconds, data:
(4, 'Python')
(5, 'for')
Runtime: 0.71 seconds, data:
(6, 'Machine')
(7, 'Learning')


## follow up tasks

In case you need to trigger new tasks

In [11]:
@ray.remote 
def follow_up_task(retrieve_result):
    original_item, _ = retrieve_result 
    follow_up_result = retrieve(original_item + 1) 
    return retrieve_result, follow_up_result

In [12]:
# Warning! Executing this task twice will produce the error:

# Retreive even tasks
retrieve_refs = [retrieve_task.remote(item, db_object_ref) for item in [0, 2, 4, 6]] 
# Retrieve odd tasks
follow_up_refs = [follow_up_task.remote(ref) for ref in retrieve_refs]
result = [print(data) for data in ray.get(follow_up_refs)]

((0, 'Learning'), (1, 'Ray'))
((2, 'Flexible'), (3, 'Distributed'))
((4, 'Python'), (5, 'for'))
((6, 'Machine'), (7, 'Learning'))


## Actors

Actors are used for stateful computations like:
1. Storing some information about executed tasks

In [13]:
@ray.remote
class DataTracker: 
    def __init__(self): 
        self._counts = 0 
        
    def increment(self): 
        self._counts += 1 
    
    def counts(self): 
        return self._counts

In [14]:
@ray.remote 
def retrieve_tracker_task(item, tracker, db): 
    time.sleep(item / 10.) 
    tracker.increment.remote() 
    return item, db[item]

In [15]:
tracker = DataTracker.remote() 
object_references = [ retrieve_tracker_task.remote(item, tracker, db_object_ref) for item in range(8) ] 
data = ray.get(object_references) 
print(data) 
print(ray.get(tracker.counts.remote()))

[(0, 'Learning'), (1, 'Ray'), (2, 'Flexible'), (3, 'Distributed'), (4, 'Python'), (5, 'for'), (6, 'Machine'), (7, 'Learning')]
8


In [16]:
ray.shutdown()

## In Summary


| API | Description |
|-|-|
|`ray.init()` | Initializes your Ray Cluster. Pass in an address to connect to an existing cluster. |
|`@ray.remote` | Turns functions into tasks and classes into actors. ray.put() Puts values into Ray’s object store. |
|`ray.get()` | Gets values from the object store. Returns the values you’ve put there or that were computed by a task or actor.|
|`.remote()` | Runs actor methods or tasks on your Ray Cluster and is used to instantiate actors. |
|`ray.wait()` | Returns two lists of object references, one with finished tasks we’re waiting for and one with unfinished tasks.|
