# Introduction to Ray
Ray is a open-source unified framework for scaling AI and Python applications like machine learning. It provides the compute layer for parallel processing so that you don't need to be a dsitributed systems expert.

## 1 | Paralleizing Tasks with Ray
We can turn functions and classes eaily into Ray tasks and actors, for Python with simple primitives for building and running distributed applications. 

### Simplified Explanation
Ray make it easy to run functions and classes on multiple computers at the same time. It handles the complex parts of managing tasks and resources, so you can focus on building scalable applications with simple commands

In [1]:
# importing ray
import ray

In [2]:
# initializing ray
ray.init()

2025-01-06 09:12:14,047	INFO worker.py:1821 -- Started a local Ray instance.


0,1
Python version:,3.10.16
Ray version:,2.40.0


[33m(raylet)[0m [2025-01-06 09:12:22,983 E 24038 275896] (raylet) file_system_monitor.cc:116: /tmp/ray/session_2025-01-06_09-12-07_653172_23910 is over 95% full, available space: 9.83531 GB; capacity: 278.466 GB. Object creation will fail if spilling is required.
[33m(raylet)[0m [2025-01-06 09:12:32,994 E 24038 275896] (raylet) file_system_monitor.cc:116: /tmp/ray/session_2025-01-06_09-12-07_653172_23910 is over 95% full, available space: 9.83337 GB; capacity: 278.466 GB. Object creation will fail if spilling is required.
[33m(raylet)[0m [2025-01-06 09:12:43,042 E 24038 275896] (raylet) file_system_monitor.cc:116: /tmp/ray/session_2025-01-06_09-12-07_653172_23910 is over 95% full, available space: 9.83166 GB; capacity: 278.466 GB. Object creation will fail if spilling is required.
[33m(raylet)[0m [2025-01-06 09:12:53,128 E 24038 275896] (raylet) file_system_monitor.cc:116: /tmp/ray/session_2025-01-06_09-12-07_653172_23910 is over 95% full, available space: 9.82136 GB; capacity:

In [None]:
# the @ is a decorator that tells Ray to treat function f as a "remote task" that can be executed in parallel
@ray.remote
def f(x):
    return x * x

In [None]:
# creates a list of futures, which are placeholders for results of remote tasks
futures = [f.remote(i) for i in range(4)]

# used to retrieve the results of the futures
print(ray.get(futures))

In [None]:
# stopping ray
ray.shutdown()

## 2 | Counting Digits of PI
Sometimes we just want to do something simple in parallel. Ray is useful for simpler, repetitive tasks that need to be run multiple times. The following example below is about processing 100,000 time series. Each time series needs to be processed using the same algorithm.

Instead of processing them one by one, Ray can handle the tasks in parallel, so multiple time series can be processed at the same time, which speeds up the overall work.

### PI Example
We take the the simple example of counting digits of Pi. The algorithm works by geneating random x and y, and if x^2 + y^2 < 1, it's inside the circle, we count as in. 

In [1]:
import ray
import random
import time
import math
from fractions import Fraction

In [2]:
# Let's start Ray
ray.init()

2024-11-20 12:55:12,350	INFO worker.py:1810 -- Started a local Ray instance. View the dashboard at [1m[32m127.0.0.1:8265 [39m[22m


0,1
Python version:,3.9.20
Ray version:,2.39.0
Dashboard:,http://127.0.0.1:8265


In [3]:
@ray.remote
def pi4_sample(sample_count):
    """pi4_sample runs sample_count experiments, and returns the 
    fraction of time it was inside the circle. 
    """
    in_count = 0
    for i in range(sample_count):
        x = random.random()
        y = random.random()
        if x*x + y*y <= 1:
            in_count += 1
    return Fraction(in_count, sample_count)

In [4]:
SAMPLE_COUNT = 100000000
start = time.time() 
future = pi4_sample.remote(sample_count = SAMPLE_COUNT)
pi4 = ray.get(future)
end = time.time()
dur = end - start
print(f'Running {SAMPLE_COUNT} tests took {dur} seconds')

Running 100000000 tests took 27.970555305480957 seconds


In [5]:
FULL_SAMPLE_COUNT = 1000000000
start = time.time() 
BATCHES = int(FULL_SAMPLE_COUNT / SAMPLE_COUNT)
print(f'Doing {BATCHES} batches')
results = []
for _ in range(BATCHES):
    results.append(pi4_sample.remote(sample_count = SAMPLE_COUNT))
output = ray.get(results)
end = time.time()
dur = end - start
dur

Doing 10 batches


69.99713730812073

In [None]:
ray.shutdown()

In [None]:
def pi4_sample(sample_count):
    """pi4_sample runs sample_count experiments, and returns the 
    fraction of time it was inside the circle. 
    """
    in_count = 0
    for i in range(sample_count):
        x = random.random()
        y = random.random()
        if x*x + y*y <= 1:
            in_count += 1
    return Fraction(in_count, sample_count)

In [None]:
start = time.time() 
pi4_sample(100000000)
end = time.time()
dur = end - start
dur

In [None]:
pi4_sample(100)