In [None]:
#| hide
!pip install -Uqq nixtla fugue[ray]

In [None]:
#| hide 
from nixtla.utils import in_colab

In [None]:
#| hide 
IN_COLAB = in_colab()

In [None]:
#| hide
if not IN_COLAB:
    from nixtla.utils import colab_badge
    from dotenv import load_dotenv

# Ray 

> Run TimeGPT distributedly on top of Ray

[Ray](https://www.ray.io/) is an open source unified compute framework to scale Python workloads. In this guide, we will explain how to use `TimeGPT` on top of Ray. 

**Outline:** 

1. [Installation](#installation)

2. [Load Your Data](#load-your-data)

3. [Initialize Ray](#initialize-ray) 

4. [Use TimeGPT on Ray](#use-timegpt-on-ray)

5. [Shutdown Ray](#shutdown-ray)

In [None]:
#| echo: false
if not IN_COLAB:
    load_dotenv()
    colab_badge('docs/tutorials/19_computing_at_scale_ray_distributed')

[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/18_computing_at_scale_ray_distributed.ipynb)

## 1. Installation 

Install Ray through [Fugue](https://fugue-tutorials.readthedocs.io/). Fugue provides an easy-to-use interface for distributed computing that lets users execute Python code on top of several distributed computing frameworks, including Ray. 

::: {.callout-note}
You can install `fugue` with `pip`:
    
```shell
pip install fugue[ray]
```
:::

If executing on a distributed `Ray` cluster, ensure that the `nixtla` library is installed across all the workers.

## 2. Load Data 

You can load your data as a `pandas` DataFrame. In this tutorial, we will use a dataset that contains hourly electricity prices from different markets. 

In [None]:
import pandas as pd 

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv') 
df.head()

Unnamed: 0,unique_id,ds,y
0,BE,2016-10-22 00:00:00,70.0
1,BE,2016-10-22 01:00:00,37.1
2,BE,2016-10-22 02:00:00,37.1
3,BE,2016-10-22 03:00:00,44.75
4,BE,2016-10-22 04:00:00,37.1


## 3. Initialize Ray

Initialize `Ray` and convert the pandas DataFrame to a `Ray` DataFrame. 

In [None]:
import ray
from ray.cluster_utils import Cluster

In [None]:
ray_cluster = Cluster(
    initialize_head=True,
    head_node_args={"num_cpus": 2}
)
ray.init(address=ray_cluster.address, ignore_reinit_error=True)

2024-05-10 11:09:19,076	INFO worker.py:1564 -- Connecting to existing Ray cluster at address: 127.0.0.1:63694...
2024-05-10 11:09:19,092	INFO worker.py:1740 -- Connected to Ray cluster. View the dashboard at [1m[32m127.0.0.1:8265 [39m[22m


0,1
Python version:,3.10.14
Ray version:,2.20.0
Dashboard:,http://127.0.0.1:8265


In [None]:
ray_df = ray.data.from_pandas(df)
ray_df 

MaterializedDataset(
   num_blocks=1,
   num_rows=8400,
   schema={unique_id: object, ds: object, y: float64}
)

## 4. Use TimeGPT on Ray

Using `TimeGPT` on top of `Ray` is almost identical to the non-distributed case. The only difference is that you need to use a `Ray` DataFrame. 

First, instantiate the `NixtlaClient` class. 

In [None]:
from nixtla import NixtlaClient

In [None]:
nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key = 'my_api_key_provided_by_nixtla'
)

> 👍 Use an Azure AI endpoint
>
> To use an Azure AI endpoint, set the `base_url` argument:
>
> `nixtla_client = NixtlaClient(base_url="you azure ai endpoint", api_key="your api_key")`

In [None]:
#| hide 
if not IN_COLAB:
    nixtla_client = NixtlaClient()

Then use any method from the `NixtlaClient` class such as [`forecast`](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-forecast) or [`cross_validation`](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-cross-validation).

In [None]:
%%capture
fcst_df = nixtla_client.forecast(ray_df, h=12)

> 📘 Available models in Azure AI
>
> If you are using an Azure AI endpoint, please be sure to set `model="azureai"`:
>
> `nixtla_client.forecast(..., model="azureai")`
> 
> For the public API, we support two models: `timegpt-1` and `timegpt-1-long-horizon`. 
> 
> By default, `timegpt-1` is used. Please see [this tutorial](https://docs.nixtla.io/docs/tutorials-long_horizon_forecasting) on how and when to use `timegpt-1-long-horizon`.

To visualize the result, use the `to_pandas` method to convert the output of `Ray` to a `pandas` DataFrame.

In [None]:
fcst_df.to_pandas().tail()

Unnamed: 0,unique_id,ds,TimeGPT
55,NP,2018-12-24 07:00:00,55.387066
56,NP,2018-12-24 08:00:00,56.115517
57,NP,2018-12-24 09:00:00,56.090714
58,NP,2018-12-24 10:00:00,55.813717
59,NP,2018-12-24 11:00:00,55.528519


In [None]:
%%capture
cv_df = nixtla_client.cross_validation(ray_df, h=12, freq='H', n_windows=5, step_size=2)

In [None]:
cv_df.to_pandas().tail()

Unnamed: 0,unique_id,ds,cutoff,TimeGPT
295,NP,2018-12-23 19:00:00,2018-12-23 11:00:00,53.632019
296,NP,2018-12-23 20:00:00,2018-12-23 11:00:00,52.512775
297,NP,2018-12-23 21:00:00,2018-12-23 11:00:00,51.894035
298,NP,2018-12-23 22:00:00,2018-12-23 11:00:00,51.06572
299,NP,2018-12-23 23:00:00,2018-12-23 11:00:00,50.32592


You can also use exogenous variables with `TimeGPT` on top of `Ray`. To do this, please refer to the [Exogenous Variables](https://docs.nixtla.io/docs/tutorials-exogenous_variables) tutorial. Just keep in mind that instead of using a pandas DataFrame, you need to use a `Ray` DataFrame instead.

## 5. Shutdown Ray

When you are done, shutdown the `Ray` session. 

In [None]:
ray.shutdown()