# How to on Ray: Cross Validation
> Run TimeGPT distributedly on top of Ray.

`TimeGPT` works on top of Spark, Dask, and Ray through Fugue. `TimeGPT` will read the input DataFrame and use the corresponding engine. For example, if the input is a Ray DataFrame, `TimeGPT` will use the existing Ray session to run the forecast.


In [None]:
#| hide
from nixtlats.utils import colab_badge

In [None]:
#| echo: false
colab_badge('docs/how-to-guides/1_distributed_cv_spark')

# Installation 

[Ray](https://www.ray.io/) is an open source unified compute framework to scale Python workloads. As long as Ray is installed and configured, `TimeGPT` will be able to use it. If executing on a distributed Ray cluster, make sure the `nixtlats` library is installed across all the workers.

In addition to Ray, you'll also need to have [Fugue](https://fugue-tutorials.readthedocs.io/) installed. Fugue provides an easy-to-use interface for distributed computing that lets users execute Python code on top of Spark, Dask and Ray. You can install Fugue for Ray using pip. 

In [None]:
%%capture
pip install "fugue[ray]"

## Executing on Ray

First, instantiate a `TimeGPT` class. To do this, you'll need a token provided by Nixtla. If you haven't one already, please request yours [here](https://www.nixtla.io/). 

There are different ways of setting the token. Here we'll use it as an environment variable. You can learn more about this [here](https://docs.nixtla.io/docs/faqs#setting-up-your-authentication-token-for-nixtla-sdk). 

In [None]:
#| hide
import os

import pandas as pd
from dotenv import load_dotenv

load_dotenv()

In [None]:
from nixtlats import TimeGPT

timegpt = TimeGPT() # defaults to os.environ.get("TIMEGPT_TOKEN")

Start Ray as engine. 

In [None]:
import ray
import logging
ray.init(logging_level=logging.ERROR) # log error events 

### Cross validation

Time series cross validation is a method to check how well a model would have performed in the past. It uses a moving window over historical data to make predictions for the next period. After each prediction, the window moves ahead and the process keeps going until it covers all the data. `TimeGPT` allows you to perfom cross validation on top of Dask. 

After starting Ray, load a pandas DataFrame and then convert it to a Ray dataset. 

In [None]:
ray_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv')
ray_df.head()

In [None]:
ctx = ray.data.context.DatasetContext.get_current()
ctx.use_streaming_executor = False
ray_df = ray.data.from_pandas(df).repartition(4)

Now call `TimeGPT`'s cross validation method with the Ray dataset. 

In [None]:
fcst_df = timegpt.cross_validation(ray_df, h=12, freq='H', n_windows=5, step_size=2)

In [None]:
fcst_df.head()

### Cross validation with exogenous variables

Exogenous variables or external factors are crucial in time series forecasting as they provide additional information that might influence the prediction. These variables could include holiday markers, marketing spending, weather data, or any other external data that correlate with the time series data you are forecasting.

For example, if you're forecasting ice cream sales, temperature data could serve as a useful exogenous variable. On hotter days, ice cream sales may increase.

To incorporate exogenous variables in TimeGPT, you'll need to pair each point in your time series data with the corresponding external data.

Let's see an example. Notice that you need to load the data as a Ray dataset. 

In [None]:
ray_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
ray_df.head()

In [None]:
ctx = ray.data.context.DatasetContext.get_current()
ctx.use_streaming_executor = False
ray_df = ray.data.from_pandas(ray_df).repartition(4)

Let's call the `cross_validation` method, adding this information:

In [None]:
timegpt_cv_ex_vars_df = timegpt.cross_validation(
    df=ray_df,
    h=48, 
    freq='H',
    level=[80, 90],
    n_windows=5,
)

In [None]:
timegpt_cv_ex_vars_df.to_pandas().head()

Don't forget to stop Ray once you're done. 

In [None]:
ray.shutdown()