# How to on Dask: Forecasting
> Run TimeGPT distributedly on top of Dask.

`TimeGPT` works on top of Spark, Dask, and Ray through Fugue. `TimeGPT` will read the input DataFrame and use the corresponding engine. For example, if the input is a Dask DataFrame, TimeGPT will use the existing Dask session to run the forecast.


In [None]:
#| hide
from nixtlats.utils import colab_badge

In [None]:
#| echo: false
colab_badge('docs/how-to-guides/0_distributed_fcst_dask')

# Installation 

[Dask](https://www.dask.org/get-started) is an open source parallel computing library for Python. As long as Dask is installed and configured, `TimeGPT` will be able to use it. If executing on a distributed Dask cluster, make sure the `nixtlats` library is installed across all the workers.

In addition to Dask, you'll also need to have [Fugue](https://fugue-tutorials.readthedocs.io/) installed. Fugue provides an easy-to-use interface for distributed computing that lets users execute Python code on top of Spark, Dask and Ray. You can install Fugue for Dask using pip. 

In [None]:
%%capture 
pip install "fugue[dask]"

## Executing on Dask

First, instantiate a `TimeGPT` class. To do this, you'll need a token provided by Nixtla. If you haven't one already, please request yours [here](https://www.nixtla.io/). 

There are different ways of setting the token. Here we'll use it as an environment variable. You can learn more about this [here](https://docs.nixtla.io/docs/faqs#setting-up-your-authentication-token-for-nixtla-sdk). 

In [None]:
#| hide
import os
import pandas as pd
from dotenv import load_dotenv
load_dotenv()

In [None]:
from nixtlats import TimeGPT

timegpt = TimeGPT() # defaults to os.environ.get("TIMEGPT_TOKEN")

### Forecast

Next, load a Dask DataFrame. 

In [None]:
import dask.dataframe as dd

dask_df = dd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv')
dask_df

Now call `TimeGPT` forecast method. 

In [None]:
fcst_df = timegpt.forecast(dask_df, h=12, freq='H', id_col='unique_id')
fcst_df.head()

### Forecast with exogenous variables

Exogenous variables or external factors are crucial in time series forecasting as they provide additional information that might influence the prediction. These variables could include holiday markers, marketing spending, weather data, or any other external data that correlate with the time series data you are forecasting.

For example, if you're forecasting ice cream sales, temperature data could serve as a useful exogenous variable. On hotter days, ice cream sales may increase.

To incorporate exogenous variables in TimeGPT, you'll need to pair each point in your time series data with the corresponding external data.

Let's see an example. Notice that you need to load the data as a Dask DataFrame. 

In [None]:
dask_df = dd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
dask_df

To produce forecasts we have to add the future values of the exogenous variables. Let's read this dataset. In this case we want to predict 24 steps ahead, therefore each unique id will have 24 observations.

In [None]:
future_ex_vars_dask = dd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-future-ex-vars.csv')
future_ex_vars_dask

Let's call the `forecast` method, adding this information:

In [None]:
timegpt_fcst_ex_vars_df = timegpt.forecast(df=dask_df, X_df=future_ex_vars_dask, h=24, freq="H", level=[80, 90])
timegpt_fcst_ex_vars_df.head()