# How to on Dask: Forecasting
> Run TimeGPT distributedly on top of Dask.

`TimeGPT` works on top of Spark, Dask, and Ray through Fugue. `TimeGPT` will read the input DataFrame and use the corresponding engine. For example, if the input is a Dask DataFrame, TimeGPT will use the existing Dask session to run the forecast.


In [None]:
#| hide
from nixtlats.utils import colab_badge

In [None]:
#| echo: false
colab_badge('docs/how-to-guides/2_distributed_fcst_dask')

[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/how-to-guides/0_distributed_fcst_dask.ipynb)

# Installation 

[Dask](https://www.dask.org/get-started) is an open source parallel computing library for Python. As long as Dask is installed and configured, `TimeGPT` will be able to use it. If executing on a distributed Dask cluster, make sure the `nixtlats` library is installed across all the workers.

In addition to Dask, you'll also need to have [Fugue](https://fugue-tutorials.readthedocs.io/) installed. Fugue provides an easy-to-use interface for distributed computing that lets users execute Python code on top of Spark, Dask and Ray. You can install Fugue for Dask using pip. 

In [None]:
%%capture 
pip install "fugue[dask]"

## Executing on Dask

First, instantiate a `NixtlaClient` class. To do this, you will need an API key provided by Nixtla. If you don't have one already, please request yours [here](https://docs.nixtla.io/).

There are different ways to set your API key. Here, we will set it up as an environment variable. Please refer to this [tutorial](https://docs.nixtla.io/docs/setting_up_your_authentication_api_key) to learn more.

In [None]:
#| hide
from dotenv import load_dotenv

load_dotenv()

True

In [None]:
from nixtlats import NixtlaClient

nixtla_client = NixtlaClient() # defaults to os.environ.get("NIXTLATS_API_KEY")

### Forecast

Next, load the data using pandas and convert it to a Dask DataFrame. 

In [None]:
import pandas as pd 

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv')
df.head()

Unnamed: 0,unique_id,ds,y
0,BE,2016-12-01 00:00:00,72.0
1,BE,2016-12-01 01:00:00,65.8
2,BE,2016-12-01 02:00:00,59.99
3,BE,2016-12-01 03:00:00,50.69
4,BE,2016-12-01 04:00:00,52.58


In [None]:
import dask.dataframe as dd

dask_df = dd.from_pandas(df, npartitions=2)

Unnamed: 0,unique_id,ds,y
0,BE,2016-12-01 00:00:00,72.0
1,BE,2016-12-01 01:00:00,65.8
2,BE,2016-12-01 02:00:00,59.99
3,BE,2016-12-01 03:00:00,50.69
4,BE,2016-12-01 04:00:00,52.58


Now call `NixtlaClient` forecast method. 

In [None]:
fcst_df = nixtla_client.forecast(dask_df, h=12, freq='H')
fcst_df.head()

INFO:nixtlats.nixtla_client:Validating inputs...
INFO:nixtlats.nixtla_client:Preprocessing dataframes...
INFO:nixtlats.nixtla_client:Calling Forecast Endpoint...


Unnamed: 0,unique_id,ds,TimeGPT
0,FR,2016-12-31 00:00:00,62.130219
1,FR,2016-12-31 01:00:00,56.890831
2,FR,2016-12-31 02:00:00,52.231552
3,FR,2016-12-31 03:00:00,48.888664
4,FR,2016-12-31 04:00:00,46.498367


### Forecast with exogenous variables

Exogenous variables or external factors are crucial in time series forecasting as they provide additional information that might influence the prediction. These variables could include holiday markers, marketing spending, weather data, or any other external data that correlate with the time series data you are forecasting.

For example, if you're forecasting ice cream sales, temperature data could serve as a useful exogenous variable. On hotter days, ice cream sales may increase.

To incorporate exogenous variables in TimeGPT, you'll need to pair each point in your time series data with the corresponding external data.

Let's see an example. First we will load the data using `pandas` and convert it to a Dask DataFrame.

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
df.head()

Unnamed: 0,unique_id,ds,y,Exogenous1,Exogenous2,day_0,day_1,day_2,day_3,day_4,day_5,day_6
0,BE,2016-12-01 00:00:00,72.0,61507.0,71066.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
1,BE,2016-12-01 01:00:00,65.8,59528.0,67311.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
2,BE,2016-12-01 02:00:00,59.99,58812.0,67470.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
3,BE,2016-12-01 03:00:00,50.69,57676.0,64529.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
4,BE,2016-12-01 04:00:00,52.58,56804.0,62773.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0


In [None]:
dask_df = dd.from_pandas(df, npartitions=2)

To produce forecasts we have to add the future values of the exogenous variables. Let's read this dataset. In this case we want to predict 24 steps ahead, therefore each unique id will have 24 observations.

In [None]:
future_ex_vars = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-future-ex-vars.csv')
future_ex_vars.head()

Unnamed: 0,unique_id,ds,Exogenous1,Exogenous2,day_0,day_1,day_2,day_3,day_4,day_5,day_6
0,BE,2016-12-31 00:00:00,64108.0,70318.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
1,BE,2016-12-31 01:00:00,62492.0,67898.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
2,BE,2016-12-31 02:00:00,61571.0,68379.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
3,BE,2016-12-31 03:00:00,60381.0,64972.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
4,BE,2016-12-31 04:00:00,60298.0,62900.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [None]:
future_ex_vars_dask = dd.from_pandas(future_ex_vars, npartitions=2)

Let's call the `forecast` method, adding this information:

In [None]:
fcst_ex_vars_df = nixtla_client.forecast(df=dask_df, X_df=future_ex_vars_dask, h=24, freq="H", level=[80, 90])
fcst_ex_vars_df.head()

INFO:nixtlats.nixtla_client:Validating inputs...
INFO:nixtlats.nixtla_client:Preprocessing dataframes...
INFO:nixtlats.nixtla_client:Using the following exogenous variables: Exogenous1, Exogenous2, day_0, day_1, day_2, day_3, day_4, day_5, day_6
INFO:nixtlats.nixtla_client:Calling Forecast Endpoint...


Unnamed: 0,unique_id,ds,TimeGPT,TimeGPT-lo-90,TimeGPT-lo-80,TimeGPT-hi-80,TimeGPT-hi-90
0,FR,2016-12-31 00:00:00,59.391552,54.471115,56.130394,62.652709,64.311988
1,FR,2016-12-31 01:00:00,60.184393,56.167005,56.778589,63.590196,64.201781
2,FR,2016-12-31 02:00:00,58.129127,53.554694,55.235126,61.023128,62.703559
3,FR,2016-12-31 03:00:00,53.825965,46.310026,50.664494,56.987436,61.341905
4,FR,2016-12-31 04:00:00,47.694177,38.219029,42.945387,52.442967,57.169325
