# Computing at Scale with Dask

> Run TimeGPT distributedly on top of Dask

[Dask](https://www.dask.org/get-started) is an open source parallel computing library for Python. In this guide, we will explain how to use `TimeGPT` on top of Dask. 

**Outline:** 
1. [Installation](#installation)
2. [Load Your Data](#load-your-data)
3. [Import Dask](#import-dask) 
4. [Use TimeGPT on Dask](#use-timegpt-on-dask)

In [None]:
#| hide
from nixtla.utils import colab_badge

In [None]:
#| echo: false
colab_badge('docs/how-to-guides/1_computing_at_scale_with_dask_distributed')

[![](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/how-to-guides/1_computing_at_scale_with_dask.ipynb)

## Installation 

Install Dask through [Fugue](https://fugue-tutorials.readthedocs.io/). Fugue provides an easy-to-use interface for distributed computing that lets users execute Python code on top of several distributed computing frameworks, including Dask. 

In [None]:
%%capture 
pip install "fugue[dask]"

If executing on a distributed `Dask` cluster, ensure that the `nixtla` library is installed across all the workers.

## Load Data 

You can load your data as a `pandas` DataFrame. In this tutorial, we will use a dataset that contains hourly electricity prices from different markets. 

In [None]:
import pandas as pd 

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv') 
df.head()

Unnamed: 0,unique_id,ds,y
0,BE,2016-12-01 00:00:00,72.0
1,BE,2016-12-01 01:00:00,65.8
2,BE,2016-12-01 02:00:00,59.99
3,BE,2016-12-01 03:00:00,50.69
4,BE,2016-12-01 04:00:00,52.58


## Import Dask

Import Dask and convert the pandas DataFrame to a Dask DataFrame. 

In [None]:
import dask.dataframe as dd

dask_df = dd.from_pandas(df, npartitions=2)
dask_df 

Unnamed: 0_level_0,unique_id,ds,y
npartitions=2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,string,string,float64
1800,...,...,...
3599,...,...,...


## Use TimeGPT on Dask 

Using `TimeGPT` on top of `Dask` is almost identical to the non-distributed case. The only difference is that you need to use a `Dask` DataFrame, which we already defined in the previous step. 

First, instantiate the `NixtlaClient` class. 

In [None]:
from nixtla import NixtlaClient

In [None]:
nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key = 'my_api_key_provided_by_nixtla'
)

In [None]:
#| hide 
nixtla_client = NixtlaClient()

Then use any method from the `NixtlaClient` class such as [`forecast`](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-forecast) or [`cross_validation`](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-cross-validation).

In [None]:
fcst_df = nixtla_client.forecast(dask_df, h=12)
fcst_df.head()

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: H
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...


Unnamed: 0,unique_id,ds,TimeGPT
0,FR,2016-12-31 00:00:00,62.130219
1,FR,2016-12-31 01:00:00,56.890831
2,FR,2016-12-31 02:00:00,52.231552
3,FR,2016-12-31 03:00:00,48.888664
4,FR,2016-12-31 04:00:00,46.498367


In [None]:
cv_df = nixtla_client.cross_validation(dask_df, h=12, n_windows=5, step_size=2)
cv_df.head()

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Inferred freq: H
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: H
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: H
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: H
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: H
INFO:nixtla.nixtla_client:Ca

Unnamed: 0,unique_id,ds,cutoff,TimeGPT
0,FR,2016-12-30 04:00:00,2016-12-30 03:00:00,44.893738
1,FR,2016-12-30 05:00:00,2016-12-30 03:00:00,46.05793
2,FR,2016-12-30 06:00:00,2016-12-30 03:00:00,48.790077
3,FR,2016-12-30 07:00:00,2016-12-30 03:00:00,54.397026
4,FR,2016-12-30 08:00:00,2016-12-30 03:00:00,57.592995


You can also use exogenous variables with `TimeGPT` on top of `Dask`. To do this, please refer to the [Exogenous Variables](https://docs.nixtla.io/docs/exogenous_variables) tutorial. Just keep in mind that instead of using a pandas DataFrame, you need to use a `Dask` DataFrame instead.