# TimeGPT

> Unlock the power of accurate predictions and confidently navigate uncertainty. Reduce uncertainty and resource limitations. With TimeGPT, you can effortlessly access state-of-the-art models to make data-driven decisions. Whether you're a bank forecasting market trends or a startup predicting product demand, TimeGPT democratizes access to cutting-edge predictive insights, eliminating the need for a dedicated team of machine learning engineers.

## Introduction

Nixtla's TimeGPT is a generative pre-trained model trained to forecast time series data. The inputs to TimeGPT are time series data, and the model generates forecast outputs based on these. The input involves providing the historical data and potentially defining parameters such as the forecast horizon. TimeGPT can be used across a plethora of tasks including demand forecasting, anomaly detection, financial forecasting, and more. 

The TimeGPT model "reads" time series data much like the way humans read a sentence – from left to right. It looks at a chunk of past data, which we can think of as "tokens", and predicts what comes next. This prediction is based on patterns the model identifies in past data, much like how a human would predict the end of a sentence based on the beginning.

The TimeGPT API provides an interface to this powerful model, allowing users to leverage its forecasting capabilities to predict future events based on past data. With this API, users can not only forecast future events but also delve into various time series-related tasks, such as what-if scenarios, anomaly detection, and more.

![figure](./img/timegpt-arch.png)

In [None]:
#| default_exp distributed.timegpt

In [None]:
#| hide 
%load_ext autoreload
%autoreload 2

In [None]:
#| export
from typing import Any, Dict, List, Optional, Union

import pandas as pd
from fugue import transform, DataFrame, FugueWorkflow, ExecutionEngine
from fugue.collections.yielded import Yielded
from fugue.constants import FUGUE_CONF_WORKFLOW_EXCEPTION_INJECT
from triad import Schema

from nixtlats.timegpt import _TimeGPT

In [None]:
#| hide
import os

In [None]:
#| exporti
def _cotransform(
    df1: Any,
    df2: Any,
    using: Any,
    schema: Any = None,
    params: Any = None,
    partition: Any = None,
    engine: Any = None,
    engine_conf: Any = None,
    force_output_fugue_dataframe: bool = False,
    as_local: bool = False,
) -> Any:
    dag = FugueWorkflow(compile_conf={FUGUE_CONF_WORKFLOW_EXCEPTION_INJECT: 0})
    src = dag.create_data(df1).zip(dag.create_data(df2), partition=partition)
    tdf = src.transform(
        using=using,
        schema=schema,
        params=params,
        pre_partition=partition,
    )
    tdf.yield_dataframe_as("result", as_local=as_local)
    dag.run(engine, conf=engine_conf)
    result = dag.yields["result"].result  # type:ignore
    if force_output_fugue_dataframe or isinstance(df1, (DataFrame, Yielded)):
        return result
    return result.as_pandas() if result.is_local else result.native  # type:ignore

In [None]:
#| exporti
class _DistributedTimeGPT:

    def forecast(
            self,
            token: str,
            df: pd.DataFrame,
            h: int,
            freq: Optional[str] = None,    
            id_col: str = 'unique_id',
            time_col: str = 'ds',
            target_col: str = 'y',
            X_df: Optional[pd.DataFrame] = None,
            level: Optional[List[Union[int, float]]] = None,
            finetune_steps: int = 0,
            clean_ex_first: bool = True,
            validate_token: bool = False,
            add_history: bool = False,
            date_features: Union[bool, List[str]] = False,
            date_features_to_one_hot: Union[bool, List[str]] = True,
        ) -> Any:
        kwargs = dict(
            h=h,
            freq=freq,
            id_col=id_col,
            time_col=time_col,
            target_col=target_col,
            level=level,
            finetune_steps=finetune_steps,
            clean_ex_first=clean_ex_first,
            validate_token=validate_token,
            add_history=add_history,
            date_features=date_features,
            date_features_to_one_hot=date_features_to_one_hot,
        )
        schema = "*-y+" + str(self._get_output_schema(models, level))
        if X_df is None:
            return transform(
                df,
                self._forecast,
                params=dict(
                    token=token,
                    kwargs=kwargs,
                ),
                schema=schema,
                partition={"by": id_col},
                engine=self._engine,
                engine_conf=self._conf,
                **self._transform_kwargs,
            )
        else:
            schema = "unique_id:str,ds:str," + str(
                self._get_output_schema(models, level)
            )
            return _cotransform(
                df,
                X_df,
                self._forecast_X,
                params=dict(
                    token=token,
                    kwargs=kwargs,
                ),
                schema=schema,
                partition={"by": "unique_id"},
                engine=self._engine,
                engine_conf=self._conf,
                **self._transform_kwargs,
            )

    def _forecast(
            self, 
            df: pd.DataFrame, 
            token: str,
            kwargs,
        ) -> pd.DataFrame:
        timegpt = _TimeGPT(token=token)
        return timegpt.forecast(df=df, **kwargs)

    # schema: unique_id:str, ds:str, *
    def _forecast_X(
            self, 
            df: pd.DataFrame, 
            X_df: pd.DataFrame, 
            token: str, 
            kwargs,
        ) -> pd.DataFrame:
        timegpt = _TimeGPT(token=token)
        return timegpt.forecast(df=df, X_df=X_df, **kwargs)

    def _get_output_schema(self, level=None) -> Schema:
        cols: List[Any] = []
        if level is None:
            level = []
        cols.append(('TimeGPT', np.float32))
        cols.append((f'TimeGPT-lo-{lv}', np.float32) for lv in reversed(level))
        cols.append((f'TimeGPT-hi-{lv}', np.float32) for lv in level)
        return Schema(cols)

In [None]:
from statsforecast.utils import generate_series

n_series = 4
horizon = 7

series = generate_series(n_series)

  from tqdm.autonotebook import tqdm


In [None]:

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

# Make unique_id a column
series = series.reset_index()
series['unique_id'] = series['unique_id'].astype(str)

# Convert to Spark
sdf = spark.createDataFrame(series)


The operation couldn’t be completed. Unable to locate a Java Runtime.
Please visit http://www.java.com for information on installing Java.

/Users/fedex/miniconda3/envs/nixtlats/lib/python3.10/site-packages/pyspark/bin/spark-class: line 97: CMD: bad array subscript
head: illegal line count -- -1


RuntimeError: Java gateway process exited before sending its port number