# Intermittent Demand Forecasting with TimeGPT 

This tutorial is based on `skforecast`'s tutorial on [Intermittend Demand Forecasting with Skforecast](https://cienciadedatos.net/documentos/py48-intermittent-demand-forecasting.html). 

The data used in this example represents the number of clients who visited a store during its opening hours from Monday to Friday, between 7:00 and 20:00 (inclusive).  

It's a work in progress because results with TimeGPT are not as expected. 

I'm ran several experiments: 

- Using the data as is 

- Using the data with exogenous variables (weekdays and open hours)

Furthermore, the evaluation will be done using MAE considering the full dataset and then only open hours. 

## Load data

In [32]:
import pandas as pd 

df = pd.read_csv('https://raw.githubusercontent.com/JoaquinAmatRodrigo/Estadistica-machine-learning-python/master/data/intermittent_demand.csv', sep=',')
df['date_time'] = pd.to_datetime(df['date_time'])
df = df[['date_time', 'users']]
df = df.sort_values(by='date_time')
df.head()

Unnamed: 0,date_time,users
0,2011-01-01 00:00:00,0.0
1,2011-01-01 01:00:00,0.0
2,2011-01-01 02:00:00,0.0
3,2011-01-01 03:00:00,0.0
4,2011-01-01 04:00:00,0.0


In [33]:
df.rename(columns={'date_time':'ds', 'users':'y'}, inplace=True)

In [34]:
len(df['ds'].unique())/24 # number of days in the dataset

731.0

## Data visualization

In [35]:
from nixtla import NixtlaClient

In [36]:
nixtla_client = NixtlaClient()

In [37]:
nixtla_client.plot(df, engine='plotly')

## Forecast with TimeGPT

In [38]:
train_df = df[:-24*122]
test_df = df[-24*122:]

In [39]:
nixtla_client.plot(test_df, engine='plotly')

In [40]:
fcst = nixtla_client.forecast(
    df=train_df,
    h = 24*122,
    freq = 'H',
    model='timegpt-1-long-horizon'
    )

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...


In [41]:
fcst['ds'] = pd.to_datetime(fcst['ds'])

In [42]:
nixtla_client.plot(test_df, fcst, engine='plotly')

In [43]:
from utilsforecast.evaluation import evaluate 
from utilsforecast.losses import mae 

In [60]:
result = test_df.merge(fcst, on=['ds'], how='left')
result['unique_id'] = 'StoreA'
result.head()

Unnamed: 0,ds,y,TimeGPT,unique_id
0,2012-09-01 00:00:00,0.0,-1.27617,StoreA
1,2012-09-01 01:00:00,0.0,-6.373388,StoreA
2,2012-09-01 02:00:00,0.0,6.259474,StoreA
3,2012-09-01 03:00:00,0.0,-32.217434,StoreA
4,2012-09-01 04:00:00,0.0,-31.021919,StoreA


In [61]:
evaluate(result, metrics=[mae])

Unnamed: 0,unique_id,metric,TimeGPT
0,StoreA,mae,68.815402


In [62]:
# Results for open hours only 
result['weekday'] = result['ds'].dt.dayofweek
result['open'] = ((result['weekday'] < 5) & result['ds'].dt.hour.between(7, 20)).astype(int)
result_open = result[(result['weekday'] < 5) & (result['open'] == 1)]
evaluate(result_open[['unique_id', 'ds', 'y', 'TimeGPT']], metrics=[mae])

Unnamed: 0,unique_id,metric,TimeGPT
0,StoreA,mae,110.553706


## Add exogenous variables 

In [72]:
df_ex_vars = df.copy()

# Add day of week (0 = Monday, 6 = Sunday)
df_ex_vars['weekday'] = df_ex_vars['ds'].dt.dayofweek

# Add open hours (7:00 AM to 8:00 PM inclusive)
df_ex_vars['open'] = ((df_ex_vars['weekday'] < 5) & df_ex_vars['ds'].dt.hour.between(7, 20)).astype(int)

#df_ex_vars['month'] = df_ex_vars['ds'].dt.month # add month as a feature

df_ex_vars.head()

Unnamed: 0,ds,y,weekday,open,month
0,2011-01-01 00:00:00,0.0,5,0,1
1,2011-01-01 01:00:00,0.0,5,0,1
2,2011-01-01 02:00:00,0.0,5,0,1
3,2011-01-01 03:00:00,0.0,5,0,1
4,2011-01-01 04:00:00,0.0,5,0,1


In [73]:
train_ex_vars = df_ex_vars[:-24*122]
test_ex_vars = df_ex_vars[-24*122:]

In [83]:
# Create future values for the external variables
future_ex_vars = test_ex_vars[['ds', 'weekday', 'open']]
#future_ex_vars = test_ex_vars[['ds', 'weekday', 'open', 'month']] # use this when month is added as a feature
future_ex_vars.head()

Unnamed: 0,ds,weekday,open
14616,2012-09-01 00:00:00,5,0
14617,2012-09-01 01:00:00,5,0
14618,2012-09-01 02:00:00,5,0
14619,2012-09-01 03:00:00,5,0
14620,2012-09-01 04:00:00,5,0


In [84]:
fcst_ex_vars = nixtla_client.forecast(
    df=train_ex_vars,
    X_df=future_ex_vars,
    h = 24*122,
    freq = 'H',
    model='timegpt-1-long-horizon', 
    finetune_steps=100
    )

INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Using the following exogenous variables: weekday, open
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...


In [85]:
fcst_ex_vars['ds'] = pd.to_datetime(fcst_ex_vars['ds']) 

In [86]:
nixtla_client.plot(test_ex_vars, fcst_ex_vars, engine='plotly')

In [87]:
res_ex_vars = test_ex_vars.merge(fcst_ex_vars, on=['ds'], how='left')
res_ex_vars['unique_id'] = 'StoreA'

In [88]:
res_open = res_ex_vars[(res_ex_vars['weekday'] < 5) & (res_ex_vars['open'] == 1)] # only evaluate open hours
#res_open = res_ex_vars # evaluate all days and hours 
res_open.head()

Unnamed: 0,ds,y,weekday,open,month,TimeGPT,unique_id
55,2012-09-03 07:00:00,43.0,0,1,9,460.21727,StoreA
56,2012-09-03 08:00:00,127.0,0,1,9,630.732849,StoreA
57,2012-09-03 09:00:00,229.0,0,1,9,377.450821,StoreA
58,2012-09-03 10:00:00,359.0,0,1,9,234.769424,StoreA
59,2012-09-03 11:00:00,459.0,0,1,9,249.103943,StoreA


In [90]:
evaluate(res_open[['unique_id', 'ds', 'y', 'TimeGPT']], metrics=[mae])

Unnamed: 0,unique_id,metric,TimeGPT
0,StoreA,mae,122.485637


### Results 

**Only evaluating open hours** 

- MAE: 110.55 - no exogenous variables 

- MAE: 125.76 - exogenous variables, no finetuning 

- MAE: 125.13 - exogenous variables, 50 finetuning steps 

- MAE: 122.48 - exogenous variables, 100 finetuning steps 

- MAE: 124.21 - exogenous variables, 100 finetuning steps, MAE as finetune loss

- MAE: 153.98 - exogenous variables, 100 finetuning steps, month as feature

**Evaluating all hours**

- MAE: 68.81 - no exogenous variables

- MAE: 64.53 - exogenous variables, 100 finetune steps 

**Important considerations**

- I don't think this is an example of intermittent data, at least not in the conventional sense. 

- This is more of a problem of irregular timestamps, exogenous variables, and long-horizon. 

- Open hours don't match with skforecast tutorial. 