## TBATS time series forecasting

TBATS is an acronym for key features of the model:

T: Trigonometric seasonality

B: Box-Cox transformation (A Box Cox transformation is a way to transform non-normal dependent variables into a normal shape)

A: ARIMA errors

T: Trend

S: Seasonal components

Load in the ml library and TBATS

In [1]:
\l ml/ml.q
\l ml/init.q
tbats:.p.import[`tbats]`:TBATS

Load in the data

In [None]:
show bikes:update date:"d"$timestamp from ("PFFFFFFBFF";enlist ",")0:`:data/london_merged.csv

Fill in the missing datetimes in the data

In [None]:
show data:([]timestamp:{x<max bikes`timestamp}{x+01:00:00.000}\min bikes`timestamp)lj `timestamp xkey bikes
data:.ml.filltab[data;();`timestamp;::]

### Initialise the dataset

The model only takes the predicted values as input, The different period lengths (i.e days of the week, years, etc) can be added as a hyperparameter called seasonal_periods

Hyper-parameters for this model include:

Parameter        | Explanation
-----------------|------------------------
use_box_cox      | Boolean indicating to use box_cox or not
box_cox_bounds   | Bounds associated if box_cox true
use_trend        | Add trend to the data
use_damped_trend | Add damped trend to the data
seasonal_periods | Season periods, i.e days of the week, yearly  (7, 365.2) etc patterns seen
use_arma_errors  | Boolean indicatin got use arma errors

In [None]:
model:tbats[`seasonal_periods pykw (24,7, 365.25)]
mdlfit:model[`:fit][-100_data[`cnt]]

Now use the model to predict the next 30 days results and calculate the root mean squared log error

In [None]:
preds:mdlfit[`:forecast][`steps pykw 100]`

-1!"The Mean squared error is ",string .ml.rmsle[preds;-100#data[`cnt]]

## Amazon Data

In [2]:
show 5#amzndaydata:.ml.dropconstant lower[cols d]xcol d:("DFFFFJJ";(),",")0:`:../mlnotebooks/data/amzn_day.us.txt
-1"\nShape of data: ",a:" x "sv string .ml.shape amzndaydata;

date       open high low  close volume  
----------------------------------------
1997.05.16 1.97 1.98 1.71 1.73  14700000
1997.05.19 1.76 1.77 1.62 1.71  6106800 
1997.05.20 1.73 1.75 1.64 1.64  5467200 
1997.05.21 1.64 1.65 1.38 1.43  18853200
1997.05.22 1.44 1.45 1.31 1.4   11776800

Shape of data: 5170 x 6


In [3]:
show amzndaydata:`ds`y xcol amzndaydata

ds         y    high low  close volume  
----------------------------------------
1997.05.16 1.97 1.98 1.71 1.73  14700000
1997.05.19 1.76 1.77 1.62 1.71  6106800 
1997.05.20 1.73 1.75 1.64 1.64  5467200 
1997.05.21 1.64 1.65 1.38 1.43  18853200
1997.05.22 1.44 1.45 1.31 1.4   11776800
1997.05.23 1.41 1.52 1.33 1.5   15937200
1997.05.27 1.51 1.65 1.46 1.58  8697600 
1997.05.28 1.62 1.64 1.53 1.53  4574400 
1997.05.29 1.54 1.54 1.48 1.51  3472800 
1997.05.30 1.5  1.51 1.48 1.5   2594400 
1997.06.02 1.51 1.53 1.5  1.51  591600  
1997.06.03 1.53 1.53 1.48 1.48  1183200 
1997.06.04 1.48 1.49 1.4  1.42  3080400 
1997.06.05 1.42 1.54 1.38 1.54  5672400 
1997.06.06 1.52 1.71 1.51 1.66  7807200 
1997.06.09 1.66 1.71 1.66 1.69  2352000 
1997.06.10 1.71 1.71 1.53 1.58  5458800 
1997.06.11 1.59 1.6  1.53 1.54  1188000 
1997.06.12 1.58 1.65 1.55 1.6   1632000 
1997.06.13 1.62 1.62 1.58 1.58  693600  
..


In [0]:
amzndaydata

[0;31mamzndaydata[0m: [0;31mamzndaydata[0m

In [4]:
// Find the timestamp which splits the data 80/20
split:min[amzndaydata`ds]+.8*max[amzndaydata`ds]-min[amzndaydata`ds]

trainamz:select from amzndaydata where ds<=split
testamz:select from amzndaydata where ds>split

-1!"There is ",string[count trainamz]," datapoints in he training set"
-1!"There is ",string[count testamz]," datapoints in he testing set"

"There is 4134 datapoints in he training set"


"There is 1036 datapoints in he testing set"


In [5]:
model:tbats[`seasonal_periods pykw (24,7, 365.25)]
mdlfit:model[`:fit][trainamz`y]

  return np.sign(yy) * (np.abs(yy) ** (1 / lam))


In [7]:
preds:mdlfit[`:forecast][`steps pykw count testamz]`

-1!"The Mean squared error is ",string .ml.mse[preds;testamz`y]

"The Mean squared error is 120640.4"


## Daily Temp

In [8]:
show temp:`ds`y xcol ("PF";enlist",")0:`:data/dailytemp.csv

ds                            y   
----------------------------------
1981.01.01D00:00:00.000000000 20.7
1981.01.02D00:00:00.000000000 17.9
1981.01.03D00:00:00.000000000 18.8
1981.01.04D00:00:00.000000000 14.6
1981.01.05D00:00:00.000000000 15.8
1981.01.06D00:00:00.000000000 15.8
1981.01.07D00:00:00.000000000 15.8
1981.01.08D00:00:00.000000000 17.4
1981.01.09D00:00:00.000000000 21.8
1981.01.10D00:00:00.000000000 20  
1981.01.11D00:00:00.000000000 16.2
1981.01.12D00:00:00.000000000 13.3
1981.01.13D00:00:00.000000000 16.7
1981.01.14D00:00:00.000000000 21.5
1981.01.15D00:00:00.000000000 25  
1981.01.16D00:00:00.000000000 20.7
1981.01.17D00:00:00.000000000 20.6
1981.01.18D00:00:00.000000000 24.8
1981.01.19D00:00:00.000000000 17.7
1981.01.20D00:00:00.000000000 15.5
..


In [9]:
// Find the timestamp which splits the data 80/20
split:min[temp`ds]+.8*max[temp`ds]-min[temp`ds]

traintemp:select from temp where ds<=split
testtemp:select from temp where ds>split

-1!"There is ",string[count traintemp]," datapoints in he training set"
-1!"There is ",string[count testtemp]," datapoints in he testing set"

"There is 2920 datapoints in he training set"


"There is 730 datapoints in he testing set"


In [10]:
model:tbats[`seasonal_periods pykw (24,7, 365.25)]
mdlfit:model[`:fit][traintemp`y]

In [11]:
preds:mdlfit[`:forecast][`steps pykw count testtemp]`

-1!"The Mean squared error is ",string .ml.rmsle[preds;testtemp`y]

"The Mean squared error is 0.2659451"
