# GluonTS

Gluon Time Series (GluonTS) is an open source deep learning API developed by Amazon and Microsoft which allows devolopers to easily build deep-learning based machine learning models using Apache MXNet.



In [1]:
evalpred:.p.import[`gluonts.evaluation.backtest]`:make_evaluation_predictions
trainer:.p.import[`gluonts.trainer]`:Trainer
deepar:.p.import[`gluonts.model.deepar]
pd:.p.import[`pandas]
pylst:.p.import[`builtins][`:list]

\l ml/ml.q
\l ml/init.q

INFO:root:Using CPU


No shared object files for cutils.q, only q implementations available


## Daily Temp

Load in the required dataset.

The below dataset contains the mimimum daily temperatures reached in Melbourne during the period from January 1981 - Decemeber 1990.

Any null values are deleted from the dataset.

In [1]:
show temp:`date`temp xcol ("DF";enlist",")0:`:../data/dailytemp.csv
temp:delete from temp where temp =0N

[0;31m../data/dailytemp.csv. OS reports: No such file or directory[0m: [0;31m../data/dailytemp.csv. OS reports: No such file or directory[0m

The dataset is then split into training and testing datasets (80/20). The dataset is split by date to ensure that no time elakage occurs

In [15]:
// Find the timestamp which splits the data 80/20
split:min[temp`date]+.8*max[temp`date]-min[temp`date]

traintemp:select from temp where date<=split
testtemp:select from temp where date>split

-1!"There is ",string[count traintemp]," datapoints in he training set"
-1!"There is ",string[count testtemp]," datapoints in he testing set"

"There is 2917 datapoints in he training set"


"There is 730 datapoints in he testing set"


In order for data to be passed through a GLuonTS model, the data must be in a specific format. This requires the data to have the following columns:

Column          | Explanation
----------------|-------------------------------
target          | a list of target values
start           | The start of the time series in a pandas timestamp format

### Preprocessing the Training data

In [4]:
start:pd[`:Timestamp]["j"$("d"$first traintemp`date)-1970.01.01;`freq pykw "1H";`unit pykw "D"]`
tab:([]target:enlist traintemp`temp;start:enlist start)

This table must then be converted to a list of dictionaries in python

In [5]:
gltab:.ml.tab2df tab
gltab:gltab[`:to_dict][`orient pykw `records]

GluonTS uses the model `ListDataset` to convert these dictionaries into a format that can be passed to the model

In [6]:
lstdata:.p.import[`gluonts.dataset.common]`:ListDataset

datalst:lstdata[gltab;`freq pykw "1D"]

### Configuring an estimator

GlounTS comes with a number of prebuilt models with most models returning a probability distribution. These estimators include but are not limited to:

- Simple Feedforward 
- Deep Auto Regressors
- Reoccuccrent Neural Networks

These models include required fields 

 `freq`: The  frequency of the timeseries in the dataset
 
 `prediction_length` : The smount of datapoints the model is to predict i.e the length of the test data
 
 An optioned field `trainer` is also available, which allows you to alter the parameters of the configured model, such as number of epochs, training rate etc

In [7]:
model:deepar[`:DeepAREstimator][`freq pykw "1D";`prediction_length pykw count testtemp;
    `trainer pykw trainer[`epochs pykw 1]]

INFO:root:Using CPU


### Training the model

After constructing an estimator with the appropriate hyperparameters, the estimator is then fitted on the training data, which creates the `Predictor` in GluonTS parlance.

In [8]:
\ts predict:model[`:train][`training_data pykw datalst]

INFO:root:Start model training


learning rate from ``lr_scheduler`` has been overwritten by ``learning_rate`` in optimizer.


INFO:root:Epoch[0] Learning rate is 0.001
  0%|          | 0/50 [00:00<?, ?it/s]INFO:root:Number of parameters in DeepARTrainingNetwork: 25884
100%|██████████| 50/50 [01:31<00:00,  1.82s/it, avg_epoch_loss=2.87]
INFO:root:Epoch[0] Elapsed time 91.134 seconds
INFO:root:Epoch[0] Evaluation metric 'epoch_loss'=2.867600
INFO:root:Loading parameters from best epoch (0)
INFO:root:Final loss: 2.867599720954895 (occurred at epoch 0)
INFO:root:End model training


92477 4196144


### Forecasting

The testing data must also be preprocessed using the same steps outlined for the training data

In [9]:
tst:([]target:enlist temp`cnt;start: start)
tst:.ml.tab2df tst
tsttab:tst[`:to_dict][`orient pykw `records]
datalsttst:lstdata[tsttab;`freq pykw "1D"]

This preprcoessed data can then be passed to the predictor, utilising the predict attribute of the model. This must then be converted to a list. 

This returns a glounts object which contains attributes such as the frequency of the timeseries, along with the forecasted predictions

In [13]:
preds:pylst[predict[`:predict][datalsttst]]`
print preds

[gluonts.model.forecast.SampleForecast(freq="1D", info=None, item_id=None, samples=numpy.array([[-1.2084250267108132e-10, 3.12442398342494e-11, -2.101748462768338e-11, -2.496576383126392e-11, 3.6094478100823224e-11, 2.871905491197424e-10, 1.0578996012533537e-10, 1.2786748049276042e-10, 6.9389580886758395e-12, -8.163602432942341e-11, 5.5544138732877e-11, 1.0178761306045558e-10, 9.379572812884263e-11, -1.6927305390712633e-10, -3.576122037385332e-11, 1.0612476175619889e-10, 3.934761370893991e-12, 1.0069680506097356e-10, 1.0702431302300752e-10, -2.3611609195617334e-11, -7.368403109886401e-11, -3.144179014369364e-11, -5.246143450210461e-11, -4.321891802772093e-11, -2.977843666096547e-11, -4.1664480025793704e-12, -1.3422340496005436e-11, 1.6856026296974136e-10, 7.243356955899394e-12, -1.8454038644422077e-11, -9.529278408026975e-12, 5.52797911612668e-12, -3.4697442835573966e-11, -7.40935438003909e-12, -1.2243002445178064e-10, 4.9006996377665146e-11, 1.0007693979074972e-10, 2.230176980588805e-

[0;31mcall: bad argument type for built-in operation[0m: [0;31mcall: bad argument type for built-in operation[0m

In order to extract the predicted forecasts, the `samples` attribute is extracted.

Multiple sample paths are created giving various "realisations" of the future predictions, the average of all these paths are used as the forecasted predictions

In [18]:
show forecast_paths:(.p.wrap[preds[0]])[`:samples]`

-1!"There are ",string[count[forecast_paths]]," sample paths"
-1!"Each path has ",string[count[forecast_paths[0]]]," forecasted predictions"

forecasts:avg forecast_paths

-1.208425e-10 3.124424e-11  -2.101748e-11 -2.496576e-11 3.609448e-11  2.87190..
-5.48702e-12  1.143946e-10  2.187591e-11  7.773288e-11  4.637389e-12  -9.4559..
-2.53114e-12  -2.864163e-11 1.734085e-11  8.52451e-11   -3.177545e-11 -4.6533..
3.046141e-11  -1.662797e-11 6.734677e-11  -1.801338e-11 6.138102e-11  -1.1921..
3.172456e-10  8.103108e-11  -1.276152e-10 -4.905574e-11 -4.813729e-11 -2.9404..
-1.401817e-10 2.813933e-11  1.060771e-11  1.788246e-11  -1.488811e-10 -3.2833..
1.023125e-11  -5.183424e-12 -1.220506e-10 -5.125347e-11 4.963952e-11  2.59423..
1.062037e-10  -1.279667e-10 2.002275e-11  6.812319e-12  -1.402176e-11 8.05180..
-5.063393e-12 -3.777783e-11 9.253807e-11  -9.292339e-12 -6.454165e-11 -2.5349..
-4.827754e-11 7.033584e-11  4.025185e-11  -3.062474e-11 -3.661736e-11 1.75472..
-9.330168e-11 -9.166069e-11 1.23243e-10   6.098089e-12  4.556686e-11  4.30965..
-7.97318e-11  6.632414e-11  -3.375933e-11 3.649008e-12  5.403348e-11  2.55401..
7.495097e-11  1.622116e-10  2.16302e-10 

"There are 100 sample paths"


"Each path has 730 forecasted predictions"


In [22]:
-1!"The root mean squared loss error is " ,string .ml.rmsle[testtemp`temp;forecasts]

"The root mean squared loss error is 2.488439"


### Bikes

In [11]:
show data:1000#update date:"d"$timestamp from ("PFFFFFFBFF";enlist ",")0:`:data/london_merged.csv

timestamp                     cnt  t1  t2   hum  wind_speed weather_code is_h..
-----------------------------------------------------------------------------..
2015.01.04D00:00:00.000000000 182  3   2    93   6          3            0   ..
2015.01.04D01:00:00.000000000 138  3   2.5  93   5          1            0   ..
2015.01.04D02:00:00.000000000 134  2.5 2.5  96.5 0          1            0   ..
2015.01.04D03:00:00.000000000 72   2   2    100  0          1            0   ..
2015.01.04D04:00:00.000000000 47   2   0    93   6.5        1            0   ..
2015.01.04D05:00:00.000000000 46   2   2    93   4          1            0   ..
2015.01.04D06:00:00.000000000 51   1   -1   100  7          4            0   ..
2015.01.04D07:00:00.000000000 75   1   -1   100  7          4            0   ..
2015.01.04D08:00:00.000000000 131  1.5 -1   96.5 8          4            0   ..
2015.01.04D09:00:00.000000000 301  2   -0.5 100  9          3            0   ..
2015.01.04D10:00:00.000000000 528  3   -

In [13]:
// Find the timestamp which splits the data 80/20
split:min[data`timestamp]+.8*max[data`timestamp]-min[data`timestamp]

train:select from data where timestamp<=split
test:select from data where timestamp>split

-1!"There is ",string[count train]," datapoints in he training set"
-1!"There is ",string[count test]," datapoints in he testing set"

"There is 800 datapoints in he training set"


"There is 200 datapoints in he testing set"


In [90]:
start:pd[`:Timestamp]["j"$("d"$first data`timestamp)-1970.01.01;`freq pykw "1H";`unit pykw "D"]`
tab:.ml.tab2df ([]target:enlist train`cnt;start:enlist start)
newt:tab[`:to_dict][`orient pykw `records]

In [93]:
datalst:lstdata[newt;`freq pykw "1H"]

In [19]:
model:deepar[`:DeepAREstimator][`freq pykw "1H";`prediction_length pykw count test;`trainer pykw trainer[`epochs pykw 10]]

INFO:root:Using CPU


In [28]:
\ts predict:model[`:train][`training_data pykw datalst]

INFO:root:Start model training


learning rate from ``lr_scheduler`` has been overwritten by ``learning_rate`` in optimizer.


INFO:root:Epoch[0] Learning rate is 0.001
  0%|          | 0/50 [00:00<?, ?it/s]INFO:root:Number of parameters in DeepARTrainingNetwork: 27644
100%|██████████| 50/50 [00:17<00:00,  2.87it/s, avg_epoch_loss=7.61]
INFO:root:Epoch[0] Elapsed time 17.402 seconds
INFO:root:Epoch[0] Evaluation metric 'epoch_loss'=7.610264
INFO:root:Epoch[1] Learning rate is 0.001
100%|██████████| 50/50 [00:15<00:00,  3.23it/s, avg_epoch_loss=6.68]
INFO:root:Epoch[1] Elapsed time 15.486 seconds
INFO:root:Epoch[1] Evaluation metric 'epoch_loss'=6.678459
INFO:root:Epoch[2] Learning rate is 0.001
100%|██████████| 50/50 [00:16<00:00,  3.07it/s, avg_epoch_loss=6.34]
INFO:root:Epoch[2] Elapsed time 16.272 seconds
INFO:root:Epoch[2] Evaluation metric 'epoch_loss'=6.340207
INFO:root:Epoch[3] Learning rate is 0.001
100%|██████████| 50/50 [00:16<00:00,  3.12it/s, avg_epoch_loss=6.19]
INFO:root:Epoch[3] Elapsed time 16.010 seconds
INFO:root:Epoch[3] Evaluation metric 'epoch_loss'=6.186275
INFO:root:Epoch[4] Learning rat

165046 4196144


In [29]:
start:pd[`:Timestamp]["j"$("d"$first data`timestamp)-1970.01.01;`freq pykw "1H";`unit pykw "D"]`
tst:([]target:enlist data`cnt;start: start)
tst:.ml.tab2df tst
tsttab:tst[`:to_dict][`orient pykw `records]
datalsttst:lstdata[tsttab;`freq pykw "1H"]

In [89]:
enlist flip tab

target                                                                       ..
-----------------------------------------------------------------------------..
20.7 17.9 18.8 14.6 15.8 15.8 15.8 17.4 21.8 20 16.2 13.3 16.7 21.5 25 20.7 2..


In [56]:
preds:.p.import[`builtins][`:list][predict[`:predict][datalsttst]]

In [52]:
n:cc`
n2:n[0]
show n3:.p.wrap[n2][`:samples]`

1512.687 1247.039 864.1685 709.62   652.8145 459.6572 423.0172 344.2044 209.4..
1491.692 1311.47  808.2672 686.1235 565.0466 444.9091 369.0193 317.3401 229.9..
1495.944 1318.77  891.6616 694.3107 573.1056 448.6213 366.9089 285.6498 216.4..
1626.161 1409.544 896.2545 723.6349 609.601  464.1082 396.0421 315.3875 200.4..
1534.711 1528.774 812.0159 598.6399 513.17   471.3798 415.2264 322.7203 231.0..
1571.215 1352.338 853.0047 660.5516 552.9806 533.4406 436.4039 289.4491 252.0..
1583.13  1437.619 875.9233 703.0258 545.894  408.5998 320.732  272.9536 192.7..
1425.958 1244.505 887.532  673.4818 548.6979 417.5376 369.9981 318.7295 230.6..
1586.581 1317.967 990.7119 771.2639 585.2097 463.734  433.5912 338.2545 225.0..
1708.496 1567.836 994.4071 723.576  609.9359 463.9536 427.3461 335.781  233.4..
1523.728 1297.093 863.3671 724.3539 565.9529 464.8717 455.4143 342.7518 248.7..
1687.237 1511.437 1065.367 823.0226 615.8867 473.0256 389.1056 342.9338 236.4..
1575.044 1413.005 893.7167 718.7091 582.

In [61]:
.ml.rmse[test`cnt;avg n3]


1229.625


In [32]:
print cc

[gluonts.model.forecast.SampleForecast(freq="1H", info=None, item_id=None, samples=numpy.array([[1512.6865234375, 1247.0391845703125, 864.16845703125, 709.6199951171875, 652.814453125, 459.6572265625, 423.0172119140625, 344.20440673828125, 209.49053955078125, 167.9594268798828, 105.46268463134766, 65.2970962524414, 102.81623077392578, 298.7578125, 1124.278076171875, 1836.6868896484375, 1276.448486328125, 1351.95458984375, 1665.1024169921875, 1780.2076416015625, 1647.9990234375, 1474.4178466796875, 1383.0623779296875, 1484.816162109375, 1552.6092529296875, 1065.1068115234375, 684.9546508789062, 533.436279296875, 450.2315368652344, 337.2895812988281, 253.92576599121094, 167.02618408203125, 61.137969970703125, 28.64101791381836, 21.350017547607422, 31.719757080078125, 181.72283935546875, 858.591796875, 2410.464599609375, 3836.441650390625, 1505.147216796875, 787.569580078125, 988.357177734375, 1244.3590087890625, 1322.6839599609375, 1146.778076171875, 1089.5157470703125, 1498.137451171875