### Probabilistic forecasting using [GluonTS](https://ts.gluon.ai/index.html): Bitcoin example
[Probabilistic forecasting](https://en.wikipedia.org/wiki/Probabilistic_forecasting), rather than providing a single point prediction, provides a probability distribution as the outcome.
To do this we shall be using the [GluonTS - Probabilistic Time Series Modeling](https://ts.gluon.ai/index.html) package. This notebook is heavily based on the [Quick Start Tutorial](https://ts.gluon.ai/tutorials/forecasting/quick_start_tutorial.html) and the [Extended Forecasting Tutorial](https://ts.gluon.ai/tutorials/forecasting/extended_tutorial.html) that are both provided with the package.

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
#plt.style.use('fivethirtyeight')
plt.rcParams.update({'font.size': 12})
#plt.rcParams["figure.figsize"] = (12, 4)
from datetime import date

Install GluonTS

In [None]:
!pip install -q gluonts

For our data we shall be using the dataset provided by the kaggle [G-Research Crypto Forecasting](https://www.kaggle.com/c/g-research-crypto-forecasting) competition. We shall extract the data corresponding to the Bitcoin cryptocurrency. From this Bitcoin data we shall use only one datapoint each day, arbitrarily chosen to be the **High** value at 9:00 a.m.

In [None]:
train = pd.read_csv("../input/g-research-crypto-forecasting/train.csv")
# extract the data corresponding to Bitcoin (Asset_ID = 1)
bitcoin = train.query("Asset_ID == 1").reset_index(drop = True)
bitcoin['timestamp'] = pd.to_datetime(bitcoin['timestamp'], unit='s')
bitcoin = bitcoin.set_index('timestamp')
# extract the "High" value at 9:00 a.m. daily
indexer_9am = bitcoin.index.indexer_at_time('9:00:00')
values_at_9am = bitcoin.iloc[indexer_9am]
high_values_at_9am = values_at_9am[["High"]]
# take a look
high_values_at_9am

We shall forecast three months (90 days) worth of data, based on the 180 days prior to the start of the forecasting period

In [None]:
freq = "1D"             # the frequency of our data, here daily
context_length    = 180 # train on this number of days
prediction_length =  90 # predict these many days, these are removed from the end of the training data

Convert our dataframe into a GluonTS dataset

In [None]:
from gluonts.dataset.common import ListDataset

data_list = [{"start": "2018-01-01 09:00:00", "target": high_values_at_9am[c].values} for c in high_values_at_9am.columns]
train_ds  = ListDataset(data_iter=data_list,freq=freq)

### Training
Here we use the GulonTS’s pre-built [feedforward neural network estimator](https://ts.gluon.ai/api/gluonts/gluonts.model.simple_feedforward.html) `SimpleFeedForwardEstimator` in conjunction with a [trainer](https://ts.gluon.ai/api/gluonts/gluonts.mx.trainer.html). This feedforward network can be [substituted for a recurrent neural network (RNN)](https://ts.gluon.ai/tutorials/forecasting/extended_tutorial.html#From-feedforward-to-RNN), such as the  [DeepAREstimator](https://ts.gluon.ai/api/gluonts/gluonts.model.deepar.html).

In [None]:
from gluonts.model.simple_feedforward  import SimpleFeedForwardEstimator
# to use the DeepAREstimator 
# from gluonts.model.deepar import DeepAREstimator
from gluonts.mx.distribution.student_t import StudentTOutput
from gluonts.mx import Trainer

estimator = SimpleFeedForwardEstimator(num_hidden_dimensions=[50],
                                       freq=freq,
                                       context_length=context_length,
                                       prediction_length=prediction_length,
                                       distr_output=StudentTOutput(),
                                       trainer=Trainer(epochs=50,
                                                       learning_rate=1e-3,
                                                       num_batches_per_epoch=100,
                                                       patience=10))

predictor = estimator.train(train_ds)
print("Done")

### Prediction
We shall now make (here 2000) predictions using our model

In [None]:
from gluonts.evaluation import make_evaluation_predictions

forecast_it, ts_it = make_evaluation_predictions(
    dataset=train_ds,  # dataset
    predictor=predictor,  # predictor
    num_samples=2000,  # number of sample paths we want for evaluation
)

forecasts = list(forecast_it)
tss = list(ts_it)

### Visualization
And finally we shall now plot our probabilistic forecast (here with the 50% and 90% prediction intervals shown) along with the ground truth values

In [None]:
def plot_prob_forecasts(ts_entry, forecast_entry):
    plot_length = context_length + prediction_length
    prediction_intervals = (50.0, 90.0)
    legend = ["ground truth", "median prediction"] + [f"{k}% prediction interval" for k in prediction_intervals][::-1]

    fig, ax = plt.subplots(1, 1, figsize=(18, 7))
    ts_entry[-plot_length:].plot(ax=ax)  # plot the time series
    forecast_entry.plot(prediction_intervals=prediction_intervals, color='cadetblue')
    plt.axvline(forecast_entry.start_date, color='g', lw=1) # end of train dataset
    plt.grid(which="major")
    plt.legend(legend, loc="upper left")
    plt.show();
    
plot_prob_forecasts(tss[0], forecasts[0])

We can extract the mean and median values of our forecast as follows:

In [None]:
# first entry of the forecast list
forecast_entry = forecasts[0]
print(f"Mean of the future window:\n {forecast_entry.mean}")
print(f"0.5-quantile (median) of the future window:\n {forecast_entry.quantile(0.5)}")

It is worth mentioning that if desired [point forecasts](https://ts.gluon.ai/tutorials/forecasting/extended_tutorial.html#Point-forecasts-with-a-simple-feedforward-network) can also be made using GluonTS.

### What about the minutes?
In the above example we have taken one sample per day, however, we have much more fine grained data at our disposal. Would we do better if we looked at the data on the scale of minutes? It was [Benoit Mandelbrot](https://en.wikipedia.org/wiki/Benoit_Mandelbrot) that was the first to suggest that stock data may be [fractal](https://en.wikipedia.org/wiki/Fractal) in nature; the data looking similar no matter what the scale. To quantify this somewhat here we shall calculate the [Higuchi fractal dimension](https://en.wikipedia.org/wiki/Higuchi_dimension) for the daily sample above, and for a similar sample size on the minute scale. To do this we shall use the [HFDA](https://github.com/hiroki-kojima/HFDA) package written by Hiroki Kojima. 

In [None]:
!pip install -q hfda
import hfda

In [None]:
k_max = 15
D = hfda.measure(np.concatenate(high_values_at_9am.values), k_max)
print("Higuchi fractal dimension = %.2f" % D)

Now we shall create the minutes dataset

In [None]:
bitcoin_minutes = bitcoin[["High"]].tail(1358)

D = hfda.measure(np.concatenate(bitcoin_minutes.values), k_max)
print("Higuchi fractal dimension = %.2f" %D)

Indeed the values seem similar, with only a slightly lower value for the minute data . Let us repeat a probabilistic forecast, now for the minutes data

In [None]:
freq = "1min"           # the frequency of our data, now in minutes
context_length    = 180 # train on this number of days
prediction_length =  90 # predict these many days, these are removed from the end of the training data

data_list = [{"start": "2021-09-20 01:23:00", "target": bitcoin_minutes[c].values} for c in bitcoin_minutes.columns]
train_ds  = ListDataset(data_iter=data_list,freq=freq)

estimator = SimpleFeedForwardEstimator(num_hidden_dimensions=[50],
                                       freq=freq,
                                       context_length=context_length,
                                       prediction_length=prediction_length,
                                       distr_output=StudentTOutput(),
                                       trainer=Trainer(epochs=50,
                                                       learning_rate=1e-3,
                                                       num_batches_per_epoch=100,
                                                       patience=10))

predictor = estimator.train(train_ds)

In [None]:
forecast_it, ts_it = make_evaluation_predictions(
    dataset=train_ds,  # dataset
    predictor=predictor,  # predictor
    num_samples=2000,  # number of sample paths we want for evaluation
)

forecasts = list(forecast_it)
tss = list(ts_it)

plot_prob_forecasts(tss[0], forecasts[0])

Indeed it does seem that forecasting on the minute scale is just as challenging as forecasting on the daily scale.

### Related packages
* [NGBoost: Natural Gradient Boosting for Probabilistic Prediction](https://stanfordmlgroup.github.io/projects/ngboost/)

### Related kaggle notebooks
* ["M5 - Sales Uncertainty Prediction"](https://www.kaggle.com/allunia/m5-sales-uncertainty-prediction) written by [Laura Fink](https://www.kaggle.com/allunia)
* ["M5 Forecasting Competition GluonTS Template"](https://www.kaggle.com/steverab/m5-forecasting-competition-gluonts-template) written by [Stephan Rabanser](https://www.kaggle.com/steverab)

### Related reading
* [Rob J. Hyndman and George Athanasopoulos "*Forecasting: Principles and Practice*", (3rd Edition)](https://otexts.com/fpp3/)
* [Fotios Petropoulos, *et al. "Forecasting: Theory and practice*", arXiv:2012.03854 (2020)](https://arxiv.org/pdf/2012.03854.pdf)
* [DeepAR Forecasting Algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar.html)
* [Benoit B. Mandelbrot "*The Variation of Certain Speculative Prices*", The Journal of Business **Vol. 36** pp. 394-419 (1963)](https://www.jstor.org/stable/2350970)
* [Benoit B. Mandelbrot "*The Variation of Some Other Speculative Prices*", The Journal of Business **Vol. 40** pp. 393-413 (1967)](https://www.jstor.org/stable/2351623)
* [Benoit B. Mandelbrot, Adlai J. Fisher and Laurent E. Calvet "*A Multifractal Model of Asset Returns*", Cowles Foundation Discussion Paper No. 1164 (1997)](https://ssrn.com/abstract=78588)
* [T. Higuchi "*Approach to an irregular time series on the basis of the fractal theory*", Physica D: Nonlinear Phenomena **Vol. 31** pp. 277-283 (1988)](https://doi.org/10.1016/0167-2789(88)90081-4)