## Tree Based Ensemble Models for Price Returns Forecasting

### Approach
1. Set Up
2. Identifying Suitable Lags for Price, Volume & Uncertainty Indices
3. Modelling w/ Grid Search & Forecast Evaluation
   1. Random Forest (h = 1 to 12)
      1. Model A (With Price Returns, Price & Volume)
      2. Model B (With A + Lucey Original Price) 
      3. Model C (With A + Lucey Reddit Price)
      4. Model D (With A + LDA Reddit Price)
      5. Model E (With A + Top2Vec Reddit Price)
      6. Model F (With A + VCRIX)
      7. Policy Based Models?

TBD:
1. Add in More Horizon Values [Running]
2. Compute historical forecasts with best models => Direction of Forecast

### Set Up

In [66]:
# NB config
%load_ext autoreload
%autoreload 2

# Load Libraries
import os

os.chdir("../../")
from typing import Any, List, Dict, Union
import pandas as pd
import numpy as np
from pathlib import Path
from darts import TimeSeries
from darts.metrics import (
    rmse,
)
from tqdm import tqdm
from darts import concatenate
from darts.models.forecasting.random_forest import (
    RandomForest,
)
from sklearn.model_selection import ParameterGrid
import warnings

warnings.filterwarnings("ignore")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Data Preparation

In [2]:
# Data Dir
data_dir = Path("forecasting/data/modelling")

# BTC-USD data
btc_usd_fp = data_dir / "btc_usd_weekly.csv"
btc_usd_df = pd.read_csv(btc_usd_fp)

# UCRY Indices data
ucry_fp = data_dir / "ucry_indices_weekly.csv"
ucry_df = pd.read_csv(ucry_fp)

#### Create ***h***-weeks Log Price Returns Time Series

In [3]:
# h = 1 (Weekly Price Returns)
btc_usd_df["Price Returns (h=1)"] = np.log1p(btc_usd_df[["Price"]].pct_change(1))

# h = 4 (4 Week Price Returns)
btc_usd_df["Price Returns (h=4)"] = np.log1p(btc_usd_df[["Price"]].pct_change(4))

# h = 12 (12 Week Price Returns)
btc_usd_df["Price Returns (h=12)"] = np.log1p(btc_usd_df[["Price"]].pct_change(12))

# Create TimeSeries
# h = 1 (Weekly Price Returns)
btc_usd1_ts = TimeSeries.from_dataframe(
    btc_usd_df[["Date", "Price Returns (h=1)"]].dropna(), time_col="Date"
)

# h = 4 (4 Week Price Returns)
btc_usd4_ts = TimeSeries.from_dataframe(
    btc_usd_df[["Date", "Price Returns (h=4)"]].dropna(), time_col="Date"
)

# h = 12 (12 Week Price Returns)
btc_usd12_ts = TimeSeries.from_dataframe(
    btc_usd_df[["Date", "Price Returns (h=12)"]].dropna(), time_col="Date"
)

In [4]:
def gen_log_price_returns(
    series: pd.DataFrame, h: int, date_col: str = "Date", var_col: str = "Price"
) -> TimeSeries:
    new_col_name = f"Price Returns (h={h})"
    series[new_col_name] = np.log1p(series[[var_col]].pct_change(h))
    new_ts = TimeSeries.from_dataframe(
        series[[date_col, new_col_name]].dropna(), time_col=date_col
    )
    return new_ts

### Create Price and Volume Time Series

In [5]:
# Price
price_ts = TimeSeries.from_dataframe(btc_usd_df[["Date", "Price"]], time_col="Date")

# Volume
vol_ts = TimeSeries.from_dataframe(btc_usd_df[["Date", "Volume"]], time_col="Date")

#### Create UCRY Indices Time Series

In [6]:
# Create TimeSeries
sel_cols = ["Date", "Index Value"]
time_col = "Date"

# Lucey Price
lucey_price = TimeSeries.from_dataframe(
    ucry_df[ucry_df.Index == "Lucey-Original-Price"].reset_index()[sel_cols],
    time_col=time_col,
)

# Lucey Reddit Price
lucey_reddit_price = TimeSeries.from_dataframe(
    ucry_df[ucry_df.Index == "Lucey-Reddit-Price"].reset_index()[sel_cols],
    time_col=time_col,
)

# LDA Price
lda_price = TimeSeries.from_dataframe(
    ucry_df[ucry_df.Index == "LDA-Reddit-Price"].reset_index()[sel_cols],
    time_col=time_col,
)

# Top2Vec Price
t2v_price = TimeSeries.from_dataframe(
    ucry_df[ucry_df.Index == "Top2Vec-Reddit-Price"].reset_index()[sel_cols],
    time_col=time_col,
)

# VCRIX
vcrix = TimeSeries.from_dataframe(
    ucry_df[ucry_df.Index == "VCRIX"].reset_index()[sel_cols], time_col=time_col
)

#### Train Test Split Date

In [7]:
# Split into Train and Test
split_date = pd.Timestamp("20190527")

### Identify Suitable Lags for UCRY Index Time Series
* STATUS: Use **t** variables to predict **t + h** variables for now

### Random Forest Forecasting Model & Evaluation

#### Define Params Grid for Grid Search

In [8]:
# Params Grid
rf_params_grid = {
    "n_estimators": [50, 100, 300],
    "max_depth": [2, 5, 10],
    "criterion": ["squared_error"],
    "max_features": [1 / 3, "auto"],
    "n_jobs": [-1],
    "random_state": [42],
    "oob_score": [True],
}

rf_params_list = list(ParameterGrid(rf_params_grid))
len(rf_params_list)

18

### Define Forecast Horizons

In [9]:
horizons = list(range(1, 13))

### Random Forest GridSearch Helper

In [10]:
# Run Grid Search (Runs for super long may be buggy - Revisit in the future)
# rfA_1_best_model, rfA_1_best_params = RandomForest(
#    lags=1,
#     lags_past_covariates=1
# ).gridsearch(
#     parameters=rf_params_grid,
#     series=btc_usd1_ts,
#     past_covariates=rfA_1_past_covs,
#     forecast_horizon=1,
#     stride=1,
#     start=split_date,
#     metric=mse,
#     reduction=np.mean,
#     verbose=True,
#     n_jobs=-1
# )
#
# pprint(rfA_1_best_params)

In [11]:
# Homemade RF Grid Search


def gridsearch_RF(
    series: TimeSeries,
    past_covariates: TimeSeries,
    forecast_horizon: int,
    lags: int,
    lags_past_covariates: int,
    verbose: bool = False,
    params_list: List[Any] = rf_params_list,
    error_metric: Any = rmse,
    split_date: pd.Timestamp = split_date,
):

    min_error = np.float("inf")
    best_params = None
    for params in tqdm(params_list, leave=True):
        model = RandomForest(
            lags=lags, lags_past_covariates=lags_past_covariates, **params
        )
        error = model.backtest(
            series=series,
            past_covariates=past_covariates,
            forecast_horizon=forecast_horizon,
            stride=1,
            start=split_date,
            metric=error_metric,
            reduction=np.mean,
            verbose=verbose,
        )
        if error < min_error:
            min_error = error
            best_params = params

    print("Average RMSE over all historical forecasts: %.2f" % min_error)
    print("Best Params: %s" % str(best_params))

    return best_params, min_error

In [12]:
# Multi Horizon Forecasting Helper


def multi_forecast_GS_RF(
    target_df: pd.DataFrame,
    past_covariates: List[TimeSeries],
    forecast_horizons: List[int] = horizons,
    lags: int = 1,
    past_covariates_lags: int = 1,
    verbose: bool = False,
    error_metric: Any = rmse,
    params_list: List[Any] = rf_params_list,
    split_date: pd.Timestamp = split_date,
) -> None:

    best_results = {}

    for h in tqdm(forecast_horizons):
        print(f"Recursively Forecasting horizon={h} with Random Forest + Grid Search")

        # Generate Appropriate TS Data
        target_cov = gen_log_price_returns(target_df, h)
        past_cov_list = [target_cov]
        (
            past_cov_list.extend(
                list(map(lambda x: x.slice_intersect(target_cov), past_covariates))
            )
        )
        past_cov_list_tidy = concatenate(past_cov_list, axis=1)
        # Run Grid Search RF
        best_params, min_error = gridsearch_RF(
            target_cov,
            past_cov_list_tidy,
            h,
            lags,
            past_covariates_lags,
            verbose,
            params_list,
            error_metric,
            split_date,
        )

        best_results[h] = {"best_params": best_params, "min_error": min_error}

    return best_results

#### Model A (Price Returns, Price & Volume) for h = 1 to 12

In [13]:
# Model A Multi Horizon
RF_A_results = multi_forecast_GS_RF(
    btc_usd_df,
    [price_ts, vol_ts],
)

  0%|          | 0/12 [00:00<?, ?it/s]

Recursively Forecasting horizon=1 with Random Forest + Grid Search


100%|██████████| 18/18 [05:22<00:00, 17.90s/it]
  8%|▊         | 1/12 [05:22<59:04, 322.26s/it]

Average RMSE over all historical forecasts: 0.08
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 0.3333333333333333, 'n_estimators': 100, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=2 with Random Forest + Grid Search


100%|██████████| 18/18 [05:50<00:00, 19.48s/it]
 17%|█▋        | 2/12 [11:12<56:29, 339.00s/it]

Average RMSE over all historical forecasts: 0.11
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 0.3333333333333333, 'n_estimators': 100, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=3 with Random Forest + Grid Search


100%|██████████| 18/18 [06:23<00:00, 21.29s/it]
 25%|██▌       | 3/12 [17:36<53:52, 359.19s/it]

Average RMSE over all historical forecasts: 0.14
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 100, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=4 with Random Forest + Grid Search


100%|██████████| 18/18 [07:36<00:00, 25.37s/it]
 33%|███▎      | 4/12 [25:12<53:01, 397.66s/it]

Average RMSE over all historical forecasts: 0.14
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 50, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=5 with Random Forest + Grid Search


100%|██████████| 18/18 [07:53<00:00, 26.29s/it]
 42%|████▏     | 5/12 [33:06<49:34, 424.89s/it]

Average RMSE over all historical forecasts: 0.15
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 100, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=6 with Random Forest + Grid Search


100%|██████████| 18/18 [07:50<00:00, 26.17s/it]
 50%|█████     | 6/12 [40:56<44:03, 440.57s/it]

Average RMSE over all historical forecasts: 0.15
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 100, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=7 with Random Forest + Grid Search


100%|██████████| 18/18 [07:58<00:00, 26.58s/it]
 58%|█████▊    | 7/12 [48:55<37:44, 452.96s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 100, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=8 with Random Forest + Grid Search


100%|██████████| 18/18 [08:12<00:00, 27.34s/it]
 67%|██████▋   | 8/12 [57:07<31:01, 465.46s/it]

Average RMSE over all historical forecasts: 0.16
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 100, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=9 with Random Forest + Grid Search


100%|██████████| 18/18 [08:23<00:00, 27.99s/it]
 75%|███████▌  | 9/12 [1:05:31<23:52, 477.48s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 100, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=10 with Random Forest + Grid Search


100%|██████████| 18/18 [08:36<00:00, 28.69s/it]
 83%|████████▎ | 10/12 [1:14:08<16:18, 489.50s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 100, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=11 with Random Forest + Grid Search


100%|██████████| 18/18 [08:47<00:00, 29.33s/it]
 92%|█████████▏| 11/12 [1:22:55<08:21, 501.26s/it]

Average RMSE over all historical forecasts: 0.18
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 50, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=12 with Random Forest + Grid Search


100%|██████████| 18/18 [09:00<00:00, 30.01s/it]
100%|██████████| 12/12 [1:31:56<00:00, 459.68s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 50, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}





#### Model B (Price Returns, Price & Volume + Lucey Price Index)

In [14]:
# Model B Multi Horizon
RF_B_results = multi_forecast_GS_RF(
    btc_usd_df,
    [price_ts, vol_ts, lucey_price],
)

  0%|          | 0/12 [00:00<?, ?it/s]

Recursively Forecasting horizon=1 with Random Forest + Grid Search


100%|██████████| 18/18 [06:30<00:00, 21.69s/it]
  8%|▊         | 1/12 [06:30<1:11:34, 390.42s/it]

Average RMSE over all historical forecasts: 0.08
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 0.3333333333333333, 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=2 with Random Forest + Grid Search


100%|██████████| 18/18 [06:49<00:00, 22.73s/it]
 17%|█▋        | 2/12 [13:19<1:06:54, 401.45s/it]

Average RMSE over all historical forecasts: 0.11
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 0.3333333333333333, 'n_estimators': 50, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=3 with Random Forest + Grid Search


100%|██████████| 18/18 [07:04<00:00, 23.59s/it]
 25%|██▌       | 3/12 [20:24<1:01:48, 412.04s/it]

Average RMSE over all historical forecasts: 0.14
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=4 with Random Forest + Grid Search


100%|██████████| 18/18 [07:18<00:00, 24.34s/it]
 33%|███▎      | 4/12 [27:42<56:18, 422.34s/it]  

Average RMSE over all historical forecasts: 0.14
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=5 with Random Forest + Grid Search


100%|██████████| 18/18 [07:31<00:00, 25.10s/it]
 42%|████▏     | 5/12 [35:14<50:30, 432.97s/it]

Average RMSE over all historical forecasts: 0.15
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=6 with Random Forest + Grid Search


100%|██████████| 18/18 [07:45<00:00, 25.84s/it]
 50%|█████     | 6/12 [42:59<44:23, 443.89s/it]

Average RMSE over all historical forecasts: 0.15
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=7 with Random Forest + Grid Search


100%|██████████| 18/18 [07:58<00:00, 26.60s/it]
 58%|█████▊    | 7/12 [50:58<37:56, 455.31s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=8 with Random Forest + Grid Search


100%|██████████| 18/18 [08:11<00:00, 27.28s/it]
 67%|██████▋   | 8/12 [59:09<31:06, 466.68s/it]

Average RMSE over all historical forecasts: 0.16
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=9 with Random Forest + Grid Search


100%|██████████| 18/18 [08:23<00:00, 27.98s/it]
 75%|███████▌  | 9/12 [1:07:32<23:54, 478.25s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=10 with Random Forest + Grid Search


100%|██████████| 18/18 [08:36<00:00, 28.67s/it]
 83%|████████▎ | 10/12 [1:16:08<16:19, 489.93s/it]

Average RMSE over all historical forecasts: 0.18
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=11 with Random Forest + Grid Search


100%|██████████| 18/18 [08:47<00:00, 29.32s/it]
 92%|█████████▏| 11/12 [1:24:56<08:21, 501.51s/it]

Average RMSE over all historical forecasts: 0.18
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 50, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=12 with Random Forest + Grid Search


100%|██████████| 18/18 [08:59<00:00, 30.00s/it]
100%|██████████| 12/12 [1:33:56<00:00, 469.72s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}





#### Model C (Price Returns, Price & Volume + Lucey Reddit Price Index)

In [15]:
# Model C Multi Horizon
RF_C_results = multi_forecast_GS_RF(
    btc_usd_df,
    [price_ts, vol_ts, lucey_reddit_price],
)

  0%|          | 0/12 [00:00<?, ?it/s]

Recursively Forecasting horizon=1 with Random Forest + Grid Search


100%|██████████| 18/18 [06:30<00:00, 21.72s/it]
  8%|▊         | 1/12 [06:30<1:11:40, 390.92s/it]

Average RMSE over all historical forecasts: 0.08
Best Params: {'criterion': 'squared_error', 'max_depth': 5, 'max_features': 0.3333333333333333, 'n_estimators': 100, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=2 with Random Forest + Grid Search


100%|██████████| 18/18 [06:48<00:00, 22.70s/it]
 17%|█▋        | 2/12 [13:19<1:06:53, 401.38s/it]

Average RMSE over all historical forecasts: 0.11
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 0.3333333333333333, 'n_estimators': 50, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=3 with Random Forest + Grid Search


100%|██████████| 18/18 [07:04<00:00, 23.57s/it]
 25%|██▌       | 3/12 [20:23<1:01:46, 411.82s/it]

Average RMSE over all historical forecasts: 0.14
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=4 with Random Forest + Grid Search


100%|██████████| 18/18 [07:18<00:00, 24.35s/it]
 33%|███▎      | 4/12 [27:42<56:18, 422.31s/it]  

Average RMSE over all historical forecasts: 0.14
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=5 with Random Forest + Grid Search


100%|██████████| 18/18 [07:32<00:00, 25.11s/it]
 42%|████▏     | 5/12 [35:14<50:31, 433.03s/it]

Average RMSE over all historical forecasts: 0.15
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=6 with Random Forest + Grid Search


100%|██████████| 18/18 [07:44<00:00, 25.83s/it]
 50%|█████     | 6/12 [42:59<44:23, 443.89s/it]

Average RMSE over all historical forecasts: 0.15
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=7 with Random Forest + Grid Search


100%|██████████| 18/18 [07:59<00:00, 26.62s/it]
 58%|█████▊    | 7/12 [50:58<37:57, 455.42s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=8 with Random Forest + Grid Search


100%|██████████| 18/18 [08:11<00:00, 27.29s/it]
 67%|██████▋   | 8/12 [59:09<31:07, 466.81s/it]

Average RMSE over all historical forecasts: 0.16
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=9 with Random Forest + Grid Search


100%|██████████| 18/18 [08:24<00:00, 28.01s/it]
 75%|███████▌  | 9/12 [1:07:33<23:55, 478.50s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=10 with Random Forest + Grid Search


100%|██████████| 18/18 [08:36<00:00, 28.70s/it]
 83%|████████▎ | 10/12 [1:16:10<16:20, 490.28s/it]

Average RMSE over all historical forecasts: 0.18
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=11 with Random Forest + Grid Search


100%|██████████| 18/18 [08:48<00:00, 29.35s/it]
 92%|█████████▏| 11/12 [1:24:58<08:21, 501.94s/it]

Average RMSE over all historical forecasts: 0.18
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 50, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=12 with Random Forest + Grid Search


100%|██████████| 18/18 [09:00<00:00, 30.01s/it]
100%|██████████| 12/12 [1:33:59<00:00, 469.92s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}





#### Model D (Price Returns, Price & Volume + LDA Reddit Price Index)

In [16]:
# Model D Multi Horizon
RF_D_results = multi_forecast_GS_RF(
    btc_usd_df,
    [price_ts, vol_ts, lda_price],
)

  0%|          | 0/12 [00:00<?, ?it/s]

Recursively Forecasting horizon=1 with Random Forest + Grid Search


100%|██████████| 18/18 [06:30<00:00, 21.70s/it]
  8%|▊         | 1/12 [06:30<1:11:37, 390.68s/it]

Average RMSE over all historical forecasts: 0.08
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 0.3333333333333333, 'n_estimators': 50, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=2 with Random Forest + Grid Search


100%|██████████| 18/18 [06:48<00:00, 22.70s/it]
 17%|█▋        | 2/12 [13:19<1:06:52, 401.21s/it]

Average RMSE over all historical forecasts: 0.11
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 0.3333333333333333, 'n_estimators': 50, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=3 with Random Forest + Grid Search


100%|██████████| 18/18 [07:04<00:00, 23.57s/it]
 25%|██▌       | 3/12 [20:23<1:01:45, 411.73s/it]

Average RMSE over all historical forecasts: 0.14
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=4 with Random Forest + Grid Search


100%|██████████| 18/18 [07:18<00:00, 24.37s/it]
 33%|███▎      | 4/12 [27:42<56:18, 422.37s/it]  

Average RMSE over all historical forecasts: 0.14
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=5 with Random Forest + Grid Search


100%|██████████| 18/18 [07:31<00:00, 25.09s/it]
 42%|████▏     | 5/12 [35:13<50:30, 432.94s/it]

Average RMSE over all historical forecasts: 0.15
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=6 with Random Forest + Grid Search


100%|██████████| 18/18 [07:44<00:00, 25.80s/it]
 50%|█████     | 6/12 [42:58<44:21, 443.66s/it]

Average RMSE over all historical forecasts: 0.15
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=7 with Random Forest + Grid Search


100%|██████████| 18/18 [07:58<00:00, 26.60s/it]
 58%|█████▊    | 7/12 [50:57<37:55, 455.15s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=8 with Random Forest + Grid Search


100%|██████████| 18/18 [08:10<00:00, 27.26s/it]
 67%|██████▋   | 8/12 [59:07<31:05, 466.46s/it]

Average RMSE over all historical forecasts: 0.16
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=9 with Random Forest + Grid Search


100%|██████████| 18/18 [08:23<00:00, 27.99s/it]
 75%|███████▌  | 9/12 [1:07:31<23:54, 478.13s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=10 with Random Forest + Grid Search


100%|██████████| 18/18 [08:36<00:00, 28.69s/it]
 83%|████████▎ | 10/12 [1:16:08<16:19, 489.96s/it]

Average RMSE over all historical forecasts: 0.18
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=11 with Random Forest + Grid Search


100%|██████████| 18/18 [08:47<00:00, 29.32s/it]
 92%|█████████▏| 11/12 [1:24:55<08:21, 501.51s/it]

Average RMSE over all historical forecasts: 0.18
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 50, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=12 with Random Forest + Grid Search


100%|██████████| 18/18 [08:59<00:00, 30.00s/it]
100%|██████████| 12/12 [1:33:55<00:00, 469.65s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}





#### Model E (Price Returns, Price & Volume + Top2Vec Reddit Price Index)

In [21]:
# Model E Multi Horizon
RF_E_results = multi_forecast_GS_RF(
    btc_usd_df,
    [price_ts, vol_ts, t2v_price],
)

  0%|          | 0/12 [00:00<?, ?it/s]

Recursively Forecasting horizon=1 with Random Forest + Grid Search


100%|██████████| 18/18 [06:31<00:00, 21.73s/it]
  8%|▊         | 1/12 [06:31<1:11:43, 391.25s/it]

Average RMSE over all historical forecasts: 0.08
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 0.3333333333333333, 'n_estimators': 50, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=2 with Random Forest + Grid Search


100%|██████████| 18/18 [06:49<00:00, 22.75s/it]
 17%|█▋        | 2/12 [13:20<1:06:59, 401.98s/it]

Average RMSE over all historical forecasts: 0.11
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 0.3333333333333333, 'n_estimators': 50, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=3 with Random Forest + Grid Search


100%|██████████| 18/18 [07:05<00:00, 23.62s/it]
 25%|██▌       | 3/12 [20:25<1:01:53, 412.60s/it]

Average RMSE over all historical forecasts: 0.14
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=4 with Random Forest + Grid Search


100%|██████████| 18/18 [07:19<00:00, 24.41s/it]
 33%|███▎      | 4/12 [27:45<56:25, 423.15s/it]  

Average RMSE over all historical forecasts: 0.14
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=5 with Random Forest + Grid Search


100%|██████████| 18/18 [07:33<00:00, 25.17s/it]
 42%|████▏     | 5/12 [35:18<50:37, 433.92s/it]

Average RMSE over all historical forecasts: 0.15
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=6 with Random Forest + Grid Search


100%|██████████| 18/18 [07:46<00:00, 25.90s/it]
 50%|█████     | 6/12 [43:04<44:29, 444.90s/it]

Average RMSE over all historical forecasts: 0.15
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=7 with Random Forest + Grid Search


100%|██████████| 18/18 [07:59<00:00, 26.66s/it]
 58%|█████▊    | 7/12 [51:04<38:01, 456.36s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=8 with Random Forest + Grid Search


100%|██████████| 18/18 [08:12<00:00, 27.37s/it]
 67%|██████▋   | 8/12 [59:17<31:11, 467.93s/it]

Average RMSE over all historical forecasts: 0.16
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=9 with Random Forest + Grid Search


100%|██████████| 18/18 [08:25<00:00, 28.09s/it]
 75%|███████▌  | 9/12 [1:07:42<23:59, 479.71s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=10 with Random Forest + Grid Search


100%|██████████| 18/18 [08:38<00:00, 28.79s/it]
 83%|████████▎ | 10/12 [1:16:21<16:23, 491.60s/it]

Average RMSE over all historical forecasts: 0.18
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=11 with Random Forest + Grid Search


100%|██████████| 18/18 [08:49<00:00, 29.41s/it]
 92%|█████████▏| 11/12 [1:25:10<08:23, 503.17s/it]

Average RMSE over all historical forecasts: 0.18
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 50, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=12 with Random Forest + Grid Search


100%|██████████| 18/18 [09:01<00:00, 30.10s/it]
100%|██████████| 12/12 [1:34:12<00:00, 471.02s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 5, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}





#### Model F (Price Returns, Price & Volume + VCRIX) [TBD: Special Case due to cov concat]

In [19]:
def multi_forecast_GS_RF_VCRIX(
    target_df: pd.DataFrame,
    forecast_horizons: List[int] = horizons,
    lags: int = 1,
    past_covariates_lags: int = 1,
    verbose: bool = False,
    error_metric: Any = rmse,
    params_list: List[Any] = rf_params_list,
    split_date: pd.Timestamp = split_date,
) -> None:

    best_results = {}

    for h in tqdm(forecast_horizons):
        print(f"Recursively Forecasting horizon={h} with Random Forest + Grid Search")

        # Generate Appropriate TS Data
        target_cov = gen_log_price_returns(target_df, h)
        past_cov_list = concatenate(
            [
                target_cov.slice_intersect(vcrix),
                price_ts.slice_intersect(vcrix),
                vol_ts.slice_intersect(vcrix),
                vcrix.slice_intersect(target_cov.slice_intersect(vcrix)),
            ],
            axis=1,
        )
        # Run Grid Search RF
        best_params, min_error = gridsearch_RF(
            target_cov,
            past_cov_list,
            h,
            lags,
            past_covariates_lags,
            verbose,
            params_list,
            error_metric,
            split_date,
        )

        best_results[h] = {"best_params": best_params, "min_error": min_error}

    return best_results

In [20]:
RF_F_results = multi_forecast_GS_RF_VCRIX(btc_usd_df)

  0%|          | 0/12 [00:00<?, ?it/s]

Recursively Forecasting horizon=1 with Random Forest + Grid Search


100%|██████████| 18/18 [05:18<00:00, 17.72s/it]
  8%|▊         | 1/12 [05:18<58:28, 318.94s/it]

Average RMSE over all historical forecasts: 0.08
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 0.3333333333333333, 'n_estimators': 100, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=2 with Random Forest + Grid Search


100%|██████████| 18/18 [05:33<00:00, 18.51s/it]
 17%|█▋        | 2/12 [10:52<54:33, 327.34s/it]

Average RMSE over all historical forecasts: 0.11
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 0.3333333333333333, 'n_estimators': 50, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=3 with Random Forest + Grid Search


100%|██████████| 18/18 [05:47<00:00, 19.33s/it]
 25%|██▌       | 3/12 [16:40<50:30, 336.74s/it]

Average RMSE over all historical forecasts: 0.14
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=4 with Random Forest + Grid Search


100%|██████████| 18/18 [06:05<00:00, 20.28s/it]
 33%|███▎      | 4/12 [22:45<46:23, 347.95s/it]

Average RMSE over all historical forecasts: 0.14
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=5 with Random Forest + Grid Search


100%|██████████| 18/18 [07:25<00:00, 24.73s/it]
 42%|████▏     | 5/12 [30:10<44:40, 383.00s/it]

Average RMSE over all historical forecasts: 0.15
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=6 with Random Forest + Grid Search


100%|██████████| 18/18 [07:42<00:00, 25.68s/it]
 50%|█████     | 6/12 [37:52<40:59, 409.97s/it]

Average RMSE over all historical forecasts: 0.16
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=7 with Random Forest + Grid Search


100%|██████████| 18/18 [07:56<00:00, 26.47s/it]
 58%|█████▊    | 7/12 [45:49<35:58, 431.72s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=8 with Random Forest + Grid Search


100%|██████████| 18/18 [08:09<00:00, 27.18s/it]
 67%|██████▋   | 8/12 [53:58<30:00, 450.05s/it]

Average RMSE over all historical forecasts: 0.16
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=9 with Random Forest + Grid Search


100%|██████████| 18/18 [08:21<00:00, 27.89s/it]
 75%|███████▌  | 9/12 [1:02:20<23:18, 466.29s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=10 with Random Forest + Grid Search


100%|██████████| 18/18 [08:34<00:00, 28.58s/it]
 83%|████████▎ | 10/12 [1:10:55<16:02, 481.18s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=11 with Random Forest + Grid Search


100%|██████████| 18/18 [08:46<00:00, 29.23s/it]
 92%|█████████▏| 11/12 [1:19:41<08:14, 494.96s/it]

Average RMSE over all historical forecasts: 0.18
Best Params: {'criterion': 'squared_error', 'max_depth': 5, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}
Recursively Forecasting horizon=12 with Random Forest + Grid Search


100%|██████████| 18/18 [08:58<00:00, 29.92s/it]
100%|██████████| 12/12 [1:28:39<00:00, 443.32s/it]

Average RMSE over all historical forecasts: 0.17
Best Params: {'criterion': 'squared_error', 'max_depth': 2, 'max_features': 'auto', 'n_estimators': 300, 'n_jobs': -1, 'oob_score': True, 'random_state': 42}





### Consolidate and Save Best Model Results [Add to end of each GridSearch later]


In [50]:
save_data_dir = Path(
    "/Users/christopherliew/Desktop/Y4S1/HT/crypto_uncertainty_index/forecasting/data/forecasts/random_forest"
)

In [51]:
def format_results(res_dict: Dict[int, Any], model_name: str) -> pd.DataFrame:
    df = pd.DataFrame(res_dict).T.reset_index().rename(columns={"index": "horizon"})
    df["model"] = model_name
    return df

In [52]:
RF_A_results_df = format_results(RF_A_results, "A")
RF_B_results_df = format_results(RF_B_results, "B")
RF_C_results_df = format_results(RF_C_results, "C")
RF_D_results_df = format_results(RF_D_results, "D")
RF_E_results_df = format_results(RF_E_results, "E")
RF_F_results_df = format_results(RF_F_results, "F")

In [53]:
# Save as csv
models = ["A", "B", "C", "D", "E", "F"]

res = [
    RF_A_results_df,
    RF_B_results_df,
    RF_C_results_df,
    RF_D_results_df,
    RF_E_results_df,
    RF_F_results_df,
]

model_res_collection = zip(models, res)

for name, res in model_res_collection:
    path = save_data_dir / f"model_{name}_results.csv"
    res.to_csv(path, index=False)

In [61]:
RF_A_results_df.query("horizon == 1").best_params.iloc[0]

{'criterion': 'squared_error',
 'max_depth': 2,
 'max_features': 0.3333333333333333,
 'n_estimators': 100,
 'n_jobs': -1,
 'oob_score': True,
 'random_state': 42}

### Historical Forecasts with Best Model

In [88]:
hist_forecast_save_dir = Path(
    "/Users/christopherliew/Desktop/Y4S1/HT/crypto_uncertainty_index/forecasting/data/forecasts/random_forest"
)

In [89]:
# Homemade RF Grid Search


def hist_forecast_RF(
    series: TimeSeries,
    past_covariates: TimeSeries,
    forecast_horizon: int,
    best_params: Dict[str, Any],
    lags: int,
    lags_past_covariates: int,
    verbose: bool = False,
    split_date: pd.Timestamp = split_date,
):

    model = RandomForest(
        lags=lags, lags_past_covariates=lags_past_covariates, **best_params
    )
    hist_forecast = model.historical_forecasts(
        series=series,
        past_covariates=past_covariates,
        forecast_horizon=forecast_horizon,
        stride=1,
        start=split_date,
        verbose=verbose,
    )
    return hist_forecast

In [93]:
def multi_hist_forecast_RF(
    target_df: pd.DataFrame,
    past_covariates: List[TimeSeries],
    best_params_dict: Dict[int, Any],
    save_data_dir: Union[str, Path],
    model_name: str,
    lags: int = 1,
    past_covariates_lags: int = 1,
    verbose: bool = True,
    split_date: pd.Timestamp = split_date,
) -> None:

    for h in tqdm(best_params_dict.keys()):
        print(f"Recursively Forecasting horizon={h} with Random Forest")

        # Generate Appropriate TS Data
        target_cov = gen_log_price_returns(target_df, h)
        past_cov_list = [target_cov]
        (
            past_cov_list.extend(
                list(map(lambda x: x.slice_intersect(target_cov), past_covariates))
            )
        )
        past_cov_list_tidy = concatenate(past_cov_list, axis=1)
        # Run Historical Forecast
        hist_forecast = hist_forecast_RF(
            target_cov,
            past_cov_list_tidy,
            h,
            best_params_dict[h],
            lags,
            past_covariates_lags,
            verbose,
            split_date,
        )
        print(f"Saving forecasts ...")
        forecast_df = hist_forecast.with_columns_renamed(["0"], ["price_return"])
        forecast_df.to_csv(Path(save_data_dir) / f"rf_model_{model_name}_h{h}.csv")

### Model A

In [94]:
RF_A_best_params_dict = RF_A_results_df.set_index("horizon")[["best_params"]].to_dict()[
    "best_params"
]

In [95]:
multi_hist_forecast_RF(
    btc_usd_df, [price_ts, vol_ts], RF_A_best_params_dict, hist_forecast_save_dir, "A"
)

  0%|          | 0/12 [00:00<?, ?it/s]

Recursively Forecasting horizon=1 with Random Forest


  0%|          | 0/136 [00:00<?, ?it/s]

  8%|▊         | 1/12 [00:11<02:01, 11.03s/it]

Saving forecasts ...
Recursively Forecasting horizon=2 with Random Forest


  0%|          | 0/135 [00:00<?, ?it/s]

 17%|█▋        | 2/12 [00:22<01:55, 11.58s/it]

Saving forecasts ...
Recursively Forecasting horizon=3 with Random Forest


  0%|          | 0/134 [00:00<?, ?it/s]

 25%|██▌       | 3/12 [00:35<01:48, 12.07s/it]

Saving forecasts ...
Recursively Forecasting horizon=4 with Random Forest


  0%|          | 0/133 [00:00<?, ?it/s]

 33%|███▎      | 4/12 [00:43<01:22, 10.29s/it]

Saving forecasts ...
Recursively Forecasting horizon=5 with Random Forest


  0%|          | 0/132 [00:00<?, ?it/s]

 42%|████▏     | 5/12 [00:57<01:21, 11.65s/it]

Saving forecasts ...
Recursively Forecasting horizon=6 with Random Forest


  0%|          | 0/131 [00:00<?, ?it/s]

 50%|█████     | 6/12 [01:11<01:14, 12.49s/it]

Saving forecasts ...
Recursively Forecasting horizon=7 with Random Forest


  0%|          | 0/130 [00:00<?, ?it/s]

 58%|█████▊    | 7/12 [01:26<01:06, 13.29s/it]

Saving forecasts ...
Recursively Forecasting horizon=8 with Random Forest


  0%|          | 0/129 [00:00<?, ?it/s]

 67%|██████▋   | 8/12 [01:41<00:55, 13.98s/it]

Saving forecasts ...
Recursively Forecasting horizon=9 with Random Forest


  0%|          | 0/128 [00:00<?, ?it/s]

 75%|███████▌  | 9/12 [01:57<00:43, 14.53s/it]

Saving forecasts ...
Recursively Forecasting horizon=10 with Random Forest


  0%|          | 0/127 [00:00<?, ?it/s]

 83%|████████▎ | 10/12 [02:13<00:30, 15.09s/it]

Saving forecasts ...
Recursively Forecasting horizon=11 with Random Forest


  0%|          | 0/126 [00:00<?, ?it/s]

 92%|█████████▏| 11/12 [02:23<00:13, 13.46s/it]

Saving forecasts ...
Recursively Forecasting horizon=12 with Random Forest


  0%|          | 0/125 [00:00<?, ?it/s]

100%|██████████| 12/12 [02:33<00:00, 12.83s/it]

Saving forecasts ...





### Model B

In [96]:
RF_B_best_params_dict = RF_B_results_df.set_index("horizon")[["best_params"]].to_dict()[
    "best_params"
]

In [97]:
multi_hist_forecast_RF(
    btc_usd_df,
    [price_ts, vol_ts, lucey_price],
    RF_B_best_params_dict,
    hist_forecast_save_dir,
    "B",
)

  0%|          | 0/12 [00:00<?, ?it/s]

Recursively Forecasting horizon=1 with Random Forest


  0%|          | 0/136 [00:00<?, ?it/s]

  8%|▊         | 1/12 [00:30<05:34, 30.37s/it]

Saving forecasts ...
Recursively Forecasting horizon=2 with Random Forest


  0%|          | 0/135 [00:00<?, ?it/s]

 17%|█▋        | 2/12 [00:36<02:43, 16.38s/it]

Saving forecasts ...
Recursively Forecasting horizon=3 with Random Forest


  0%|          | 0/134 [00:00<?, ?it/s]

 25%|██▌       | 3/12 [01:10<03:37, 24.16s/it]

Saving forecasts ...
Recursively Forecasting horizon=4 with Random Forest


  0%|          | 0/133 [00:00<?, ?it/s]

 33%|███▎      | 4/12 [01:44<03:46, 28.26s/it]

Saving forecasts ...
Recursively Forecasting horizon=5 with Random Forest


  0%|          | 0/132 [00:00<?, ?it/s]

 42%|████▏     | 5/12 [02:20<03:36, 30.96s/it]

Saving forecasts ...
Recursively Forecasting horizon=6 with Random Forest


  0%|          | 0/131 [00:00<?, ?it/s]

 50%|█████     | 6/12 [02:57<03:18, 33.08s/it]

Saving forecasts ...
Recursively Forecasting horizon=7 with Random Forest


  0%|          | 0/130 [00:00<?, ?it/s]

 58%|█████▊    | 7/12 [03:36<02:54, 34.81s/it]

Saving forecasts ...
Recursively Forecasting horizon=8 with Random Forest


  0%|          | 0/129 [00:00<?, ?it/s]

 67%|██████▋   | 8/12 [04:16<02:25, 36.39s/it]

Saving forecasts ...
Recursively Forecasting horizon=9 with Random Forest


  0%|          | 0/128 [00:00<?, ?it/s]

 75%|███████▌  | 9/12 [04:57<01:53, 37.86s/it]

Saving forecasts ...
Recursively Forecasting horizon=10 with Random Forest


  0%|          | 0/127 [00:00<?, ?it/s]

 83%|████████▎ | 10/12 [05:39<01:18, 39.16s/it]

Saving forecasts ...
Recursively Forecasting horizon=11 with Random Forest


  0%|          | 0/126 [00:00<?, ?it/s]

 92%|█████████▏| 11/12 [05:48<00:30, 30.16s/it]

Saving forecasts ...
Recursively Forecasting horizon=12 with Random Forest


  0%|          | 0/125 [00:00<?, ?it/s]

100%|██████████| 12/12 [06:33<00:00, 32.80s/it]

Saving forecasts ...





### Model C

In [98]:
RF_C_best_params_dict = RF_C_results_df.set_index("horizon")[["best_params"]].to_dict()[
    "best_params"
]

In [99]:
multi_hist_forecast_RF(
    btc_usd_df,
    [price_ts, vol_ts, lucey_reddit_price],
    RF_C_best_params_dict,
    hist_forecast_save_dir,
    "C",
)

  0%|          | 0/12 [00:00<?, ?it/s]

Recursively Forecasting horizon=1 with Random Forest


  0%|          | 0/136 [00:00<?, ?it/s]

  8%|▊         | 1/12 [00:11<02:08, 11.66s/it]

Saving forecasts ...
Recursively Forecasting horizon=2 with Random Forest


  0%|          | 0/135 [00:00<?, ?it/s]

 17%|█▋        | 2/12 [00:18<01:25,  8.58s/it]

Saving forecasts ...
Recursively Forecasting horizon=3 with Random Forest


  0%|          | 0/134 [00:00<?, ?it/s]

 25%|██▌       | 3/12 [00:51<02:58, 19.83s/it]

Saving forecasts ...
Recursively Forecasting horizon=4 with Random Forest


  0%|          | 0/133 [00:00<?, ?it/s]

 33%|███▎      | 4/12 [01:25<03:25, 25.67s/it]

Saving forecasts ...
Recursively Forecasting horizon=5 with Random Forest


  0%|          | 0/132 [00:00<?, ?it/s]

 42%|████▏     | 5/12 [02:01<03:25, 29.40s/it]

Saving forecasts ...
Recursively Forecasting horizon=6 with Random Forest


  0%|          | 0/131 [00:00<?, ?it/s]

 50%|█████     | 6/12 [02:38<03:11, 31.99s/it]

Saving forecasts ...
Recursively Forecasting horizon=7 with Random Forest


  0%|          | 0/130 [00:00<?, ?it/s]

 58%|█████▊    | 7/12 [03:17<02:50, 34.13s/it]

Saving forecasts ...
Recursively Forecasting horizon=8 with Random Forest


  0%|          | 0/129 [00:00<?, ?it/s]

 67%|██████▋   | 8/12 [03:57<02:23, 35.93s/it]

Saving forecasts ...
Recursively Forecasting horizon=9 with Random Forest


  0%|          | 0/128 [00:00<?, ?it/s]

 75%|███████▌  | 9/12 [04:38<01:52, 37.49s/it]

Saving forecasts ...
Recursively Forecasting horizon=10 with Random Forest


  0%|          | 0/127 [00:00<?, ?it/s]

 83%|████████▎ | 10/12 [05:20<01:17, 38.94s/it]

Saving forecasts ...
Recursively Forecasting horizon=11 with Random Forest


  0%|          | 0/126 [00:00<?, ?it/s]

 92%|█████████▏| 11/12 [05:29<00:29, 29.96s/it]

Saving forecasts ...
Recursively Forecasting horizon=12 with Random Forest


  0%|          | 0/125 [00:00<?, ?it/s]

100%|██████████| 12/12 [06:14<00:00, 31.20s/it]

Saving forecasts ...





### Model D

In [100]:
RF_D_best_params_dict = RF_D_results_df.set_index("horizon")[["best_params"]].to_dict()[
    "best_params"
]

In [101]:
multi_hist_forecast_RF(
    btc_usd_df,
    [price_ts, vol_ts, lda_price],
    RF_D_best_params_dict,
    hist_forecast_save_dir,
    "D",
)

  0%|          | 0/12 [00:00<?, ?it/s]

Recursively Forecasting horizon=1 with Random Forest


  0%|          | 0/136 [00:00<?, ?it/s]

  8%|▊         | 1/12 [00:06<01:08,  6.24s/it]

Saving forecasts ...
Recursively Forecasting horizon=2 with Random Forest


  0%|          | 0/135 [00:00<?, ?it/s]

 17%|█▋        | 2/12 [00:12<01:03,  6.36s/it]

Saving forecasts ...
Recursively Forecasting horizon=3 with Random Forest


  0%|          | 0/134 [00:00<?, ?it/s]

 25%|██▌       | 3/12 [00:45<02:47, 18.65s/it]

Saving forecasts ...
Recursively Forecasting horizon=4 with Random Forest


  0%|          | 0/133 [00:00<?, ?it/s]

 33%|███▎      | 4/12 [01:21<03:21, 25.16s/it]

Saving forecasts ...
Recursively Forecasting horizon=5 with Random Forest


  0%|          | 0/132 [00:00<?, ?it/s]

 42%|████▏     | 5/12 [01:57<03:24, 29.17s/it]

Saving forecasts ...
Recursively Forecasting horizon=6 with Random Forest


  0%|          | 0/131 [00:00<?, ?it/s]

 50%|█████     | 6/12 [02:34<03:11, 31.87s/it]

Saving forecasts ...
Recursively Forecasting horizon=7 with Random Forest


  0%|          | 0/130 [00:00<?, ?it/s]

 58%|█████▊    | 7/12 [03:13<02:50, 34.05s/it]

Saving forecasts ...
Recursively Forecasting horizon=8 with Random Forest


  0%|          | 0/129 [00:00<?, ?it/s]

 67%|██████▋   | 8/12 [03:52<02:23, 35.87s/it]

Saving forecasts ...
Recursively Forecasting horizon=9 with Random Forest


  0%|          | 0/128 [00:00<?, ?it/s]

 75%|███████▌  | 9/12 [04:33<01:52, 37.51s/it]

Saving forecasts ...
Recursively Forecasting horizon=10 with Random Forest


  0%|          | 0/127 [00:00<?, ?it/s]

 83%|████████▎ | 10/12 [05:16<01:18, 39.02s/it]

Saving forecasts ...
Recursively Forecasting horizon=11 with Random Forest


  0%|          | 0/126 [00:00<?, ?it/s]

 92%|█████████▏| 11/12 [05:25<00:29, 29.99s/it]

Saving forecasts ...
Recursively Forecasting horizon=12 with Random Forest


  0%|          | 0/125 [00:00<?, ?it/s]

100%|██████████| 12/12 [06:10<00:00, 30.88s/it]

Saving forecasts ...





### Model E

In [102]:
RF_E_best_params_dict = RF_E_results_df.set_index("horizon")[["best_params"]].to_dict()[
    "best_params"
]

In [103]:
multi_hist_forecast_RF(
    btc_usd_df,
    [price_ts, vol_ts, t2v_price],
    RF_E_best_params_dict,
    hist_forecast_save_dir,
    "E",
)

  0%|          | 0/12 [00:00<?, ?it/s]

Recursively Forecasting horizon=1 with Random Forest


  0%|          | 0/136 [00:00<?, ?it/s]

  8%|▊         | 1/12 [00:06<01:06,  6.09s/it]

Saving forecasts ...
Recursively Forecasting horizon=2 with Random Forest


  0%|          | 0/135 [00:00<?, ?it/s]

 17%|█▋        | 2/12 [00:12<01:02,  6.29s/it]

Saving forecasts ...
Recursively Forecasting horizon=3 with Random Forest


  0%|          | 0/134 [00:00<?, ?it/s]

 25%|██▌       | 3/12 [00:45<02:47, 18.60s/it]

Saving forecasts ...
Recursively Forecasting horizon=4 with Random Forest


  0%|          | 0/133 [00:00<?, ?it/s]

 33%|███▎      | 4/12 [01:20<03:18, 24.84s/it]

Saving forecasts ...
Recursively Forecasting horizon=5 with Random Forest


  0%|          | 0/132 [00:00<?, ?it/s]

 42%|████▏     | 5/12 [01:56<03:22, 28.91s/it]

Saving forecasts ...
Recursively Forecasting horizon=6 with Random Forest


  0%|          | 0/131 [00:00<?, ?it/s]

 50%|█████     | 6/12 [02:33<03:10, 31.72s/it]

Saving forecasts ...
Recursively Forecasting horizon=7 with Random Forest


  0%|          | 0/130 [00:00<?, ?it/s]

 58%|█████▊    | 7/12 [03:12<02:50, 34.17s/it]

Saving forecasts ...
Recursively Forecasting horizon=8 with Random Forest


  0%|          | 0/129 [00:00<?, ?it/s]

 67%|██████▋   | 8/12 [03:52<02:24, 36.07s/it]

Saving forecasts ...
Recursively Forecasting horizon=9 with Random Forest


  0%|          | 0/128 [00:00<?, ?it/s]

 75%|███████▌  | 9/12 [04:34<01:53, 37.71s/it]

Saving forecasts ...
Recursively Forecasting horizon=10 with Random Forest


  0%|          | 0/127 [00:00<?, ?it/s]

 83%|████████▎ | 10/12 [05:16<01:18, 39.26s/it]

Saving forecasts ...
Recursively Forecasting horizon=11 with Random Forest


  0%|          | 0/126 [00:00<?, ?it/s]

 92%|█████████▏| 11/12 [05:26<00:30, 30.12s/it]

Saving forecasts ...
Recursively Forecasting horizon=12 with Random Forest


  0%|          | 0/125 [00:00<?, ?it/s]

100%|██████████| 12/12 [06:13<00:00, 31.16s/it]

Saving forecasts ...





### Model F

In [110]:
def multi_hist_forecast_RF_VCRIX(
    target_df: pd.DataFrame,
    best_params_dict: Dict[int, Any],
    save_data_dir: Union[str, Path],
    model_name: str,
    lags: int = 1,
    past_covariates_lags: int = 1,
    verbose: bool = True,
    split_date: pd.Timestamp = split_date,
) -> None:

    for h in tqdm(best_params_dict.keys()):
        print(f"Recursively Forecasting horizon={h} with Random Forest")

        # Generate Appropriate TS Data
        target_cov = gen_log_price_returns(target_df, h)
        past_cov_list = concatenate(
            [
                target_cov.slice_intersect(vcrix),
                price_ts.slice_intersect(vcrix),
                vol_ts.slice_intersect(vcrix),
                vcrix.slice_intersect(target_cov.slice_intersect(vcrix)),
            ],
            axis=1,
        )

        # Run Historical Forecast
        hist_forecast = hist_forecast_RF(
            target_cov,
            past_cov_list,
            h,
            best_params_dict[h],
            lags,
            past_covariates_lags,
            verbose,
            split_date,
        )
        print(f"Saving forecasts ...")
        forecast_df = hist_forecast.with_columns_renamed(["0"], ["price_return"])
        forecast_df.to_csv(Path(save_data_dir) / f"rf_model_{model_name}_h{h}.csv")

In [111]:
RF_F_best_params_dict = RF_F_results_df.set_index("horizon")[["best_params"]].to_dict()[
    "best_params"
]

In [112]:
multi_hist_forecast_RF_VCRIX(
    btc_usd_df, RF_F_best_params_dict, hist_forecast_save_dir, "F"
)

  0%|          | 0/12 [00:00<?, ?it/s]

Recursively Forecasting horizon=1 with Random Forest


  0%|          | 0/136 [00:00<?, ?it/s]

  8%|▊         | 1/12 [00:10<01:57, 10.69s/it]

Saving forecasts ...
Recursively Forecasting horizon=2 with Random Forest


  0%|          | 0/135 [00:00<?, ?it/s]

 17%|█▋        | 2/12 [00:16<01:20,  8.10s/it]

Saving forecasts ...
Recursively Forecasting horizon=3 with Random Forest


  0%|          | 0/134 [00:00<?, ?it/s]

 25%|██▌       | 3/12 [00:50<02:56, 19.64s/it]

Saving forecasts ...
Recursively Forecasting horizon=4 with Random Forest


  0%|          | 0/133 [00:00<?, ?it/s]

 33%|███▎      | 4/12 [01:24<03:24, 25.55s/it]

Saving forecasts ...
Recursively Forecasting horizon=5 with Random Forest


  0%|          | 0/132 [00:00<?, ?it/s]

 42%|████▏     | 5/12 [02:01<03:25, 29.42s/it]

Saving forecasts ...
Recursively Forecasting horizon=6 with Random Forest


  0%|          | 0/131 [00:00<?, ?it/s]

 50%|█████     | 6/12 [02:38<03:12, 32.09s/it]

Saving forecasts ...
Recursively Forecasting horizon=7 with Random Forest


  0%|          | 0/130 [00:00<?, ?it/s]

 58%|█████▊    | 7/12 [03:17<02:51, 34.28s/it]

Saving forecasts ...
Recursively Forecasting horizon=8 with Random Forest


  0%|          | 0/129 [00:00<?, ?it/s]

 67%|██████▋   | 8/12 [03:58<02:25, 36.44s/it]

Saving forecasts ...
Recursively Forecasting horizon=9 with Random Forest


  0%|          | 0/128 [00:00<?, ?it/s]

 75%|███████▌  | 9/12 [04:41<01:55, 38.43s/it]

Saving forecasts ...
Recursively Forecasting horizon=10 with Random Forest


  0%|          | 0/127 [00:00<?, ?it/s]

 83%|████████▎ | 10/12 [05:24<01:19, 39.98s/it]

Saving forecasts ...
Recursively Forecasting horizon=11 with Random Forest


  0%|          | 0/126 [00:00<?, ?it/s]

 92%|█████████▏| 11/12 [06:12<00:42, 42.25s/it]

Saving forecasts ...
Recursively Forecasting horizon=12 with Random Forest


  0%|          | 0/125 [00:00<?, ?it/s]

100%|██████████| 12/12 [06:58<00:00, 34.86s/it]

Saving forecasts ...



