## Welcome to Time Series Forecasting in Finance!

By the end of this notebook, you will understand the importance of data processing and how basic mathematical models can be successful in finance markets.

### Data Processing

We will start by exploring data processing steps such as:

- Data cleaning
- Tick to Bar Conversion

These stages are essential to ensure the data you work with is accurate and relevant for your model.

### Primary Model Building

Next, we will dive into model building, focusing on simple yet effective methods like:

- Trade Event Detection
- Triple Barrier Method
- Rolling averages
- Hyperparameter Optimization

These simple methods will provide valuable insights into financial time series data.

Let us get started and build a solid foundation in data processing and basic model building for quantitative finance. **Happy coding!**

In [None]:
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt
import plotly.subplots as sp
import plotly.graph_objs as go

import optuna


import sys
sys.path.append('../utils/')
from preprocess import tick_to_dollar_bar, clean_and_filter_data
from primary_model import generate_cusum_events, triple_barrier_method, generate_trading_signals_ma

### Loading and Cleaning HBAR (Hedera Hashgraph) Dataset

In this section, we will perform the following tasks:
1. Load the minute resolution dataset of HBAR (Hedera Hashgraph) provided in the repository
2. Clean the data, and any other tasks necessary before the data transormation steps

First, to load the data and to check for proper formatting.

In [None]:
ohlcv_data = pd.read_csv("../datasets/hbar_data.csv", parse_dates=['datetime'])
ohlcv_data.head()

The columns consist of:  datetime,	open,	high,	low,	close,	volume. That is what we're looking for.

Now to visualize the price action and ideate how we should process the data.

In [None]:
import plotly.graph_objs as go

# Resample the data to daily frequency
ohlcv_data_daily = ohlcv_data.resample('D', on='datetime').agg({'close': 'mean', 'volume': 'sum'}).reset_index()

# Create a plot with two y-axes
fig = go.Figure()

# Add the 'close' column in the original data using Plotly
fig.add_trace(go.Scatter(x=ohlcv_data_daily['datetime'],
                         y=ohlcv_data_daily['close'],
                         name='Price',
                         yaxis='y1'))

# Add volume bars to the plot
fig.add_trace(go.Bar(x=ohlcv_data_daily['datetime'],
                     y=ohlcv_data_daily['volume'],
                     name='Volume',
                     yaxis='y2',
                     marker=dict(color='rgba(0, 0, 0, 0.2)')))

# Update layout to create two separate y-axes with a logarithmic scale for the volume axis
fig.update_layout(
    yaxis=dict(title='Price (USD)', side='left', showgrid=False),
    yaxis2=dict(title='Volume', side='right', overlaying='y', showgrid=False, type='log'),
    xaxis=dict(title='Date'),
    legend=dict(orientation='h', yanchor='bottom', y=1.02, xanchor='right', x=1)
)

# Display the plot
fig.show()

It looks like January 2021-June 2022 was an abnormally volatile time for the market. Since the market has settled down considerably, the volatile data will not be reflect today's market, so it's best to remove it.

In [None]:
ohlcv_data_cleaned = clean_and_filter_data(ohlcv_data, ['2021-01-01', '2022-06-01'])
ohlcv_data_cleaned_daily = ohlcv_data_cleaned.resample('D', on='datetime').agg({'close': 'mean', 'volume': 'sum'}).reset_index()



# Create subplots
fig = sp.make_subplots(rows=1, cols=2, subplot_titles=("Original Data", "Cleaned Data"))

# Plot the 'close' column in the original data using Plotly
fig.add_trace(go.Scatter(x=ohlcv_data_daily['datetime'], y=ohlcv_data_daily['close'], name='Original Data'), row=1, col=1)

# Plot the 'close' column in the cleaned data using Plotly
fig.add_trace(go.Scatter(x=ohlcv_data_cleaned_daily['datetime'], y=ohlcv_data_cleaned_daily['close'], name='Cleaned Data'), row=1, col=2)

# Update layout
fig.update_layout(showlegend=False, xaxis_title='Date', xaxis2_title='Date', yaxis_title='Price (USD)')

# Display the plot
fig.show()

The cleaned data displays fairly consistent price action, providing a solid starting point for data transformations.

**Tick Data vs Bar Data:** Tick data can be unreliable, as it doesn't take trade volume into account. In contrast, *bar data* is more reliable due to its consideration of trade volume.

**Types of Bar Data:** There are various types of bar data, including:

- *Volume bars*
- *Dollar bars*
- *Run bars*
- *Imbalance bars*

For simplicity, we will focus on **dollar bars** in this analysis.

In [None]:
dollar_bars = tick_to_dollar_bar(ohlcv_data_cleaned, bars_per_day=50)
dollar_bars

In [None]:
# Create subplots
fig = sp.make_subplots(rows=2, cols=2, subplot_titles=("Cleaned Data (Price)", "Dollar Bars (Price)", "Cleaned Data (Volume)", "Dollar Bars (Volume)"))

# Plot the 'close' column in the cleaned data using Plotly
fig.add_trace(go.Scatter(x=ohlcv_data_cleaned['datetime'], y=ohlcv_data_cleaned['close'], name='Cleaned Data (Price)'), row=1, col=1)

# Plot the 'close' column in the dollar bars using Plotly
fig.add_trace(go.Scatter(x=dollar_bars['datetime'], y=dollar_bars['close'], name='Dollar Bars (Price)'), row=1, col=2)

# Plot the 'volume' column in the cleaned data using Plotly
fig.add_trace(go.Bar(x=ohlcv_data_cleaned['datetime'], y=ohlcv_data_cleaned['volume'], name='Cleaned Data (Volume)'), row=2, col=1)

# Plot the 'volume' column in the dollar bars using Plotly
fig.add_trace(go.Bar(x=dollar_bars['datetime'], y=dollar_bars['volume'], name='Dollar Bars (Volume)'), row=2, col=2)

# Update layout
fig.update_layout(showlegend=False, xaxis_title='Date', xaxis2_title='Date', xaxis3_title='Date', xaxis4_title='Date',
                  yaxis_title='Price (USD)', yaxis2_title='Price (USD)', yaxis3_title='Volume', yaxis4_title='Volume')

# Display the plot
fig.show()


**Primary Model Construction:** We will now build our primary trading model.

**Generating Trading Signals with Moving Averages (MA):** The `generate_trading_signals_ma` method generates trading signals based on the relationship between two moving averages:

- *Fast MA*
- *Slow MA*

**Buy and Sell Signals:**

- If the *fast MA* is above the *slow MA*, the `side` column is set to **1**, indicating a *buy signal*.
- If the *fast MA* is below the *slow MA*, the `side` column is set to **-1**, indicating a *sell signal*.

By analyzing the relationship between fast and slow moving averages, we approximate whether or not we should buy or sell at each timepoint.

In [None]:
slow_window=50
fast_window=10
dollar_bars = generate_trading_signals_ma(dollar_bars, slow_window, fast_window)




pt = 0.04 #the gain at which to take profit
sl = 0.06 # the loss at which to cut losses
min_ret = 0.01 # the minimum return to be considered for triple barrier labeling
num_days = 1 # the maximum time for a trade to be live
triple_barrier_labels = triple_barrier_method(dollar_bars, trading_events, pt, sl, min_ret, num_days)


total_ret = triple_barrier_labels['return'].sum()


We don't want to bet on every bar though. That's too risky. It's better to look for **events** to bet on.
You can create events with many mathematical models.
For the sake of this tutorial, we are using CUSUM, which is recommended by de Prado.

In [None]:
from joblib import parallel_backend
import numpy as np


def backtest(triple_barrier_labels, initial_money=1000, bet_percentage=0.16):
    usd = initial_money * 0.5
    hbar = initial_money * 0.5
    total_money = initial_money
    active_bets = {}

    for index, row in triple_barrier_labels.iterrows():
        current_time = row['datetime']

        # Close bets that have reached their t1
        bets_to_close = [key for key, value in active_bets.items() if value['t1'] <= current_time]
        for key in bets_to_close:
            bet = active_bets.pop(key)
            bet_amount = bet['amount']
            pnl = bet_amount * bet['return']
            usd += pnl
            hbar -= pnl / row['final_price']
            total_money = usd + hbar * row['final_price']

        if len(active_bets) == 0:
            bet_amount = total_money * bet_percentage

            if row['side'] == 1:  # Long signal
                usd -= bet_amount
                hbar += bet_amount / row['initial_price']
            elif row['side'] == -1:  # Short signal
                usd += bet_amount
                hbar -= bet_amount / row['initial_price']

            active_bets[row['datetime']] = {'amount': bet_amount, 'return': row['return'], 't1': row['t1']}

    return total_money

def objective(trial):
    # Choose hyperparameters from trial object
    slow_window = trial.suggest_int("slow_window", 10, 200)
    fast_window = trial.suggest_int("fast_window", 5, 50)
    pt = trial.suggest_float("pt", 0.04, 0.1)
    sl = trial.suggest_float("sl", 0.03, 0.2)
    min_ret = 0.01
    num_days = trial.suggest_int("num_days", 1, 2)


    ohlcv_data = pd.read_csv("../datasets/hbar_data.csv", parse_dates=['datetime'])
    ohlcv_data_cleaned = clean_and_filter_data(ohlcv_data, ['2021-01-01', '2022-06-01'])
    dollar_bars = tick_to_dollar_bar(ohlcv_data_cleaned, bars_per_day=50)
    
    dollar_bars = generate_trading_signals_ma(dollar_bars, slow_window, fast_window)

    trading_events = generate_cusum_events(dollar_bars, threshold=0.0632)
    triple_barrier_labels = triple_barrier_method(dollar_bars, trading_events, pt, sl, min_ret, num_days)
    total_ret = triple_barrier_labels['return'].sum()

    last_week = triple_barrier_labels['datetime'].iloc[-1] - pd.DateOffset(weeks=15)
    dollar_bars_last_week = triple_barrier_labels[triple_barrier_labels['datetime'] > last_week]
    
    recency_bias = dollar_bars_last_week['return'].sum()
    accuracy_multiplier = np.log(triple_barrier_labels['return'].gt(0).mean() + 0.5)
    score = backtest(triple_barrier_labels) + 0.5 * backtest(dollar_bars_last_week)#(total_ret + recency_bias) * accuracy_multiplier

    return score

# Create an Optuna study and optimize the hyperparameters
study = optuna.create_study(
    direction="maximize",
    sampler=optuna.samplers.TPESampler(n_startup_trials=130)
)
n_jobs = 8  # Set the number of parallel jobs
with parallel_backend("threading", n_jobs=n_jobs):
    study.optimize(objective, n_trials=200, n_jobs=n_jobs)

# Print the best hyperparameters
print("\nBest trial:")
trial = study.best_trial
print("Score: {}".format(trial.value))
print("Params: ")
for key, value in trial.params.items():
    print("{}: {}".format(key, value))

In [None]:

# Get the best parameters from the study
best_params = study.best_params
print(best_params)

# Extract the best parameters
slow_window = best_params["slow_window"]
fast_window = best_params["fast_window"]
pt = best_params["pt"]
sl = best_params["sl"]
num_days = best_params["num_days"]


ohlcv_data = pd.read_csv("../datasets/hbar_data.csv", parse_dates=['datetime'])
ohlcv_data_cleaned = clean_and_filter_data(ohlcv_data, ['2021-01-01', '2022-06-01'])
dollar_bars = tick_to_dollar_bar(ohlcv_data_cleaned, bars_per_day=50)

dollar_bars = generate_trading_signals_ma(dollar_bars, slow_window, fast_window)

trading_events = generate_cusum_events(dollar_bars, threshold=0.0667)
triple_barrier_labels = triple_barrier_method(dollar_bars, trading_events, pt, sl, 0.01, num_days)
total_ret = triple_barrier_labels['return'].sum()

last_week = triple_barrier_labels['datetime'].iloc[-1] - pd.DateOffset(weeks=15)
dollar_bars_last_week = triple_barrier_labels[triple_barrier_labels['datetime'] > last_week]


# Calculate total return with the best parameters
total_ret = triple_barrier_labels['return'].sum()
print("Total return with the best parameters: ", total_ret)

In [None]:


offset=8
# Filter the dollar bars for the last week
last_week = dollar_bars['datetime'].iloc[-1] - pd.DateOffset(weeks=offset)
dollar_bars_last_week = dollar_bars[dollar_bars['datetime'] > last_week]


tblw = triple_barrier_labels['datetime'].iloc[-1] - pd.DateOffset(weeks=offset)
tblw = triple_barrier_labels[triple_barrier_labels['datetime'] > tblw]
print(tblw['return'].sum())


In [None]:
import pandas as pd
import plotly.subplots as sp
import plotly.graph_objs as go


# Plot the dollar bars for the last week
dollar_bars_last_week_scatter = go.Scatter(x=dollar_bars_last_week['datetime'], y=dollar_bars_last_week['close'], name='Dollar Bars', mode='lines', line=dict(width=0.5))
triple_barrier_labels_last_week = triple_barrier_labels[triple_barrier_labels['datetime'] > last_week]


scatter_markers = []
for _, row in triple_barrier_labels_last_week.iterrows():
    start_time = row['datetime']
    end_time = row['t1']
    start_price = dollar_bars[dollar_bars['datetime'] == start_time]['close'].values[0]
    end_price = dollar_bars[dollar_bars['datetime'] == end_time]['close'].values[0]

    if row['return'] > 0:
        color = 'green'
    else:
        color = 'red'

    if row['side'] == 1:  # Long trade
        start_marker = 'triangle-up'
    else:  # Short trade
        start_marker = 'triangle-down'

    scatter_markers.append(go.Scatter(x=[start_time, end_time], y=[start_price, end_price], mode='markers', marker=dict(size=10, symbol=[start_marker, 'x'], color=color)))

# Create a figure for the dollar bars with triple barrier events (last week)
fig = go.Figure(data=[dollar_bars_last_week_scatter] + scatter_markers)
fig.update_layout(height=600, width=1000, title_text="Dollar Bars with Triple Barrier Events (Last Week)", xaxis_title="Date", yaxis_title="Close Price", showlegend=False)

# Show the plots
fig.show()

In [None]:
triple_barrier_labels['return'].gt(0).mean()