# Example 5: Nasdaq momentum strategy using XGBoost

## Pre-requisites
### 1. If you have not opened the notebook in Colab, select the button below
<a href="https://githubtocolab.com/SIGTechnologies/sigtech-python/blob/master/examples/framework/5_NASDAQ_Momentum_Using_XGBoost.ipynb">
    <img src="https://sigtech.com/wp-content/uploads/2023/08/grey_google_colab.svg"></a>

### 2. Enter your API key
After pasting in your API key, you need to run the cell. In Colab, hover your cursor over an individual code cell and click play to run it.
>**Tip**!\
>After pasting in your API key, you can press `CTRL-F9` (Windows) or `⌘-F9` (Mac) to run the entire notebook at once.

In [None]:
# Install our Python SDK
%pip install sigtech 

# Import OS and our Python SDK
import sigtech.api as sig
import os

# Define your API key as a string. Remember to delete it before sharing your notebook with others. Replace 
# <YOUR_API_KEY> with the API key you have generated. e.g. os.environ['SIGTECH_API_KEY'] = 'sig_A1B2C3D4E5f6g7h8i9'
os.environ['SIGTECH_API_KEY'] = '<YOUR_API_KEY>'

### 3. Set up your Colab environment

In [None]:
# Install our Python SDK & XGBoost
%pip install xgboost
%pip install scikit-learn

# Import any additional Python libraries you require.
import datetime as dtm
import sklearn
import pandas as pd
import numpy as np
import xgboost as xgb
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

# Set any parameters 
plt.rcParams['figure.figsize'] = [16, 8]

### 3. Create a session
After installing our Python SDK, defining your API key, importing any additional Python libraries or functions you require, and setting any default parameters, initialize your session.

In [None]:
sig.init()

## Introduction to Nasdaq momentum strategies
This strategy aims to capitalize on momentum trends in the Nasdaq (`NQ`) index futures contract. The strategy employs a combination of traditional technical indicators and machine learning to identify potential entry and exit points for trades based on momentum signals in the historical price data.

## Our strategy
- We will create a dataframe of technical indicators monitoring the performance of Nasdaq futures.
- The dataframe will be used to train an XGBoost model and obtain predictions about which dates to take a long or short position on NQ futures.
- We will backtest the performance of a signals strategy which trades based on these predictions and compare its performance to a benchmark rolling futures strategy,

## 1. Create a dataframe of multiple technical indicators on the performance of Nasdaq futures

The technical indicators being used are:
- 4-week and 12-week simple moving averages (to identify trends in the Nasdaq index price data). Crossovers between shorter and longer MAs can signal potential shifts in momentum and help determine entry and exit points for trades.
- 14-day Relative Strength Index (RSI) which measures the speed and change of price movements and determine whether an asset is overbought or oversold.
- Bollinger Bands consisting of a 20-week moving average and upper and lower bands representing standard deviations from the MA. These bands help identify periods of high or low volatility, which can indicate potential price breakouts or reversals.
- MACD (Moving Average Convergence Divergence) calculated by finding the difference between two exponential moving averages (`exp12` and `exp26`) of the closing prices of Nasdaq index futures. This helps identify short-term and long-term trends in the Nasdaq index futures.
- Volatility, measured by the standard deviation of closing prices over a rolling window of 20 periods.

In [None]:
nq = sig.RollingFutureStrategy(contract_code = 'NQ', contract_sector = 'INDEX')
nq.history().plot()

In [None]:
# Create a pandas DataFrame ('df') to store the historical closing prices of the NQ index:
df = pd.DataFrame({'Close':nq.history()})

In [None]:
# Calculate weekly returns and add them as a new column ('weekly_returns') in the DataFrame:
df['weekly_returns'] = df['Close'].pct_change(periods=5)

# Calculate 4-week and 12-week moving averages and add them as new columns in the DataFrame:
df['4_week_ma'] = df['Close'].rolling(window=4).mean()
df['12_week_ma'] = df['Close'].rolling(window=12).mean()

# Calculate the Relative Strength Index (RSI) and add it as a new column ('rsi') in the DataFrame:
delta = df['Close'].diff()
up, down = delta.copy(), delta.copy()
up[up < 0] = 0
down[down > 0] = 0

average_gain = up.rolling(window=14).mean()
average_loss = abs(down.rolling(window=14).mean())

rs = average_gain / average_loss
df['rsi'] = 100 - (100 / (1 + rs))

# Calculate the Moving Average Convergence Divergence (MACD) and add it as a new column ('macd') in the DataFrame:
exp12 = df['Close'].ewm(span=12, adjust=False).mean()
exp26 = df['Close'].ewm(span=26, adjust=False).mean()
macd_line = exp12 - exp26
signal_line = macd_line.ewm(span=9, adjust=False).mean()

df['macd'] = macd_line - signal_line

# Calculate the Bollinger Bands and add them as new columns ('bollinger_upper' and 'bollinger_lower') in the DataFrame:
df['20_week_ma'] = df['Close'].rolling(window=20).mean()
df['20_week_std'] = df['Close'].rolling(window=20).std()
df['bollinger_upper'] = df['20_week_ma'] + (df['20_week_std'] * 2)
df['bollinger_lower'] = df['20_week_ma'] - (df['20_week_std'] * 2)

# Calculate volatility and add it as a new column ('volatility') in the DataFrame:
df['volatility'] = df['Close'].rolling(window=20).std()

In [None]:
# Clean the DataFrame by dropping any rows with NaN values:
df = df.dropna()

df.tail()

## Use this dataframe in an XGBoost model

The following code prepares the data, trains an XGBoost model, and generates binary predictions for future price movements in the Nasdaq index futures contract. The 'target' column is crucial as it provides the labeled data for training, enabling the model to learn patterns in the historical price data and predict whether prices will increase or decrease in the following week.

The following XGBoost parameters are used:
- `max_depth`: specifies the maximum depth of each decision tree in the ensemble, controlling the complexity of the individual trees and preventing overfitting.
- `eta` (or learning rate): determines the step size for each boosting iteration. A smaller `eta` makes computation slower but leads to a more optimal model.
- `min_child_weight`: sets the minimum sum of instance weights needed in a child (leaf) node during the tree building process. Prevents additional child nodes without meaningful data forming.
- `gamma`: the minimum loss reduction required to make a further partition on a leaf node. Prevents the model from splitting nodes that do not significantly reduce the loss function.
- `subsample`: sets the subsample ratio of the training instances used to grow each tree, controlling the randomness and variance of the model by training on a fraction of the data.
- `objective`: specifies the loss function to be minimized during training, determining the model's task (regression, classification, etc.) and the appropriate loss metric to optimize.


In [None]:
# Create the target variable called 'target' in the DataFrame ('df'). The 'target' 
# column will contain binary values: -1 if the price of the NQ index decreased in 
# the next week compared to the current week, and 1 if the price increased.
df['target'] = np.where(df['Close'].shift(-5) > df['Close'], 1, -1)

# Remove the last 5 rows. They don't have a target value (because there is no next week data for them).
df = df[:-5]

# Split the DataFrame ('df') into two parts: 'features' and 'target'. 
# 'features' contain all columns except the 'target' and 'Close' columns.
# 'target' will only contain the 'target' column.
features = df.drop(columns=['target', 'Close'])
target = df['target']

# To evaluate the model, we divide the data into training and testing sets using the train_test_split function. 
# ~90% of the data will be used for training ('X_train' and 'y_train'), and ~10% will be used for testing 
# ('X_test' and 'y_test').
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.1, random_state=42, shuffle=False)

# XGBoost requires a specific data structure called 'DMatrix' to efficiently handle large datasets. 
# We create 'DMatrix' objects for both the training and testing sets ('dtrain' and 'dtest').
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Set the various parameters that control the behavior of the XGBoost model. 
param = {
    'max_depth': 8,
    'eta': 0.1,
    'min_child_weight': 1,
    'gamma': 0.1,
    'subsample': 0.8,
    'objective': 'reg:squarederror'
}


# Train the model using the 'xgb.train' function. The model will go through 20 iterations to refine its predictions.
num_round = 20
model = xgb.train(param, dtrain, num_round)

# Use the model to make predictions on the test dataset ('X_test'). 
preds = model.predict(dtest)

# Convert the model's predictions from continuous values into binary predictions: -1 or 1.
binary_preds = np.where(preds < 0, -1, 1)

# Create a DataFrame ('pred_df') to store the binary predictions 
pred_df = pd.DataFrame(binary_preds, columns=['prediction'], index=X_test.index)
pred_df.tail(10)


In [None]:
# Create a plot to see a visual representation of the output of the XGBoost model
pred_df.plot()

## 3. Backtest and compare trading using the XGBoost model versus a normal rolling futures strategy
In this section, we will compare the performance of a `SignalStrategy` which trades NQ futures based on the signals from our XGBoost model (and stored in `pred_df`) versus a benchmark rolling futures strategy which trades NQ futures. 

Firstly, `pref_df` needs to be converted into a DataFrame which can be used as a `signal_input` in our SDK's `SignalStrategy` class. 

In [None]:
# The 'squeeze()' method converts 'pred_df' to a one-dimensional Series.
signal_df = pd.DataFrame({nq.name:pred_df.squeeze()})
signal_df

Next, we create the `SignalStrategy`, our strategy will start on the earliest available date and rebalance daily. 

In [None]:
s = sig.SignalStrategy(
    currency='USD',
    signal_input=signal_df,
    start_date=signal_df.first_valid_index().date(),
    rebalance_frequency='1BD',
)

Finally, we compare the performance of our XGBoost model based strategy against a normal rolling futures strategy.

In [None]:
nq_benchmark = sig.RollingFutureStrategy(
    start_date = signal_df.first_valid_index().date(), 
    contract_code = 'NQ', 
    contract_sector = 'INDEX'
    )

nq_benchmark.history().plot(label="Rolling Futures Strategy")
s.history().plot(label="XGBoost Signals Strategy")
plt.legend()
plt.show()