# Notebook Instructions

1. If you are new to Jupyter notebooks, please go through this introductory manual <a href='https://quantra.quantinsti.com/quantra-notebook' target="_blank">here</a>.
1. Any changes made in this notebook would be lost after you close the browser window. **You can download the notebook to save your work on your PC.**
1. Before running this notebook on your local PC:<br>
i.  You need to set up a Python environment and the relevant packages on your local PC. To do so, go through the section on "**Run Codes Locally on Your Machine**" in the course.<br>
ii. You need to **download the zip file available in the last unit** of this course. The zip file contains the data files and/or python modules that might be required to run this notebook.

## Classification Tree

A classification tree automatically selects important predictors and suggests trading rules. In this notebook, we will learn to create a classification tree model. The predictor variables are technical indicators like the Average Directional Index (ADX), Relative Strength Index (RSI), and Simple Moving Average (SMA). The target variable is one-day future return. The classification tree we create will help us build trading rules for when the future return is expected to be positive or negative.

Create a classification tree

 1. Import the data from a csv file 
 2. Create the predictor variables and the target variable
 3. Split the data into train and test dataset
 4. Fit a decision tree model on train data 
 5. Visualize the decision tree model
 6. Make Predictions and evaluate the performance


## Import the data

We will input raw data of Tesla Inc. stock from a CSV file. Predictor and target variables are created using this raw data. 

In [None]:
import pandas as pd
import numpy as np
import yfinance as yf

# Import warnings
import warnings
warnings.filterwarnings('ignore')

df = pd.read_csv('../data_modules/tesla_dt_2018_2023.csv')
df.rename(columns = {'Open':'OPEN','High':'HIGH','Low':'LOW','Close':'CLOSE','Volume':'VOLUME'}, inplace = True)
df.tail()

## Define predictor variables and a target variable

We define a list of predictors using the TA-Lib library for technical indicators from which the model will pick the best predictors. The predictors used are the Average Directional Index (ADX), Relative Strength Index (RSI), and Simple Moving Average (SMA).


In [None]:
# Import talib
import talib as ta

# Create the predictors
df['ADX'] = ta.ADX(df['HIGH'].values, df['LOW'].values,
                   df['CLOSE'].values, timeperiod=14)
df['RSI'] = ta.RSI(df['CLOSE'].values, timeperiod=14)
df['SMA'] = ta.SMA(df['CLOSE'].values, timeperiod=20)

The target variable is 1-day future returns. 

shift(periods=n) shifts the values by n period(s). If the value of n is negative then, it shifts the values backwards and vice-versa.

We will classify the returns in two labels:  0 for negative returns and 1 for positive returns. Next, we drop the NaN values from the dataframe.

In [None]:
# Create target variable
df['Return'] = df['CLOSE'].pct_change(1).shift(-1)
df['target'] = np.where(df.Return > 0, 1, 0)
df.tail()

We drop the NaN values and store the predictor variables in X and target variable in y.

In [None]:
df = df.dropna()
predictors_list = ['ADX', 'RSI', 'SMA']
X = df[predictors_list]
print(X.tail())

y = df.target
y.tail()

## Split the data into train and test dataset

Before we build a decision tree model, we need to split the data set into train and test data. A decision tree model uses the train data to learn the properties of data and the test data to estimate the model’s accuracy of the predictions.

In [None]:
split_percentage = 0.8
split = int(split_percentage*len(X))
# Train data set
X_train = X[:split]
y_train = y[:split]
# Test data set
X_test = X[split:]
y_test = y[split:]

print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

We have training data in the X_train and y_train for creating the classification tree model and a X_test and y_test to verify the model on unseen data.

## Create classification tree model

We have used DecisionTreeClassifier from sklearn.tree to create the classification tree model. We choose to set min_samples_leaf to 5, but you are free to experiment with other values and see what is optimal on the train dataset. 

In [None]:
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(criterion='gini', max_depth=3, min_samples_leaf=5)
clf

We create the fit() method to train the classifier on the train dataset.

In [None]:
clf = clf.fit(X_train, y_train)

In [None]:
# Uncomment below line to see details of DecisionTreeClassifier
# help(DecisionTreeClassifier)

## Visualize the model

We now visualize the classification tree created in the step below using the graphviz and sklearn's tree package.

In [None]:
from sklearn import tree
import graphviz
dot_data = tree.export_graphviz(
    clf, out_file=None, filled=True, feature_names=predictors_list)
graphviz.Source(dot_data)

As can be seen in the tree, every node contains some information: 
    
 1. ADX, RSI and SMA: the predictor variable used to split the data set
 2. gini: the value of gini impurity 
 3. samples: the number of data points available at that node
 4. value: the number of target variable data points belonging to class 0 and class 1. For example, value[0,6] indicates that 0 data points belongs to class 0 and 6 data points belong to class 1.

A decision path following the below rule:

1. RSI <= 55.868
2. SMA <= 14.515
3. SMA >= 13.956

Leads to a pure node with 6 data points belonging to class 1, which can be used to define a long rule for a trading strategy.

If in the test data or live trading this condition is met for a new data point, then the model will predict the next day return for the stock to be positive.

## Make Predictions

Once we have trained the decision tree classifier, we make predictions on the test data. To make predictions, the predict method of
the DecisionTreeClassifier class is used.

In [None]:
y_pred = clf.predict(X_test)

## Evaluate the Model Performance

Scikit-learn provides performance report for classification problems. The report prints  measures like precision, recall, F1-score and support for each class. Precision and recall indicate the quality of our predictions. The f1-score gives the harmonic mean of precision and recall. The support values are used as weights to compute the average values of precision, recall and F-1.

Anything above 0.5 is usually considered a good number. We have got an average recall of 0.55 which is good for this model.

In [None]:
from sklearn.metrics import classification_report
report = classification_report(y_test, y_pred)
print(report)

In [None]:
# Store the test dataset in new dataframe df_split
df_split = df[split:]

# Store the decision tree's predicted output to signal column of df_split dataframe
df_split["signal"] = y_pred

# Calculate the strategy returns by multiplying the percentage change in the "CLOSE" column
# with the lagged values of the "signal" column and assigns it to a new column "Strategy returns"
df_split["Strategy returns"] = df_split["CLOSE"].pct_change() * \
    df_split["signal"].shift(1)

# Import the required matplotlib library for plotting and sets the style of the plots
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')

# Plot the cumulative product of the strategy returns + 1 over time as a line plot
# with the label 'Decision Tree' for the legend and sets the size of the figure to (15, 7)
(df_split['Strategy returns'] +
 1).cumprod().plot(label='Decision Tree', figsize=(10, 6))

# Set the title of the plot to 'Out of Sample Performance' with a font size of 16
plt.title('Out of Sample Performance', fontsize=16)

# Set the labels for the x and y axes as 'Date' and 'Cumulative Returns' respectively
# with font sizes of 14
plt.xlabel('Date', fontsize=14)
plt.ylabel('Cumulative Returns', fontsize=14)

# Display the legend on the plot
plt.legend()

# Show the plot on the output screen
plt.show()

## Summary 

In this notebook, we saw how a decision tree model can be coded in Python to predict the next day stock returns. In the next section, we will learn to build a trading strategy using a regression tree model.<BR>