# Machine Learning Trading Bot

In this Challenge, you’ll assume the role of a financial advisor at one of the top five financial advisory firms in the world. Your firm constantly competes with the other major firms to manage and automatically trade assets in a highly dynamic environment. In recent years, your firm has heavily profited by using computer algorithms that can buy and sell faster than human traders.

The speed of these transactions gave your firm a competitive advantage early on. But, people still need to specifically program these systems, which limits their ability to adapt to new data. You’re thus planning to improve the existing algorithmic trading systems and maintain the firm’s competitive advantage in the market. To do so, you’ll enhance the existing trading signals with machine learning algorithms that can adapt to new data.

In [598]:
# Imports
import pandas as pd
import numpy as np
from pathlib import Path
import hvplot.pandas
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.preprocessing import StandardScaler
from pandas.tseries.offsets import DateOffset
from sklearn.metrics import classification_report
import plotly.express as px

---

## Establish a Baseline Performance

In this section, you’ll run the provided starter code to establish a baseline performance for the trading algorithm. To do so, complete the following steps.

Open the Jupyter notebook. Restart the kernel, run the provided cells that correspond with the first three steps, and then proceed to step four. 


### Step 1: Import the OHLCV dataset into a Pandas DataFrame.

In [599]:
# Import the OHLCV dataset into a Pandas Dataframe
ohlcv_df = pd.read_csv(
    Path("./Resources/heem_ohlcv.csv"), 
    index_col='date', 
    infer_datetime_format=True, 
    parse_dates=True
)

# Review the DataFrame
ohlcv_df.head()

Unnamed: 0_level_0,open,high,low,close,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2015-01-21 09:30:00,23.83,23.83,23.83,23.83,100
2015-01-21 11:00:00,23.98,23.98,23.98,23.98,100
2015-01-22 15:00:00,24.42,24.42,24.42,24.42,100
2015-01-22 15:15:00,24.42,24.44,24.42,24.44,200
2015-01-22 15:30:00,24.46,24.46,24.46,24.46,200


In [600]:
# Filter the date index and close columns
signals_df = ohlcv_df[['close']].copy()

# Use the pct_change function to generate returns from close prices
signals_df['Actual Returns'] = signals_df['close'].pct_change()

# Drop all NaN values from the DataFrame
signals_df.dropna(inplace=True)

# Review the DataFrame
display(signals_df.head())
display(signals_df.tail())

Unnamed: 0_level_0,close,Actual Returns
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2015-01-21 11:00:00,23.98,0.006295
2015-01-22 15:00:00,24.42,0.018349
2015-01-22 15:15:00,24.44,0.000819
2015-01-22 15:30:00,24.46,0.000818
2015-01-26 12:30:00,24.33,-0.005315


Unnamed: 0_level_0,close,Actual Returns
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2021-01-22 09:30:00,33.27,-0.006866
2021-01-22 11:30:00,33.35,0.002405
2021-01-22 13:45:00,33.42,0.002099
2021-01-22 14:30:00,33.47,0.001496
2021-01-22 15:45:00,33.44,-0.000896


## Step 2: Generate trading signals using short- and long-window SMA values. 

In [622]:
# Set the short window and long window
short_window = 4
long_window = 100

# Generate the fast and slow simple moving averages (4 and 100 days, respectively)
signals_df['SMA_Fast'] = signals_df['close'].rolling(window=short_window).mean()
signals_df['SMA_Slow'] = signals_df['close'].rolling(window=long_window).mean()

signals_df = signals_df.dropna()

# Review the DataFrame
display(signals_df.head())
display(signals_df.tail())

Unnamed: 0_level_0,close,Actual Returns,SMA_Fast,SMA_Slow,Signal,Strategy Returns
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2015-06-11 15:15:00,24.35,0.002058,24.3575,25.4558,1.0,0.002058
2015-06-12 10:00:00,24.38,0.001232,24.355,25.4504,1.0,0.001232
2015-06-12 10:15:00,24.33,-0.002051,24.34,25.4445,1.0,-0.002051
2015-06-12 11:00:00,24.25,-0.003288,24.3275,25.4376,1.0,-0.003288
2015-06-12 11:45:00,24.36,0.004536,24.33,25.4317,1.0,0.004536


Unnamed: 0_level_0,close,Actual Returns,SMA_Fast,SMA_Slow,Signal,Strategy Returns
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2021-01-22 09:30:00,33.27,-0.006866,33.2025,30.40215,0.0,-0.0
2021-01-22 11:30:00,33.35,0.002405,33.2725,30.44445,0.0,0.0
2021-01-22 13:45:00,33.42,0.002099,33.385,30.48745,0.0,0.0
2021-01-22 14:30:00,33.47,0.001496,33.3775,30.53085,0.0,0.0
2021-01-22 15:45:00,33.44,-0.000896,33.42,30.57495,0.0,-0.0


In [623]:
# Initialize the new Signal column
signals_df['Signal'] = 0.0

# When Actual Returns are greater than or equal to 0, generate signal to buy stock long
signals_df['Signal'][short_window:] = np.where(
    signals_df['SMA_Fast'][short_window:] < signals_df['SMA_Slow'][short_window:], 1.0 ,0.0
)

# Review the DataFrame
display(signals_df.head())
display(signals_df.tail())

Unnamed: 0_level_0,close,Actual Returns,SMA_Fast,SMA_Slow,Signal,Strategy Returns
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2015-06-11 15:15:00,24.35,0.002058,24.3575,25.4558,0.0,0.002058
2015-06-12 10:00:00,24.38,0.001232,24.355,25.4504,0.0,0.001232
2015-06-12 10:15:00,24.33,-0.002051,24.34,25.4445,0.0,-0.002051
2015-06-12 11:00:00,24.25,-0.003288,24.3275,25.4376,0.0,-0.003288
2015-06-12 11:45:00,24.36,0.004536,24.33,25.4317,1.0,0.004536


Unnamed: 0_level_0,close,Actual Returns,SMA_Fast,SMA_Slow,Signal,Strategy Returns
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2021-01-22 09:30:00,33.27,-0.006866,33.2025,30.40215,0.0,-0.0
2021-01-22 11:30:00,33.35,0.002405,33.2725,30.44445,0.0,0.0
2021-01-22 13:45:00,33.42,0.002099,33.385,30.48745,0.0,0.0
2021-01-22 14:30:00,33.47,0.001496,33.3775,30.53085,0.0,0.0
2021-01-22 15:45:00,33.44,-0.000896,33.42,30.57495,0.0,-0.0


In [624]:
signals_df['Signal'].value_counts()

0.0    2453
1.0    1671
Name: Signal, dtype: int64

In [625]:
# Calculate the strategy returns and add them to the signals_df DataFrame
signals_df['Strategy Returns'] = (
    signals_df['Actual Returns'] * signals_df['Signal'].shift()
)

# Review the DataFrame
display(signals_df.head())
display(signals_df.tail())

Unnamed: 0_level_0,close,Actual Returns,SMA_Fast,SMA_Slow,Signal,Strategy Returns
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2015-06-11 15:15:00,24.35,0.002058,24.3575,25.4558,0.0,
2015-06-12 10:00:00,24.38,0.001232,24.355,25.4504,0.0,0.0
2015-06-12 10:15:00,24.33,-0.002051,24.34,25.4445,0.0,-0.0
2015-06-12 11:00:00,24.25,-0.003288,24.3275,25.4376,0.0,-0.0
2015-06-12 11:45:00,24.36,0.004536,24.33,25.4317,1.0,0.0


Unnamed: 0_level_0,close,Actual Returns,SMA_Fast,SMA_Slow,Signal,Strategy Returns
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2021-01-22 09:30:00,33.27,-0.006866,33.2025,30.40215,0.0,-0.0
2021-01-22 11:30:00,33.35,0.002405,33.2725,30.44445,0.0,0.0
2021-01-22 13:45:00,33.42,0.002099,33.385,30.48745,0.0,0.0
2021-01-22 14:30:00,33.47,0.001496,33.3775,30.53085,0.0,0.0
2021-01-22 15:45:00,33.44,-0.000896,33.42,30.57495,0.0,-0.0


In [626]:
# Plot Strategy Returns to examine performance
strategy_1_returns_df = pd.DataFrame({
    'Buy & Hold': ((1 + signals_df['Actual Returns']).cumprod() - 1) * 100,
    'Algorithmic': ((1 + signals_df['Strategy Returns']).cumprod() - 1) * 100
}, index=signals_df.index).dropna()

fig = px.line(
    strategy_1_returns_df, 
    title='HEEM Returns Comparison',
    labels=dict(value='Cumulative Returns (%)', date='Date', variable='Trading Method'),
    width=1000, 
)
fig.show()

### Step 3: Split the data into training and testing datasets.

In [627]:
# Assign a copy of the sma_fast and sma_slow columns to a features DataFrame called X
X = signals_df[['SMA_Fast', 'SMA_Slow']].shift().dropna()

# Review the DataFrame
X.head()

Unnamed: 0_level_0,SMA_Fast,SMA_Slow
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2015-06-12 10:00:00,24.3575,25.4558
2015-06-12 10:15:00,24.355,25.4504
2015-06-12 11:00:00,24.34,25.4445
2015-06-12 11:45:00,24.3275,25.4376
2015-06-15 10:15:00,24.33,25.4317


In [628]:
# Create the target set selecting the Signal column and assiging it to y
y = signals_df['Signal']

# Review the value counts
y.value_counts()

0.0    2453
1.0    1671
Name: Signal, dtype: int64

In [629]:
# Select the start of the training period
training_begin = X.index.min()

# Display the training begin date
print(training_begin)

2015-06-12 10:00:00


In [630]:
# Select the ending period for the training data with an offset of 3 months
training_end = X.index.min() + DateOffset(months=5)

# Display the training end date
print(training_end)

2015-11-12 10:00:00


In [631]:
# Generate the X_train and y_train DataFrames
X_train = X.loc[training_begin:training_end]
y_train = y.loc[training_begin:training_end]

# Review the X_train DataFrame
X_train.head()

Unnamed: 0_level_0,SMA_Fast,SMA_Slow
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2015-06-12 10:00:00,24.3575,25.4558
2015-06-12 10:15:00,24.355,25.4504
2015-06-12 11:00:00,24.34,25.4445
2015-06-12 11:45:00,24.3275,25.4376
2015-06-15 10:15:00,24.33,25.4317


In [632]:
# Generate the X_test and y_test DataFrames
X_test = X.loc[training_end+DateOffset(hours=1):]
y_test = y.loc[training_end+DateOffset(hours=1):]

# Review the X_test DataFrame
X_train.head()
y_train

date
2015-06-12 10:00:00    0.0
2015-06-12 10:15:00    0.0
2015-06-12 11:00:00    0.0
2015-06-12 11:45:00    1.0
2015-06-15 10:15:00    1.0
                      ... 
2015-11-05 15:30:00    0.0
2015-11-09 12:00:00    0.0
2015-11-09 12:15:00    0.0
2015-11-09 12:30:00    1.0
2015-11-11 15:15:00    1.0
Name: Signal, Length: 267, dtype: float64

In [633]:
# Scale the features DataFrames

# Create a StandardScaler instance
scaler = StandardScaler()

# Apply the scaler model to fit the X-train data
X_scaler = scaler.fit(X_train)

# Transform the X_train and X_test DataFrames using the X_scaler
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

### Step 4: Use the `SVC` classifier model from SKLearn's support vector machine (SVM) learning method to fit the training data and make predictions based on the testing data. Review the predictions.

In [634]:
# From SVM, instantiate SVC classifier model instance
svm_model = svm.SVC()
 
# Fit the model to the data using the training data
svm_model = svm_model.fit(X_train_scaled, y_train)
 
# Use the testing data to make the model predictions
svm_pred = svm_model.predict(X_test_scaled)

# Review the model's predicted values
svm_pred


array([0., 0., 1., ..., 1., 1., 1.])

### Step 5: Review the classification report associated with the `SVC` model predictions. 

In [635]:
# Use a classification report to evaluate the model using the predictions and testing data
svm_testing_report = classification_report(y_test, svm_pred)

# Print the classification report
print(svm_testing_report)


              precision    recall  f1-score   support

         0.0       0.91      0.23      0.36      2362
         1.0       0.44      0.97      0.61      1494

    accuracy                           0.51      3856
   macro avg       0.68      0.60      0.48      3856
weighted avg       0.73      0.51      0.46      3856



### Step 6: Create a predictions DataFrame that contains columns for “Predicted” values, “Actual Returns”, and “Strategy Returns”.

In [636]:
# Create a new empty predictions DataFrame:

# Create a predictions DataFrame
svm_predictions_df = pd.DataFrame(index=X_test.index)

# Add the SVM model predictions to the DataFrame
svm_predictions_df['Predicted Signal'] = svm_pred

# Add the actual returns to the DataFrame
svm_predictions_df['Actual Returns'] = signals_df['Actual Returns']

# Add the strategy returns to the DataFrame
svm_predictions_df['Strategy Returns'] = signals_df['Strategy Returns']

# Review the DataFrame
display(svm_predictions_df.head())
display(svm_predictions_df.tail())

Unnamed: 0_level_0,Predicted Signal,Actual Returns,Strategy Returns
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2015-11-12 13:45:00,0.0,0.001835,0.001835
2015-11-12 15:00:00,0.0,-0.004121,-0.004121
2015-11-12 15:45:00,1.0,-0.004138,-0.004138
2015-11-17 10:45:00,1.0,0.007387,0.007387
2015-11-17 11:15:00,1.0,-0.001833,-0.001833


Unnamed: 0_level_0,Predicted Signal,Actual Returns,Strategy Returns
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2021-01-22 09:30:00,1.0,-0.006866,-0.0
2021-01-22 11:30:00,1.0,0.002405,0.0
2021-01-22 13:45:00,1.0,0.002099,0.0
2021-01-22 14:30:00,1.0,0.001496,0.0
2021-01-22 15:45:00,1.0,-0.000896,-0.0


### Step 7: Create a cumulative return plot that shows the actual returns vs. the strategy returns. Save a PNG image of this plot. This will serve as a baseline against which to compare the effects of tuning the trading algorithm.

In [637]:
# Plot the actual returns versus the strategy returns
predicted_returns = svm_predictions_df['Actual Returns'] * svm_predictions_df['Predicted Signal']

returns_df_2 = pd.DataFrame({
    'Buy & Hold': ((1 + svm_predictions_df['Actual Returns']).cumprod() - 1) * 100,
    'Strategic': ((1 + svm_predictions_df['Strategy Returns']).cumprod() - 1) * 100,
    'SVM': ((1 + predicted_returns).cumprod() - 1) * 100
}, index=svm_predictions_df.index).dropna()

fig_2 = px.line(
    returns_df_2, 
    title='HEEM Predictive Returns Comparison',
    labels=dict(value='Cumulative Returns (%)', date='Date', variable='Trading Method'),
    width=1000, 
)
fig_2.show()


---

## Tune the Baseline Trading Algorithm

In this section, you’ll tune, or adjust, the model’s input features to find the parameters that result in the best trading outcomes. You’ll choose the best by comparing the cumulative products of the strategy returns.

### Step 1: Tune the training algorithm by adjusting the size of the training dataset. 

To do so, slice your data into different periods. Rerun the notebook with the updated parameters, and record the results in your `README.md` file. 

Answer the following question: What impact resulted from increasing or decreasing the training window?

### Step 2: Tune the trading algorithm by adjusting the SMA input features. 

Adjust one or both of the windows for the algorithm. Rerun the notebook with the updated parameters, and record the results in your `README.md` file. 

Answer the following question: What impact resulted from increasing or decreasing either or both of the SMA windows?

### Step 3: Choose the set of parameters that best improved the trading algorithm returns. 

Save a PNG image of the cumulative product of the actual returns vs. the strategy returns, and document your conclusion in your `README.md` file.

---

## Evaluate a New Machine Learning Classifier

In this section, you’ll use the original parameters that the starter code provided. But, you’ll apply them to the performance of a second machine learning model. 

### Step 1:  Import a new classifier, such as `AdaBoost`, `DecisionTreeClassifier`, or `LogisticRegression`. (For the full list of classifiers, refer to the [Supervised learning page](https://scikit-learn.org/stable/supervised_learning.html) in the scikit-learn documentation.)

In [638]:
# Import a new classifier from SKLearn
from sklearn.neural_network import MLPClassifier

# Initiate the model instance
mlp_clf = MLPClassifier(
    solver='lbfgs',
    alpha=1e-5, 
    hidden_layer_sizes=(5, 2), 
    random_state=1,
    max_iter=240,
    batch_size=40
)


### Step 2: Using the original training data as the baseline model, fit another model with the new classifier.

In [639]:
# Fit the model using the training data
model = mlp_clf.fit(X_train_scaled, y_train)

# Use the testing dataset to generate the predictions for the new model
mlp_pred = mlp_clf.predict(X_test_scaled)

# Review the model's predicted values
mlp_pred[:50]


array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

### Step 3: Backtest the new model to evaluate its performance. 

Save a PNG image of the cumulative product of the actual returns vs. the strategy returns for this updated trading algorithm, and write your conclusions in your `README.md` file. 

Answer the following questions: 
Did this new model perform better or worse than the provided baseline model? 
Did this new model perform better or worse than your tuned trading algorithm?

In [640]:
# Use a classification report to evaluate the model using the predictions and testing data
mlp_clf_report = classification_report(y_test, mlp_pred)

# Print the classification report
print(mlp_clf_report)


              precision    recall  f1-score   support

         0.0       0.99      0.28      0.44      2362
         1.0       0.47      1.00      0.64      1494

    accuracy                           0.56      3856
   macro avg       0.73      0.64      0.54      3856
weighted avg       0.79      0.56      0.52      3856



In [641]:
# Create a predictions DataFrame
mlp_predictions_df = pd.DataFrame(index=X_test.index)

# Add the SVM model predictions to the DataFrame
mlp_predictions_df['Predicted Signal'] = mlp_pred

# Add the actual returns to the DataFrame
mlp_predictions_df['Actual Returns'] = signals_df['Actual Returns']

# Calculate returns from the MLP classifier trading
mlp_returns = mlp_predictions_df['Actual Returns'] * mlp_predictions_df['Predicted Signal']

# Build a dataframe of cumulative returns from the MLP classifier
mlp_returns_df = pd.DataFrame({
    'MLP': ((1 + mlp_returns).cumprod() - 1) * 100
})

# Join all cumulative returns
all_returns_df = pd.concat([returns_df_2, mlp_returns_df], join='inner', axis='columns')

# Review the DataFrame
all_returns_df.head()


Unnamed: 0_level_0,Buy & Hold,Strategic,SVM,MLP
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2015-11-12 13:45:00,0.183486,0.183486,0.0,0.183486
2015-11-12 15:00:00,-0.229358,-0.229358,0.0,-0.229358
2015-11-12 15:45:00,-0.642202,-0.642202,-0.413793,-0.642202
2015-11-17 10:45:00,0.091743,0.091743,0.321839,0.091743
2015-11-17 11:15:00,-0.091743,-0.091743,0.137931,-0.091743


In [642]:
# Plot the actual returns versus the strategy returns
fig_3 = px.line(
    all_returns_df, 
    title='HEEM MLP Classifier Returns Comparison',
    labels=dict(value='Cumulative Returns (%)', date='Date', variable='Trading Method'),
    width=1000, 
)
fig_3.show()