# Machine Learning Trading Bot

In this Challenge, you’ll assume the role of a financial advisor at one of the top five financial advisory firms in the world. Your firm constantly competes with the other major firms to manage and automatically trade assets in a highly dynamic environment. In recent years, your firm has heavily profited by using computer algorithms that can buy and sell faster than human traders.

The speed of these transactions gave your firm a competitive advantage early on. But, people still need to specifically program these systems, which limits their ability to adapt to new data. You’re thus planning to improve the existing algorithmic trading systems and maintain the firm’s competitive advantage in the market. To do so, you’ll enhance the existing trading signals with machine learning algorithms that can adapt to new data.

## Instructions:

Use the starter code file to complete the steps that the instructions outline. The steps for this Challenge are divided into the following sections:

* Establish a Baseline Performance

* Tune the Baseline Trading Algorithm

* Evaluate a New Machine Learning Classifier

* Create an Evaluation Report

#### Establish a Baseline Performance

In this section, you’ll run the provided starter code to establish a baseline performance for the trading algorithm. To do so, complete the following steps.

Open the Jupyter notebook. Restart the kernel, run the provided cells that correspond with the first three steps, and then proceed to step four. 

1. Import the OHLCV dataset into a Pandas DataFrame.

2. Generate trading signals using short- and long-window SMA values. 

3. Split the data into training and testing datasets.

4. Use the `SVC` classifier model from SKLearn's support vector machine (SVM) learning method to fit the training data and make predictions based on the testing data. Review the predictions.

5. Review the classification report associated with the `SVC` model predictions. 

6. Create a predictions DataFrame that contains columns for “Predicted” values, “Actual Returns”, and “Strategy Returns”.

7. Create a cumulative return plot that shows the actual returns vs. the strategy returns. Save a PNG image of this plot. This will serve as a baseline against which to compare the effects of tuning the trading algorithm.

8. Write your conclusions about the performance of the baseline trading algorithm in the `README.md` file that’s associated with your GitHub repository. Support your findings by using the PNG image that you saved in the previous step.

#### Tune the Baseline Trading Algorithm

In this section, you’ll tune, or adjust, the model’s input features to find the parameters that result in the best trading outcomes. (You’ll choose the best by comparing the cumulative products of the strategy returns.) To do so, complete the following steps:

1. Tune the training algorithm by adjusting the size of the training dataset. To do so, slice your data into different periods. Rerun the notebook with the updated parameters, and record the results in your `README.md` file. Answer the following question: What impact resulted from increasing or decreasing the training window?

> **Hint** To adjust the size of the training dataset, you can use a different `DateOffset` value&mdash;for example, six months. Be aware that changing the size of the training dataset also affects the size of the testing dataset.

2. Tune the trading algorithm by adjusting the SMA input features. Adjust one or both of the windows for the algorithm. Rerun the notebook with the updated parameters, and record the results in your `README.md` file. Answer the following question: What impact resulted from increasing or decreasing either or both of the SMA windows?

3. Choose the set of parameters that best improved the trading algorithm returns. Save a PNG image of the cumulative product of the actual returns vs. the strategy returns, and document your conclusion in your `README.md` file.

#### Evaluate a New Machine Learning Classifier

In this section, you’ll use the original parameters that the starter code provided. But, you’ll apply them to the performance of a second machine learning model. To do so, complete the following steps:

1. Import a new classifier, such as `AdaBoost`, `DecisionTreeClassifier`, or `LogisticRegression`. (For the full list of classifiers, refer to the [Supervised learning page](https://scikit-learn.org/stable/supervised_learning.html) in the scikit-learn documentation.)

2. Using the original training data as the baseline model, fit another model with the new classifier.

3. Backtest the new model to evaluate its performance. Save a PNG image of the cumulative product of the actual returns vs. the strategy returns for this updated trading algorithm, and write your conclusions in your `README.md` file. Answer the following questions: Did this new model perform better or worse than the provided baseline model? Did this new model perform better or worse than your tuned trading algorithm?

#### Create an Evaluation Report

In the previous sections, you updated your `README.md` file with your conclusions. To accomplish this section, you need to add a summary evaluation report at the end of the `README.md` file. For this report, express your final conclusions and analysis. Support your findings by using the PNG images that you created.


In [4]:
# Imports
import pandas as pd
import numpy as np
from pathlib import Path
import hvplot.pandas
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.preprocessing import StandardScaler
from pandas.tseries.offsets import DateOffset
from sklearn.metrics import classification_report
from binance import Client
from finta import TA



---

## Establish a Baseline Performance

In this section, you’ll run the provided starter code to establish a baseline performance for the trading algorithm. To do so, complete the following steps.

Open the Jupyter notebook. Restart the kernel, run the provided cells that correspond with the first three steps, and then proceed to step four. 


### Step 1: mport the OHLCV dataset into a Pandas DataFrame.

In [5]:
# Instantiate Binance client
client = Client()
# Set the fiat currency to use
fiat = 'USDT'

In [6]:
# Create a function to download kline candlestick data from Binance
def get_historical_data(currency):
    klines = client.get_historical_klines(
        currency + fiat,
        Client.KLINE_INTERVAL_1DAY,
        "5 year ago UTC"
    )
    # klines columns=['Open Time', 'Open', 'High', 'Low', 'Close', 'Volume', 'Close Time', 'Quote asset volume', 'Number of trades', 'Taker buy base asset volume', 'Taker buy quote asset volume', 'Ignore'])
    cols_ohlcv = ('open', 'high', 'low', 'close', 'volume')
    df = pd.DataFrame((x[:6] for x in klines), columns=['timestamp', *cols_ohlcv])
    df[[*cols_ohlcv]] = df[[*cols_ohlcv]].astype(float)
    df['date'] = pd.to_datetime(df['timestamp'], unit='ms')
    df.set_index('date', inplace=True)
    df.drop(columns='timestamp', inplace=True)

    return df

ohlcv_df = get_historical_data('BTC')
ohlcv_df.head()

Unnamed: 0_level_0,open,high,low,close,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2017-08-17,4261.48,4485.39,4200.74,4285.08,795.150377
2017-08-18,4285.08,4371.52,3938.77,4108.37,1199.888264
2017-08-19,4108.37,4184.69,3850.0,4139.98,381.309763
2017-08-20,4120.98,4211.08,4032.62,4086.29,467.083022
2017-08-21,4069.13,4119.62,3911.79,4016.0,691.74306


In [7]:
# Filter the date index and close columns
signals_df = ohlcv_df.loc[:, ["close"]]



# Use the pct_change function to generate  returns from close prices
signals_df["Actual Returns"] = signals_df["close"].pct_change()

# Drop all NaN values from the DataFrame
# signals_df = signals_df.dropna()

# Review the DataFrame
display(signals_df.head())
display(signals_df.tail())

Unnamed: 0_level_0,close,Actual Returns
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-08-17,4285.08,
2017-08-18,4108.37,-0.041238
2017-08-19,4139.98,0.007694
2017-08-20,4086.29,-0.012969
2017-08-21,4016.0,-0.017201


Unnamed: 0_level_0,close,Actual Returns
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-02-27,37699.07,-0.036242
2022-02-28,43160.0,0.144856
2022-03-01,44421.2,0.029222
2022-03-02,43892.98,-0.011891
2022-03-03,43399.1,-0.011252


## Step 2: Generate trading signals using short- and long-window SMA values. 

In [8]:
# Set the short window and long window
short_window = 4
long_window = 100

# Generate the fast and slow simple moving averages (4 and 100 days, respectively)
sma_df = pd.DataFrame(
    [
        ohlcv_df['close'].pct_change(),
        TA.SMA(ohlcv_df, short_window),
        TA.SMA(ohlcv_df, long_window),
    ]
).T

sma_df.columns = ['Actual Returns'] + list(sma_df.columns[1:])
# signals_df = signals_df.dropna()

# Review the DataFrame
display(sma_df.head())
display(sma_df.tail())

Unnamed: 0_level_0,Actual Returns,4 period SMA,100 period SMA
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-08-17,,,
2017-08-18,-0.041238,,
2017-08-19,0.007694,,
2017-08-20,-0.012969,4154.93,
2017-08-21,-0.017201,4087.66,


Unnamed: 0_level_0,Actual Returns,4 period SMA,100 period SMA
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2022-02-27,-0.036242,38590.5425,45276.1201
2022-02-28,0.144856,39798.74,45110.645
2022-03-01,0.029222,41099.2475,44968.6368
2022-03-02,-0.011891,42293.3125,44845.0948
2022-03-03,-0.011252,43718.32,44703.6731


In [10]:
# Add more technical indicators
bbands_df = TA.BBANDS(ohlcv_df)
# signals_df['close_vs_BB'] = 0.0
# signals_df.loc[(bbands['BB_UPPER'] < ohlcv_df['close']), 'close_vs_BB'] = -1
# signals_df.loc[(bbands['BB_LOWER'] > ohlcv_df['close']), 'close_vs_BB'] = 1
bbands_df['close_vs_BB'] = np.select(
    [
        bbands_df['BB_UPPER'] < ohlcv_df['close'],
        bbands_df['BB_LOWER'] > ohlcv_df['close'],
    ],
    [-1, 1],
    default=0
)
ema_df = pd.DataFrame(
    [
        TA.EMA(ohlcv_df, 5),
        TA.EMA(ohlcv_df, 12),
    ]
).T
ema_df['EMA_DIFFERENCE'] = np.where(ema_df.iloc[:,1] > ema_df.iloc[:,0], 1 , -1)

signals_df = pd.concat(
    [
        ohlcv_df['close'],
        ema_df,
        bbands_df,
        sma_df,
        TA.RSI(ohlcv_df, 14),
        TA.DMI(ohlcv_df),
        TA.VWAP(ohlcv_df),
        TA.PIVOT_FIB(ohlcv_df),
    ],
    axis='columns',
)
# signals_df.index = ohlcv_df.index
x = 800
y = 10
signals_df.iloc[x:x+y,:]
# signals_df.tail()
# signals_df.shape
# close_vs_BB.shape




Unnamed: 0_level_0,close,5 period EMA,12 period EMA,EMA_DIFFERENCE,BB_UPPER,BB_MIDDLE,BB_LOWER,close_vs_BB,Actual Returns,4 period SMA,...,VWAP.,pivot,s1,s2,s3,s4,r1,r2,r3,r4
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-10-26,9230.0,8434.287116,8235.266471,-1,8986.91931,8199.0175,7411.11569,-1,0.066433,8191.0125,...,7658.217063,8271.673333,7722.357333,7382.989333,6833.673333,6284.357333,8820.989333,9160.357333,9709.673333,10258.989333
2019-10-27,9529.93,8799.501411,8434.445475,-1,9253.322581,8266.0095,7278.696419,-1,0.032495,8706.84,...,7664.69979,9356.793333,8631.138493,8182.828173,7457.173333,6731.518493,10082.448173,10530.758493,11256.413333,11982.068173
2019-10-28,9205.14,8934.714274,8553.013864,-1,9388.894969,8317.847,7246.799031,0,-0.034081,9155.0225,...,7670.067731,9466.416667,9191.132187,9021.061147,8745.776667,8470.492187,9741.701147,9911.772187,10187.056667,10462.341147
2019-10-29,9407.62,9092.349516,8684.491731,-1,9533.730918,8360.191,7186.651082,0,0.021996,9343.1725,...,7674.146704,9422.413333,9138.931133,8963.795533,8680.313333,8396.831133,9705.895533,9881.031133,10164.513333,10447.995533
2019-10-30,9154.72,9113.139677,8756.834542,-1,9613.997879,8390.0255,7166.053121,0,-0.026882,9324.3525,...,7677.318869,9343.206667,9160.610667,9047.802667,8865.206667,8682.610667,9525.802667,9638.610667,9821.206667,10003.802667
2019-10-31,9140.85,9122.376451,8815.913843,-1,9701.010353,8434.143,7167.275647,0,-0.001515,9227.0825,...,7680.355171,9188.523333,9032.350273,8935.866393,8779.693333,8623.520273,9344.696393,9441.180273,9597.353333,9753.526393
2019-11-01,9231.61,9158.787634,8879.867098,-1,9794.461998,8480.719,7166.976002,0,0.009929,9233.7,...,7682.825336,9152.95,8965.006,8848.894,8660.95,8473.006,9340.894,9457.006,9644.95,9832.894
2019-11-02,9289.52,9202.36509,8942.890621,-1,9889.345705,8531.4445,7173.543295,0,0.006273,9204.175,...,7684.572057,9180.203333,9085.085333,9026.321333,8931.203333,8836.085333,9275.321333,9334.085333,9429.203333,9524.321333
2019-11-03,9194.71,9199.813393,8981.632064,-1,9960.095557,8573.77,7187.444443,0,-0.010206,9214.1725,...,7686.173812,9283.156667,9211.520207,9167.263127,9095.626667,9023.990207,9354.793127,9399.050207,9470.686667,9542.323127
2019-11-04,9393.35,9264.325595,9044.973285,-1,10053.610445,8635.473,7217.335555,0,0.021604,9277.2975,...,7689.031818,9207.806667,9094.570407,9024.612927,8911.376667,8798.140407,9321.042927,9391.000407,9504.236667,9617.472927


In [None]:
# Initialize the new Signal column
signals_df['Signal'] = 0.0

# When Actual Returns are greater than or equal to 0, generate signal to buy stock long
signals_df.loc[(signals_df['Actual Returns'] >= 0), 'Signal'] = 1

# When Actual Returns are less than 0, generate signal to sell stock short
signals_df.loc[(signals_df['Actual Returns'] < 0), 'Signal'] = -1

# Review the DataFrame
display(signals_df.head())
display(signals_df.tail())

In [None]:
signals_df['Signal'].value_counts()

In [None]:
# Calculate the strategy returns and add them to the signals_df DataFrame
signals_df['Strategy Returns'] = signals_df['Actual Returns'] * signals_df['Signal'].shift()

# Review the DataFrame
display(signals_df.head())
display(signals_df.tail())

In [None]:
# Plot Strategy Returns to examine performance
(1 + signals_df['Strategy Returns']).cumprod().plot()

### Step 3: Split the data into training and testing datasets.

In [None]:
# Assign a copy of the sma_fast and sma_slow columns to a features DataFrame called X
X = signals_df[['SMA_Fast', 'SMA_Slow']].shift().dropna()

# Review the DataFrame
X.head()

In [None]:
# Create the target set selecting the Signal column and assiging it to y
y = signals_df['Signal']

# Review the value counts
y.value_counts()

In [None]:
# Select the start of the training period
training_begin = X.index.min()

# Display the training begin date
print(training_begin)

In [None]:
# Select the ending period for the training data with an offset of 3 months
training_end = X.index.min() + DateOffset(months=3)

# Display the training end date
print(training_end)

In [None]:
# Generate the X_train and y_train DataFrames
X_train = X.loc[training_begin:training_end]
y_train = y.loc[training_begin:training_end]

# Review the X_train DataFrame
X_train.head()

In [None]:
# Generate the X_test and y_test DataFrames
X_test = X.loc[training_end+DateOffset(hours=1):]
y_test = y.loc[training_end+DateOffset(hours=1):]

# Review the X_test DataFrame
X_train.head()

In [None]:
# Scale the features DataFrames

# Create a StandardScaler instance
scaler = StandardScaler()

# Apply the scaler model to fit the X-train data
X_scaler = scaler.fit(X_train)

# Transform the X_train and X_test DataFrames using the X_scaler
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

### Step 4: Use the `SVC` classifier model from SKLearn's support vector machine (SVM) learning method to fit the training data and make predictions based on the testing data. Review the predictions.

In [None]:
# From SVM, instantiate SVC classifier model instance
svm_model = # YOUR CODE HERE
 
# Fit the model to the data using the training data
svm_model = # YOUR CODE HERE
 
# Use the testing data to make the model predictions
svm_pred = # YOUR CODE HERE

# Review the model's predicted values
# YOUR CODE HERE


### Step 5: Review the classification report associated with the `SVC` model predictions. 

In [None]:
# Use a classification report to evaluate the model using the predictions and testing data
svm_testing_report = # YOUR CODE HERE

# Print the classification report
# YOUR CODE HERE


### Step 6: Create a predictions DataFrame that contains columns for “Predicted” values, “Actual Returns”, and “Strategy Returns”.

In [None]:
# Create a new empty predictions DataFrame.

# Create a predictions DataFrame
predictions_df = # YOUR CODE HERE

# Add the SVM model predictions to the DataFrame
predictions_df['Predicted'] = # YOUR CODE HERE

# Add the actual returns to the DataFrame
predictions_df['Actual Returns'] = # YOUR CODE HERE

# Add the strategy returns to the DataFrame
predictions_df['Strategy Returns'] = # YOUR CODE HERE

# Review the DataFrame
display(predictions_df.head())
display(predictions_df.tail())

### Step 7: Create a cumulative return plot that shows the actual returns vs. the strategy returns. Save a PNG image of this plot. This will serve as a baseline against which to compare the effects of tuning the trading algorithm.

In [None]:
# Plot the actual returns versus the strategy returns
# YOUR CODE HERE


---

## Tune the Baseline Trading Algorithm

## Step 6: Use an Alternative ML Model and Evaluate Strategy Returns

In this section, you’ll tune, or adjust, the model’s input features to find the parameters that result in the best trading outcomes. You’ll choose the best by comparing the cumulative products of the strategy returns.

### Step 1: Tune the training algorithm by adjusting the size of the training dataset. 

To do so, slice your data into different periods. Rerun the notebook with the updated parameters, and record the results in your `README.md` file. 

Answer the following question: What impact resulted from increasing or decreasing the training window?

### Step 2: Tune the trading algorithm by adjusting the SMA input features. 

Adjust one or both of the windows for the algorithm. Rerun the notebook with the updated parameters, and record the results in your `README.md` file. 

Answer the following question: What impact resulted from increasing or decreasing either or both of the SMA windows?

### Step 3: Choose the set of parameters that best improved the trading algorithm returns. 

Save a PNG image of the cumulative product of the actual returns vs. the strategy returns, and document your conclusion in your `README.md` file.

---

## Evaluate a New Machine Learning Classifier

In this section, you’ll use the original parameters that the starter code provided. But, you’ll apply them to the performance of a second machine learning model. 

### Step 1:  Import a new classifier, such as `AdaBoost`, `DecisionTreeClassifier`, or `LogisticRegression`. (For the full list of classifiers, refer to the [Supervised learning page](https://scikit-learn.org/stable/supervised_learning.html) in the scikit-learn documentation.)

In [None]:
# Import a new classifier from SKLearn
# YOUR CODE HERE

# Initiate the model instance
# YOUR CODE HERE


### Step 2: Using the original training data as the baseline model, fit another model with the new classifier.

In [None]:
# Fit the model using the training data
model = # YOUR CODE HERE

# Use the testing dataset to generate the predictions for the new model
pred = # YOUR CODE HERE

# Review the model's predicted values
# YOUR CODE HERE


### Step 3: Backtest the new model to evaluate its performance. 

Save a PNG image of the cumulative product of the actual returns vs. the strategy returns for this updated trading algorithm, and write your conclusions in your `README.md` file. 

Answer the following questions: 
Did this new model perform better or worse than the provided baseline model? 
Did this new model perform better or worse than your tuned trading algorithm?

In [None]:
# Use a classification report to evaluate the model using the predictions and testing data
# YOUR CODE HERE

# Print the classification report
# YOUR CODE HERE


In [None]:
# Create a new empty predictions DataFrame.

# Create a predictions DataFrame
# YOUR CODE HERE

# Add the SVM model predictions to the DataFrame
# YOUR CODE HERE

# Add the actual returns to the DataFrame
# YOUR CODE HERE

# Add the strategy returns to the DataFrame
# YOUR CODE HERE

# Review the DataFrame
# YOUR CODE HERE


In [None]:
# Plot the actual returns versus the strategy returns
# YOUR CODE HERE