# High Frequency Trading Algorithm

You have been tasked by the investment firm Renaissance High Frequency Trading (RHFT) to develop an automated trading strategy utilizing a combination of machine learning algorithms and high frequency algorithms. RHFT wants this new algorithm to be based on stock market data of the 30 stocks in the Dow Jones at the minute level and to conduct buys and sells every minute based on 1 min, 5 min, and 10 min Momentum. The CIO asked you to choose the Machine Learning Algorithm best suited for this task and wants you to execute the trades via Alpaca's API.

## Part 1: Prepare the data for training and testing

### Initial Set-Up

In [1]:
# Initial Imports
import os
from pathlib import Path
import alpaca_trade_api as tradeapi
import pandas as pd
import numpy as np
import datetime
from datetime import datetime
import time
import pytz
from dotenv import load_dotenv
from sklearn.model_selection import train_test_split
from collections import Counter
from sklearn.metrics import balanced_accuracy_score
from imblearn.over_sampling import RandomOverSampler
from sklearn.metrics import classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import AdaBoostClassifier
from xgboost import XGBClassifier
import joblib


In [2]:
# Load .env enviroment variables
load_dotenv()

True

In [3]:
# Set Alpaca API key and secret
API_KEY = os.getenv("ALPACA_API_KEY")
API_SECRET = os.getenv("ALPACA_SECRET_KEY")
ALPACA_API_BASE_URL = "https://paper-api.alpaca.markets"

In [4]:
print(f"Alpaca Key type: {type(API_KEY)}")
print(f"Alpaca Secret Key type: {type(API_SECRET)}")

Alpaca Key type: <class 'str'>
Alpaca Secret Key type: <class 'str'>


In [5]:
# Create the Alpaca API object, specifying use of the paper trading account:
api = tradeapi.REST(API_KEY, API_SECRET, ALPACA_API_BASE_URL, api_version='v2')

### Data Generation



#### 1. Create a ticker list, beginning and end dates, and timeframe interval.


In [6]:
# Define a list of tickers
ticker_list = ["FB", "AMZN", "AAPL", "NFLX", "GOOGL", "MSFT", "TSLA"]

# declare begin and end date strings
beg_date = '2022-05-27'
end_date = '2022-05-27'
# we convert begin and end date to formats that the ALPACA API requires
start =  pd.Timestamp(f'{beg_date} 09:30:00-0400', tz='America/New_York').replace(hour=9, minute=30, second=0).astimezone('GMT').isoformat()[:-6]+'Z'
end   =  pd.Timestamp(f'{end_date} 16:00:00-0400', tz='America/New_York').replace(hour=16, minute=0, second=0).astimezone('GMT').isoformat()[:-6]+'Z'

# We set the time frequency at which we want to pull prices
timeframe = '1Min'

#### 2. Ping the Alpaca API for the data and store it in a DataFrame called `prices` by using the `get_barset` function combined with the `df` method from the Alpaca Trade SDK.

In [7]:
# Pull all prices from ticker_list using the ALPACA API
prices = api.get_bars(ticker_list, timeframe=timeframe, start=start, end=end).df

In [8]:
prices.head()

Unnamed: 0_level_0,open,high,low,close,volume,trade_count,vwap,symbol
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2022-05-27 13:30:00+00:00,145.39,145.74,145.26,145.54,1718145,16997,145.40178,AAPL
2022-05-27 13:31:00+00:00,145.56,146.13,145.56,146.0,656681,5929,145.872124,AAPL
2022-05-27 13:32:00+00:00,146.0,146.23,145.93,146.18,411564,3584,146.068032,AAPL
2022-05-27 13:33:00+00:00,146.18,146.205,146.02,146.045,374322,3481,146.121218,AAPL
2022-05-27 13:34:00+00:00,146.04,146.11,145.751,145.79,505251,3876,145.933648,AAPL


#### 3. Store only the close prices from the `prices` DataFrame in a new DataFrame called `df_closing_prices`, then view the head and tail to confirm the following:
* First price for each stock on the open at 9:30 Eastern Time.
* Last price for the day on the close at 3:59 pm Eastern Time.

In [9]:
# Create an empty DataFrame for closing prices
df_closing_prices = pd.DataFrame()

# Fetch the closing prices for each one of the tickers and store in a column in df_closing_prices amed after that ticker
for ticker in ticker_list:
    df_closing_prices[ticker] = prices.loc[prices['symbol'] == ticker]['close']

In [10]:
# Preview first five rows
df_closing_prices.head(5)

Unnamed: 0_level_0,FB,AMZN,AAPL,NFLX,GOOGL,MSFT,TSLA
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2022-05-27 13:30:00+00:00,190.34,2271.97,145.54,193.9,2188.01,267.7928,728.355
2022-05-27 13:31:00+00:00,190.67,2281.46,146.0,193.61,2197.575,268.19,734.52
2022-05-27 13:32:00+00:00,191.7087,2280.75,146.18,194.38,2193.68,268.61,735.77
2022-05-27 13:33:00+00:00,191.245,2271.01,146.045,193.5747,2194.43,269.11,732.45
2022-05-27 13:34:00+00:00,190.61,2264.365,145.79,192.8,2192.12,268.91,730.59


In [11]:
# Preview last five rows
df_closing_prices.tail(5)

Unnamed: 0_level_0,FB,AMZN,AAPL,NFLX,GOOGL,MSFT,TSLA
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2022-05-27 19:56:00+00:00,195.1,2299.99,149.195,195.12,2240.86,272.98,757.7334
2022-05-27 19:57:00+00:00,195.12,2299.46,149.415,195.0101,2242.81,272.97,757.77
2022-05-27 19:58:00+00:00,195.02,2299.88,149.51,194.95,2242.98,273.03,758.28
2022-05-27 19:59:00+00:00,195.13,2303.38,149.65,195.19,2246.36,273.23,759.66
2022-05-27 20:00:00+00:00,195.13,2301.99,149.64,195.19,2246.33,273.22,759.5


In [12]:
# Number of rows
df_closing_prices.shape

(391, 7)

In [13]:
# Test for null values
df_closing_prices.isnull().sum()

FB        0
AMZN      1
AAPL      0
NFLX      0
GOOGL    10
MSFT      0
TSLA      0
dtype: int64

#### 4. When viewing the head and tail, you'll notice several `NaN` values.
* Alpaca reports `NaN` for minutes without any trades occuring as missing.
* These values must be removed, we use Panda's `ffill()` function to "forward fill", or replace, those prices with the previous values (since the price has not changed).


In [14]:
# Use Pandas' forward fill function to fill missing values (be sure to set inplace=True)
df_closing_prices.ffill(inplace=True)

In [15]:
# Test for null values
df_closing_prices.isnull().sum()

FB       0
AMZN     0
AAPL     0
NFLX     0
GOOGL    0
MSFT     0
TSLA     0
dtype: int64

### Computing Returns

#### 1. Compute the percentage change values for 1 minute as follows:
* Create a variable called `forecast` to hold the forecast, in this case `1` for 1 minute.
* Use the `pct_change` function, passing in the `forecast`, on the `df_closing_prices` DataFrame, storeing the newly generated DataFrame in a variable called `returns`.
* Convert the `returns` DataFrame to show forward returns by passing `-(forecast)` into the `shift function.`

In [16]:
# Define a variable to set prediction period
forecast = 1

# Compute the pct_change for 1 min 
returns = df_closing_prices.pct_change()

# Shift the returns to convert them to forward returns
returns = returns.shift(-forecast)

# Preview the DataFrame
returns.head(5)

Unnamed: 0_level_0,FB,AMZN,AAPL,NFLX,GOOGL,MSFT,TSLA
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2022-05-27 13:30:00+00:00,0.001734,0.004177,0.003161,-0.001496,0.004372,0.001483,0.008464
2022-05-27 13:31:00+00:00,0.005448,-0.000311,0.001233,0.003977,-0.001772,0.001566,0.001702
2022-05-27 13:32:00+00:00,-0.002419,-0.004271,-0.000924,-0.004143,0.000342,0.001861,-0.004512
2022-05-27 13:33:00+00:00,-0.00332,-0.002926,-0.001746,-0.004002,-0.001053,-0.000743,-0.002539
2022-05-27 13:34:00+00:00,0.001967,-0.00047,0.001193,0.005265,0.001884,0.001153,0.001985


In [17]:
returns.unstack().head()

    timestamp                
FB  2022-05-27 13:30:00+00:00    0.001734
    2022-05-27 13:31:00+00:00    0.005448
    2022-05-27 13:32:00+00:00   -0.002419
    2022-05-27 13:33:00+00:00   -0.003320
    2022-05-27 13:34:00+00:00    0.001967
dtype: float64

#### 2. Convert the DataFrame into long form for merging later using `unstack` and `reset_index`.

In [18]:
# Use unstack() to bring the data in long format and save the output as as dataframe
returns = returns.unstack().reset_index()

# Rename the column to make it easer to identify it:
name = f'F_{forecast}_m_returns'
returns.rename(columns = {0: name}, inplace=True)


In [19]:
# Preview the first five rows
returns.head(5)


Unnamed: 0,level_0,timestamp,F_1_m_returns
0,FB,2022-05-27 13:30:00+00:00,0.001734
1,FB,2022-05-27 13:31:00+00:00,0.005448
2,FB,2022-05-27 13:32:00+00:00,-0.002419
3,FB,2022-05-27 13:33:00+00:00,-0.00332
4,FB,2022-05-27 13:34:00+00:00,0.001967


In [20]:
# Preview the last five rows
returns.tail(5)

Unnamed: 0,level_0,timestamp,F_1_m_returns
2732,TSLA,2022-05-27 19:56:00+00:00,4.8e-05
2733,TSLA,2022-05-27 19:57:00+00:00,0.000673
2734,TSLA,2022-05-27 19:58:00+00:00,0.00182
2735,TSLA,2022-05-27 19:59:00+00:00,-0.000211
2736,TSLA,2022-05-27 20:00:00+00:00,


#### 3. Compute the 1, 5, 10 minute momentums that will be used to predict the forward returns, then merge them with the forward returns as follows:
* Create the list of moments: `list_of_momentums = [1,5,10]`.
* Write a for-loop to loop through the `list_of_momentums`, applying them to `pct_change` with the `df_closing_price` with each iteration.
* With each loop, the data temporary DataFrame, `returns_temp` will need to be prepped with `unstack` and `reset_index`, then added as a new column to the original `returns` DataFrame from the prior step.
* Complete this step by dropping the null values from `returns` and creating a multi-index based on date and ticker.

In [21]:
# Create list of momentums that we want to predict
list_of_momentums = [1,5,10]
for i in list_of_momentums:   
    # Compute percentage change for each one of the momentums in the momentum list
    pct_chg = df_closing_prices.pct_change(i)
    
    # Unstack the returns and save the output as as dataframe called returns_temp 
    returns_temp = pd.DataFrame(pct_chg.unstack(level=0))
    
    # Rename the column to make it easer to identify it:
    name = f'{i}_m_returns'
    returns_temp.rename(columns={0: name}, inplace = True)
    
    # Reset the index so we can merge based on index
    returns_temp.reset_index(inplace=True)
    
    # Merge returns_temp  with the original returns 
    returns = pd.merge(returns,returns_temp,left_on=['level_0', 'timestamp'],right_on=['level_0', 'timestamp'], how='left', suffixes=('_original', 'right'))

In [22]:
returns.head(11)

Unnamed: 0,level_0,timestamp,F_1_m_returns,1_m_returns,5_m_returns,10_m_returns
0,FB,2022-05-27 13:30:00+00:00,0.001734,,,
1,FB,2022-05-27 13:31:00+00:00,0.005448,0.001734,,
2,FB,2022-05-27 13:32:00+00:00,-0.002419,0.005448,,
3,FB,2022-05-27 13:33:00+00:00,-0.00332,-0.002419,,
4,FB,2022-05-27 13:34:00+00:00,0.001967,-0.00332,,
5,FB,2022-05-27 13:35:00+00:00,0.002331,0.001967,0.003389,
6,FB,2022-05-27 13:36:00+00:00,0.004544,0.002331,0.003986,
7,FB,2022-05-27 13:37:00+00:00,-0.000988,0.004544,0.003084,
8,FB,2022-05-27 13:38:00+00:00,-0.000312,-0.000988,0.004523,
9,FB,2022-05-27 13:39:00+00:00,0.002094,-0.000312,0.007555,


In [23]:
# Use dropna() to get rid of those missing observations.
returns.dropna(inplace=True)

# Create a multi index based on level_0 and time
returns.set_index(['level_0', 'timestamp'], inplace=True)
returns.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,F_1_m_returns,1_m_returns,5_m_returns,10_m_returns
level_0,timestamp,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
FB,2022-05-27 13:40:00+00:00,-0.002245,0.002094,0.007682,0.011096
FB,2022-05-27 13:41:00+00:00,-0.003177,-0.002245,0.003082,0.00708
FB,2022-05-27 13:42:00+00:00,-0.000522,-0.003177,-0.004628,-0.001558
FB,2022-05-27 13:43:00+00:00,-0.002776,-0.000522,-0.004164,0.00034
FB,2022-05-27 13:44:00+00:00,-0.000833,-0.002776,-0.006619,0.000886


## Part 2: Train and Compare Multiple Machine Learning Algorithms

 In this section, you'll train each of the requested algorithms and compare performance. Be sure to use the same parameters and training steps for each model. This is necessary to compare each model accurately.

### Preprocessing Data

#### 1. Generate your feature data (`X`) and target data (`y`):
* Create a dataframe `X` that contains all the columns from the returns dataframe that will be used to predict `F_1_m_returns`.
* Create a variable, called `y`, that is equal 1 if `F_1_m_returns` is larger than 0. This will be our target variable.

In [24]:
# Create a dataframe `X` that contains all the columns from the returns dataframe that will be used to predict `F_1_m_returns`.
# Create a variable, called `y`, that is equal 1 if `F_1_m_returns` is larger than 0. This will be our target variable.

# Create a separate dataframe for features and define the target variable as a binary target
X = returns.iloc[:,1:4]

# Create the target variable
y = []
# Loop through the returns["F_1_m_returns"] data and append 0 or 1 to y based on returns
for row in returns["F_1_m_returns"]:
    if row > 0:
        y.append(1)

    elif row <= 0:
        y.append(0)

X.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,1_m_returns,5_m_returns,10_m_returns
level_0,timestamp,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
FB,2022-05-27 13:40:00+00:00,0.002094,0.007682,0.011096
FB,2022-05-27 13:41:00+00:00,-0.002245,0.003082,0.00708
FB,2022-05-27 13:42:00+00:00,-0.003177,-0.004628,-0.001558
FB,2022-05-27 13:43:00+00:00,-0.000522,-0.004164,0.00034
FB,2022-05-27 13:44:00+00:00,-0.002776,-0.006619,0.000886


##### Note:
> Notice that we don't use shuffle when splitting the dataset into a training and testing dataset. 

> We want to keep the original ordering of the data, so we don't end up using observations in the future to predict past observations,

> This is a critical mistake known as look ahead bias.

#### 2. Use the train_test_split library to split the dataset into a training and testing dataset, with 70% used for testing
* Set the shuffle parameter to False, so that you use the first 70% for training to prvent look ahead bias.
* Make sure you have these 4 variables: `X_train`, `X_test`, `y_train`, `y_test`. 

In [25]:
# Split the dataset without shuffling
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1, shuffle=False)

#### 3. Use the `Counter` function to test the distribution of the data. 
* The result of `Counter({1: 668, 0: 1194})` reveals the data is indeed unbalanced.

In [26]:
# Use Counter to count the number 1s and 0 in y_train
Counter(y_train)

Counter({0: 958, 1: 1037})

#### 4. Balance the dataset with the Oversampler libary, setting `random state= 1`.

In [27]:
# Use RandomOverSampler to resample the datase using random_state=1
ros = RandomOverSampler(random_state=1)
X_resampled, y_resampled = ros.fit_resample(X_train, y_train)

#### 5. Test the distribution once again with `Counter`. The new result of `Counter({1: 1194, 0: 1194})` shows the data is now balanced.

In [28]:
# Use Counter again to verify imbalance removed
Counter(y_resampled)

Counter({0: 1037, 1: 1037})

# Machine Learning

#### 1. The first cells in this section provide an example of how to fit and train your model using the `LogisticRegression` model from sklearn:
* Import select model.
* Instantiate model object.
* Fit the model to the resampled data - `X_resampled` and `y_resampled`.
* Predict the model using `X_test`.
* Print the classification report.

In [29]:
# Create a LogisticRegression model and train it on the X_resampled data we created before
log_model = LogisticRegression()
log_model.fit(X_resampled, y_resampled)  

# Use the model you trained to predict using X_test
y_pred = log_model.predict(X_test)   

# Print out a classification report toevaluate performance
print(classification_report(y_test, y_pred, digits=4))

# Print out a balanced accuracy score report to evaluate performance
print(f"Balanced Accuracy Score: {balanced_accuracy_score(y_test, y_pred)}")

# Calculate Sharpe Ratio
sharpe_ratio = y_pred.mean() / y_pred.std()
print(f"Sharpe Ratio: {sharpe_ratio}")

              precision    recall  f1-score   support

           0     0.4984    0.5080    0.5032       311
           1     0.5603    0.5508    0.5556       354

    accuracy                         0.5308       665
   macro avg     0.5294    0.5294    0.5294       665
weighted avg     0.5314    0.5308    0.5311       665

Balanced Accuracy Score: 0.529443021418061
Sharpe Ratio: 1.0477556003702655


#### 2. Use the same approach as above to train and test the following ML Algorithms:
* [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)
* [GradientBoostingClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html)
* [AdaBoostClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html)
* [XGBClassifier](https://xgboost.readthedocs.io/en/latest/python/python_api.html)

#### RandomForestClassifier

In [30]:
# Create a RandomForestClassifier model and train it on the X_resampled data we created before
rfc_model = RandomForestClassifier()
rfc_model.fit(X_resampled, y_resampled)  

# Use the model you trained to predict using X_test
y_pred = rfc_model.predict(X_test)   

# Print out a classification report to evaluate performance
print(classification_report(y_test, y_pred, digits=4))

# Print out a balanced accuracy score report to evaluate performance
print(f"Balanced Accuracy Score: {balanced_accuracy_score(y_test, y_pred)}")

# Calculate Sharpe Ratio
sharpe_ratio = y_pred.mean() / y_pred.std()
print(f"Sharpe Ratio: {sharpe_ratio}")

              precision    recall  f1-score   support

           0     0.4719    0.4855    0.4786       311
           1     0.5362    0.5226    0.5293       354

    accuracy                         0.5053       665
   macro avg     0.5041    0.5041    0.5040       665
weighted avg     0.5061    0.5053    0.5056       665

Balanced Accuracy Score: 0.5040647083401457
Sharpe Ratio: 1.0383279828647594


#### GradientBoostingClassifier

In [31]:
# Create a GradientBoostingClassifier model and train it on the X_resampled data we created before
gbc_model = GradientBoostingClassifier()
gbc_model.fit(X_resampled, y_resampled)  

# Use the model you trained to predict using X_test
y_pred = gbc_model.predict(X_test)   

# Print out a classification report to evaluate performance
print(classification_report(y_test, y_pred, digits=4))

# Print out a balanced accuracy score report to evaluate performance
print(f"Balanced Accuracy Score: {balanced_accuracy_score(y_test, y_pred)}")

# Calculate Sharpe Ratio
sharpe_ratio = y_pred.mean() / y_pred.std()
print(f"Sharpe Ratio: {sharpe_ratio}")

              precision    recall  f1-score   support

           0     0.4697    0.5241    0.4954       311
           1     0.5346    0.4802    0.5060       354

    accuracy                         0.5008       665
   macro avg     0.5022    0.5022    0.5007       665
weighted avg     0.5043    0.5008    0.5010       665

Balanced Accuracy Score: 0.5021708721637873
Sharpe Ratio: 0.9573016833623034


#### AdaBoostClassifier

In [32]:
# Create a AdaBoostClassifier model and train it on the X_resampled data we created before
abc_model = AdaBoostClassifier()
abc_model.fit(X_resampled, y_resampled)  

# Use the model you trained to predict using X_test
y_pred = abc_model.predict(X_test)   

# Print out a classification report to evaluate performance
print(classification_report(y_test, y_pred, digits=4))

# Print out a balanced accuracy score report to evaluate performance
print(f"Balanced Accuracy Score: {balanced_accuracy_score(y_test, y_pred)}")

# Calculate Sharpe Ratio
sharpe_ratio = y_pred.mean() / y_pred.std()
print(f"Sharpe Ratio: {sharpe_ratio}")

              precision    recall  f1-score   support

           0     0.4777    0.5852    0.5260       311
           1     0.5458    0.4379    0.4859       354

    accuracy                         0.5068       665
   macro avg     0.5117    0.5115    0.5060       665
weighted avg     0.5139    0.5068    0.5047       665

Balanced Accuracy Score: 0.5115310552800334
Sharpe Ratio: 0.8633694598183222


#### XGBClassifier

In [33]:
# Create a XGBClassifier model and train it on the X_resampled data we created before
xgbc_model = XGBClassifier()
xgbc_model.fit(X_resampled, y_resampled)  

# Use the model you trained to predict using X_test
y_pred = xgbc_model.predict(X_test)   

# Print out a classification report to evaluate performance
print(classification_report(y_test, y_pred, digits=4))

# Print out a balanced accuracy score report to evaluate performance
print(balanced_accuracy_score(y_test, y_pred))

# Print out a balanced accuracy score report to evaluate performance
print(f"Balanced Accuracy Score: {balanced_accuracy_score(y_test, y_pred)}")

# Calculate Sharpe Ratio
sharpe_ratio = y_pred.mean() / y_pred.std()
print(f"Sharpe Ratio: {sharpe_ratio}")

              precision    recall  f1-score   support

           0     0.4909    0.5177    0.5039       311
           1     0.5549    0.5282    0.5412       354

    accuracy                         0.5233       665
   macro avg     0.5229    0.5230    0.5226       665
weighted avg     0.5249    0.5233    0.5238       665

0.5229667375152143
Balanced Accuracy Score: 0.5229667375152143
Sharpe Ratio: 1.0136266691392073


### Evaluate the performance of each model


#### 1. Using the classification report for each model, choose the model with the highest precision for use in your algo-trading program.

* Which model has the highest accuracy?
    * **0.529443021418061 - Logistic Regression**
    * 0.4869202681345033 - RandomForestClassifier
    * 0.5021708721637873 - GradientBoostingClassifier
    * 0.5115310552800334 - AdaBoostClassifier
    * 0.5229667375152143 - XGBClassifier

    **Logistic Regression model has the highest accuracy**

* Which model has the highest performance over time?
    * **0.5311 - Logistic Regression**
    * 0.4877 - RandomForestClassifier
    * 0.5010 - GradientBoostingClassifier
    * 0.5047 - AdaBoostClassifier
    * 0.5238 - XGBClassifier

    **Logistic Regression model has the highest performance over time**

* Which model has the highest sharpe ratio?
    * **1.0477556003702655 - Logistic Regression**
    * 1.0075472768815938 - RandomForestClassifier
    * 0.9573016833623034 - GradientBoostingClassifier
    * 0.8633694598183222 - AdaBoostClassifier
    * 1.0136266691392073 - XGBClassifier

    **Logistic Regression model has the highest sharpe ratio**

#### 2. Save the selected model with the `joblib` libary to avoid retraining every time you wish to use it.

In [34]:
# Use the library to save the model that you want to use for trading
joblib.dump(log_model, 'log_model.pkl')

['log_model.pkl']

## Part 3: Implement the strongest model using Apaca API

### Develop the Algorithm


#### 1. Use the provided code to ping the Alpaca API and create the DataFrame needed to feed data into the model.
   * This code will also store the correct feature data in `X` for later use.

In [35]:
# Create the list of tickers
ticker_list = ['FB','AMZN','AAPL','NFLX', 'GOOGL', 'MSFT', 'TSLA']

# Define Dates
beg_date = '2022-05-27'
end_date = '2022-05-27'

# Convert the date in a format the Alpaca API reqires
start =  pd.Timestamp(f'{beg_date} 09:30:00-0400', tz='America/New_York').replace(hour=9, minute=30, second=0).astimezone('GMT').isoformat()[:-6]+'Z'
end   =  pd.Timestamp(f'{end_date} 16:00:00-0400', tz='America/New_York').replace(hour=15, minute=0, second=0).astimezone('GMT').isoformat()[:-6]+'Z'
timeframe='1Min'

# Use iloc to get the last 10 mins every time we pull new data
prices = api.get_bars(ticker_list, timeframe=timeframe, start=start, end=end).df
prices.ffill(inplace=True)

prices.head()

Unnamed: 0_level_0,open,high,low,close,volume,trade_count,vwap,symbol
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2022-05-27 13:30:00+00:00,145.39,145.74,145.26,145.54,1718145,16997,145.40178,AAPL
2022-05-27 13:31:00+00:00,145.56,146.13,145.56,146.0,656681,5929,145.872124,AAPL
2022-05-27 13:32:00+00:00,146.0,146.23,145.93,146.18,411564,3584,146.068032,AAPL
2022-05-27 13:33:00+00:00,146.18,146.205,146.02,146.045,374322,3481,146.121218,AAPL
2022-05-27 13:34:00+00:00,146.04,146.11,145.751,145.79,505251,3876,145.933648,AAPL


In [36]:
# Create an empty DataFrame for closing prices
df_closing_prices = pd.DataFrame()

# Fetch the closing prices for each symbol in the prices DataFrame. 
for ticker in ticker_list:
    df_closing_prices[ticker] = prices.loc[prices['symbol'] == ticker]['close'].iloc[-11:]

df_closing_prices.head(20)

Unnamed: 0_level_0,FB,AMZN,AAPL,NFLX,GOOGL,MSFT,TSLA
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2022-05-27 18:50:00+00:00,193.98,2273.6178,148.695,193.5,2238.36,270.77,754.9
2022-05-27 18:51:00+00:00,194.18,2275.54,148.69,193.56,,270.845,755.7538
2022-05-27 18:52:00+00:00,194.23,2276.97,148.6999,193.65,2239.29,270.91,756.1
2022-05-27 18:53:00+00:00,194.1603,2276.855,148.655,193.545,2238.19,270.7876,755.21
2022-05-27 18:54:00+00:00,194.09,2275.9443,148.62,193.45,2238.01,270.77,755.3838
2022-05-27 18:55:00+00:00,194.22,2278.0,148.68,193.51,2239.0,270.92,755.1408
2022-05-27 18:56:00+00:00,194.2694,2279.0,148.7217,193.71,,270.945,755.31
2022-05-27 18:57:00+00:00,194.21,2282.3515,148.77,193.7763,2240.7111,270.98,755.44
2022-05-27 18:58:00+00:00,194.19,2279.81,148.795,193.87,2240.0,271.0445,755.5744
2022-05-27 18:59:00+00:00,194.25,2279.29,148.765,193.89,2240.0,271.0934,755.515


In [37]:
# Create list of momentums
list_of_momentums = [1,5,10]

for i in list_of_momentums:  
    # Compute percentage change for each one of the momentums in the momentum list
    returns_temp = df_closing_prices.pct_change(i)
    # Unstack the returns 
    returns_temp = pd.DataFrame(returns_temp.unstack())
    name = f'{i}_m_returns'
    returns_temp.rename(columns={0: name}, inplace = True)
    # Reset the index so we can merge based on index
    returns_temp.reset_index(inplace = True)
    # Merge newly computed returns with previously created returns
    if i ==1:
        returns = returns_temp
    else:
        returns = pd.merge(returns,returns_temp,left_on=['level_0', 'time'],right_on=['level_0', 'time'], how='left', suffixes=('_original', 'right'))

# Drop nulls and set index
returns.dropna(axis=0, how='any', inplace=True)
returns.set_index(['level_0', 'time'], inplace=True)

# Generate feature data and preview first 10 rows.
X = returns
X.head(10)

KeyError: 'time'

#### 2. Using `joblib`, load the chosen model.

In [None]:
# Load the previously trained and saved model using joblib
# YOUR CODE HERE

#### 3. Use the model file to make predicttions:
* Use `predict` on `X` and save this as `y_pred`.
* Convert `y_pred` to a DataFrame, setting the index to the index of `X`.
* Rename the column 0 to 'buy', be sure to set `inplace =True`.

In [None]:
# Use the model file to predict on X
# YOUR CODE HERE

# Convert y_pred to a dataframe, set the index to the index of X
# YOUR CODE HERE

# Rename the column 0 to 'buy', be sure to set inplace =True
# YOUR CODE HERE

#### 4. Filter the stocks where 'buy' is equal to 1, saving the filter as `y_pred`.

In [None]:
# Filter the stocks where 'buy' is equal to 1
# YOUR CODE HERE

#### 5. Using the `y_pred` filter, create a dictionary called `buy_dict` and assign 'n' to each Ticker (key value) as a placeholder.

In [None]:
# Create dictionary from y_pred and assign a 'n' to each of them for now as a placeholder.
buy_dict = dict.fromkeys(y_pred.index.get_level_values(0), 'n')
buy_dict

#### 6. Obtain the total available equity in your account from the Alpaca API and store in a variable called `total_capital`. You will split the capital equally between all selected stocks per the CIO's request.

In [None]:
# Pull the total available equity in our account from the  Alpaca API
# YOUR CODE HERE

In [None]:
# Compute capital per stock, divide equity in account by number of stocks
# Use Alpaca API to pull the equity in the account
if len(buy_dict) > 0:
    capital_per_stock = float(total_capital)/ len(buy_dict)
else:
    capital_per_stock = 0
print(f'Capital per stock: {capital_per_stock}')

#### 7. Use a for-loop to iterate through `buy_dict` to determine the number stocks you need to buy for each ticker.

In [None]:
# Use for loop to iterate through dictionary of buys 
# Determine the number stocks we need to buy for each ticker
for ticker in buy_dict:
    try:
        buy_dict[ticker] = int(capital_per_stock /int(prices[ticker].iloc[-1]['close']))
    except:
        pass

print(buy_dict)

#### 8. Cancel all previous orders in the Alpaca API (so you don't buy more than intended) and sell all currently held stocks to close all positions.

In [None]:
# Cancel all previous orders in the Alpaca API
# YOUR CODE HERE

# Sell all currently held stocks to close all positions
# YOUR CODE HERE

#### 9. Iterate through `buy_dict` and send a buy order for each ticker with their corresponding number of shares.

In [None]:
# Iterate through the longlist object and send a buy order for each ticker with a corresponding number of shares:
# YOUR CODE HERE

### Automate the algorithm

#### 1. Make a function called `trade()` that incorporates all of the steps above.

In [None]:
# Add all of the steps conducted above into the function trade
def trade():

    ticker_list = ['FB','AMZN','AAPL','NFLX', 'GOOGL', 'MSFT', 'TSLA']
    # Notice that we remove the start and end variables since we want the latest prices.
    timeframe='1Min'
    # Use iloc to get the last 10 mins every time we pull new data
    prices = api.get_barset(ticker_list, "minute").df.iloc[-11:]
    prices.ffill(inplace=True)   

    # Create and empty DataFrame for closing prices
    df_closing_prices = pd.DataFrame()

    # Fetch the closing prices of our tickers
    df_closing_prices["FB"] = prices["FB"]["close"]
    df_closing_prices["AMZN"] = prices["AMZN"]["close"]
    df_closing_prices["AAPL"] = prices["AAPL"]["close"]
    df_closing_prices["NFLX"] = prices["NFLX"]["close"]
    df_closing_prices["GOOGL"] = prices["GOOGL"]["close"]
    df_closing_prices['MSFT'] = prices['MSFT']["close"]
    df_closing_prices['TSLA'] = prices['TSLA']["close"]
    print(df_closing_prices.head())
    
    # Loop through momentums to build new DataFrame
    list_of_momentums = [1,5,10]
    for i in list_of_momentums:   
        returns_temp = df_closing_prices.pct_change(i)
        returns_temp = pd.DataFrame(returns_temp.unstack())
        name = f'{i}_m_returns'
        returns_temp.rename(columns={0: name}, inplace = True)
        returns_temp.reset_index(inplace = True)
        if i ==1:
            returns = returns_temp
        else:
            returns = pd.merge(returns,returns_temp,left_on=['level_0', 'time'],right_on=['level_0', 'time'], how='left', suffixes=('_original', 'right'))

    # Drop nulls and set index            
    returns.dropna(axis=0, how='any', inplace=True)
    returns.set_index(['level_0', 'time'], inplace=True)

    # Preprocess data for model
    # YOUR CODE HERE

    # Create the `buy_dict` object
    # YOUR CODE HERE
    
    # Split capital between stocks and determine buy or sell
    # YOUR CODE HERE

    
    # Cancel pending orders and close positions
    # YOUR CODE HERE
   
    
    # Submit orders
    # YOUR CODE HERE


#### 2. Import Python's schedule module.

In [None]:
# Import Python's schedule module 
# YOUR CODE HERE

#### 3. Use the "schedule" module to automate the algorithm:
* Clear the schedule with `.clear()`.
* Define a schedule to run the trade function every minute at 5 seconds past the minute mark (e.g. `10:31:05`).
* Use the Alpaca API to check whether the market is open.
* Use run_pending() function inside schedule to execute the schedule you defined while the market is open

In [None]:
# Clear the schedule
# YOUR CODE HERE

# Define a schedule to run the trade function every minute at 5 seconds past the minute mark (e.g. 10:31:05)
# YOUR CODE HERE

# Use the Alpaca API to check whether the market is open
# YOUR CODE HERE

# Use run_pending() function inside schedule to execute the schedule you defined as long as the market is open
# YOUR CODE HERE