# High Frequency Trading Algorithm

You have been tasked by the investment firm Renaissance High Frequency Trading (RHFT) to develop an automated trading strategy utilizing a combination of machine learning algorithms and high frequency algorithms. RHFT wants this new algorithm to be based on stock market data of the 30 stocks in the Dow Jones at the minute level and to conduct buys and sells every minute based on 1 min, 5 min, and 10 min Momentum. The CIO asked you to choose the Machine Learning Algorithm best suited for this task and wants you to execute the trades via Alpaca's API.

## Part 1: Prepare the data for training and testing

### Initial Set-Up

In [1]:
import os
from pathlib import Path
import alpaca_trade_api as tradeapi
import pandas as pd
import numpy as np
import datetime
import time
from dotenv import load_dotenv


In [2]:
# Load .env enviroment variables
load_dotenv()

True

In [3]:
# Set Alpaca API key and secret
alpaca_api_key = os.getenv("ALPACA_API_KEY")
alpaca_secret_key = os.getenv("ALPACA_SECRET_KEY")


In [4]:
# Create the Alpaca API object, specifying use of the paper trading account:
api = tradeapi.REST(
    alpaca_api_key,
    alpaca_secret_key,
    base_url = 'https://paper-api.alpaca.markets',
    api_version = "v2"
)

In [5]:
# Obtain and check account information
account = api.get_account()
print(account)

Account({   'account_blocked': False,
    'account_number': 'PA3TN3SZNFYP',
    'accrued_fees': '0',
    'buying_power': '75038.78',
    'cash': '100000',
    'created_at': '2022-02-28T21:44:51.926672Z',
    'crypto_status': 'ACTIVE',
    'currency': 'USD',
    'daytrade_count': 0,
    'daytrading_buying_power': '0',
    'equity': '100000',
    'id': '51d1e927-eb04-4ac3-9d0f-a815d22a43ec',
    'initial_margin': '62480.61',
    'last_equity': '100000',
    'last_maintenance_margin': '0',
    'long_market_value': '0',
    'maintenance_margin': '0',
    'multiplier': '2',
    'non_marginable_buying_power': '0',
    'pattern_day_trader': False,
    'pending_transfer_in': '0',
    'portfolio_value': '100000',
    'regt_buying_power': '75038.78',
    'short_market_value': '0',
    'shorting_enabled': True,
    'sma': '100000',
    'status': 'ACTIVE',
    'trade_suspended_by_user': False,
    'trading_blocked': False,
    'transfers_blocked': False})


### Data Generation



#### 1. Create a ticker list, beginning and end dates, and timeframe interval.


In [6]:
# Define a list of tickers
ticker_list = ['AMZN','AAPL','GOOGL']
# declare begin and end date strings
beg_date = '2021-01-05'
end_date = '2021-01-05'
# we convert begin and end date to formats that the ALPACA API requires
start =  pd.Timestamp(f'{beg_date} 09:30:00-0400', tz='America/New_York').replace(hour=9, minute=30, second=0).astimezone('GMT').isoformat()[:-6]+'Z'
end   =  pd.Timestamp(f'{end_date} 16:00:00-0400', tz='America/New_York').replace(hour=16, minute=0, second=0).astimezone('GMT').isoformat()[:-6]+'Z'
# We set the time frequency at which we want to pull prices
timeframe='1Min'


#### 2. Ping the Alpaca API for the data and store it in a DataFrame called `prices` by using the `get_barset` function combined with the `df` method from the Alpaca Trade SDK.

In [7]:
# Pull prices from the ALPACA API
prices = api.get_barset(ticker_list, timeframe,limit=1000, start=start, end=end).df

#### 3. Store only the close prices from the `prices` DataFrame in a new DataFrame called `df_closing_prices`, then view the head and tail to confirm the following:
* First price for each stock on the open at 9:30 Eastern Time.
* Last price for the day on the close at 3:59 pm Eastern Time.

In [8]:
# Create a DataFrame for the closing prices for each one of the tickers and store in a column in df_closing_prices amed after that ticker
df_closing_prices = pd.DataFrame({
    "GOOGL": prices["GOOGL"].close,
    "AAPL": prices["AAPL"].close,
    "AMZN": prices["AMZN"].close,
    }, index=prices.index
)


In [9]:
# Preview first five rows
df_closing_prices.head(5)

Unnamed: 0_level_0,GOOGL,AAPL,AMZN
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2021-01-05 09:30:00-05:00,1724.17,129.485,3172.98
2021-01-05 09:31:00-05:00,1724.05,130.06,3177.81
2021-01-05 09:32:00-05:00,1721.61,130.02,3175.47
2021-01-05 09:33:00-05:00,,130.12,3179.36
2021-01-05 09:34:00-05:00,1720.3,130.51,3184.015


In [10]:
# Preview last five rows
df_closing_prices.tail(5)

Unnamed: 0_level_0,GOOGL,AAPL,AMZN
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2021-01-05 15:56:00-05:00,1738.15,130.85,3219.84
2021-01-05 15:57:00-05:00,1738.99,131.01,3222.7
2021-01-05 15:58:00-05:00,1738.84,130.99,3221.18
2021-01-05 15:59:00-05:00,1740.57,130.965,3219.67
2021-01-05 16:00:00-05:00,,131.14,


#### 4. When viewing the head and tail, you'll notice several `NaN` values.
* Alpaca reports `NaN` for minutes without any trades occuring as missing.
* These values must be removed, we use Panda's `ffill()` function to "forward fill", or replace, those prices with the previous values (since the price has not changed).


In [11]:
# Use Pandas' forward fill function to fill missing values (be sure to set inplace=True)
df_closing_prices.ffill(inplace=True)
df_closing_prices.head()

Unnamed: 0_level_0,GOOGL,AAPL,AMZN
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2021-01-05 09:30:00-05:00,1724.17,129.485,3172.98
2021-01-05 09:31:00-05:00,1724.05,130.06,3177.81
2021-01-05 09:32:00-05:00,1721.61,130.02,3175.47
2021-01-05 09:33:00-05:00,1721.61,130.12,3179.36
2021-01-05 09:34:00-05:00,1720.3,130.51,3184.015


### Computing Returns

#### 1. Compute the percentage change values for 1 minute as follows:
* Create a variable called `forecast` to hold the forecast, in this case `1` for 1 minute.
* Use the `pct_change` function, passing in the `forecast`, on the `df_closing_prices` DataFrame, storeing the newly generated DataFrame in a variable called `returns`.
* Convert the `returns` DataFrame to show forward returns by passing `-(forecast)` into the `shift function.`

In [12]:
# Define a variable to set prediction period
forecast = 1

# Compute the pct_change for 1 min 
returns = df_closing_prices.pct_change(periods=forecast)

# Shift the returns to convert them to forward returns
returns = returns.shift(-(forecast))

# Preview the DataFrame
returns.head()

Unnamed: 0_level_0,GOOGL,AAPL,AMZN
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2021-01-05 09:30:00-05:00,-7e-05,0.004441,0.001522
2021-01-05 09:31:00-05:00,-0.001415,-0.000308,-0.000736
2021-01-05 09:32:00-05:00,0.0,0.000769,0.001225
2021-01-05 09:33:00-05:00,-0.000761,0.002997,0.001464
2021-01-05 09:34:00-05:00,0.003061,0.000651,0.002074


##### Note: 
> You can verify these returns are computed correctly by analyzing the first observation for Facebook:
> * 9:30 am for 0.000632.
 
> How is that number computed? 
 
> * The price of Facebook at 9:30 is 269.00
> * The price of Facebook at 9:31 is 269.17

> Which gives you:

> * (269.17 - 	269.00)/ 269.90 = 0.000632
 

#### 2. Convert the DataFrame into long form for merging later using `unstack` and `reset_index`.

In [13]:
# Use unstack() to bring the data in long format and save the output as as dataframe
returns = pd.DataFrame(returns.unstack(level=0))

# Rename the column to make it easer to identify it:
column_name = f'F_{forecast}_m_returns'
returns.rename(columns={0: column_name}, inplace = True)

# Reset the index of the dataframe for merging later (be sure to set inplace=True)
returns.reset_index(inplace=True)

In [14]:
# Preview the first five rows
returns.head()

Unnamed: 0,level_0,time,F_1_m_returns
0,GOOGL,2021-01-05 09:30:00-05:00,-7e-05
1,GOOGL,2021-01-05 09:31:00-05:00,-0.001415
2,GOOGL,2021-01-05 09:32:00-05:00,0.0
3,GOOGL,2021-01-05 09:33:00-05:00,-0.000761
4,GOOGL,2021-01-05 09:34:00-05:00,0.003061


In [15]:
# Preview the last five rows
returns.tail()

Unnamed: 0,level_0,time,F_1_m_returns
1168,AMZN,2021-01-05 15:56:00-05:00,0.000888
1169,AMZN,2021-01-05 15:57:00-05:00,-0.000472
1170,AMZN,2021-01-05 15:58:00-05:00,-0.000469
1171,AMZN,2021-01-05 15:59:00-05:00,0.0
1172,AMZN,2021-01-05 16:00:00-05:00,


#### 3. Compute the 1, 5, 10 minute momentums that will be used to predict the forward returns, then merge them with the forward returns as follows:
* Create the list of moments: `list_of_momentums = [1,5,10]`.
* Write a for-loop to loop through the `list_of_momentums`, applying them to `pct_change` with the `df_closing_price` with each iteration.
* With each loop, the data temporary DataFrame, `returns_temp` will need to be prepped with `unstack` and `reset_index`, then added as a new column to the original `returns` DataFrame from the prior step.
* Complete this step by dropping the null values from `returns` and creating a multi-index based on date and ticker.

In [16]:
# Create list of momentums that we want to predict
list_of_momentums = [1,5,10]
for i in list_of_momentums:   
    # Compute percentage change for each one of the momentums in the momentum list
    pct_chg = df_closing_prices.pct_change(i)
    
    # Unstack the returns and save the output as as dataframe called returns_temp 
    returns_temp = pd.DataFrame(pct_chg.unstack(level=0))
    
    # Rename the column to make it easer to identify it:
    column_name = f'{i}_m_returns'
    returns_temp.rename(columns={0: column_name}, inplace = True)
    
    # Reset the index so we can merge based on index
    returns_temp.reset_index(inplace=True)
    
    # Merge returns_temp  with the original returns 
    returns = pd.merge(returns,returns_temp,left_on=['level_0', 'time'],right_on=['level_0', 'time'], how='left', suffixes=('_original', 'right'))

In [17]:
returns.head()

Unnamed: 0,level_0,time,F_1_m_returns,1_m_returns,5_m_returns,10_m_returns
0,GOOGL,2021-01-05 09:30:00-05:00,-7e-05,,,
1,GOOGL,2021-01-05 09:31:00-05:00,-0.001415,-7e-05,,
2,GOOGL,2021-01-05 09:32:00-05:00,0.0,-0.001415,,
3,GOOGL,2021-01-05 09:33:00-05:00,-0.000761,0.0,,
4,GOOGL,2021-01-05 09:34:00-05:00,0.003061,-0.000761,,


In [18]:
# Use dropna() to get rid of those missing observations.
returns.dropna(inplace=True)

# Create a multi index based on level_0 and time
returns.set_index(['level_0','time'], inplace=True)
returns.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,F_1_m_returns,1_m_returns,5_m_returns,10_m_returns
level_0,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
GOOGL,2021-01-05 09:40:00-05:00,-0.000196,0.00116,0.003037,0.003848
GOOGL,2021-01-05 09:41:00-05:00,-0.000101,-0.000196,0.001174,0.003721
GOOGL,2021-01-05 09:42:00-05:00,0.001659,-0.000101,0.001398,0.005042
GOOGL,2021-01-05 09:43:00-05:00,0.0,0.001659,0.002522,0.006709
GOOGL,2021-01-05 09:44:00-05:00,0.003139,0.0,0.002522,0.007475


## Part 2: Train and Compare Multiple Machine Learning Algorithms

 In this section, you'll train each of the requested algorithms and compare performance. Be sure to use the same parameters and training steps for each model. This is necessary to compare each model accurately.

### Preprocessing Data

#### 1. Generate your feature data (`X`) and target data (`y`):
* Create a dataframe `X` that contains all the columns from the returns dataframe that will be used to predict `F_1_m_returns`.
* Create a variable, called `y`, that is equal 1 if `F_1_m_returns` is larger than 0. This will be our target variable.

In [22]:
from sklearn.model_selection import train_test_split
# Create a separate dataframe for features and define the target variable as a binary target
X = returns.iloc[:,1:4]

# Create the target variable
y = []

# Loop through the returns["F_1_m_returns"] data and append 0 or 1 to y based on returns
for row in returns["F_1_m_returns"]:
    if row > 0:
        y.append(1)

    elif row <= 0:
        y.append(0)

In [23]:
#X.head()
y[:10]

[0, 0, 1, 0, 1, 0, 0, 0, 0, 1]

##### Note:
> Notice that we don't use shuffle when splitting the dataset into a training and testing dataset. 

> We want to keep the original ordering of the data, so we don't end up using observations in the future to predict past observations,

> This is a critical mistake known as look ahead bias.

#### 2. Use the train_test_split library to split the dataset into a training and testing dataset, with 70% used for testing
* Set the shuffle parameter to False, so that you use the first 70% for training to prvent look ahead bias.
* Make sure you have these 4 variables: `X_train`, `X_test`, `y_train`, `y_test`. 

In [24]:
# Split the dataset without shuffling
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1, shuffle=False)

#### 3. Use the `Counter` function to test the distribution of the data. 
* The result of `Counter({1: 668, 0: 1194})` reveals the data is indeed unbalanced.

In [26]:
# Use Counter to count the number 1s and 0 in y_train
from collections import Counter
Counter(y_train)

Counter({0: 552, 1: 303})

#### 4. Balance the dataset with the Oversampler libary, setting `random state= 1`.

In [28]:
# Use RandomOverSampler to resample the datase using random_state=1
from imblearn.over_sampling import RandomOverSampler
ros = RandomOverSampler(random_state=1)
X_resampled, y_resampled = ros.fit_resample(X_train, y_train)

#### 5. Test the distribution once again with `Counter`. The new result of `Counter({1: 1194, 0: 1194})` shows the data is now balanced.

In [29]:
# Use Counter again to verify imbalance removed
Counter(y_resampled)

Counter({0: 552, 1: 552})

# Machine Learning

#### 1. The first cells in this section provide an example of how to fit and train your model using the `LogisticRegression` model from sklearn:
* Import select model.
* Instantiate model object.
* Fit the model to the resampled data - `X_resampled` and `y_resampled`.
* Predict the model using `X_test`.
* Print the classification report.

In [33]:
#imports ML
from sklearn.metrics import balanced_accuracy_score
from sklearn.metrics import classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import AdaBoostClassifier
from xgboost import XGBClassifier

In [34]:
# Create a LogisticRegression model and train it on the X_resampled data we created before
log_model = LogisticRegression()
log_model.fit(X_resampled, y_resampled)  

# Use the model you trained to predict using X_test
y_pred = log_model.predict(X_test)   

# Print out a classification report toevaluate performance
print(classification_report(y_test, y_pred, digits=4))

# Print out a balanced accuracy score report to evaluate performance
print(f"Balanced Accuracy Score: {balanced_accuracy_score(y_test, y_pred)}")

# Calculate Sharpe Ratio
sharpe_ratio = y_pred.mean() / y_pred.std()
print(f"Sharpe Ratio: {sharpe_ratio}")

              precision    recall  f1-score   support

           0     0.6397    0.4286    0.5133       203
           1     0.2215    0.4024    0.2857        82

    accuracy                         0.4211       285
   macro avg     0.4306    0.4155    0.3995       285
weighted avg     0.5194    0.4211    0.4478       285

Balanced Accuracy Score: 0.4155052264808362
Sharpe Ratio: 1.0467035087808378


#### 2. Use the same approach as above to train and test the following ML Algorithms:
* [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)
* [GradientBoostingClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html)
* [AdaBoostClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html)
* [XGBClassifier](https://xgboost.readthedocs.io/en/latest/python/python_api.html)

#### RandomForestClassifier

In [35]:
# Create a RandomForestClassifier model and train it on the X_resampled data we created before
rfc_model = RandomForestClassifier()
rfc_model.fit(X_resampled, y_resampled)  

# Use the model you trained to predict using X_test
y_pred = rfc_model.predict(X_test)   

# Print out a classification report to evaluate performance
print(classification_report(y_test, y_pred, digits=4))

# Print out a balanced accuracy score report to evaluate performance
print(f"Balanced Accuracy Score: {balanced_accuracy_score(y_test, y_pred)}")

# Calculate Sharpe Ratio
sharpe_ratio = y_pred.mean() / y_pred.std()
print(f"Sharpe Ratio: {sharpe_ratio}")

              precision    recall  f1-score   support

           0     0.7100    0.6995    0.7047       203
           1     0.2824    0.2927    0.2874        82

    accuracy                         0.5825       285
   macro avg     0.4962    0.4961    0.4961       285
weighted avg     0.5870    0.5825    0.5847       285

Balanced Accuracy Score: 0.49609515799591497
Sharpe Ratio: 0.6519202405202648


#### GradientBoostingClassifier

In [36]:
# Create a GradientBoostingClassifier model and train it on the X_resampled data we created before
gbc_model = GradientBoostingClassifier()
gbc_model.fit(X_resampled, y_resampled)  

# Use the model you trained to predict using X_test
y_pred = gbc_model.predict(X_test)   

# Print out a classification report to evaluate performance
print(classification_report(y_test, y_pred, digits=4))

# Print out a balanced accuracy score report to evaluate performance
print(f"Balanced Accuracy Score: {balanced_accuracy_score(y_test, y_pred)}")

# Calculate Sharpe Ratio
sharpe_ratio = y_pred.mean() / y_pred.std()
print(f"Sharpe Ratio: {sharpe_ratio}")

              precision    recall  f1-score   support

           0     0.7415    0.7488    0.7451       203
           1     0.3625    0.3537    0.3580        82

    accuracy                         0.6351       285
   macro avg     0.5520    0.5512    0.5516       285
weighted avg     0.6324    0.6351    0.6337       285

Balanced Accuracy Score: 0.5512135047458848
Sharpe Ratio: 0.6246950475544242


#### AdaBoostClassifier

In [37]:
# Create a AdaBoostClassifier model and train it on the X_resampled data we created before
abc_model = AdaBoostClassifier()
abc_model.fit(X_resampled, y_resampled)  

# Use the model you trained to predict using X_test
y_pred = abc_model.predict(X_test)   

# Print out a classification report to evaluate performance
print(classification_report(y_test, y_pred, digits=4))

# Print out a balanced accuracy score report to evaluate performance
print(f"Balanced Accuracy Score: {balanced_accuracy_score(y_test, y_pred)}")

# Calculate Sharpe Ratio
sharpe_ratio = y_pred.mean() / y_pred.std()
print(f"Sharpe Ratio: {sharpe_ratio}")

              precision    recall  f1-score   support

           0     0.7360    0.7143    0.7250       203
           1     0.3409    0.3659    0.3529        82

    accuracy                         0.6140       285
   macro avg     0.5385    0.5401    0.5390       285
weighted avg     0.6224    0.6140    0.6180       285

Balanced Accuracy Score: 0.5400696864111498
Sharpe Ratio: 0.6683565722084382


#### XGBClassifier

In [38]:
# Create a XGBClassifier model and train it on the X_resampled data we created before
xgbc_model = XGBClassifier()
xgbc_model.fit(X_resampled, y_resampled)  

# Use the model you trained to predict using X_test
y_pred = xgbc_model.predict(X_test)   

# Print out a classification report to evaluate performance
print(classification_report(y_test, y_pred, digits=4))

# Print out a balanced accuracy score report to evaluate performance
print(balanced_accuracy_score(y_test, y_pred))

# Print out a balanced accuracy score report to evaluate performance
print(f"Balanced Accuracy Score: {balanced_accuracy_score(y_test, y_pred)}")

# Calculate Sharpe Ratio
sharpe_ratio = y_pred.mean() / y_pred.std()
print(f"Sharpe Ratio: {sharpe_ratio}")



              precision    recall  f1-score   support

           0     0.7100    0.6995    0.7047       203
           1     0.2824    0.2927    0.2874        82

    accuracy                         0.5825       285
   macro avg     0.4962    0.4961    0.4961       285
weighted avg     0.5870    0.5825    0.5847       285

0.49609515799591497
Balanced Accuracy Score: 0.49609515799591497
Sharpe Ratio: 0.6519202405202648


### Evaluate the performance of each model


#### 1. Using the classification report for each model, choose the model with the highest precision for use in your algo-trading program.

- 1a Which model produces the highest Accuracy?     
    - **0.527809939034318 - LogisticRegression**
    
**Answer the LogisticRegression model produces the highest accuracy**     
    
- 1b Which model produces the highest performance over time?
    - **0.5278 - LogisticRegression**
    
**Answer the LogisticRegression model produces the highest performance over time** 

- 1c Which model produces the highest Sharpe Ratio?
    - **1.1689944579443594 - AdaBoostClassifier**
    
**Answer the AdaBoostClassifier model produces the highest Sharpe Ratio** 

#### 2. Save the selected model with the `joblib` libary to avoid retraining every time you wish to use it.

In [40]:
# Use the library to save the model that you want to use for trading
import joblib
joblib.dump(log_model, 'log_model.pkl')

['log_model.pkl']

## Part 3: Implement the strongest model using Apaca API

### Develop the Algorithm


#### 1. Use the provided code to ping the Alpaca API and create the DataFrame needed to feed data into the model.
   * This code will also store the correct feature data in `X` for later use.

In [41]:
# Create the list of tickers
ticker_list = ['AMZN','AAPL','GOOGL']

# Define Dates
beg_date = '2021-01-06'
end_date = '2021-01-06'

# Convert the date in a format the Alpaca API reqires
start =  pd.Timestamp(f'{beg_date} 09:30:00-0400', tz='America/New_York').replace(hour=9, minute=30, second=0).astimezone('GMT').isoformat()[:-6]+'Z'
end   =  pd.Timestamp(f'{end_date} 16:00:00-0400', tz='America/New_York').replace(hour=15, minute=0, second=0).astimezone('GMT').isoformat()[:-6]+'Z'
timeframe='1Min'

# Use iloc to get the last 10 mins every time we pull new data
prices = api.get_barset(ticker_list, "minute", start=start, end=end).df.iloc[-11:]
prices.ffill(inplace=True)   

# Create an empty DataFrame for closing prices
df_closing_prices = pd.DataFrame()

# Fetch the closing prices of our tickers
df_closing_prices["AMZN"] = prices["AMZN"]["close"]
df_closing_prices["AAPL"] = prices["AAPL"]["close"]
df_closing_prices["GOOGL"] = prices["GOOGL"]["close"]
print(df_closing_prices.head(20))

                               AMZN     AAPL    GOOGL
time                                                 
2021-01-06 14:50:00-05:00  3146.960  127.110  1721.82
2021-01-06 14:51:00-05:00  3146.910  127.430  1721.82
2021-01-06 14:52:00-05:00  3147.980  127.720  1723.67
2021-01-06 14:53:00-05:00  3148.570  127.510  1723.67
2021-01-06 14:54:00-05:00  3147.840  127.645  1720.84
2021-01-06 14:55:00-05:00  3150.330  127.920  1720.60
2021-01-06 14:56:00-05:00  3150.610  128.150  1721.10
2021-01-06 14:57:00-05:00  3151.745  127.980  1720.07
2021-01-06 14:58:00-05:00  3149.280  127.850  1720.07
2021-01-06 14:59:00-05:00  3150.840  127.930  1720.48
2021-01-06 15:00:00-05:00  3148.580  127.630  1720.48


In [42]:
# Create list of momentums
list_of_momentums = [1,5,10]

for i in list_of_momentums:  
    # Compute percentage change for each one of the momentums in the momentum list
    returns_temp = df_closing_prices.pct_change(i)
    # Unstack the returns 
    returns_temp = pd.DataFrame(returns_temp.unstack())
    name = f'{i}_m_returns'
    returns_temp.rename(columns={0: name}, inplace = True)
    # Reset the index so we can merge based on index
    returns_temp.reset_index(inplace = True)
    # Merge newly computed returns with previously created returns
    if i ==1:
        returns = returns_temp
    else:
        returns = pd.merge(returns,returns_temp,left_on=['level_0', 'time'],right_on=['level_0', 'time'], how='left', suffixes=('_original', 'right'))

# Drop nulls and set index
returns.dropna(axis=0, how='any', inplace=True)
returns.set_index(['level_0', 'time'], inplace=True)

# Generate feature data and preview first 10 rows.
X = returns
X.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,1_m_returns,5_m_returns,10_m_returns
level_0,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AMZN,2021-01-06 15:00:00-05:00,-0.000717,-0.000555,0.000515
AAPL,2021-01-06 15:00:00-05:00,-0.002345,-0.002267,0.004091
GOOGL,2021-01-06 15:00:00-05:00,0.0,-7e-05,-0.000778


#### 2. Using `joblib`, load the chosen model.

In [43]:
# Load the previously trained and saved model using joblib
model = joblib.load('log_model.pkl')

#### 3. Use the model file to make predicttions:
* Use `predict` on `X` and save this as `y_pred`.
* Convert `y_pred` to a DataFrame, setting the index to the index of `X`.
* Rename the column 0 to 'buy', be sure to set `inplace =True`.

In [44]:
# Use the model file to predict on X
y_pred = model.predict(X)

# Convert y_pred to a dataframe, set the index to the index of X
y_df = pd.DataFrame(y_pred, index=X.index)

# Rename the column 0 to 'buy', be sure to set inplace =True
y_df.rename(columns={0: "buy"}, inplace = True)
y_df

Unnamed: 0_level_0,Unnamed: 1_level_0,buy
level_0,time,Unnamed: 2_level_1
AMZN,2021-01-06 15:00:00-05:00,1
AAPL,2021-01-06 15:00:00-05:00,1
GOOGL,2021-01-06 15:00:00-05:00,1


#### 4. Filter the stocks where 'buy' is equal to 1, saving the filter as `y_pred`.

In [45]:
# Filter the stocks where 'buy' is equal to 1
y_pred = y_df.loc[y_df["buy"] == 1]
y_pred

Unnamed: 0_level_0,Unnamed: 1_level_0,buy
level_0,time,Unnamed: 2_level_1
AMZN,2021-01-06 15:00:00-05:00,1
AAPL,2021-01-06 15:00:00-05:00,1
GOOGL,2021-01-06 15:00:00-05:00,1


#### 5. Using the `y_pred` filter, create a dictionary called `buy_dict` and assign 'n' to each Ticker (key value) as a placeholder.

In [46]:
# Create dictionary from y_pred and assign a 'n' to each of them for now as a placeholder.
buy_dict = dict.fromkeys(y_pred.index.get_level_values(0), 'n')
buy_dict

{'AMZN': 'n', 'AAPL': 'n', 'GOOGL': 'n'}

#### 6. Obtain the total available equity in your account from the Alpaca API and store in a variable called `total_capital`. You will split the capital equally between all selected stocks per the CIO's request.

In [47]:
# Pull the total available equity in our account from the  Alpaca API
account = api.get_account()
total_capital = float(account.equity)
print(f"Total available capital: {total_capital}")

Total available capital: 100000.0


In [48]:
# Compute capital per stock, divide equity in account by number of stocks
# Use Alpaca API to pull the equity in the account
if len(buy_dict) > 0:
    capital_per_stock = float(total_capital)/ len(buy_dict)
else:
    capital_per_stock = 0
print(f'Capital per stock: {capital_per_stock}')

Capital per stock: 33333.333333333336


#### 7. Use a for-loop to iterate through `buy_dict` to determine the number stocks you need to buy for each ticker.

In [49]:
# Use for loop to iterate through dictionary of buys 
# Determine the number stocks we need to buy for each ticker
for ticker in buy_dict:
    try:
        buy_dict[ticker] = int(capital_per_stock /int(prices[ticker].iloc[-1]['close']))
    except:
        pass

print(buy_dict)

{'AMZN': 10, 'AAPL': 262, 'GOOGL': 19}


#### 8. Cancel all previous orders in the Alpaca API (so you don't buy more than intended) and sell all currently held stocks to close all positions.

In [50]:
# Cancel all previous orders in the Alpaca API
api.cancel_all_orders()

# Sell all currently held stocks to close all positions
api.close_all_positions()

[]

#### 9. Iterate through `buy_dict` and send a buy order for each ticker with their corresponding number of shares.

In [51]:
# Iterate through the buy_dict object and send a buy order for each ticker with a corresponding number of shares:
for stock, qty in buy_dict.items():    
    # Submit a market order to buy shares as described in buy_dict
    api.submit_order(
        symbol=stock,
        qty=qty,
        side='buy',
        type='market',
        time_in_force='gtc',
    )
    print(f'buying {stock} numShares {qty}')
    

buying AMZN numShares 10
buying AAPL numShares 262
buying GOOGL numShares 19


### Automate the algorithm

#### 1. Make a function called `trade()` that incorporates all of the steps above.

In [52]:
# Add all of the steps conducted above into the function trade
def trade():

    ticker_list = ['AMZN','AAPL''GOOGL']
    # Notice that we remove the start and end variables since we want the latest prices.
    timeframe='1Min'
    # Use iloc to get the last 10 mins every time we pull new data
    prices = api.get_barset(ticker_list, "minute").df.iloc[-11:]
    prices.ffill(inplace=True)   

    # Create and empty DataFrame for closing prices
    df_closing_prices = pd.DataFrame()

    # Fetch the closing prices of our tickers
    df_closing_prices["AMZN"] = prices["AMZN"]["close"]
    df_closing_prices["AAPL"] = prices["AAPL"]["close"]
    df_closing_prices["GOOGL"] = prices["GOOGL"]["close"]
    print(df_closing_prices.head())
    
    # Loop through momentums to build new DataFrame
    list_of_momentums = [1,5,10]
    for i in list_of_momentums:   
        returns_temp = df_closing_prices.pct_change(i)
        returns_temp = pd.DataFrame(returns_temp.unstack())
        name = f'{i}_m_returns'
        returns_temp.rename(columns={0: name}, inplace = True)
        returns_temp.reset_index(inplace = True)
        if i ==1:
            returns = returns_temp
        else:
            returns = pd.merge(returns,returns_temp,left_on=['level_0', 'time'],right_on=['level_0', 'time'], how='left', suffixes=('_original', 'right'))

    # Drop nulls and set index            
    returns.dropna(axis=0, how='any', inplace=True)
    returns.set_index(['level_0', 'time'], inplace=True)

    # Preprocess data for model
    model = joblib.load('log_model.pkl')
    y_pred = model.predict(X)
    y_df = pd.DataFrame(y_pred, index=X.index)
    y_df.rename(columns={0: "buy"}, inplace = True)
    y_pred = y_df.loc[y_df["buy"] == 1]
    
    # Create the `buy_dict` object
    buy_dict = dict.fromkeys(y_pred.index.get_level_values(0), 'n')
    
    # Split capital between stocks and determine buy or sell
    account = api.get_account()
    total_capital = float(account.equity)
    if len(buy_dict) > 0:
        capital_per_stock = float(total_capital)/ len(buy_dict)
    else:
        capital_per_stock = 0
    for ticker in buy_dict:
        try:
            buy_dict[ticker] = int(capital_per_stock /int(prices[ticker].iloc[-1]['close']))
        except:
            pass

    
    # Cancel pending orders and close positions
    api.cancel_all_orders()
    api.close_all_positions()
    
    # Submit orders
    for stock, qty in buy_dict.items():    
        # Submit a market order to buy shares as described in buy_dict
        api.submit_order(
            symbol=stock,
            qty=qty,
            side='buy',
            type='market',
            time_in_force='gtc',
        )
    print(f'buying {stock} numShares {qty}')


#### 2. Import Python's schedule module.

In [53]:
# Import Python's schedule module 
import schedule

#### 3. Use the "schedule" module to automate the algorithm:
* Clear the schedule with `.clear()`.
* Define a schedule to run the trade function every minute at 5 seconds past the minute mark (e.g. `10:31:05`).
* Use the Alpaca API to check whether the market is open.
* Use run_pending() function inside schedule to execute the schedule you defined while the market is open

In [54]:
# Clear the schedule
schedule.clear()

# Define a schedule to run the trade function every minute at 5 seconds past the minute mark (e.g. 10:31:05)
trade_schedule = schedule.every().minute.at(":05").do(trade)

# Use the Alpaca API to check whether the market is open
clock = api.get_clock()

# Use run_pending() function inside schedule to execute the schedule you defined as long as the market is open
while clock.is_open == True:
    print(f'The market trading widow for {clock.next_open} is open, executing trade function')
    schedule.run_pending()
    time.sleep(1)
else:
    print(f'Market closed, next open market day will be {clock.next_open}')


Market closed, next open market day will be 2022-03-03 09:30:00-05:00


In [55]:
# Get Scheduled Jobs
schedule.get_jobs()

[Every 1 minute at 00:00:05 do trade() (last run: [never], next run: 2022-03-03 14:53:05)]