### ELEMENTARY TRADING BOT

The below is a start at using 3 machine learning models to create long/short signals in instances where the stock trades in a range. We will look to employ the algorithm during instances defined with the following characteristics:
- Large-cap stocks (they tend to offer ranges intraday that "scalpers" take advantage of. I'm willing to bet an algo can learn how to scalp like those traders or even better.)
- During times of low volatility. Low Volatility is essentially what we mean when we say, "the stock is trading in a range"; when the price action of a stock can be considered "range-bound".  This is typically in the middle of the day when there is less volume in the markets. We can visualize the average volume given time of day later on in this project. 

We will start with the above, keeing it simple.


Our next step is to choose the variables that we will feed the algorithm. Remember that these variables will be based on a stock's Open Low High and Close (OHLC), and Volume, for now...

### TO--DO
Now we determine the stocks of which we'd like to pull the OHLC data.
Let's go with all of the stocks in the NQ and the ES. This should serve as a nice basis for how the type of stocks that we want to deploy our algorithm onto move; ***mega-cap stocks that offer range-bound trading during periods of low volatility.***

### Getting a list of the stocks in the NQ and ES onto a dataframe. 

Let's start with the NQ.

In [1]:
# Import the required libraries
import numpy as np
import pandas as pd
import hvplot.pandas
from pathlib import Path
from finta import TA

In [2]:
# # Read csv, take a look
# nq_df = pd.read_csv(("../algotrader2/resources/nq_stocks.csv"))
# nq_df.head()

# COMMENTING OUT NOW, We will do this later

In [3]:
# # Do the same for the snp 500 symbols
# snp_df = pd.read_csv(("../algotrader2/resources/snp_stocks.csv"))
# snp_df.head()

# COMMENTING OUT NOW, We will do this later

In [4]:
# # Save the symbols from each df
# nq_symbols_df = nq_df["Symbol"]
# snp_symbols_df = snp_df["Symbol"]

# # Put dfs together
# snp_nq_symbols_df = pd.concat([nq_symbols_df, snp_symbols_df], axis=0)

# # Drop duplicates
# snp_nq_symbols_df.drop_duplicates(inplace=True)

# # View
# snp_nq_symbols_df

# COMMENTING OUT NOW, We will do this later

In [5]:
# # Convert symbols to a list so that we can feed it into our Alpaca API.
# # Alpaca is where we will access the historical OHLC data needed.

# nq_snp_symbols_list = snp_nq_symbols_df.tolist()

# len(nq_snp_symbols_list)

# COMMENTING OUT NOW, We will do this later

In [6]:
# # THE FOLLOWING IS WHEN WE CONTINUE WITH MANY STOCKS...
# # THE FOLLOWING NEEDS TO BE ITERATED

# # Rearrange df
# # Separate ticker data

# # WE'LL HAVE TO iterate the below for all stocks in both the SPY and NQ

# AAPL = aapl_msft_df[aapl_msft_df['symbol']=='AAPL'].drop('symbol', axis=1)
# MSFT = aapl_msft_df[aapl_msft_df['symbol']=='MSFT'].drop('symbol', axis=1)

# AAPL.sort_index(inplace=True)
# MSFT.sort_index(inplace=True)

# # Concatenate the ticker DataFrames
# am_df = pd.concat([AAPL, MSFT],axis=1, keys=['AAPL','MSFT'])


# am_df


# # FEATURE CREATION WITH TWO stocks... this is for LATER USE

# # Generate returns from close column with pct_change

# # WILL NEED TO INTERATE
# am_df[("AAPL","actual_returns")] = am_df[("AAPL","close")].pct_change()
# am_df[("MSFT","actual_returns")] = am_df[("MSFT","close")].pct_change()

# # Drop NaN
# am_df = am_df.dropna()

# # Simple Moving Averages
# am_df[("AAPL","AAPL_SMA5")] = TA.SMA(am_df["AAPL"],5)
# am_df[("AAPL","AAPL_SMA10")] = TA.SMA(am_df["AAPL"],10)

# am_df[("MSFT","MSFT_SMA5")] = TA.SMA(am_df["MSFT"],5)
# am_df[("MSFT","MSFT_SMA10")] = TA.SMA(am_df["MSFT"],10)

# # Exponential Moving Averages
# am_df[("AAPL","AAPL_EMA3")] = TA.EMA(am_df["AAPL"],3)

# am_df[("MSFT","MSFT_EMA3")] = TA.EMA(am_df["MSFT"],3)

# # NEEDS TO BE ITERATED!!


# # Create a df with all the indicators for both AAPL and MSFT
# aapl_indicator_df = am_df["AAPL"].drop(["actual_returns"], axis=1)
# msft_indicator_df = am_df["MSFT"].drop(["actual_returns"], axis=1)

# # Save the signal column as our 'y'
# y=am_df["AAPL"]["signal"]

# display(aapl_indicator_df.head())
# display(aapl_indicator_df.tail())

Okay. Now let's set up our alpaca API to get the OHLC data.


### Alpaca API for Historical OHLC Data

In [7]:
# Imports for Alpaca API, .env files, requests
import os
import requests
from dotenv import load_dotenv
import alpaca_trade_api as tradeapi

%matplotlib inline

In [8]:
# Load .env enviroment variables
load_dotenv()

True

In [9]:
# Set Alpaca API key and secret
alpaca_api_key = os.getenv("ALPACA_API_KEY")
alpaca_secret_key = os.getenv("ALPACA_SECRET_KEY")

# Verify that Alpaca key and secret were correctly loaded
print(f"Alpaca Key type: {type(alpaca_api_key)}")
print(f"Alpaca Secret Key type: {type(alpaca_secret_key)}")

# Create the Alpaca API object
alpaca = tradeapi.REST(
    alpaca_api_key,
    alpaca_secret_key,
    api_version="v2")

Alpaca Key type: <class 'str'>
Alpaca Secret Key type: <class 'str'>


In [10]:
# Set start and end dates and specify isoformat
# We will start off with on week in the middle of November of this year, 2023.

start_date = pd.Timestamp("2023-01-01", tz="America/New_York").isoformat()
end_date = pd.Timestamp("2024-01-17", tz="America/New_York").isoformat()


In [11]:
# Set the tickers
# Test with one for now.
# Later, this is where we will feed in our list of nq/es stocks we created above
tickers = ["AAPL"]

In [12]:
# Set timeframe for Alpaca API
timeframe = "1Min"

In [13]:
# Get historical OHLC data for AAPL
aapl_df = alpaca.get_bars(
    tickers,
    timeframe,
    start = start_date,
    end = end_date
).df

In [14]:
# Display sample data
aapl_df

Unnamed: 0_level_0,close,high,low,trade_count,open,volume,vwap,symbol
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2023-01-03 09:00:00+00:00,131.00,131.00,130.28,208,130.28,8174,130.854120,AAPL
2023-01-03 09:01:00+00:00,131.10,131.17,130.87,157,130.87,8820,130.955243,AAPL
2023-01-03 09:02:00+00:00,131.17,131.24,131.17,53,131.18,2112,131.208769,AAPL
2023-01-03 09:03:00+00:00,131.28,131.29,131.19,90,131.19,3888,131.220790,AAPL
2023-01-03 09:04:00+00:00,131.46,131.46,131.28,88,131.28,5984,131.327899,AAPL
...,...,...,...,...,...,...,...,...
2024-01-17 00:55:00+00:00,183.00,183.03,183.00,45,183.03,1224,183.014261,AAPL
2024-01-17 00:56:00+00:00,183.00,183.01,183.00,69,183.01,4641,183.001462,AAPL
2024-01-17 00:57:00+00:00,183.01,183.04,183.01,21,183.04,958,183.030350,AAPL
2024-01-17 00:58:00+00:00,183.05,183.05,183.04,21,183.04,671,183.043748,AAPL


FOR NOW, we will not include pre-market/after-hours action, but WE WILL in the future.

Now we clean up the AAPL and MSFT dfs we created; put them together. 

In [15]:
# Separate ticker data
AAPL = aapl_df[aapl_df['symbol']=='AAPL'].drop('symbol', axis=1)

In [16]:
AAPL

Unnamed: 0_level_0,close,high,low,trade_count,open,volume,vwap
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2023-01-03 09:00:00+00:00,131.00,131.00,130.28,208,130.28,8174,130.854120
2023-01-03 09:01:00+00:00,131.10,131.17,130.87,157,130.87,8820,130.955243
2023-01-03 09:02:00+00:00,131.17,131.24,131.17,53,131.18,2112,131.208769
2023-01-03 09:03:00+00:00,131.28,131.29,131.19,90,131.19,3888,131.220790
2023-01-03 09:04:00+00:00,131.46,131.46,131.28,88,131.28,5984,131.327899
...,...,...,...,...,...,...,...
2024-01-17 00:55:00+00:00,183.00,183.03,183.00,45,183.03,1224,183.014261
2024-01-17 00:56:00+00:00,183.00,183.01,183.00,69,183.01,4641,183.001462
2024-01-17 00:57:00+00:00,183.01,183.04,183.01,21,183.04,958,183.030350
2024-01-17 00:58:00+00:00,183.05,183.05,183.04,21,183.04,671,183.043748


In [17]:
# Sort the indexes

AAPL.sort_index(inplace=True)

In [18]:
# # Concatenate the ticker DataFrames
# AAPL = pd.concat([AAPL, MSFT],axis=1, keys=['AAPL','MSFT'])

# # Preview DataFrame
# AAPL

Nice. Now we have the OHLC data here in our workspace, and we can move onto the next step of the process; data cleaning.

### Data Cleaning

Drop NA.

In [19]:
# Drop NA
AAPL = AAPL.dropna()

AAPL

Unnamed: 0_level_0,close,high,low,trade_count,open,volume,vwap
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2023-01-03 09:00:00+00:00,131.00,131.00,130.28,208,130.28,8174,130.854120
2023-01-03 09:01:00+00:00,131.10,131.17,130.87,157,130.87,8820,130.955243
2023-01-03 09:02:00+00:00,131.17,131.24,131.17,53,131.18,2112,131.208769
2023-01-03 09:03:00+00:00,131.28,131.29,131.19,90,131.19,3888,131.220790
2023-01-03 09:04:00+00:00,131.46,131.46,131.28,88,131.28,5984,131.327899
...,...,...,...,...,...,...,...
2024-01-17 00:55:00+00:00,183.00,183.03,183.00,45,183.03,1224,183.014261
2024-01-17 00:56:00+00:00,183.00,183.01,183.00,69,183.01,4641,183.001462
2024-01-17 00:57:00+00:00,183.01,183.04,183.01,21,183.04,958,183.030350
2024-01-17 00:58:00+00:00,183.05,183.05,183.04,21,183.04,671,183.043748


For 1 minute: Dropping the NaN just took out about 600 rows of data. Did it take out the whole column given the date? That is not good, since if we have the data for one stock at a given minute but not the other, we don't want to throw out that info for the one stock. 

3166 to 2405 rows after dropping NaN.

We will come back to this. Remember, we don't want little things like this from holding us back from implementing this. Just make a note of these little issues and crush them during the next iteration. 

UPDATE 1: With 3 minutes, we start with 1200 rows, then have 1014 after dropping NaN. 

Seems to get better when we zoom out... We will most probably just source better data, however. 


In [20]:
# Now save this cleaned dataframe to a new .csv
AAPL.to_csv('../algotrader2/Resources/aapl_OHLCV_1min_df.csv', index="True")


Our next step is to generate the features from the data set that we will train our model with. 

[Feature Creation->](/Users/montygash/Desktop/ETAlgo/tradingbot/nn_feature_creation.ipynb)

Now we have the indicators and what we are trying to predict (actual returns). 

We will set the indicators to our 'X' and the actual return as our 'y'.

In this case, we have two different stocks that we are trying to predict.
- AAPL data will be used to predict AAPL actual returns.
- MSFT data will be used to predict MSFT actual returns. 
- Is this how we want the algorithm to be? To only train itself on a particular stock? 
    - No... I want it to be trained on MULTIPLE stocks. So it is trained on both MSFT, AAPL, and even TSLA, AMD, NVDA... and then I deploy it on any stock that I want to deploy it onto. 
    - This is a major issue that I need to resolve. 


Let's just begin by implementing the neural network algorithm on only AAPL.

This begs another question: Do we only need to shift it back one? Since we are using 1 minute candles? Maybe we should be shifting it back more than one one minute bar?... We will revisit this later.


We are trying to predict y, which is whether or not we should buy or sell the stock at that given time.

I was confused and tried to implement by making actual returns be the y, but we actually want to generate signals, where the bot chooses whether to short or long the stock based on the predicted returns during the next 1 minute candle (we will probably revisit the 1-minute candle idea. I'm not sure that we want the bot to send out that many signals. Maybe we train it with different OHLC time-frames)

So the first step in making the bot generate signals is to define when the signal is 1 (buy), and -1(sell).

We want the bot to sell when the expected returns is negative, and buy when positive.

deploy. 

Nice! Now I want to take only the necessary portion of this page onto a new .ipynb. We will make it nice and clean. We will only include the data prep portion and the neural networks model. We will keep our comments more breif to improve the readability of the overall idea--the 'gist' of all of the moving parts--from somewhat of a bird's eye view. We will divide the code up strategically, so that we can dive into each moving part and tune it as needed. Our model should become more and more sofisticated and accurate over future iterations until our goal of consistient profitability is met.