### Data aquisition

First let's determine the stocks of which we'd like to pull the OHLC data.
Let's go with all of the stocks in the NQ and the ES. This should serve as a nice basis for how the type of stocks that we want to deploy our algorithm onto move; ***mega-cap stocks that offer range-bound trading during periods of low volatility.***

### Getting a list of the stocks in the NQ and ES onto a dataframe. 

Let's start with the NQ.

In [None]:
# Import the required libraries
import numpy as np
import pandas as pd
import hvplot.pandas
from pathlib import Path
from finta import TA

In [None]:
# # Read csv, take a look
# nq_df = pd.read_csv(("../algotrader2/resources/nq_stocks.csv"))
# nq_df.head()

# COMMENTING OUT NOW, We will do this later

In [None]:
# # Do the same for the snp 500 symbols
# snp_df = pd.read_csv(("../algotrader2/resources/snp_stocks.csv"))
# snp_df.head()

# COMMENTING OUT NOW, We will do this later

In [None]:
# # Save the symbols from each df
# nq_symbols_df = nq_df["Symbol"]
# snp_symbols_df = snp_df["Symbol"]

# # Put dfs together
# snp_nq_symbols_df = pd.concat([nq_symbols_df, snp_symbols_df], axis=0)

# # Drop duplicates
# snp_nq_symbols_df.drop_duplicates(inplace=True)

# # View
# snp_nq_symbols_df

# COMMENTING OUT NOW, We will do this later

In [None]:
# # Convert symbols to a list so that we can feed it into our Alpaca API.
# # Alpaca is where we will access the historical OHLC data needed.

# nq_snp_symbols_list = snp_nq_symbols_df.tolist()

# len(nq_snp_symbols_list)

# COMMENTING OUT NOW, We will do this later

In [None]:
# # THE FOLLOWING IS WHEN WE CONTINUE WITH MANY STOCKS...
# # THE FOLLOWING NEEDS TO BE ITERATED

# # Rearrange df
# # Separate ticker data

# # WE'LL HAVE TO iterate the below for all stocks in both the SPY and NQ

# AAPL = aapl_msft_df[aapl_msft_df['symbol']=='AAPL'].drop('symbol', axis=1)
# MSFT = aapl_msft_df[aapl_msft_df['symbol']=='MSFT'].drop('symbol', axis=1)

# AAPL.sort_index(inplace=True)
# MSFT.sort_index(inplace=True)

# # Concatenate the ticker DataFrames
# am_df = pd.concat([AAPL, MSFT],axis=1, keys=['AAPL','MSFT'])


# am_df


# # FEATURE CREATION WITH TWO stocks... this is for LATER USE

# # Generate returns from close column with pct_change

# # WILL NEED TO INTERATE
# am_df[("AAPL","actual_returns")] = am_df[("AAPL","close")].pct_change()
# am_df[("MSFT","actual_returns")] = am_df[("MSFT","close")].pct_change()

# # Drop NaN
# am_df = am_df.dropna()

# # Simple Moving Averages
# am_df[("AAPL","AAPL_SMA5")] = TA.SMA(am_df["AAPL"],5)
# am_df[("AAPL","AAPL_SMA10")] = TA.SMA(am_df["AAPL"],10)

# am_df[("MSFT","MSFT_SMA5")] = TA.SMA(am_df["MSFT"],5)
# am_df[("MSFT","MSFT_SMA10")] = TA.SMA(am_df["MSFT"],10)

# # Exponential Moving Averages
# am_df[("AAPL","AAPL_EMA3")] = TA.EMA(am_df["AAPL"],3)

# am_df[("MSFT","MSFT_EMA3")] = TA.EMA(am_df["MSFT"],3)

# # NEEDS TO BE ITERATED!!


# # Create a df with all the indicators for both AAPL and MSFT
# aapl_indicator_df = am_df["AAPL"].drop(["actual_returns"], axis=1)
# msft_indicator_df = am_df["MSFT"].drop(["actual_returns"], axis=1)

# # Save the signal column as our 'y'
# y=am_df["AAPL"]["signal"]

# display(aapl_indicator_df.head())
# display(aapl_indicator_df.tail())

Okay. Now let's set up our alpaca API to get the OHLC data.


### Alpaca API for Historical OHLC Data

In [None]:
# Imports for Alpaca API, .env files, requests
import os
import requests
from dotenv import load_dotenv
import alpaca_trade_api as tradeapi

%matplotlib inline

In [None]:
# Load .env enviroment variables
load_dotenv()

In [None]:
# Set Alpaca API key and secret
alpaca_api_key = os.getenv("ALPACA_API_KEY")
alpaca_secret_key = os.getenv("ALPACA_SECRET_KEY")

# Verify that Alpaca key and secret were correctly loaded
print(f"Alpaca Key type: {type(alpaca_api_key)}")
print(f"Alpaca Secret Key type: {type(alpaca_secret_key)}")

# Create the Alpaca API object
alpaca = tradeapi.REST(
    alpaca_api_key,
    alpaca_secret_key,
    api_version="v2")

In [None]:
# Set start and end dates and specify isoformat
# We will start off with on week in the middle of November of this year, 2023.

start_date = pd.Timestamp("2023-01-01", tz="America/New_York").isoformat()
end_date = pd.Timestamp("2024-01-17", tz="America/New_York").isoformat()


In [None]:
# Set the tickers
# Test with one for now.
# Later, this is where we will feed in our list of nq/es stocks we created above
tickers = ["AAPL"]

In [None]:
# Set timeframe for Alpaca API
timeframe = "1Min"

In [None]:
# Get historical OHLC data for AAPL
aapl_df = alpaca.get_bars(
    tickers,
    timeframe,
    start = start_date,
    end = end_date
).df

In [None]:
# Display sample data
aapl_df

FOR NOW, we will not include pre-market/after-hours action, but WE WILL in the future.

In [None]:
# Separate ticker data
AAPL = aapl_df[aapl_df['symbol']=='AAPL'].drop('symbol', axis=1)

In [None]:
AAPL

In [None]:
# Sort the indexes

AAPL.sort_index(inplace=True)

In [None]:
# # Concatenate the ticker DataFrames
# AAPL = pd.concat([AAPL, MSFT],axis=1, keys=['AAPL','MSFT'])

# # Preview DataFrame
# AAPL

Nice. Now we have the OHLC data here in our workspace, and we can move onto the next step of the process; data cleaning.

### Data Cleaning

Drop NA.

In [None]:
# Drop NA
AAPL = AAPL.dropna()

AAPL

For 1 minute: Dropping the NaN just took out about 600 rows of data. Did it take out the whole column given the date? That is not good, since if we have the data for one stock at a given minute but not the other, we don't want to throw out that info for the one stock. 

3166 to 2405 rows after dropping NaN.

We will come back to this. Remember, we don't want little things like this from holding us back from implementing this. Just make a note of these little issues and crush them during the next iteration. 

UPDATE 1: With 3 minutes, we start with 1200 rows, then have 1014 after dropping NaN. 

Seems to get better when we zoom out... We will most probably just source better data, however. 


In [None]:
# Now save this cleaned dataframe to a new .csv
AAPL.to_csv('../algotrader2/Resources/aapl_1min_df.csv', index="True")


Our next step is to generate the features from the data set that we will train our model with. 

[Feature Creation->](/Users/montygash/Desktop/ETAlgo/tradingbot/nn_feature_creation.ipynb)


- Is this how we want the algorithm to be? To only train itself on a particular stock? 
    - No... I want it to be trained on MULTIPLE stocks. So it is trained on both MSFT, AAPL, and even TSLA, AMD, NVDA... and then I deploy it on any stock that I want to deploy it onto. 
    - This is a major issue that I need to resolve. 

This begs another question: Do we only need to shift it back one? Since we are using 1 minute candles? Maybe we should be shifting it back more than one one minute bar?... We will revisit this later.

- Is this how we want the algorithm to be? To only train itself on a particular stock? 
    - No... I want it to be trained on MULTIPLE stocks. So it is trained on both MSFT, AAPL, and even TSLA, AMD, NVDA... and then I deploy it on any stock that I want to deploy it onto. 
    - This is a major issue that I need to resolve. 

In [None]:
# 3 MINUTE
# Set timeframe for Alpaca API
timeframe = "3Min"

# Get historical OHLC data for AAPL
aapl_3min_df = alpaca.get_bars(
    tickers,
    timeframe,
    start = start_date,
    end = end_date
).df

aapl_3min_df

# Separate ticker data
aapl_3min_formatted = aapl_3min_df[aapl_3min_df['symbol']=='AAPL'].drop('symbol', axis=1)

# Drop NA
aapl_3min_formatted = aapl_3min_formatted.dropna()

display(aapl_3min_formatted.head())

# Now save this cleaned dataframe to a new .csv
aapl_3min_formatted.to_csv('../algotrader2/Resources/aapl_3min_df.csv', index=True)


In [None]:
# 15 MINUTE
# Set timeframe for Alpaca API
timeframe = "15Min"

# Get historical OHLC data for AALP and MSFT
aapl_15min_df = alpaca.get_bars(
    tickers,
    timeframe,
    start = start_date,
    end = end_date
).df

aapl_15min_df

# Separate ticker data
aapl_15min_formatted = aapl_15min_df[aapl_15min_df['symbol']=='AAPL'].drop('symbol', axis=1)


# Drop NA
aapl_15min_formatted = aapl_15min_formatted.dropna()

display(aapl_15min_formatted.head())

# Now save this cleaned dataframe to a new .csv
aapl_15min_formatted.to_csv('../algotrader2/Resources/aapl_15min_df.csv', index=True)

Nice! Now we will divide the code up strategically, so that we can dive into each moving part and tune it as needed. Our model should become more and more sofisticated and accurate over future iterations until our goal of consistient profitability is met.