This is for loading all of the the 'AAPL' stock historical data. You can change the period from
"max" to a specific start_date and end_date in order to get a specific range of dates. You can also set the
interval to be minutes, hours, days, weeks.

Once the data is loaded into a dictionary, it will dump it in the stockData dictionary file.

In [None]:
import yfinance as yf
# The data is stored in this dictionary and can be accessed by stockData[symbol]
stockData = {}

# Just choosing a random baseline stock
symb = 'AAPL'

# Loading the stock data from Yahoo finance API
stock = yf.Ticker(symb)
data = stock.history(period="max")
data.head()
data =data.values.tolist()
data = [[i[j] for j in range(5) ] for i in data]

# This is just making sure that the data was loaded properly
if len(data) > 10:
    stockData[symb] = data

# Saves the data in a pickle file
import pickle
f = open('Data/HistoricalData/stockData.pkl','wb')
pickle.dump(stockData,f)
f.close()

This is built on the previous code. It will load data corresponding to each minute of data from
the last seven days. It does this for all stocks in the nasdaq stock exchange. This takes a minute.

In [None]:
import pandas as pd
import requests
import io
import yfinance as yf

# This loads all the stock symbols listed on the nasdaq.
url="https://pkgstore.datahub.io/core/nasdaq-listings/nasdaq-listed_csv/data/7665719fb51081ba0bd834fde71ce822/nasdaq-listed_csv.csv"
s = requests.get(url).content
companies = pd.read_csv(io.StringIO(s.decode('utf-8')))
symbols = companies['Symbol'].tolist()

# This corresponds to the number of days/minutes/hours of historical data that should be returned when the yahoo finance
# api is called. I just use the code above and get the number of days from the apple stock. I do this bc
# some stocks aren't loaded fully or they are missing days. This will lead to misalligned data, so I just
# ignore those cases.
numTimeSteps = 2725
# This is the list of symbols that sucessfully is loaded from YFinance.
symbList = []
# Dictionary where the historical data is stored. It can be accessed with stockData[symbol]
stockData = {}
# Loop through all stocks listed on the nasdaq and attempt to load them.
for symb in symbols:
    try:
        print(symb)
        # Loading the stock
        stock = yf.Ticker(symb)
        # Formatting the data
        data = stock.history(period="7d",interval='1m')
        data.head()
        dates = data.index.tolist()
        data =data.values.tolist()
        data = [[i[j] for j in range(5) ] for i in data]
        # If the number of days is not correct, then it will not be added.
        if len(data) == numTimeSteps:
            stockData[symb] = data
            symbList.append(symb)
    except:
        a = 1

Now using pickle to dump the date to a file to be used later.

In [None]:
import pickle
f = open('7wData.pkl','wb')
pickle.dump(stockData,f)
f.close()
f = open('7wLst.pkl','wb')
pickle.dump(symbList,f)
f.close()

This is an example of using Ta-lib to get some technical indicator data. Specfically this example gets
the RSI which is an indicator that uses moving averages to determine if a stock is overbought or oversold. It
also returns the macD which is an indicator that tells the mometum of the current stock.

In [None]:
import pickle
import talib as ta
import numpy as np

## Code if I need to load any previous data
# f = open('20yrData.pkl','rb')
# stockData= pickle.load(f)
# f.close()
# f = open('20yrLst.pkl','rb')
# symbList= pickle.load(f)
# f.close()

# Stores the rsi for each day for each symbol. risDict[symbol] will be a stocks historical rsi values.
rsiDict = {}
# Stores the close prices for each day for each symbol. closeDict[symbol] will be a stocks
# historical close values.
closeDict = {}
# Stores the macD for each day for each symbol. closeDict[symbol] will be a stocks
# historical close values.
macDict = {}
# This loops through each stock and find the indcators.
for symb in symbList:
    # Getting the historical data.
    data = stockData[symb]
    # Extracting only the histrocial closee price.
    close = np.asarray([i[3] for i in data])
    # Use ta-lib to get the macD.
    macdAll = ta.MACD(close)
    # Use ta-lib to get the rsi. Need to remove first 33 datapoints since the are N/A.
    # (rsi relies on moving averages, so there must be > 33 datapoints to use)
    rsi = ta.RSI(close).tolist()[33:len(close)]
    close = close.tolist()
    # Shortenning up the macD data to match rsi. And seperating it into the macD and signal.
    # (two different parts of the indicator).
    macd = macdAll[0][33:len(close)]
    signal = macdAll[1][33:len(close)]
    # Shortening up close price to match.
    close = close[33:len(close)]
    # Converting the macD and signal to a more interpretable value.
    diff = (macd - signal) / close
    # Storing the historical data in a dictionary corresponding to the symbol.
    closeDict[symb] = close
    macDict[symb] = diff
    rsiDict[symb] = rsi


