# Stock Prediction with Neural Networks
**Bennett Nelson - CS440 Final Project**

## Introduction
Although I have enjoyed learning about many topics from the realm of AI over the course of the semester, neural networks caught my attention most prominently. This is in large part due to the many useful applications for this technology that come to my mind. However, neural networks have been, admittedly, the most difficult concept for me to grasp out of the various algorithms that were covered throughout the semester. From my perspective, this project seems like a great opportunity for me to challenge myself and solidify my understanding of neural networks, which will hopefully allow me to use them to greater effect in the future.

In order to explore both the creation and capabilities of neural networks to a greater extent, I have decided to use them in a rather largescale problem: Stock Price Prediction. I believe that this problem will serve as a detailed look at the strengths and limitations of regression neural networks as well as a useful comparison between the accuracies and efficiencies of various network complexities and optimization algorithms for those networks. Due to the scale of the stock prediction problem, there will be many datasets available given the amount of stocks in the market, and there will plenty of variability between these datasets to analyze. For example, differences in industry or even the resolution of time when data is collected in a set can be tested. Overall, the main goal in working with this problem is to explore key variables in neural networks such as optimization algorithms, hidden layer complexities, network sizes, and sizes of training datasets in order to predict something as complex as a stock price with as much accuracy as possible.

## Gathering Data
Before any neural networks can be compared, used, or even trained, an efficient method of gathering historical stock price and indicator data must be found. Originally, I had planned on using data from a popular investment research organization known as Morningstar. Their website includes charting tools which also allow for the exporting of a stock's closing prices over a given amount of time in CSV format. While this information would certainly be helpful, it is not enough for my purposes, and the export process is more tedious than I would have liked. Therefore, I began searching for another solution, which I found in a free-to-use stock API called Alpha Vantage.

Using Alpha Vantage provides very quick and easy access to a variety of statistics about any given stock. For example, a stock's open, high, low, and close prices, as well as trading volume, can be rapidly collected into a CSV file with the option of selecting a time interval for how often (or how many) data points are gathered for that stock. Data can be taken as specifically as 1 minute per update or as broadly as 1 month per update. This will allow me to have great control over what is included in each dataset for a stock. In addition to data about a stock's price and trading volume, Alpha Vantage also provides access to popular technical indicators used by traders to predict the direction a stock will move.


### Technical Indicators
In the investment world, technical analysis is a method of trading which involves the careful studying of price and volume as well as a great variety of models, known as technical indicators, based on the behaviors of these statistics. Although there are seemingly endless indicators to choose from, I have decided to focus on 5 of the most widely-used as they are simple yet have the potential to be greatly effective. The datasets for the stocks tested will include the following indicators:

- **Relative Strength Index (RSI):** Using closing prices for a recent trading period, the RSI is used to determine past and present strength or weakness for a stock.

- **Moving Average Convergence Divergence (MACD):** In order to detect potential changes in a stock's strength or direction, the MACD indicator uses two exponential moving averages (a short-term average and a long-term average). Crossovers, divergences, and the overall positions of these two averages relative to each other can be used as signals for a stock's behavior.

- **Bollinger Bands:** A simple moving average is calculated from recent stock price data as well as two standard deviations above and below this average. The position of the stock's price within this range of standard deviations as well as the sizes of the standard deviations themselves can indicate overbought or oversold conditions.

- **Stochastic Oscillator:** To predict a stock's momentum, this indicator compares the stock's closing price to the range of its prices over a period of time. Its sensitivity can be manipulated by changing the length of this period of time. It is often used in many of the same ways as the RSI.

- **Commodity Channel Index (CCI):** The CCI can be used to detect patterns in a stock's behavior as well as to indicate overbought or oversold conditions. This is accomplished by finding the difference between the price of the stock and a moving average and dividing this difference by 1.5% of a normal deviation from that average.


### Simplifying Data Collection
Because I will want to use my neural networks with a variety of different stocks over several different time periods, I will define some python functions to accomplish making the API calls and compiling one CSV datafile from the several files that the API returns. Then, I will be ready to start constructing my neural networks.

In [1]:
import urllib.request # Used for interacting with Alpha Vantage API and downloading files
import os # Checking if files exist and deleting unneeded files
from time import sleep # Must wait 1 minute after making 5 API calls
import csv # Simple reading, manipulating, and writing of CSV files

#### `createStockCSV()`
This function makes the gathering of historical data about a given stock very simple. It takes a fileName, for the CSV file to be outputted, as well as several parameters needed for the API calls. These include the stock's ticker symbol (stockSymbol), the frequency with which datapoints should be taken (timeSeries), the amount of data points ahead to match the current point with as needed by the neural networks (futureDataPoints), and a timeInterval which is used when the timeSeries is intraday (essentially a more specific timeSeries).

createStockCSV() downloads all necessary files in CSV format, these being price data and data for the 5 technical indicators, for the stock specified with the stockSymbol parameter. It then calls concatCSV() and createFutureColumn() to create one final CSV comprised of the data found in the separate files. Finally, the separate files are removed.

In [2]:
def createStockCSV(fileName, stockSymbol, timeSeries, futureDataPoints, timeInterval="1min"):
    if (not os.path.exists("./" + fileName)):
        stockSymbol = stockSymbol.upper()
        timeSeries = timeSeries.upper()
        timeInterval = timeInterval.lower()

        priceURL = "https://www.alphavantage.co/query?function=TIME_SERIES_" + timeSeries + "&symbol=" +\
        stockSymbol + "&apikey=DJ74AP0344E00KXN&datatype=csv&outputsize=full"
        if (timeSeries == "INTRADAY"):
            priceURL = priceURL + "&interval=" + timeInterval
        urllib.request.urlretrieve(priceURL, "./priceData.csv")

        rsiURL = "https://www.alphavantage.co/query?function=RSI&symbol=" + stockSymbol + "&interval=" +\
        (timeInterval if (timeSeries == "INTRADAY") else timeSeries.lower()) +\
        "&time_period=10&series_type=close&apikey=DJ74AP0344E00KXN&datatype=csv"
        urllib.request.urlretrieve(rsiURL, "./rsiData.csv")

        macdURL = "https://www.alphavantage.co/query?function=MACD&symbol=" + stockSymbol + "&interval=" +\
        (timeInterval if (timeSeries == "INTRADAY") else timeSeries.lower()) +\
        "&series_type=close&apikey=DJ74AP0344E00KXN&datatype=csv"
        urllib.request.urlretrieve(macdURL, "./macdData.csv")

        bandsURL = "https://www.alphavantage.co/query?function=BBANDS&symbol=" + stockSymbol + "&interval=" +\
        (timeInterval if (timeSeries == "INTRADAY") else timeSeries.lower()) +\
        "&time_period=5&series_type=close&apikey=DJ74AP0344E00KXN&datatype=csv"
        urllib.request.urlretrieve(bandsURL, "./bandsData.csv")

        sleep(60) # Must wait 1 minute as free version of API only allows 5 calls per minute

        stochURL = "https://www.alphavantage.co/query?function=STOCH&symbol=" + stockSymbol + "&interval=" +\
        (timeInterval if (timeSeries == "INTRADAY") else timeSeries.lower()) +\
        "&apikey=DJ74AP0344E00KXN&datatype=csv"
        urllib.request.urlretrieve(stochURL, "./stochData.csv")

        cciURL = "https://www.alphavantage.co/query?function=CCI&symbol=" + stockSymbol + "&interval=" +\
        (timeInterval if (timeSeries == "INTRADAY") else timeSeries.lower()) +\
        "&time_period=10&apikey=DJ74AP0344E00KXN&datatype=csv"
        urllib.request.urlretrieve(cciURL, "./cciData.csv")

        fileList = ["./priceData.csv", "./rsiData.csv", "./macdData.csv", "./bandsData.csv", "./stochData.csv", "./cciData.csv"]
        concatCSV(fileName, fileList)

        createFutureColumn(fileName, futureDataPoints)

        for file in fileList:
            os.remove(file)

#### `concatCSV()`
As a helper function to createStockCSV(), this function simply takes a fileName for its output file as well as a list of CSV file names (fileList). It reads in each CSV file through use of the `csv` module, turning them into python lists. Then, the timestamp column is removed from all CSV data lists except the first as it is unnecessary. The CSV data lists are concatenated together, one line from each file at a time. This has the result of creating a single CSV file comprised of the columns of the separate CSV files once the python list is written to a file using the `csv` module.

In [3]:
def concatCSV(finalName, fileList):
    csvList = []
    for file in fileList:
        csvList.append(list(csv.reader(open(file, "r"))))

    for i in range(1, len(csvList)):
        for j in range(len(csvList[i])):
            csvList[i][j] = csvList[i][j][1:]
    
    finalCSV = csvList[0]
    for i in range(1, len(csvList)):
        finalCSV = [line1 + line2 for (line1, line2) in zip(finalCSV, csvList[i])]
    
    fileWriter = csv.writer(open("./" + finalName, "w"))
    fileWriter.writerows(finalCSV)

#### `createFutureColumn()`
The final step of createStockCSV(), this function adds another column to the datafile which contains the stock's closing price from a set number of data points ahead of each point. This number is specified by the parameter dataPointsAhead. The purpose of adding this column is to give the neural networks target data to train from and predict. The length of time into the future for this price data can be changed for the same stock, and I will experiment with different values to determine how it affects the networks' prediction accuracy.

In [4]:
def createFutureColumn(dataFile, dataPointsAhead):
    data = list(csv.reader(open(dataFile, "r")))
    data[0].append("Future_Date")
    data[0].append("Future_Price")
    for i in range(len(data) - 1, dataPointsAhead, -1):
        data[i].append(data[i - dataPointsAhead][0])
        data[i].append(data[i - dataPointsAhead][4])
    
    fileWriter = csv.writer(open("./" + dataFile, "w"))
    fileWriter.writerow(data[0])
    for i in range(1, len(data)):
        if (len(data[i]) == len(data[len(data) - 1])):
            fileWriter.writerow(data[i])

### The First Dataset
Now that all aspects of the createStockCSV() function are complete, I can use it to create the dataset for the first stock I will test. I have decided to start with Starbucks (SBUX) as it is a well-known, stable company with a long history of price data. For the first test, I will set the time interval to daily, giving the neural networks a good amount of samples to work with. I will also start by attempting to predict prices 1 month ahead, in order to get a baseline for how well the networks perform with what they have been given. Running the below line will create a custom CSV file fitting these specifications.

In [5]:
createStockCSV("SBUX_daily_30_data.csv", "SBUX", "daily", 30)

## Introducing Neural Networks

In [6]:
import torch
import time
import numpy as np

def trainNetwork(X, T, learningRate, numHiddenLayers, numIterations, optimizerName):
    
    class StockNN(torch.nn.module):
        def __init__(self, numInputs, numHiddenUnits, numOutputs):
            super(StockNN, self).__init__()
            self.hidden = torch.nn.Linear(numInputs, numHiddenUnits)
            self.tanh = torch.nn.Tanh()
            self.output = torch.nn.Linear(numHiddenUnits, numOutputs)
            
        def forwardPass(self, X):
            out = self.hidden(X)
            out = self.tanh(out)
            out = self.output(out)
            return out
        
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print("Running on", device)
    
    stockNN = StockNN(1, numHiddenLayers, 1).to(device).double()
    
    available_optimizers = {"SGD": torch.optim.SGD, "ADAM": torch.optim.Adam }
    optimFunc = available_optimizers[optimizerName]
    optimizer = optimFunc(stockNN.parameters(), lr=learningRate)
    lossFunc = torch.nn.MSELoss()
    
    errors = []
    startTime = time.time()
    
    for iteration in range(numIterations):
        outputs = stockNN(Xtc)
        loss = lossFunc(outputs, Ttc)
        errors.append(torch.sqrt(loss))
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
    print("Training took {} seconds.".format(time.time() - startTime))
    return stockNN, errors

In [19]:
Xsbux = torch.from_numpy(np.loadtxt("SBUX_daily_30_data.csv", delimiter=",", skiprows=1, usecols=range(1,16))).cuda()
Tsbux = torch.from_numpy(np.loadtxt("SBUX_daily_30_data.csv", delimiter=",", skiprows=1, usecols=17)).cuda()

AssertionError: 
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx

In [None]:
sbuxNN, errors_sbuxNN = trainNetwork(Xtc, Ttc, 0.01, 100, 1000)

## *Resources*
\[1\] https://investopedia.com/  
\[2\] https://stackabuse.com/download-files-with-python/