DS 3000 Spring 2021

# Stock Price Predictor

# Motivation:

## Problem

Company stock prices are very volatile and always changing. It is very difficult to determine how a stock will perform in the future. Evaluating the company and guessing its future performance requires a great deal of research into the company and analysis of its historical data. Gathering and examining the data for a company can be very difficult and require a lot of time.

## Solution

Yahoo Finance has information on every publicly traded company. They have records of each company's historical data that can downloaded. They also monitor many different aspects of a company and their stock price, and are constantly updating and recording this data. The goal of this project is to use visualization and technical analysis on historical data to determine a stocks value and risk, and then predict its future performance. 

We will also be using historical data of many different financial indicators, including the company's assets, earnings, liabilities, etc. to create associations with the change in the price. This data can be obtained straight from the SEC, and we'll download csv files with this data to use.

## Impact

If successful, this would be a great way to process tons of data on a company and extract data that would otherwise be impossible to generate ourselves. This could assist you in making trading decisions, taking into account the risk, value, and prediction of a stock. No matter how successful this is though, this won't be able to account for many external factors that can affect a stock's price. It would be impossible to account for external impacts later in the future. If this were to be very accurate, it could change how trading occurs. If every trading decision being made were based on lots of data analysis and computer generated predictions, that would take away the human element of trading. These decisions would be purely logical and wouldn't have any emotion component as trading does now. 

# Dataset

## Details

## Part 1: Historical Stock Prices

### Dataset From Kaggle
The data set was acquired from [Kaggle](https://www.kaggle.com/jacksoncrow/stock-market-dataset), which is from Yahoo and has historical data for every company ever traded on NASDAQ, going back as far as when each company first went public. The data for each company includes: 

- Date - specifies trading date
- Open - opening price
- High - maximum price during the day
- Low - minimum price during the day
- Close - close price adjusted for splits
- Adj Close - adjusted close price adjusted for both dividends and splits.
- Volume - the number of shares that changed hands during a given day

This data only goes up to 01 April 2020. 

### Current Data, Also From Kaggle
To get more current data, after 04/01/2020, we can [download NASDAQ hisorical data](https://www.kaggle.com/jacksoncrow/download-nasdaq-historical-data). This will give us the latest set of historical data, which will be in the same format as the dataset above. This will just include more recent data for each company. For this project, I will just get all of the current data from this link. The dataset above is just an example of the dataset's structure. 




In [5]:
import pandas as pd
df_aapl = pd.read_csv('NASDAQ/stocks/AAPL.csv')
df_aapl.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,1980-12-12,0.513393,0.515625,0.513393,0.513393,0.406782,117258400
1,1980-12-15,0.488839,0.488839,0.486607,0.486607,0.385558,43971200
2,1980-12-16,0.453125,0.453125,0.450893,0.450893,0.35726,26432000
3,1980-12-17,0.462054,0.464286,0.462054,0.462054,0.366103,21610400
4,1980-12-18,0.475446,0.477679,0.475446,0.475446,0.376715,18362400


## Part 2: Historical Company Metrics
To get the many indicators of a company's success and progress, we can [use this data from usfundamental.com](http://www.usfundamentals.com/download/) to download all relevant data for each company. The data comes from the SEC. This data only goes back to 2010-2011, but it provides so many indicators that is should be very useful. These indicators include:


- Assets
- AssetsCurrent
- CashAndCashEquivalentsAtCarryingValue
- Liabilities
- LiabilitiesCurrent
- NetCashProvidedByUsedInFinancingActivities (yearly only)
- NetCashProvidedByUsedInInvestingActivities (yearly only)
- NetCashProvidedByUsedInOperatingActivities (yearly only)
- OperatingIncomeLoss
- PropertyPlantAndEquipmentNet
- Revenues
- LatestSnapshot
- CashAndCashEquivalentsAtCarryingValue
- ComprehensiveIncomeNetOfTax
- EarningsPerShareDiluted
- Goodwill

Each of these 16 categories has both yearly and quarterly datasets. Each of these csv files has each company as rows, by SEC ID, and then each year or quarter as columns. For example:


In [4]:
df_assets = pd.read_csv('Assets-quarterly.csv')
df_assets.head()

Unnamed: 0,SEC ID,2011Q1,2011Q2,2011Q3,2011Q4,2012Q1,2012Q2,2012Q3,2012Q4,2013Q1,...,2015Q1,2015Q2,2015Q3,2015Q4,2016Q1,2016Q2,2016Q3,2016Q4,2017Q1,2017Q2
0,1003130,,75578000.0,69126000.0,66433000.0,69707000.0,60158000.0,52906000.0,48968000.0,50060000.0,...,,,,,,,,,,
1,1133116,,35.0,220.0,75013.0,21749.0,47437.0,38114.0,38144.0,31711.0,...,18409.0,,,,,,,,,
2,912513,3826500000.0,3857000000.0,3630700000.0,3536700000.0,3715900000.0,3858400000.0,3955900000.0,3922100000.0,3961200000.0,...,2722700000.0,2776400000.0,2808100000.0,,,,,,,
3,1101026,,,366051.0,463494.0,417429.0,,,234004.0,221235.0,...,183030.0,119244.0,101796.0,72680.0,349598.0,90846.0,749910.0,737292.0,1238745.0,
4,23197,998563000.0,937509000.0,853373000.0,,765400000.0,719778000.0,705594000.0,,684747000.0,...,471487000.0,473795000.0,459042000.0,,462170000.0,921196000.0,903452000.0,,883508000.0,863758000.0


### Method:

This problem is a regression problem. Given the information about the changes to a company's stock performance in the past, and the information about their finances and performance, we hope to estimate the future performance of the company. We will try to create associations between the numbers in the company's finances, along the changes in these numbers over time, with the changes in the stock's price.

We will use these associations to predict the future of the company's stock price given these association and any current/new data. If we get new earnings calls or other financial reports, we hope to be able to predict how this will change their stock price. 