<h1>Stock Analysis Using Volume and Price Changes</h1>
<h2>Matt Quinlan and Wes Brown<h2>
<h2>Introduction and Gathering the Data</h2>

<h3>Problem Identification</h3>
In the stock market world, there are two large groups with different mindsets around how to pick and choose which investments to place their money in. 

There are those that believe in "Fundamental" analysis, which focuses on the overall value of the stock and, in general, are looking to hold the stock for a long period of time, so that they can watch the stock price rise to reflect the value that they see. They like to follow the mantra of "buy low, sell high".

Then there are those that practice "Technical" analysis. Technical analysis focuses much more on the day to day trends. Investors here are commonly known as day-traders since they may buy and sell the same stock on the same day. These day traders are looking for trends in the data related to a specific stock - is the stock price moving up in a pattern that they've seen before? These investors follow the matra of "buy when it's going up, sell before it's goes down".

For our project, we would like focus on those practicing technical analysis, the day traders. We would like to create a tool that could be useful to those looking at the day-to-day trends of a stock and help them in making a decision on which stock to buy when comparing it to other options.

<h3>Goal Determination</h3>
In the world of stock analysis, there is almost an overwhelming amount of data available to anyone. For our analysis, we are wanting to focus on two key pieces of information - the trade volume and the stock price (and data related to it). 

Volume data is useful because it can tell you about the amount of attention that a stock is currently recieving. When the volume is around its average, there may not be much going on for that company. When the volume spikes up, a number of different things may be occuring - for good or bad reasons. Such as the company may have announced an aquistion or a legal situation may have been brought to light.

Once we have information regarding the volume of the trades, we will then look at the stock price information. Is the stock price trending up or down? By how much is it trending up or down? Has it been trending up or down over the last few days? We will look at raw values as well as percentages for the prices.

As input, we would like the user of our tool to provide a series of different stocks that they are looking at. We will then gather data related to each, add a few calculated fields, create a model for each stock, predict the intraday change for the stock for that day, then return the stock with the highest predicted intraday change based on percentage.

In the below rest of this notebook, we will go through the process of collecting the data and adding the calculated fields.

<h3>Imports</h3>

In [22]:
import yfinance as yf
import pandas as pd
import numpy as np
import pickle

import plotly.offline as plyo
import cufflinks as cf
import matplotlib.pyplot as plt 

<h3>Grabbing Tickers</h3>

In [4]:
all_tickers = "WFC MSFT INTC AMZN PYPL"
selected_stocks = yf.Tickers(all_tickers)
tickers = all_tickers.split(" ")

<h3>Getting the History for Each Ticker</h3>

In [5]:
selected_history = {}

for index in range(len(tickers)):
    selected_history[tickers[index]] = selected_stocks.tickers[index].history(period="1y")

<h3>Viewing Data</h3>

In [6]:
selected_history[tickers[0]].info()
selected_history[tickers[0]].tail(10)

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 253 entries, 2019-10-31 to 2020-10-30
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          253 non-null    float64
 1   High          253 non-null    float64
 2   Low           253 non-null    float64
 3   Close         253 non-null    float64
 4   Volume        253 non-null    int64  
 5   Dividends     253 non-null    float64
 6   Stock Splits  253 non-null    int64  
dtypes: float64(5), int64(2)
memory usage: 15.8 KB


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-10-19,22.889999,22.91,22.51,22.540001,37423600,0.0,0
2020-10-20,22.73,23.129999,22.690001,22.809999,34751200,0.0,0
2020-10-21,22.82,22.950001,22.6,22.700001,29528300,0.0,0
2020-10-22,22.66,23.32,22.610001,23.25,32238300,0.0,0
2020-10-23,23.58,23.59,23.120001,23.280001,24558300,0.0,0
2020-10-26,22.969999,23.02,22.6,22.700001,34034100,0.0,0
2020-10-27,22.610001,22.610001,21.82,21.82,49636500,0.0,0
2020-10-28,21.23,21.48,20.799999,21.18,54835900,0.0,0
2020-10-29,21.049999,21.379999,20.76,21.139999,46184000,0.0,0
2020-10-30,21.1,21.459999,20.84,21.450001,34592400,0.0,0


<h3>Methods for Calculated Data Points</h3>

In [7]:
def getIntradayChangeInfo(data_set):
    data_set["Intraday Change"] = data_set["Open"] - data_set["Close"]
    data_set["Intraday Pct Change"] = data_set["Intraday Change"] / data_set["Open"]
    
    previous_day_change = 0
    previous_day_pct_change = 0
    all_previous_day_change = []
    all_previous_day_pct_change = []
    
    for index, row in data_set.iterrows():
        all_previous_day_change.append(previous_day_change)
        all_previous_day_pct_change.append(previous_day_pct_change)
        previous_day_change = row["Intraday Change"]
        previous_day_pct_change = row["Intraday Pct Change"]
        
    data_set["Previous Day Change"] = all_previous_day_change
    data_set["Previous Day Pct Change"] = all_previous_day_pct_change
    
def getFiveDayAverageForIntraChange(data_set):
    previous_five_days = pd.Series(dtype="float64")
    previous_five_days_averages = []
    for index, row in data_set.iterrows():
        previous_five_days_averages.append(previous_five_days.mean())
        previous_five_days = updateFiveDays(row["Intraday Pct Change"], previous_five_days)
        
    data_set["Previous Five Day Average Pct Change"] = previous_five_days_averages
    data_set["Previous Five Day Average Pct Change"].fillna(0, inplace=True)
        

def updateFiveDays(current_day, five_days):
    if(five_days.size == 5):
        five_days["1"] = five_days["2"]
        five_days["2"] = five_days["3"]
        five_days["3"] = five_days["4"]
        five_days["4"] = five_days["5"]
        five_days["5"] = current_day
    else:
        five_days[str(five_days.size + 1)] = current_day
    return five_days

<h3>Adding Calculated Data Points</h3>

In [8]:
for key, history in selected_history.items():
    getIntradayChangeInfo(history)
    getFiveDayAverageForIntraChange(history)
    print("For {}".format(key))
    print(history.head(10))

For WFC
                 Open       High        Low      Close    Volume  Dividends  \
Date                                                                          
2019-10-31  49.523695  49.772412  48.930598  49.389771  18814700       0.00   
2019-11-01  49.868079  50.021136  49.657624  49.915909  16359100       0.00   
2019-11-04  50.298551  50.509003  49.973304  50.432476  17559900       0.00   
2019-11-05  50.451613  51.178635  50.451613  50.987312  25965300       0.00   
2019-11-06  50.977745  51.503880  50.862949  51.465614  22051100       0.00   
2019-11-07  51.938844  52.537621  51.938844  52.151314  23452500       0.51   
2019-11-08  51.987130  52.373433  51.726373  52.247883  14722100       0.00   
2019-11-11  51.948499  52.286515  51.890552  52.199596  10659400       0.00   
2019-11-12  51.909866  52.383093  51.784316  52.363777  15187600       0.00   
2019-11-13  51.967813  52.054732  51.388351  51.465614  16846800       0.00   

            Stock Splits  Intraday Change  

2019-11-13                              0.002869  
For PYPL
                  Open        High         Low       Close   Volume  \
Date                                                                  
2019-10-31  106.470001  106.500000  103.260002  104.099998  7200500   
2019-11-01  104.699997  105.300003  103.930000  104.980003  5489500   
2019-11-04  105.720001  105.760002  102.605003  102.809998  5818400   
2019-11-05  103.050003  103.250000  100.285004  100.989998  9236200   
2019-11-06  101.199997  101.339996  100.169998  100.629997  7331900   
2019-11-07  100.940002  101.720001  100.290001  100.470001  9088400   
2019-11-08  100.025002  101.709999   99.599998  101.419998  5451300   
2019-11-11  101.040001  103.040001  100.660004  102.669998  5391000   
2019-11-12  102.970001  103.089996  100.919998  102.029999  6715200   
2019-11-13  102.000000  103.599998  101.629997  102.120003  5246600   

            Dividends  Stock Splits  Intraday Change  Intraday Pct Change  \
Date      

<h3>Viewing the Data<h3>

In [24]:
def plotResults(data_set, plot_values):
    plyo.iplot(data_set[plot_values].iplot(asFigure=True))
    plt.show()
    
plot_values = ["Previous Day Pct Change", "Previous Five Day Average Pct Change", "Intraday Pct Change"]
plotResults(selected_history["INTC"], plot_values)

plot_values = ["Previous Day Pct Change", "Intraday Pct Change"]
plotResults(selected_history["INTC"], plot_values)

plot_values = ["Previous Five Day Average Pct Change", "Intraday Pct Change"]
plotResults(selected_history["INTC"], plot_values)

When looking at the graphs, we can easily see the relationship of the two. The Previous Five Day Average Pct Change field is a smoother version of the Previous Day Pct Change. While the fields are similar in what they are measuring, they can tell us different things. The Previous Day Pct Change will only give us information about the past day - did it spike up or down? While the Previous Five Day Average Pct Change will give context around that number - such as the last the five days were mostly down and the spike up yesterday was it just moving back towards its price at the start of the week. 

<h3>Serializing the Data</h3>

In [19]:
with open("data/model_data.pkl", mode="wb") as fwb:
    pickle.dump(selected_history, fwb)
    
with open("data/ticker_data.pkl", mode="wb") as fwb:
    pickle.dump(all_tickers, fwb)