<h1>Stock Volume Changes to Price Movement Analysis</h1>
<h3>Matt Quinlan and Wes Brown<h3>

<h2>Problem Identification</h2>
In the stock market world, there are two large groups with different mindsets around how to pick and choose which investments to place their money in. 

There are those that believe in "Fundamental" analysis, which focuses on the overall value of the stock and, in general, are looking to hold the stock for a long period of time, so that they can watch the stock price rise to reflect the value that they see. They like to follow the mantra of "buy low, sell high".

Then there are those that practice "Technical" analysis. Technical analysis focuses much more on the day to day trends. Investors here are commonly known as day-traders since they may buy and sell the same stock on the same day. These day traders are looking for trends in the data related to a specific stock - is the stock price moving up in a pattern that they've seen before? These investors follow the matra of "buy when it's going up, sell when it's going down".

For our project, we would like focus on those practicing technical analysis, the day traders. We would like to create a tool that could be useful to those looking at the day-to-day trends of a stock and help them in making a decision on whether it is a good buy or not.

<h2>Goal Determination</h2>
In the world of stock analysis, there is almost an overwhelming amount of data available to anyone. For our analysis, we are wanting to focus on two key pieces of information - the trade volume and the stock price. 

We would like to use the volume data to determine if there is an abnormally large amount of trading happening for this particular stock. We are going to look at the volume in comparsion to its past volume and also in comparsion to the volume of the stock market as a whole.

Once we have information regarding the volume of the trades, we will then look at the stock price information. Is the stock price trending up or down? By how much is it trending up or down? Has it been trending up or down? How volitile is the stock price for this stock?

Combining the information around the volume and stock price, we will determine which category that stock fits into: Strong Buy, Buy, Weak Buy, Hold, Weak Sell, Sell, and Strong Sell.

In the below section, we will talk about where we will get the data, how we will use the data for the specific pieces of the analysis, and then build the model and analysis.

<h2>Core Analysis and Model Building</h2>
Where the actual analysis will occur.

<h3>Reading in the Data</h3>

In [41]:
import pandas as pd
import os

filepath = os.path.join(os.getcwd(), 'data', 'stocks-screener-10-18-2020.csv')
stock_data = pd.read_csv(filepath)

#There is a known line at the end of the file that contains bad values
stock_data.dropna(inplace=True)

#Set the index for the dataframe to be the stock symbol
stock_data.set_index("Symbol", inplace=True)

stock_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 524 entries, A to ZTS
Data columns (total 14 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Name         524 non-null    object 
 1   Open         524 non-null    float64
 2   Last         524 non-null    float64
 3   Prev Open    524 non-null    float64
 4   Previous     524 non-null    float64
 5   Open 2D Ago  524 non-null    float64
 6   Last 2D Ago  524 non-null    float64
 7   1M MA        524 non-null    float64
 8   1M High      524 non-null    float64
 9   1M Low       524 non-null    float64
 10  Volume       524 non-null    float64
 11  Prev Vol     524 non-null    float64
 12  5D Avg Vol   524 non-null    float64
 13  1M Avg Vol   524 non-null    float64
dtypes: float64(13), object(1)
memory usage: 61.4+ KB


In [42]:
stock_data.head(10)

Unnamed: 0_level_0,Name,Open,Last,Prev Open,Previous,Open 2D Ago,Last 2D Ago,1M MA,1M High,1M Low,Volume,Prev Vol,5D Avg Vol,1M Avg Vol
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
A,Agilent Technologies,105.95,106.7,104.1,105.32,105.49,105.06,101.59,107.54,95.44,1039366.0,722900.0,917600.0,1029091.0
AAL,American Airlines Gp,12.31,12.46,12.22,12.23,12.35,12.36,12.61,14.08,11.22,32717299.0,33776102.0,41473102.0,56220273.0
AAP,Advance Auto Parts Inc,157.52,154.99,154.98,157.52,158.16,156.58,153.74,160.77,142.46,477153.0,697400.0,645960.0,723400.0
AAPL,Apple Inc,121.28,119.02,118.72,120.71,121.0,121.19,114.75,125.39,103.1,115393805.0,112559195.0,176314484.0,153805250.0
ABBV,Abbvie Inc,85.89,86.27,85.41,85.23,86.83,86.07,87.34,90.81,84.96,5362675.0,6193700.0,6517480.0,6981727.0
ABC,Amerisourcebergen Corp,99.6,99.52,97.89,99.35,97.04,98.44,96.64,100.61,92.0,1200554.0,1270500.0,945580.0,851527.0
ABMD,Abiomed Inc,283.58,286.48,275.09,283.57,280.0,280.84,271.14,289.31,255.4,262600.0,263900.0,240680.0,314805.0
ABT,Abbott Laboratories,107.7,109.67,106.65,107.32,108.61,107.75,106.81,111.57,100.34,4363766.0,3518900.0,4209960.0,4986073.0
ACN,Accenture Plc,229.03,230.05,225.59,228.77,228.88,229.43,226.99,239.35,210.42,1796451.0,1273400.0,1534360.0,2199391.0
ADBE,Adobe Systems Inc,504.0,502.82,499.26,501.15,514.34,506.31,488.91,519.6,452.52,2441300.0,2043100.0,2421040.0,2789996.0


<h3>Calculate Volume Data</h3>
In the following sections we will calculate values that we need in order to understand the changes in volume for each of the stocks in the data set.

<h4>Daily Volume Change</h4>
One of the values we will look at is the change in from the previous day. We will calculate this using the Prev Vol and Volume values which represent the previous days volume and the current days volume respectively.

In [43]:
stock_data["Daily Volume Change"] = (stock_data["Volume"] - stock_data["Prev Vol"]) / stock_data["Prev Vol"]
stock_data[["Volume", "Prev Vol","Daily Volume Change"]]

Unnamed: 0_level_0,Volume,Prev Vol,Daily Volume Change
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,1039366.0,722900.0,0.437773
AAL,32717299.0,33776102.0,-0.031348
AAP,477153.0,697400.0,-0.315812
AAPL,115393805.0,112559195.0,0.025183
ABBV,5362675.0,6193700.0,-0.134173
...,...,...,...
ZBH,770258.0,991000.0,-0.222747
ZBRA,218700.0,204200.0,0.071009
ZION,1831200.0,1852700.0,-0.011605
ZM,13745700.0,16520600.0,-0.167966


<h4>Entire Stock Market Volume Change</h4>
When looking at the volume data, we want to understand if this volume change is due to that stock specifically or if there is something going in to impact the stock market overall. As an example, when the Feds adjust the interest rates, there is usually a spike in activity within the stock market. We want to attempt to take these types of volume changes into account by comparing a specific stocks volume change to the volume change of the stock market overall.

<b>We may remove this since it requires the entire stock market for the calculation.</b>

In [44]:
overall_today_volume = stock_data["Volume"].sum()
overall_yesterday_volume = stock_data["Prev Vol"].sum()
overall_volume_change = (overall_today_volume - overall_yesterday_volume) / overall_yesterday_volume

stock_data["Adjusted Daily Vol Change"] = stock_data["Daily Volume Change"] - overall_volume_change

print("Overall Stock Market Change: {}".format(overall_volume_change))
stock_data[["Daily Volume Change","Adjusted Daily Vol Change"]]

Overall Stock Market Change: 0.0942213759371265


Unnamed: 0_level_0,Daily Volume Change,Adjusted Daily Vol Change
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1
A,0.437773,0.343551
AAL,-0.031348,-0.125569
AAP,-0.315812,-0.410033
AAPL,0.025183,-0.069038
ABBV,-0.134173,-0.228394
...,...,...
ZBH,-0.222747,-0.316968
ZBRA,0.071009,-0.023213
ZION,-0.011605,-0.105826
ZM,-0.167966,-0.262187


<h4>Previous Volume vs 5-day Average</h4>
When looking at the volume data, we would like to get an understanding if the previous days volume is an outlier. We will do that with the next few analysis that we perform - starting with the previous volume vs the average volume over the last five days.

In [45]:
stock_data["Prev Vol vs 5D Avg Vol"] = (stock_data["Volume"] - stock_data["5D Avg Vol"]) / stock_data["5D Avg Vol"]
stock_data[["Prev Vol","5D Avg Vol", "Prev Vol vs 5D Avg Vol"]]

Unnamed: 0_level_0,Prev Vol,5D Avg Vol,Prev Vol vs 5D Avg Vol
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,722900.0,917600.0,0.132701
AAL,33776102.0,41473102.0,-0.211120
AAP,697400.0,645960.0,-0.261327
AAPL,112559195.0,176314484.0,-0.345523
ABBV,6193700.0,6517480.0,-0.177186
...,...,...,...
ZBH,991000.0,958980.0,-0.196795
ZBRA,204200.0,254720.0,-0.141410
ZION,1852700.0,1427260.0,0.283018
ZM,16520600.0,10902939.0,0.260733


<h4>Daily Volume vs Monthly Average</h4>
Another factor we would like to take into account is the average volume for the stock over the past month. While there may be a large increase between yesterday and today's volume, today's volume may not he be the outlier i.e. yesterday's volume may have been dramatically lower than the average. To account for that, we will look at the daily volume in comparsion to the monthly average.

In [46]:
stock_data["Monthly Vol vs Daily Vol"] = (stock_data["Volume"] - stock_data["1M Avg Vol"]) / stock_data["1M Avg Vol"]
stock_data[["Volume","1M Avg Vol", "Monthly Vol vs Daily Vol"]]

Unnamed: 0_level_0,Volume,1M Avg Vol,Monthly Vol vs Daily Vol
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,1039366.0,1029091.0,0.009985
AAL,32717299.0,56220273.0,-0.418052
AAP,477153.0,723400.0,-0.340402
AAPL,115393805.0,153805250.0,-0.249741
ABBV,5362675.0,6981727.0,-0.231898
...,...,...,...
ZBH,770258.0,965282.0,-0.202038
ZBRA,218700.0,317282.0,-0.310708
ZION,1831200.0,1565209.0,0.169940
ZM,13745700.0,12304923.0,0.117089


<h4>Previous Volume vs Monthly Average</h4>
We would also like to compare the volume from the previous day to the average volume for the stock over the past month. We are conducting this analysis for the same reasons as the one stated above.

In [47]:
stock_data["Monthly Vol vs Prev Vol"] = (stock_data["Prev Vol"] - stock_data["1M Avg Vol"]) / stock_data["1M Avg Vol"]
stock_data[["Prev Vol","1M Avg Vol", "Monthly Vol vs Prev Vol"]]

Unnamed: 0_level_0,Prev Vol,1M Avg Vol,Monthly Vol vs Prev Vol
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,722900.0,1029091.0,-0.297535
AAL,33776102.0,56220273.0,-0.399218
AAP,697400.0,723400.0,-0.035941
AAPL,112559195.0,153805250.0,-0.268171
ABBV,6193700.0,6981727.0,-0.112870
...,...,...,...
ZBH,991000.0,965282.0,0.026643
ZBRA,204200.0,317282.0,-0.356408
ZION,1852700.0,1565209.0,0.183676
ZM,16520600.0,12304923.0,0.342601


<h3>Calculating Price Data</h3>
    In the following sections we will calculate values that we need in order to understand the changes in price for each of the stocks in the data set.

<h4>Intraday Price Change</h4>
One of the values we will look at is the change price from the open of the day to the close of the day. We will calculate this using the Open price and Last price values which represent the price at the beginning of the day and price at the end of the day.

In [48]:
stock_data["Intraday Price Change"] = (stock_data["Last"] - stock_data["Open"]) / stock_data["Open"]
stock_data[["Open", "Last","Intraday Price Change"]]

Unnamed: 0_level_0,Open,Last,Intraday Price Change
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,105.95,106.70,0.007079
AAL,12.31,12.46,0.012185
AAP,157.52,154.99,-0.016061
AAPL,121.28,119.02,-0.018635
ABBV,85.89,86.27,0.004424
...,...,...,...
ZBH,148.27,146.93,-0.009038
ZBRA,289.96,293.00,0.010484
ZION,31.10,31.14,0.001286
ZM,544.00,559.00,0.027574


<h4>Previous Day Price Change</h4>
Another of the price valyes that we would like to use in our analysis is the change in price in the previous day. There are different scenarios in the stock market that cause fluxations between different days, such as a sharp rise in price the previous day may lead to a sharp price in the next day.

In [49]:
stock_data["Previous Price Change"] = (stock_data["Previous"] - stock_data["Prev Open"]) / stock_data["Prev Open"]
stock_data[["Prev Open", "Previous","Previous Price Change"]]

Unnamed: 0_level_0,Prev Open,Previous,Previous Price Change
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,104.10,105.32,0.011720
AAL,12.22,12.23,0.000818
AAP,154.98,157.52,0.016389
AAPL,118.72,120.71,0.016762
ABBV,85.41,85.23,-0.002107
...,...,...,...
ZBH,143.31,147.54,0.029516
ZBRA,280.38,287.31,0.024716
ZION,29.83,31.17,0.044921
ZM,509.08,536.40,0.053665


In [27]:
stock_data[["Previous Price Change","Intraday Price Change"]]

Unnamed: 0_level_0,Previous Price Change,Intraday Price Change
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1
A,0.011720,0.007079
AAL,0.000818,0.012185
AAP,0.016389,-0.016061
AAPL,0.016762,-0.018635
ABBV,-0.002107,0.004424
...,...,...
ZBH,0.029516,-0.009038
ZBRA,0.024716,0.010484
ZION,0.044921,0.001286
ZM,0.053665,0.027574


<h4>Day Before Yesterday Price Change</h4>
Similar to the analysis done above, we would like to do the same for the day before yesterday.

In [50]:
stock_data["2 Days Ago Price Change"] = (stock_data["Last 2D Ago"] - stock_data["Open 2D Ago"]) / stock_data["Open 2D Ago"]
stock_data[["Open 2D Ago", "Last 2D Ago","2 Days Ago Price Change"]]

Unnamed: 0_level_0,Open 2D Ago,Last 2D Ago,2 Days Ago Price Change
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,105.49,105.06,-0.004076
AAL,12.35,12.36,0.000810
AAP,158.16,156.58,-0.009990
AAPL,121.00,121.19,0.001570
ABBV,86.83,86.07,-0.008753
...,...,...,...
ZBH,146.00,145.72,-0.001918
ZBRA,288.65,284.77,-0.013442
ZION,30.66,30.23,-0.014025
ZM,518.78,509.25,-0.018370


<h3>Building the Model</h3>
Now that we've collected the data and calculated a few values, we need to start working on the model.

In [None]:
import math
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import explained_variance_score, mean_absolute_error, r2_score, mean_squared_error

<h4>Setting the baseline</h4>
For a model on the stock data that uses the previous stock price to predict the stock price, we should validate that the model is better than just using the previous stock price.

In [142]:
features = "Previous"
target = "Last"

In [143]:
X = stock_data[features]
y = stock_data[target]

In [144]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

In [145]:
lr = LinearRegression()
X_train = X_train.values.reshape(-1,1)
X_test = X_test.values.reshape(-1,1)
lr.fit(X_train, y_train)

LinearRegression()

In [146]:
train_score = lr.score(X_train, y_train)
test_score = lr.score(X_test, y_test)

print("Train Score: {}\nTest Score: {}".format(train_score, test_score))

preds = lr.predict(X_test)

score = explained_variance_score(y_test, preds)
mae = mean_absolute_error(y_test, preds)
rmse = math.sqrt(mean_squared_error(y_test, preds))
r2 = r2_score(y_test, preds)

print("score = {:.5f} | MAE = {:.3f} | RMSE = {:.3f} | R2 = {:.5f}"
          .format(score, mae, rmse, r2))

Train Score: 0.9998261795992877
Test Score: 0.999660700143169
score = 0.99967 | MAE = 2.319 | RMSE = 4.592 | R2 = 0.99966


<h4>Evaluating the Model</h4>

In [137]:
features = ["Prev Vol vs 5D Avg Vol"
            , "Monthly Vol vs Prev Vol"
            , "Previous Price Change"
            , "2 Days Ago Price Change"
            , "Previous"
            , "Prev Vol"]
target = "Last"

In [138]:
X = stock_data[features]
y = stock_data[target]

In [139]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

In [140]:
lr = LinearRegression()
lr.fit(X_train, y_train)

LinearRegression()

In [141]:
train_score = lr.score(X_train, y_train)
test_score = lr.score(X_test, y_test)

print("Train Score: {}\nTest Score: {}".format(train_score, test_score))

preds = lr.predict(X_test)

score = explained_variance_score(y_test, preds)
mae = mean_absolute_error(y_test, preds)
rmse = math.sqrt(mean_squared_error(y_test, preds))
r2 = r2_score(y_test, preds)

print("score = {:.5f} | MAE = {:.3f} | RMSE = {:.3f} | R2 = {:.5f}"
          .format(score, mae, rmse, r2))

Train Score: 0.9997184982978101
Test Score: 0.9998507640339046
score = 0.99985 | MAE = 2.010 | RMSE = 5.312 | R2 = 0.99985


<h4>Getting Data from Yahoo Finance</h4>

In [157]:
# !pip install yfinance
import yfinance as yf

wfc = yf.Ticker("WFC")

wfc.history()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-09-17,25.23,25.43,24.9,25.110001,51531300,0,0
2020-09-18,24.940001,25.4,24.9,25.129999,115153200,0,0
2020-09-21,24.450001,24.52,23.719999,24.040001,56188400,0,0
2020-09-22,23.98,24.360001,23.530001,23.65,39839500,0,0
2020-09-23,23.780001,24.129999,22.83,22.83,45697700,0,0
2020-09-24,22.959999,23.719999,22.559999,23.32,43329100,0,0
2020-09-25,23.120001,23.709999,23.01,23.639999,30229900,0,0
2020-09-28,23.99,24.27,23.76,23.82,41103500,0,0
2020-09-29,23.719999,23.719999,23.07,23.26,38416300,0,0
2020-09-30,23.360001,23.870001,23.25,23.51,43058500,0,0


<h2>Flask Application</h2>
Not sure if this will be a final section or not.