<h1>Stock Volume Changes to Price Movement Analysis</h1>
<h3>Matt Quinlan and Wes Brown<h3>

<h2>Problem Identification</h2>
In the stock market world, there are two large groups with different mindsets around how to pick and choose which investments to place their money in. 

There are those that believe in "Fundamental" analysis, which focuses on the overall value of the stock and, in general, are looking to hold the stock for a long period of time, so that they can watch the stock price rise to reflect the value that they see. They like to follow the mantra of "buy low, sell high".

Then there are those that practice "Technical" analysis. Technical analysis focuses much more on the day to day trends. Investors here are commonly known as day-traders since they may buy and sell the same stock on the same day. These day traders are looking for trends in the data related to a specific stock - is the stock price moving up in a pattern that they've seen before? These investors follow the matra of "buy when it's going up, sell when it's going down".

For our project, we would like focus on those practicing technical analysis, the day traders. We would like to create a tool that could be useful to those looking at the day-to-day trends of a stock and help them in making a decision on whether it is a good buy or not.

<h2>Goal Determination</h2>
In the world of stock analysis, there is almost an overwhelming amount of data available to anyone. For our analysis, we are wanting to focus on two key pieces of information - the trade volume and the stock price. 

We would like to use the volume data to determine if there is an abnormally large amount of trading happening for this particular stock. We are going to look at the volume in comparsion to its past volume and also in comparsion to the volume of the stock market as a whole.

Once we have information regarding the volume of the trades, we will then look at the stock price information. Is the stock price trending up or down? By how much is it trending up or down? Has it been trending up or down? How volitile is the stock price for this stock?

Combining the information around the volume and stock price, we will determine which category that stock fits into: Strong Buy, Buy, Weak Buy, Hold, Weak Sell, Sell, and Strong Sell.

In the below section, we will talk about where we will get the data, how we will use the data for the specific pieces of the analysis, and then build the model and analysis.

<h2>Core Analysis and Model Building</h2>
Where the actual analysis will occur.

<h3>Reading in the Data</h3>

In [21]:
import pandas as pd
import os

filepath = os.path.join(os.getcwd(), 'data', 'stocks-screener-10-18-2020.csv')
fileOne = pd.read_csv(filepath)
filepath = os.path.join(os.getcwd(), 'data', 'stocks-screener-10-20-2020.csv')
fileTwo = pd.read_csv(filepath)

stock_data = pd.concat([fileOne, fileTwo])
stock_data.reset_index(inplace=True)

#There is a known line at the end of the file that contains bad values
stock_data.dropna(inplace=True)

#Set the index for the dataframe to be the stock symbol
# stock_data.set_index("Symbol", inplace=True)

stock_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1049 entries, 0 to 1049
Data columns (total 16 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   index        1049 non-null   int64  
 1   Symbol       1049 non-null   object 
 2   Name         1049 non-null   object 
 3   Open         1049 non-null   float64
 4   Last         1049 non-null   float64
 5   Prev Open    1049 non-null   float64
 6   Previous     1049 non-null   float64
 7   Open 2D Ago  1049 non-null   float64
 8   Last 2D Ago  1049 non-null   float64
 9   1M MA        1049 non-null   float64
 10  1M High      1049 non-null   float64
 11  1M Low       1049 non-null   float64
 12  Volume       1049 non-null   float64
 13  Prev Vol     1049 non-null   float64
 14  5D Avg Vol   1049 non-null   float64
 15  1M Avg Vol   1049 non-null   float64
dtypes: float64(13), int64(1), object(2)
memory usage: 139.3+ KB


In [22]:
stock_data.head(10)

Unnamed: 0,index,Symbol,Name,Open,Last,Prev Open,Previous,Open 2D Ago,Last 2D Ago,1M MA,1M High,1M Low,Volume,Prev Vol,5D Avg Vol,1M Avg Vol
0,0,A,Agilent Technologies,105.95,106.7,104.1,105.32,105.49,105.06,101.59,107.54,95.44,1039366.0,722900.0,917600.0,1029091.0
1,1,AAL,American Airlines Gp,12.31,12.46,12.22,12.23,12.35,12.36,12.61,14.08,11.22,32717299.0,33776102.0,41473102.0,56220273.0
2,2,AAP,Advance Auto Parts Inc,157.52,154.99,154.98,157.52,158.16,156.58,153.74,160.77,142.46,477153.0,697400.0,645960.0,723400.0
3,3,AAPL,Apple Inc,121.28,119.02,118.72,120.71,121.0,121.19,114.75,125.39,103.1,115393805.0,112559195.0,176314484.0,153805250.0
4,4,ABBV,Abbvie Inc,85.89,86.27,85.41,85.23,86.83,86.07,87.34,90.81,84.96,5362675.0,6193700.0,6517480.0,6981727.0
5,5,ABC,Amerisourcebergen Corp,99.6,99.52,97.89,99.35,97.04,98.44,96.64,100.61,92.0,1200554.0,1270500.0,945580.0,851527.0
6,6,ABMD,Abiomed Inc,283.58,286.48,275.09,283.57,280.0,280.84,271.14,289.31,255.4,262600.0,263900.0,240680.0,314805.0
7,7,ABT,Abbott Laboratories,107.7,109.67,106.65,107.32,108.61,107.75,106.81,111.57,100.34,4363766.0,3518900.0,4209960.0,4986073.0
8,8,ACN,Accenture Plc,229.03,230.05,225.59,228.77,228.88,229.43,226.99,239.35,210.42,1796451.0,1273400.0,1534360.0,2199391.0
9,9,ADBE,Adobe Systems Inc,504.0,502.82,499.26,501.15,514.34,506.31,488.91,519.6,452.52,2441300.0,2043100.0,2421040.0,2789996.0


<h3>Calculate Volume Data</h3>
In the following sections we will calculate values that we need in order to understand the changes in volume for each of the stocks in the data set.

<h4>Daily Volume Change</h4>
One of the values we will look at is the change in from the previous day. We will calculate this using the Prev Vol and Volume values which represent the previous days volume and the current days volume respectively.

In [23]:
stock_data["Daily Volume Change"] = (stock_data["Volume"] - stock_data["Prev Vol"]) / stock_data["Prev Vol"]
stock_data[["Volume", "Prev Vol","Daily Volume Change"]]

Unnamed: 0,Volume,Prev Vol,Daily Volume Change
0,1039366.0,722900.0,0.437773
1,32717299.0,33776102.0,-0.031348
2,477153.0,697400.0,-0.315812
3,115393805.0,112559195.0,0.025183
4,5362675.0,6193700.0,-0.134173
...,...,...,...
1045,589641.0,1011900.0,-0.417293
1046,211000.0,347500.0,-0.392806
1047,2618600.0,1515600.0,0.727765
1048,12046800.0,15193200.0,-0.207093


<h4>Entire Stock Market Volume Change</h4>
When looking at the volume data, we want to understand if this volume change is due to that stock specifically or if there is something going in to impact the stock market overall. As an example, when the Feds adjust the interest rates, there is usually a spike in activity within the stock market. We want to attempt to take these types of volume changes into account by comparing a specific stocks volume change to the volume change of the stock market overall.

<b>We may remove this since it requires the entire stock market for the calculation.</b>

In [24]:
overall_today_volume = stock_data["Volume"].sum()
overall_yesterday_volume = stock_data["Prev Vol"].sum()
overall_volume_change = (overall_today_volume - overall_yesterday_volume) / overall_yesterday_volume

stock_data["Adjusted Daily Vol Change"] = stock_data["Daily Volume Change"] - overall_volume_change

print("Overall Stock Market Change: {}".format(overall_volume_change))
stock_data[["Daily Volume Change","Adjusted Daily Vol Change"]]

Overall Stock Market Change: 0.027170378401193348


Unnamed: 0,Daily Volume Change,Adjusted Daily Vol Change
0,0.437773,0.410602
1,-0.031348,-0.058518
2,-0.315812,-0.342982
3,0.025183,-0.001987
4,-0.134173,-0.161343
...,...,...
1045,-0.417293,-0.444464
1046,-0.392806,-0.419976
1047,0.727765,0.700594
1048,-0.207093,-0.234263


<h4>Previous Volume vs 5-day Average</h4>
When looking at the volume data, we would like to get an understanding if the previous days volume is an outlier. We will do that with the next few analysis that we perform - starting with the previous volume vs the average volume over the last five days.

In [25]:
stock_data["Prev Vol vs 5D Avg Vol"] = (stock_data["Volume"] - stock_data["5D Avg Vol"]) / stock_data["5D Avg Vol"]
stock_data[["Prev Vol","5D Avg Vol", "Prev Vol vs 5D Avg Vol"]]

Unnamed: 0,Prev Vol,5D Avg Vol,Prev Vol vs 5D Avg Vol
0,722900.0,917600.0,0.132701
1,33776102.0,41473102.0,-0.211120
2,697400.0,645960.0,-0.261327
3,112559195.0,176314484.0,-0.345523
4,6193700.0,6517480.0,-0.177186
...,...,...,...
1045,1011900.0,844240.0,-0.301572
1046,347500.0,265380.0,-0.204914
1047,1515600.0,1833380.0,0.428291
1048,15193200.0,13166780.0,-0.085061


<h4>Daily Volume vs Monthly Average</h4>
Another factor we would like to take into account is the average volume for the stock over the past month. While there may be a large increase between yesterday and today's volume, today's volume may not he be the outlier i.e. yesterday's volume may have been dramatically lower than the average. To account for that, we will look at the daily volume in comparsion to the monthly average.

In [26]:
stock_data["Monthly Vol vs Daily Vol"] = (stock_data["Volume"] - stock_data["1M Avg Vol"]) / stock_data["1M Avg Vol"]
stock_data[["Volume","1M Avg Vol", "Monthly Vol vs Daily Vol"]]

Unnamed: 0,Volume,1M Avg Vol,Monthly Vol vs Daily Vol
0,1039366.0,1029091.0,0.009985
1,32717299.0,56220273.0,-0.418052
2,477153.0,723400.0,-0.340402
3,115393805.0,153805250.0,-0.249741
4,5362675.0,6981727.0,-0.231898
...,...,...,...
1045,589641.0,909050.0,-0.351366
1046,211000.0,305055.0,-0.308321
1047,2618600.0,1587405.0,0.649611
1048,12046800.0,12107827.0,-0.005040


<h4>Previous Volume vs Monthly Average</h4>
We would also like to compare the volume from the previous day to the average volume for the stock over the past month. We are conducting this analysis for the same reasons as the one stated above.

In [27]:
stock_data["Monthly Vol vs Prev Vol"] = (stock_data["Prev Vol"] - stock_data["1M Avg Vol"]) / stock_data["1M Avg Vol"]
stock_data[["Prev Vol","1M Avg Vol", "Monthly Vol vs Prev Vol"]]

Unnamed: 0,Prev Vol,1M Avg Vol,Monthly Vol vs Prev Vol
0,722900.0,1029091.0,-0.297535
1,33776102.0,56220273.0,-0.399218
2,697400.0,723400.0,-0.035941
3,112559195.0,153805250.0,-0.268171
4,6193700.0,6981727.0,-0.112870
...,...,...,...
1045,1011900.0,909050.0,0.113140
1046,347500.0,305055.0,0.139139
1047,1515600.0,1587405.0,-0.045234
1048,15193200.0,12107827.0,0.254825


<h3>Calculating Price Data</h3>
    In the following sections we will calculate values that we need in order to understand the changes in price for each of the stocks in the data set.

<h4>Intraday Price Change</h4>
One of the values we will look at is the change price from the open of the day to the close of the day. We will calculate this using the Open price and Last price values which represent the price at the beginning of the day and price at the end of the day.

In [28]:
stock_data["Intraday Price Change"] = (stock_data["Last"] - stock_data["Open"]) / stock_data["Open"]
stock_data[["Open", "Last","Intraday Price Change"]]

Unnamed: 0,Open,Last,Intraday Price Change
0,105.95,106.70,0.007079
1,12.31,12.46,0.012185
2,157.52,154.99,-0.016061
3,121.28,119.02,-0.018635
4,85.89,86.27,0.004424
...,...,...,...
1045,143.37,141.90,-0.010253
1046,293.24,294.83,0.005422
1047,31.58,30.52,-0.033566
1048,572.33,537.02,-0.061695


<h4>Previous Day Price Change</h4>
Another of the price valyes that we would like to use in our analysis is the change in price in the previous day. There are different scenarios in the stock market that cause fluxations between different days, such as a sharp rise in price the previous day may lead to a sharp price in the next day.

In [29]:
stock_data["Previous Price Change"] = (stock_data["Previous"] - stock_data["Prev Open"]) / stock_data["Prev Open"]
stock_data[["Prev Open", "Previous","Previous Price Change"]]

Unnamed: 0,Prev Open,Previous,Previous Price Change
0,104.10,105.32,0.011720
1,12.22,12.23,0.000818
2,154.98,157.52,0.016389
3,118.72,120.71,0.016762
4,85.41,85.23,-0.002107
...,...,...,...
1045,146.93,142.00,-0.033553
1046,292.20,291.57,-0.002156
1047,31.36,30.22,-0.036352
1048,572.50,568.34,-0.007266


In [30]:
stock_data[["Previous Price Change","Intraday Price Change"]]

Unnamed: 0,Previous Price Change,Intraday Price Change
0,0.011720,0.007079
1,0.000818,0.012185
2,0.016389,-0.016061
3,0.016762,-0.018635
4,-0.002107,0.004424
...,...,...
1045,-0.033553,-0.010253
1046,-0.002156,0.005422
1047,-0.036352,-0.033566
1048,-0.007266,-0.061695


<h4>Day Before Yesterday Price Change</h4>
Similar to the analysis done above, we would like to do the same for the day before yesterday.

In [31]:
stock_data["2 Days Ago Price Change"] = (stock_data["Last 2D Ago"] - stock_data["Open 2D Ago"]) / stock_data["Open 2D Ago"]
stock_data[["Open 2D Ago", "Last 2D Ago","2 Days Ago Price Change"]]

Unnamed: 0,Open 2D Ago,Last 2D Ago,2 Days Ago Price Change
0,105.49,105.06,-0.004076
1,12.35,12.36,0.000810
2,158.16,156.58,-0.009990
3,121.00,121.19,0.001570
4,86.83,86.07,-0.008753
...,...,...,...
1045,148.27,146.93,-0.009038
1046,289.96,293.00,0.010484
1047,31.10,31.14,0.001286
1048,544.00,559.00,0.027574


<h3>Building the Model</h3>
Now that we've collected the data and calculated a few values, we need to start working on the model.

<h1>NOTE TO SELF</h1>
Look to use the monthly high and low values in the analysis.
Look at the correlation for each of the values we have available.
Look at the MinMaxScaler to reduce some of the values

In [28]:
import math
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import explained_variance_score, mean_absolute_error, r2_score, mean_squared_error

<h4>Setting the baseline</h4>
For a model on the stock data that uses the previous stock price to predict the stock price, we should validate that the model is better than just using the previous stock price.

In [62]:
features = "Previous Price Change"
target = "Intraday Price Change"
stock_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1049 entries, 0 to 1049
Data columns (total 24 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   index                      1049 non-null   int64  
 1   Symbol                     1049 non-null   object 
 2   Name                       1049 non-null   object 
 3   Open                       1049 non-null   float64
 4   Last                       1049 non-null   float64
 5   Prev Open                  1049 non-null   float64
 6   Previous                   1049 non-null   float64
 7   Open 2D Ago                1049 non-null   float64
 8   Last 2D Ago                1049 non-null   float64
 9   1M MA                      1049 non-null   float64
 10  1M High                    1049 non-null   float64
 11  1M Low                     1049 non-null   float64
 12  Volume                     1049 non-null   float64
 13  Prev Vol                   1049 non-null   float

In [63]:
X = stock_data[features]
y = stock_data[target]

In [64]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

In [65]:
lr = LinearRegression()
X_train = X_train.values.reshape(-1,1)
X_test = X_test.values.reshape(-1,1)
lr.fit(X_train, y_train)

LinearRegression()

In [66]:
train_score = lr.score(X_train, y_train)
test_score = lr.score(X_test, y_test)

print("Train Score: {}\nTest Score: {}".format(train_score, test_score))

preds = lr.predict(X_test)

score = explained_variance_score(y_test, preds)
mae = mean_absolute_error(y_test, preds)
rmse = math.sqrt(mean_squared_error(y_test, preds))
r2 = r2_score(y_test, preds)

print("score = {:.5f} | MAE = {:.3f} | RMSE = {:.3f} | R2 = {:.5f}"
          .format(score, mae, rmse, r2))

Train Score: 0.011298217654080345
Test Score: 0.018513971549490083
score = 0.01894 | MAE = 0.010 | RMSE = 0.013 | R2 = 0.01851


<h4>Evaluating the Model</h4>

In [83]:
features = ["Prev Vol vs 5D Avg Vol"
            , "Monthly Vol vs Prev Vol"
            , "Previous Price Change"
            , "2 Days Ago Price Change"]
target = "Intraday Price Change"
stock_data.info()
stock_data.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1049 entries, 0 to 1049
Data columns (total 24 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   index                      1049 non-null   int64  
 1   Symbol                     1049 non-null   object 
 2   Name                       1049 non-null   object 
 3   Open                       1049 non-null   float64
 4   Last                       1049 non-null   float64
 5   Prev Open                  1049 non-null   float64
 6   Previous                   1049 non-null   float64
 7   Open 2D Ago                1049 non-null   float64
 8   Last 2D Ago                1049 non-null   float64
 9   1M MA                      1049 non-null   float64
 10  1M High                    1049 non-null   float64
 11  1M Low                     1049 non-null   float64
 12  Volume                     1049 non-null   float64
 13  Prev Vol                   1049 non-null   float

Unnamed: 0,index,Symbol,Name,Open,Last,Prev Open,Previous,Open 2D Ago,Last 2D Ago,1M MA,...,5D Avg Vol,1M Avg Vol,Daily Volume Change,Adjusted Daily Vol Change,Prev Vol vs 5D Avg Vol,Monthly Vol vs Daily Vol,Monthly Vol vs Prev Vol,Intraday Price Change,Previous Price Change,2 Days Ago Price Change
0,0,A,Agilent Technologies,105.95,106.7,104.1,105.32,105.49,105.06,101.59,...,917600.0,1029091.0,0.437773,0.410602,0.132701,0.009985,-0.297535,0.007079,0.01172,-0.004076
1,1,AAL,American Airlines Gp,12.31,12.46,12.22,12.23,12.35,12.36,12.61,...,41473102.0,56220273.0,-0.031348,-0.058518,-0.21112,-0.418052,-0.399218,0.012185,0.000818,0.00081
2,2,AAP,Advance Auto Parts Inc,157.52,154.99,154.98,157.52,158.16,156.58,153.74,...,645960.0,723400.0,-0.315812,-0.342982,-0.261327,-0.340402,-0.035941,-0.016061,0.016389,-0.00999
3,3,AAPL,Apple Inc,121.28,119.02,118.72,120.71,121.0,121.19,114.75,...,176314484.0,153805250.0,0.025183,-0.001987,-0.345523,-0.249741,-0.268171,-0.018635,0.016762,0.00157
4,4,ABBV,Abbvie Inc,85.89,86.27,85.41,85.23,86.83,86.07,87.34,...,6517480.0,6981727.0,-0.134173,-0.161343,-0.177186,-0.231898,-0.11287,0.004424,-0.002107,-0.008753


In [79]:
X = stock_data[features]
y = stock_data[target]

In [80]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

In [81]:
lr = LinearRegression()
lr.fit(X_train, y_train)

LinearRegression()

In [82]:
train_score = lr.score(X_train, y_train)
test_score = lr.score(X_test, y_test)

print("Train Score: {}\nTest Score: {}".format(train_score, test_score))

preds = lr.predict(X_test)

score = explained_variance_score(y_test, preds)
mae = mean_absolute_error(y_test, preds)
rmse = math.sqrt(mean_squared_error(y_test, preds))
r2 = r2_score(y_test, preds)

print("score = {:.5f} | MAE = {:.3f} | RMSE = {:.3f} | R2 = {:.5f}"
          .format(score, mae, rmse, r2))

Train Score: 0.02285030871969973
Test Score: 0.004456631187247484
score = 0.01086 | MAE = 0.009 | RMSE = 0.013 | R2 = 0.00446


<h4>Getting Data from Yahoo Finance</h4>

In [102]:
# !pip install yfinance
import yfinance as yf
chosen_stocks = "WFC AMZN FB INTC"
split_stocks = chosen_stocks.split(" ")
stock_info = yf.Tickers(chosen_stocks)
stock_histories = {}
i = 0

# for stock in stocks.tickers:
#     stock_history


In [39]:
features = ["Open", "Low", "Volume", "Dividends", "Stock Splits"]
target = "Close"

X = wfc_history[features]
y = wfc_history[target]

In [40]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

In [41]:
lr = LinearRegression()
lr.fit(X_train, y_train)

LinearRegression()

In [42]:
train_score = lr.score(X_train, y_train)
test_score = lr.score(X_test, y_test)

print("Train Score: {}\nTest Score: {}".format(train_score, test_score))

preds = lr.predict(X_test)

score = explained_variance_score(y_test, preds)
mae = mean_absolute_error(y_test, preds)
rmse = math.sqrt(mean_squared_error(y_test, preds))
r2 = r2_score(y_test, preds)

print("score = {:.5f} | MAE = {:.3f} | RMSE = {:.3f} | R2 = {:.5f}"
          .format(score, mae, rmse, r2))

Train Score: 0.9998354234144025
Test Score: 0.9997608347484568
score = 0.99976 | MAE = 3.748 | RMSE = 10.114 | R2 = 0.99976


<h2>Flask Application</h2>
Not sure if this will be a final section or not.