Group Members
- Jonathan Conn
- Matt Boulden
- Arunam Gupta


In [4]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn import preprocessing

#### US Economy Baseline
The Dow Jones Industrial Average tracks the 30 largest US companies.  
The S&P 500 tracks the 500 largest US companies and represents 75% of all publicly traded stocks.  
The Nasdaq tracks ~3,000 companies traded on the Nasdaq exchange, while the former two use the NY exchange.  
Together, these market indexes are used to determine how the US economy is performing.

In [3]:
dow_df = pd.read_csv('data/dow_data.csv')
nas_df = pd.read_csv('data/nasdaq_data.csv')
sp_df = pd.read_csv('data/s&p_data.csv')

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2019-12-16,28191.669922,28337.490234,28191.669922,28235.890625,28235.890625,286770000
1,2019-12-17,28221.750000,28328.630859,28220.560547,28267.160156,28267.160156,286770000
2,2019-12-18,28291.439453,28323.250000,28239.279297,28239.279297,28239.279297,289890000
3,2019-12-19,28278.310547,28381.480469,28278.240234,28376.960938,28376.960938,262570000
4,2019-12-20,28608.640625,28608.640625,28445.599609,28455.089844,28455.089844,603780000
...,...,...,...,...,...,...,...
248,2020-12-09,30229.810547,30319.699219,29951.849609,30068.810547,30068.810547,380520000
249,2020-12-10,30032.550781,30063.869141,29876.820313,29999.259766,29999.259766,325550000
250,2020-12-11,29988.210938,30071.130859,29820.839844,30046.369141,30046.369141,393870000
251,2020-12-14,30123.910156,30325.789063,29849.150391,29861.550781,29861.550781,371980000


#### Standardizing Our Data 
The price points of each index are vastly different.  
To make a good comparison we are going to normalize the data, aggregate it, and visualize it.  
We are going to call this new dataframe our Master and it will represent the US economy.

In [35]:
# seperating price data for each index
price_dow = dow_df[['Open', 'High', 'Low', 'Close']]
price_nas = nas_df[['Open', 'High', 'Low', 'Close']]
price_sp = sp_df[['Open', 'High', 'Low', 'Close']]

# normalizing each index
norm_dow = (price_dow - price_dow.mean()) / price_dow.std()
norm_nas = (price_nas - price_nas.mean()) / price_nas.std()
norm_sp = (price_sp - price_sp.mean()) / price_sp.std()


# adding backs correct datetime
norm_dow.insert(loc=0, column='Date', value=dow_df['Date'])
norm_nas.insert(loc=0, column='Date', value=nas_df['Date'])
norm_sp.insert(loc=0, column='Date', value=sp_df['Date'])

# creating master dataframe
master_df = pd.concat([norm_dow, norm_nas, norm_sp])

# converting dates to correct datatime type and sorting
master_df['Date'] = pd.to_datetime(master_df.Date)
master_df = master_df.sort_values('Date')

master_df

Unnamed: 0,Date,Open,High,Low,Close
0,2019-12-16,0.564158,0.555093,0.633392,0.583475
0,2019-12-16,-0.889918,-0.937327,-0.818666,-0.881095
0,2019-12-16,-0.041580,-0.077621,0.046095,-0.015791
1,2019-12-17,0.576487,0.551317,0.644736,0.596283
1,2019-12-17,-0.862429,-0.938393,-0.808031,-0.874494
...,...,...,...,...,...
251,2020-12-14,1.356139,1.402627,1.284216,1.249368
251,2020-12-14,1.743872,1.771338,1.793942,1.740626
252,2020-12-15,1.812898,1.810133,1.817401,1.852717
252,2020-12-15,1.272188,1.367448,1.302200,1.387719
