# Predicting the Stock Market

In this project, I'll be working with data from the S&P500 Index. I'll be using historical data on the price of the S&P500 Index to make predictions about future prices. Predicting whether an index will go up or down will help me forecast how the stock market as a whole will perform. Since stocks tend to correlate with how well the economy as a whole is performing, it can also help me make economic forecasts.

In [5]:
import pandas as pd
from datetime import datetime
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error

#Prepping Dataframe
df = pd.read_csv('/Users/miesner.jacob/Desktop/DataQuest/datasets/sphist.csv')
df['Date'] = pd.to_datetime(df['Date'])
df= df.sort_values('Date', ascending = True)
df = df.reset_index(drop=True)

#Calculating derived metrics
df['five_day_ma'] = df['Close'].rolling(window=5).mean().shift(1)
df['year_ma'] = df['Close'].rolling(window=365).mean().shift(1)
df['mean_ratio'] = df['five_day_ma'] / df['year_ma']

df['five_day_std'] = df['Close'].rolling(window=5).std().shift(1)
df['year_std'] = df['Close'].rolling(window=365).std().shift(1)
df['std_ratio'] = df['five_day_std'] / df['year_std'] 

#getting rid of rows with insufficient data
df = df[df["Date"] > datetime(year=1951, month=1, day=2)]
df = df.dropna(axis = 0)

#Creating Train/Test Split
train = df[df["Date"] < datetime(year=2013, month=1, day=1)]
test = df[df["Date"] > datetime(year=2013, month=1, day=1)]

#Fitting Model
original_columns = ['Close','High','Low','Open','Volume','Adj Close','Date']
lr = LinearRegression()
lr.fit(train.drop(original_columns, axis = 1),train['Close'])

#Making predictions and evaluting error metrics
predictions = lr.predict(test.drop(original_columns, axis = 1))
print('MAE:',"${:,.2f}".format(mean_absolute_error(test['Close'], predictions)))

MAE: $16.13


This simple model was able to predict the price of the S&P 500 within an average of $16.13 using just a few derived financial technical analysis metrics. This shows the power of machine learning. If we were to delve deeper into the inputs, model selection and construction than we wuld be able to come up with an even more accurate predictor!