<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# A simplistic approach to time series modeling

---


<h1>Lab Guide<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#A-simplistic-approach-to-time-series-modeling" data-toc-modified-id="A-simplistic-approach-to-time-series-modeling-1">A simplistic approach to time series modeling</a></span><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#Load-the-stock-data" data-toc-modified-id="Load-the-stock-data-1.0.1">Load the stock data</a></span><ul class="toc-item"><li><span><a href="#Load-the-data" data-toc-modified-id="Load-the-data-1.0.1.1">Load the data</a></span></li><li><span><a href="#Sort-the-rows-by-Date-in-ascending-order" data-toc-modified-id="Sort-the-rows-by-Date-in-ascending-order-1.0.1.2">Sort the rows by <code>Date</code> in ascending order</a></span></li></ul></li><li><span><a href="#Make-the-Date-column-the-index-of-the-DataFrame." data-toc-modified-id="Make-the-Date-column-the-index-of-the-DataFrame.-1.0.2">Make the <code>Date</code> column the index of the DataFrame.</a></span></li><li><span><a href="#Predicting-price-developments" data-toc-modified-id="Predicting-price-developments-1.0.3">Predicting price developments</a></span><ul class="toc-item"><li><span><a href="#Create-the-data-frame" data-toc-modified-id="Create-the-data-frame-1.0.3.1">Create the data frame</a></span></li><li><span><a href="#Drop-missing-values" data-toc-modified-id="Drop-missing-values-1.0.3.2">Drop missing values</a></span></li><li><span><a href="#Look-at-correlations-between-the-variables" data-toc-modified-id="Look-at-correlations-between-the-variables-1.0.3.3">Look at correlations between the variables</a></span></li><li><span><a href="#Extract-the-outcome-variable" data-toc-modified-id="Extract-the-outcome-variable-1.0.3.4">Extract the outcome variable</a></span></li><li><span><a href="#Create-a-train-test" data-toc-modified-id="Create-a-train-test-1.0.3.5">Create a train-test</a></span></li><li><span><a href="#Fit-a-linear-regression-model-and-evaluate-it-on-the-train-and-test-set." data-toc-modified-id="Fit-a-linear-regression-model-and-evaluate-it-on-the-train-and-test-set.-1.0.3.6">Fit a linear regression model and evaluate it on the train and test set.</a></span></li><li><span><a href="#Obtain-the-prediction-for-the-train-and-test-set-and-plot-them-together-with-the-true-values." data-toc-modified-id="Obtain-the-prediction-for-the-train-and-test-set-and-plot-them-together-with-the-true-values.-1.0.3.7">Obtain the prediction for the train and test set and plot them together with the true values.</a></span></li><li><span><a href="#Fit-a-random-forest-model-instead.-Does-that-lead-to-an-improvement?" data-toc-modified-id="Fit-a-random-forest-model-instead.-Does-that-lead-to-an-improvement?-1.0.3.8">Fit a random forest model instead. Does that lead to an improvement?</a></span></li></ul></li><li><span><a href="#Fit-a-linear-regression-model-taking-additionally-the-prices-of-the-day-before-yesterday-as-predictors-into-account." data-toc-modified-id="Fit-a-linear-regression-model-taking-additionally-the-prices-of-the-day-before-yesterday-as-predictors-into-account.-1.0.4">Fit a linear regression model taking additionally the prices of the day before yesterday as predictors into account.</a></span></li><li><span><a href="#Fitting-a-classification-model" data-toc-modified-id="Fitting-a-classification-model-1.0.5">Fitting a classification model</a></span><ul class="toc-item"><li><span><a href="#Predict-rise-or-drop-with-yesterday's-close-price-and-today's-open-price." data-toc-modified-id="Predict-rise-or-drop-with-yesterday's-close-price-and-today's-open-price.-1.0.5.1">Predict rise or drop with yesterday's close price and today's open price.</a></span></li><li><span><a href="#Determine-the-baseline-for-the-model" data-toc-modified-id="Determine-the-baseline-for-the-model-1.0.5.2">Determine the baseline for the model</a></span></li><li><span><a href="#Create-a-train-test-split" data-toc-modified-id="Create-a-train-test-split-1.0.5.3">Create a train-test split</a></span></li><li><span><a href="#Fit-and-evaluate-a-logistic-regression-model." data-toc-modified-id="Fit-and-evaluate-a-logistic-regression-model.-1.0.5.4">Fit and evaluate a logistic regression model.</a></span></li><li><span><a href="#Fit-and-evaluate-a-random-forest-classifier." data-toc-modified-id="Fit-and-evaluate-a-random-forest-classifier.-1.0.5.5">Fit and evaluate a random forest classifier.</a></span></li></ul></li></ul></li></ul></li></ul></div>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
 
plt.style.use('fivethirtyeight')

%config InlineBackend.figure_format = 'retina'
%matplotlib inline

### Problem Statement

Are we able to predict stock price of Apple (AAPL) based on historical data?
We have obtained one year of daily stock data from 2016 to 2017.

- `Close`: The price of AAPL when the stock market closes (at 4:30pm ET)
- `High`: The highest price of AAPL during that trading day
- `Low`: The lowest price of AAPL during that trading day
- `Open`: The price of AAPL when the stock market opens (at 9:30am ET)
- `Volume`: How many shares of AAPL were traded that day

#### Load the data

In [None]:
df = pd.read_csv('./datasets/aapl.csv', parse_dates=['Date'])
df.head()

In [None]:
df.info()

#### Sort the rows by `Date` in ascending order

In [None]:
df = df.sort_values(????)


### Make the `Date` column the index of the DataFrame.

Making the index a datetime allows us to easily order the data by time. Doing this will result in dataframe objects indexed by DateTime - literally Time Series!

In [None]:
df.set_index(????,inplace=True,drop=True)

In [None]:
df.index.name = None

In [None]:
#make a quick plot of open, high, low, close
df['Open'].plot(color='red', alpha=0.5, linewidth=2, label='Open')
plt.legend()
plt.show()

### Predicting price developments

Let's create a simple predictive model for time series.
- As the outcome variable we take today's Close price. 
- As predictors we use yesterday's Close price and today's Open price. 

#### Create the data frame

Use `.shift()` to create a column containing yesterday's closing prices.

In [None]:
df['Close_Day_Before'] = ???
X = df[['Close','Close_Day_Before','Open']].copy()
X.head()

#### Drop missing values

#### Look at correlations between the variables

#### Extract the outcome variable

In [None]:
y = X.pop('Close')

#### Create a train-test

Make sure to split in the order of the dates (why???)

Do we still need `sklearn.model_selection.train_test_split`??

In [None]:
#from sklearn.model_selection import train_test_split
#X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33,shuffle=False)

In [None]:
# Or just specify the number of days to use for training
n = 150
X_train, y_train = X[:n], y[:n]
X_test, y_test = X[n:], y[n:]

In [None]:
# Check the training and test sets

#### Fit a linear regression model and evaluate it on the train and test set.

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
model = ????
model.???(X_train, y_train)
print(model.score(X_train, y_train))
print(model.score(X_test, y_test))

#### Obtain the prediction for the train and test set and plot them together with the true values.

In [None]:
predictions_train = model.predict(X_train)
predictions_test = model.predict(X_test)

In [None]:
X['predictions'] = np.concatenate([predictions_train,predictions_test])

In [None]:
plt.figure(figsize=(12,6))
y.plot( lw=2, label='actual')
X['predictions'].plot(label='predicted', lw=2)

plt.legend()
# we split the training and test set at n
plt.vlines(X.index[n],90,120, color='g', lw=2)
plt.show()

In [None]:
# Take a closer look at the border between the training and test set
X['predictions'][n-20:n+20].plot()
y[n-20:n+20].plot()
plt.vlines(X.index[n],90,120, color='b')
plt.show()

In [None]:
X['predictions'][n:].plot()
y[n:].plot()
plt.vlines(X.index[n],90,120)
plt.show()

#### Fit a random forest model instead. Does that lead to an improvement?

In [None]:
from sklearn.ensemble import RandomForestRegressor

In [None]:
model = RandomForestRegressor(n_estimators=100)
# fit and score


### Fit a linear regression model taking additionally the prices of the day before yesterday as predictors into account.

In [None]:
df['Close_shift_1'] = ???
df['Close_shift_2'] = ???
X = df[['Close','Close_shift_1','Close_shift_2','Open']].copy()
X.head()

In [None]:
# drop the nulls

In [None]:
# Get the target column


In [None]:
# create training set for first 150 days, test set for remainder


In [None]:
# Fit and score the Linear Regression model - are the results better?



### Fitting a classification model

What if we want to predict whether the price will go **up** or **down** the next day?

Create a binary variable which indicates for consecutive days if the closing price rose or dropped.

In [None]:
df['up'] = (df.Close.diff()>0)*1


In [None]:
df.head()

#### Predict rise or drop with yesterday's close price and today's open price.

In [None]:
X = df[['up','Close_shift_1','Open']].copy()
X.dropna(inplace=True)
y = X.pop('up')

#### Determine the baseline for the model

In [None]:
y.value_counts(normalize=True)

#### Create a train-test split

In [None]:
n = 150
X_train, y_train = X[:n], y[:n]
X_test, y_test = X[n:], y[n:]

#### Fit and evaluate a logistic regression model.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix,ConfusionMatrixDisplay

In [None]:
lr_model = ???
lr_model.fit(????)
print(f'Training Accuracy: {lr_model.score(X_train, y_train)}')
print(f'Test Accuracy: {lr_model.score(X_test, y_test):.3f}')


In [None]:
# Plot confusion matrix

lr_cm = confusion_matrix(y_test,lr_model.predict(X_test))
disp = ConfusionMatrixDisplay(lr_cm, display_labels=['down = 0', 'up = 1'])
disp.plot()
plt.show()

#### Fit and evaluate a random forest classifier.

In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
rf_model = RandomForestClassifier(n_estimators=100)
# Fit and train the rf_model,generate the confusion matrix plot too


## Compare side by side

In [None]:
plt.figure(figsize=(24,12))

plt.suptitle("Confusion Matrixes",fontsize=24)
plt.subplots_adjust(wspace = 0.4, hspace= 0.4)

plt.subplot(1,2,1) #2x2 grid position 1
plt.title("Logistic Regression Confusion Matrix")
sns.heatmap(lr_cm,annot=True,cmap="Blues",fmt="d",cbar=False, annot_kws={"size": 24})

# add the confusion matrix for random forest


## Evaluating with ROC Curve

Another method of evaluating classification models is to use the ROC (Receiver Operating Characteristic) Curve

In [None]:
from sklearn.metrics import roc_auc_score, RocCurveDisplay
RocCurveDisplay.from_predictions(y_test, lr_model.predict(X_test), plot_chance_level=True)
plt.show()

### ROC AUC

The more area under this blue curve is, the better separated our distributions are.
- Check out this gif ([source](https://twitter.com/DrHughHarvey/status/1104435699095404544)):

![](https://media.giphy.com/media/H1SZ5oRLIuZ1t1c4Di/giphy.gif)

We use the **area under the ROC curve** (abbreviated **ROC AUC** or **AUC ROC**) to quantify the gap between our distributions.

In [None]:
print(f'AUC Score for Logistic Regression : {roc_auc_score(y_test, lr_model.predict(X_test)):.3f}')


Show the ROC Curve for the random forest classifer too.


### Interpreting ROC AUC
- If you have an ROC AUC of 0.5, your positive and negative populations perfectly overlap and your model is as bad as it can get.
- If you have an ROC AUC of 1, your positive and negative populations are perfectly separated and your model is as good as it can get.
- The closer your ROC AUC is to 1, the better. (1 is the maximum score.)
- If you have an ROC AUC of below 0.5, your positive and negative distributions have flipped sides. By flipping your predicted values (i.e. flipping predicted 1s and 0s), your ROC AUC will now be above 0.5.
    - Example: You have an ROC AUC of 0.2. If you change your predicted 1s to 0s and your predicted 0s to 1s, your ROC AUC will now be 0.8!

## Conclusion

What conclusions and recommendations can you make about the models that we have developed?