# 4.6.4 Quadratic Discriminant Analysis

Load modules and data

In [2]:
from scipy import stats
import pandas as pd
import seaborn as sns
import scipy as sp
import numpy as np
import matplotlib as mpl
from matplotlib import pyplot as plt
from sklearn.preprocessing import scale
import sklearn.linear_model as skl_lm
from sklearn.metrics import mean_squared_error, r2_score
import statsmodels.api as sm
import statsmodels.formula.api as smf
%matplotlib inline
plt.style.use('seaborn-white')
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

Smarket = pd.read_csv('Data/Smarket.csv', usecols = range(1,10),parse_dates=True)

Now we will perform QDA on the Smarket data. In Python, we can fit a QDA model using the QuadraticDiscriminantAnalysis function, which is part of the sklearn library.

In [4]:
x_train = Smarket[0:sum(Smarket.Year<2005)][['Lag1','Lag2']]
y_train = Smarket[0:sum(Smarket.Year<2005)]['Direction']

qda = QuadraticDiscriminantAnalysis()
qda.fit(x_train, y_train);

### Prior probabilities of groups:

In [6]:
print("Down: %f" % qda.priors_[0])
print("Up: %f" % qda.priors_[1])

Down: 0.491984
Up: 0.508016


The LDA output indicates prior probabilities of ${\hat{\pi}}_1 = 0.492$ and ${\hat{\pi}}_2 = 0.508$; in other words,
49.2% of the training observations correspond to days during which the
market went down.

### Group means:

In [7]:
pd.DataFrame(qda.means_,['Down', 'Up'],['Lag1','Lag2'])

Unnamed: 0,Lag1,Lag2
Down,0.04279,0.033894
Up,-0.039546,-0.031325


The output contains the group means. But it does not contain the coefficients
of the linear discriminants, because the QDA classifier involves a
quadratic, rather than a linear, function of the predictors. 

The predict() function returns a list of QDA’s predictions about the movement of the market on the test data:

In [8]:
x_test = Smarket[sum(Smarket.Year<2005):][['Lag1','Lag2']] # Data from 2005
y_test = Smarket[sum(Smarket.Year<2005):]['Direction'] # Data from 2005
predict = qda.predict(x_test)
pd.DataFrame(confusion_matrix(y_test, predict).T,['Down', 'Up'],['Down','Up'])

Unnamed: 0,Down,Up
Down,30,20
Up,81,121


##### Mean value

In [10]:
(30+121.0)/(30+20+81+121)

0.5992063492063492

In [11]:
print(classification_report(y_test, predict, digits=3))

             precision    recall  f1-score   support

       Down      0.600     0.270     0.373       111
         Up      0.599     0.858     0.706       141

avg / total      0.599     0.599     0.559       252



Interestingly, the QDA predictions are accurate almost 60% of the time,
even though the 2005 data was not used to fit the model. This level of accuracy
is quite impressive for stock market data, which is known to be quite
hard to model accurately. This suggests that the quadratic form assumed
by QDA may capture the true relationship more accurately than the linear
forms assumed by LDA and logistic regression. However, we recommend
evaluating this method’s performance on a larger test set before betting
that this approach will consistently beat the market!

# Bonus

Indraget Volume

In [54]:
x_train = Smarket[0:sum(Smarket.Year<2005)][['Lag1','Lag2','Volume']]
y_train = Smarket[0:sum(Smarket.Year<2005)]['Direction']

qda = QuadraticDiscriminantAnalysis()
qda.fit(x_train, y_train);
x_test = Smarket[sum(Smarket.Year<2005):][['Lag1','Lag2','Volume']] # Data from 2005
y_test = Smarket[sum(Smarket.Year<2005):]['Direction'] # Data from 2005
predict = qda.predict(x_test)
print(classification_report(y_test, predict, digits=3))

             precision    recall  f1-score   support

       Down      0.433     0.757     0.551       111
         Up      0.534     0.220     0.312       141

avg / total      0.490     0.456     0.417       252



Forudser kun korrekt 45,6%

In [58]:
x_train = Smarket[0:sum(Smarket.Year<2004)][['Lag1','Lag2']]
y_train = Smarket[0:sum(Smarket.Year<2004)]['Direction']

qda = QuadraticDiscriminantAnalysis()
qda.fit(x_train, y_train);
x_test = Smarket[sum(Smarket.Year<2005):][['Lag1','Lag2']] # Data from 2005
y_test = Smarket[sum(Smarket.Year<2005):]['Direction'] # Data from 2005
predict = qda.predict(x_test)
print(classification_report(y_test, predict, digits=3))

             precision    recall  f1-score   support

       Down      0.443     0.910     0.596       111
         Up      0.583     0.099     0.170       141

avg / total      0.522     0.456     0.357       252



Mindre testdata sæt gør det svære at forudse hvorvidt markedet vil stige eller falde