# QUIZ 03

We will be using the `Weekly` data set which is a part of the `ISLR2` package. This data is similar in nature to the `Smarket` data from this chapter's lab, except that it contains 1, 089 weekly returns for 21 years, from the beginning of 1990 to the end of 2010.

In [1]:
library(ISLR2)
library(MASS)
library(class)

head(Weekly)
attach(Weekly)


Attaching package: ‘MASS’


The following object is masked from ‘package:ISLR2’:

    Boston




Unnamed: 0_level_0,Year,Lag1,Lag2,Lag3,Lag4,Lag5,Volume,Today,Direction
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>
1,1990,0.816,1.572,-3.936,-0.229,-3.484,0.154976,-0.27,Down
2,1990,-0.27,0.816,1.572,-3.936,-0.229,0.148574,-2.576,Down
3,1990,-2.576,-0.27,0.816,1.572,-3.936,0.1598375,3.514,Up
4,1990,3.514,-2.576,-0.27,0.816,1.572,0.16163,0.712,Up
5,1990,0.712,3.514,-2.576,-0.27,0.816,0.153728,1.178,Up
6,1990,1.178,0.712,3.514,-2.576,-0.27,0.154444,-1.372,Down


(a) Use the full data set to perform a logistic regression with `Direction` as the response and the five `lag` variables plus `Volume` as predictors.

In [2]:
# Fit the logistic regression model
lr.fit <- glm(Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume, data = Weekly, family = binomial)

# Summarize the fit
summary(lr.fit)


Call:
glm(formula = Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + 
    Volume, family = binomial, data = Weekly)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.6949  -1.2565   0.9913   1.0849   1.4579  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  0.26686    0.08593   3.106   0.0019 **
Lag1        -0.04127    0.02641  -1.563   0.1181   
Lag2         0.05844    0.02686   2.175   0.0296 * 
Lag3        -0.01606    0.02666  -0.602   0.5469   
Lag4        -0.02779    0.02646  -1.050   0.2937   
Lag5        -0.01447    0.02638  -0.549   0.5833   
Volume      -0.02274    0.03690  -0.616   0.5377   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1496.2  on 1088  degrees of freedom
Residual deviance: 1486.4  on 1082  degrees of freedom
AIC: 1500.4

Number of Fisher Scoring iterations: 4


In [3]:
#hidden test cases

(b) Find overall fraction of correct predictions

In [4]:
# Predict probabilities
probabilities <- predict(lr.fit, type = "response")

# Convert probabilities to predicted class labels
predicted_direction <- ifelse(probabilities > 0.5, "Up", "Down")

# Calculate accuracy
accuracy <- mean(predicted_direction == Weekly$Direction)
accuracy


In [5]:
#hidden test cases

(c) Now fit the logistic regression model using a training data period from 1990 to 2008, with Lag2 as the only predictor. Compute the overall fraction of correct predictions for the held out data (that is, the data from 2009 and 2010)

In [6]:
library(dplyr)
library(MASS)
library(class)


# Create training and test sets
train <- Weekly %>% filter(Year < 2009)
test <- Weekly %>% filter(Year >= 2009)

# Fit the logistic regression model on the training data
lr.fit_train <- glm(Direction ~ Lag2, data = train, family = binomial)

# Predict probabilities on the test data
probabilities_test <- predict(lr.fit_train, newdata = test, type = "response")

# Convert probabilities to predicted class labels
predicted_direction_test <- ifelse(probabilities_test > 0.5, "Up", "Down")

# Calculate accuracy on the test data
accuracy <- mean(predicted_direction_test == test$Direction)
accuracy



Attaching package: ‘dplyr’


The following object is masked from ‘package:MASS’:

    select


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union




In [7]:
#hidden test case

(d) Repeat (c) using `LDA`

In [8]:
library(ISLR2)
library(dplyr)
library(MASS)

# Create training and test sets
train <- Weekly %>% filter(Year < 2009)
test <- Weekly %>% filter(Year >= 2009)

# Fit the LDA model on the training data
lda.fit <- lda(Direction ~ Lag2, data = train)

# Predict on the test data
lda.pred <- predict(lda.fit, newdata = test)
predicted_direction_lda <- lda.pred$class

# Calculate accuracy on the test data
accuracy <- mean(predicted_direction_lda == test$Direction)
print(paste("LDA Test accuracy (2009-2010):", accuracy))


[1] "LDA Test accuracy (2009-2010): 0.625"


In [9]:
#hidden test case


(d) Repeat (c) using `QDA`

In [10]:
# Fit the QDA model on the training data
qda.fit <- qda(Direction ~ Lag2, data = train)

# Predict on the test data
qda.pred <- predict(qda.fit, newdata = test)
predicted_direction_qda <- qda.pred$class

# Calculate accuracy on the test data
accuracy <- mean(predicted_direction_qda == test$Direction)
print(paste("QDA Test accuracy (2009-2010):", accuracy))

[1] "QDA Test accuracy (2009-2010): 0.586538461538462"


In [11]:
#hidden test case


(e) Repeat (c) using KNN with K = 1

In [12]:
library(class)

# Prepare the training and test data for KNN
train.X <- as.matrix(train$Lag2)
test.X <- as.matrix(test$Lag2)
train.Direction <- train$Direction

# Fit the KNN model with K=1
set.seed(1)  # For reproducibility
knn.pred <- knn(train.X, test.X, train.Direction, k = 1)

# Calculate accuracy on the test data
accuracy <- mean(knn.pred == test$Direction)
print(paste("KNN Test accuracy (2009-2010):", accuracy))

[1] "KNN Test accuracy (2009-2010): 0.5"


In [13]:
#hidden test case