# 5. Selecting the Tuning Parameter 

## 5.1 [Ex] Validation Set 

- alpha = 0 : ridge regression 
- alpha = 1 : lasso regression 

```R
# Make dataset 
library(glmnet)
library(ISLR) 
names(Hitters) 
Hitters <- na.omit(Hitters) 

set.seed(123)
x <- model.matrix(Salary~., Hitters)[, -1] 
y <- Hitters$Salary

# Train-Test Split
train <- sample(1:nrow(x), nrow(x)/3) 
test <- (-train) 
y.test <- y[test]

# Hyperparameter tuning 
grid <- 10^seq(10, -2, length=100) 
r1 <- glmnet(x[train, ], y[train], alpha=0, lambda=grid)
ss <- 0:(length(r1$lambda)-1) 
Err <- NULL

# Cross validation Error for test sample 
for (i in 1:length(r1$lambda)) { 
    r1.pred <- predict(r1, s=ss[i], newx=x[test, ])
    Err[i] <- mean((r1.pred - y.test)^2) 
} 
wh <- which.min(Err) 
lam.opt <- r1$lambda[wh] 

# Get full model with optimized hyperparmeter 
r.full <- glmnet(x, y, alpha=0, lambda=grid) 
r.full$beta[, wh] 
predict(r.full, type="coefficients", s=lam.opt) 
```

## 5.2 [Ex] K-fold Cross Validation 

```R
set.seed(1234)
cv.r <- cv.glmnet(x, y, alpha=0, nfolds=10)
names(cv.r) 
# cvm : The mean value of cross validation -> CVE 
# cvsd : The standard deviation of cross validation -> One-standard error 
# cvup : The upperbound of CVE -> cvm + cvsd 
# cvlo : The lowerbound of CVE -> cvm - cvsd 
# lambda.min : The lambda which optimize input model 
# lambda.1se : The lambda which optimize imput model based on one-standard error 

cbind(cv.r$cvlo, cv.r$cvm, cv.r$cvup)
# Scatter plot based on One-Standard error 
# left vertix line : log(lambda.min) 
# right vertix line(more shrinked model) : log(lambda.1se) 
plot(cv.r) 

which(cv.r$lambda==cv.r$lambda.min)
which(cv.r$lambda==cv.r$lambda.1se)
# 100, 54 -> lambda.min < lambda.1se 

b.min <- predict(cv.r, type="coefficients", s=cv.r$lambda.min)
b.1se <- predict(cv.r, type="coefficients", s=cv.r$lambda.1se)

# calculate l1-norm
# calculate sum(b.min!=0) - 1 to get l2-norm 
cbind(b.min, b.1se)
c(sum(b.min[-1]^2), sum(b.1se[-1]^2))
# sum(b.min[-1]^2) > sum(b.1se[-1]^2) 
```

# 6.1 Consider reality 

## 6.1 The Bias-Variance tradeoff

- If lambda axis increases to the right 
- Overfittng vs Underfitting 
- (Low bias + High variance) vs (High bias + Low variance)
- (l1-norm, l2-norm increase) vs (l1-norm, l2-norm decrease) 
- $\lambda$ decrease vs $\lambda$ increase

## 6.2 Comparison between Lasso and Ridge 

- If nonzero coefficient are large, ridge is better. 
- If nonzero coefficient are small, lasso is better. 
- In high-dimensional data where spares model is assummed, lasso perform better. 

# 7. Regularization Methods 

- Regularization methods are based on a penalized likelihood : 
    - $Q_{\lambda}(\beta_0, \beta) = -l(\beta_0, \beta) + p_{\lambda}(\beta)$
    - $(\hat{\beta_0}, \hat{\beta}) = arg min Q_{\lambda}(\beta_0, \beta)$ for a fixed $\lambda$.
- Penalized likelihood for quantitive 
    - Linear regression model : $y_i = \beta_0 + x_i^T \beta + \epsilon_i$ 
    - l1-norm : $\lambda \sum(\hat{\beta}^2)$ 
    - l2-norm : $\lambda \sum|\hat{\beta}|$ 
    - $Q_{\lambda}(\beta_0, \beta) = -l(\beta_0, \beta) + p_{\lambda}(\beta) = \frac{1}{2}\sum_{i=1}^{n}(y_i - \beta_0 + x_i^T \beta)^2 +  p_{\lambda}(\beta)$
- Penalized likelyhood for binary 
    - 
    - CVE based on deviance : 
    - CVE based on classification error : $CVE = \frac{1}{n}\sum\sum I(y_i - \hat{y_i})^{[-k]}$

## 7.1 [Ex] Heart Data(Binary Classification) 