# Last week

-   Linear models:

    -   Relationship between $y$ and $\mathbf{x}$ given by:
        $$
            y_i = \mu + \mathbf{x}_i'\bm{\beta} + \epsilon_i
        $$

    -   Estimated by minimizing **loss function** (OLS):
        $$
        L(\mu, \bm{\beta}) = 
            \underbrace{\sum_{i=1}^N \Bigl(
            y_i - \mu - \mathbf{x}_i'\bm{\beta}\Bigr)^2}_{\text{Sum of squared errors}}
        $$
-   Creating additional features using polynomials
-   Cross-validation to determine hyperparameters (polynomial degree)

***
# This week

1.  Linear regression models with **regularization**:

    -   Ridge regression
    -   Lasso
    -   Elastic net (not covered, but trivial combination of Ridge & Lasso)

2.  Models for **classification**:

    -   Logistic regression
    -   Random forest
    -   Decision trees (in lecture notes)
    -   Support Vector Machines (in lecture notes)

***
# Linear regression models

***
## Ridge regression

-   Loss function:
    $$
    L(\mu, \bm{\beta}) = 
        \underbrace{\sum_{i=1}^N \Bigl(
        y_i - \mu - \mathbf{x}_i'\bm{\beta}\Bigr)^2}_{\text{Sum of squared errors}}
        + 
        \underbrace{\alpha \sum_{k=1}^K\beta_k^2}_{\text{L2 penalty}}
    $$

-   Penalty term introduces **shrinkage** or **regularization**

    -   Large coefficients $\beta_k$ are penalized

-   Resulting model is biased, but has lower variance when making predictions
    on new data

***
### Example: Polynomial approximation

-   True relationship given by trigonometric function, measured
    with error $\epsilon_i$:
    $$
    \begin{aligned}
    y_i &= \cos\left( \frac{3}{2}\pi x_i \right) + \epsilon_i \\
        \epsilon_i &\stackrel{\text{iid}}{\sim} \mathcal{N}\left(0, 0.5^2\right)
    \end{aligned}
    $$

-   Want to approximate this function with polynomials

#### Step 1: Create sample

In [None]:
# Enable automatic reloading of external modules
%load_ext autoreload
%autoreload 2

In [None]:
from lecture12_regression import create_trig_sample

# Sample size
N = 200

# Standard deviation of error term
sigma = 0.5

# Create sample data for trigonometric relationship between x and y
x, y = create_trig_sample(N=N, sigma=sigma)

#### Step 2: Visualize sample and true function

In [None]:
from lecture12_regression import plot_trig_sample

plot_trig_sample(x, y)

#### Step 3: Estimate Ridge regression

1.  Assume function is **approximated** by polynomial of degree $K$:

    $$
    y_i \approx \mu + \beta_1 x_i + \beta_2 x_i^2 + \cdots + \beta_K x_i^K 
    $$

    -   Set $K=15$ for illustration

2.  Create pipeline ([`Pipeline`](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) or [`make_pipeline()`](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_pipeline.html)):

    1.  Feature transformation: [`PolynomialFeatures`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html)
    2.  Feature standardization: [`StandardScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
    3.  Estimation: [`Ridge`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html)
        -   Use regularization strength $\alpha = 3$
       
3.  Estimate model

In [None]:
# Max. polynomial degree
degree = 15

# TODO: Create pipeline with alpha = 3
# pipe_ridge = 

# TODO: Fit ridge regression 

#### Step 4: Estimate linear regression

-   Useful as benchmark model
-   Do we need feature standardization?

In [None]:
# TODO: Create pipeline with linear regression
# pipe_lr = 

#### Step 5: Plot predicted values

In [None]:
import numpy as np

# Values at which to predict
xvalues = np.linspace(0.0, 1.0, 100)

# TODO: Compute predicted values from Ridge 
# y_pred_ridge = 

# TODO: Compute predicted values from linear regression
# y_pred_lr =

In [None]:
# Plot sample and true relationship
ax = plot_trig_sample(x, y)

# Linear regression prediction
ax.plot(xvalues, y_pred_lr, c='purple', alpha=0.7, label='Linear regression')

# Ridge prediction
ax.plot(xvalues, y_pred_ridge, c='darkorange', lw=2.0, label='Ridge')

ax.legend()

***
#### Intuition: coefficients vs. regularization strength

-   What happens to magnitude of estimated $\bm\beta$ as we vary regularization strength $\alpha$?
-   Fit many Ridge models for a grid of $\alpha$, plot coefficients

In [None]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures, StandardScaler

# Create grid of alphas spaced uniformly in logs on [5e-3, 1000]
alphas = np.logspace(start=np.log10(5.0e-3), stop=3, num=100)

# Re-create pipeline w/o Ridge estimator, estimation step differs for each alpha
transform = make_pipeline(
    PolynomialFeatures(degree=degree, include_bias=False), StandardScaler()
)

# Create polynomial features
X_trans = transform.fit_transform(x[:, None])

# Array to store coefficients for all alphas
coefs = np.empty((len(alphas), X_trans.shape[1]))

# TODO: loop over alphas, fit Ridge for each alpha

In [None]:
import matplotlib.pyplot as plt

# Plot coefficient arrays against penalty strength
plt.figure(figsize=(6,4))

plt.plot(alphas, coefs, lw=1.0)

plt.xscale('log', base=10)
plt.axhline(0.0, ls='--', lw=0.75, c='black')
plt.xlabel(r'Regularization strength $\alpha$ (log scale)')
plt.ylabel('Coefficient value')
plt.title('Ridge coefficients as function of regularization strength')
plt.legend([rf'$\beta_{{{i}}}$' for i in range(degree)], ncols=5, loc='lower right')

***
### Tuning the regularization parameter via cross-validation

-   Regularization strength $\alpha$ can be cross-validated with
    [`RidgeCV`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html)
-   Uses MSE to find optimal $\alpha$
-   Use argument `store_cv_results=True` to store MSE for all candidate $\alpha$ (to plot validation curve)

#### Step 1: Run Ridge CV

-   `RidgeCV` does not support pipelines, transform features manually

In [None]:
from sklearn.linear_model import RidgeCV

# RidgeCV does not support pipelines, so we need to transform x before
# cross-validation.
transform = make_pipeline(
    PolynomialFeatures(degree=degree, include_bias=False), StandardScaler()
)

# Create standardized polynomial features
X_trans = transform.fit_transform(x[:, None])

# Define grid of alphas on [1e-5, 5]
N_alphas = 100
alphas = np.logspace(start=np.log10(1.0e-5), stop=np.log10(5), num=N_alphas)

# TODO: fit RidgeCV with alphas
# rcv = 

# TODO: store and report best alpha

#### Step 2: Plot validation curve

In [None]:
# TODO: Compute average MSE for each alpha
# mse_mean = 

In [None]:
import matplotlib.pyplot as plt

# Plot MSE against alphas, highlight minimum MSE
plt.plot(alphas, mse_mean)
plt.xlabel(r'Regularization strength $\alpha$ (log scale)')
plt.ylabel('Cross-validated MSE')
plt.scatter(alphas[imin], mse_mean[imin], s=15, c='black', zorder=100)
plt.axvline(alphas[imin], ls=':', lw=0.75, c='black')
plt.xscale('log')
plt.title('Validation curve for Ridge regression')

#### Step 3: Re-estimate model with optimal alpha (optional)

-   Not strictly need, could directly use fitted `RidgeCV` object
-   But `RidgeCV` does not support pipelines...

In [None]:
# TODO: Create pipeline with optimal alpha
pipe_ridge = make_pipeline(
    PolynomialFeatures(degree=degree, include_bias=False),
    StandardScaler(),
    # TODO: Add Ridge estimator
)

# TODO: Fit Ridge with optimal alpha

#### Step 4: Plot predictions from optimal Ridge

In [None]:
# Grid on which to evaluate predictions
xvalues = np.linspace(np.amin(x), np.amax(x), 100)

# TODO: Predicted values from Ridge regression
# y_pred = 

In [None]:
# Plot sample and true relationship
ax = plot_trig_sample(x, y)

# Plot predicted values from cross-validated Ridge regression
ax.plot(xvalues, y_pred, c='darkorange', lw=2.0, label=r'Ridge (optimal $\alpha$)')
ax.legend()

<div class="alert alert-info">
<h3> Your turn</h3>

Rerun the whole Ridge example with a smaller sample size of <i>N=50</i>. What happens to the optimal cross-validated penalty parameter <i>ɑ</i>?
</div>

***
## Lasso

-   Same idea as Ridge regression, but different penalty term:
    $$
    L(\mu, \bm{\beta}) = 
        \frac{1}{2N} \underbrace{\sum_{i=1}^N \Bigl(
        y_i - \mu - \mathbf{x}_i'\bm{\beta}\Bigr)^2}_{\text{Sum of squared errors}}
        + 
        \underbrace{\alpha \sum_{k=1}^K |\beta_k|}_{\text{L1 penalty}}
    $$

-   L1 penalty leads to **sparse models** with **fewer** nonzero coefficients

#### Step 1: Create sample

-   Recreate same sample as in Ridge example

In [None]:
from lecture12_regression import create_trig_sample

# Sample size
N = 200

# Standard deviation of error term
sigma = 0.5

# Create sample data for trigonometric relationship between x and y
x, y = create_trig_sample(N=N, sigma=sigma)

#### Step 2: Estimate Lasso

1.  Create pipeline ([`Pipeline`](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) or [`make_pipeline()`](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_pipeline.html)):

    1.  Feature transformation: [`PolynomialFeatures`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html)
    2.  Feature standardization: [`StandardScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
    3.  Estimation: [`Lasso`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html)
        -   Use regularization strength $\alpha = 0.0075$
        -   Might need to increase `max_iter` argument
       
2.  Estimate model


In [None]:
from sklearn.linear_model import Lasso
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.pipeline import make_pipeline

# Polynomial degree
degree = 15

# TODO: Build pipeline of transformations and Lasso estimation.
pipe_lasso = make_pipeline(
    PolynomialFeatures(degree=degree, include_bias=False),
    StandardScaler(),
    # TODO: Add Lasso estimator
)

# TODO: Fit Lasso

#### Step 3: Plot predicted values

In [None]:
# Grid on which to evaluate predictions
xvalues = np.linspace(np.amin(x), np.amax(x), 100)

# TODO: Predicted values from Lasso regression
# y_pred_lasso = 

In [None]:
ax = plot_trig_sample(x, y)

# Linear regression prediction
ax.plot(xvalues, y_pred_lr, c='purple', alpha=0.7, label='Linear regression')

# Lasso prediction
ax.plot(xvalues, y_pred_lasso, c='darkorange', lw=2.0, label='Lasso')

ax.legend()

***
#### Intuition: coefficients vs. regularization strength

-   Use [`lasso_path()`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.lasso_path.html)
    to compute coefficients for a grid of $\alpha$

In [None]:
from sklearn.linear_model import lasso_path

# Create grid of alphas spaced uniformly in logs
alphas = np.logspace(start=np.log10(1.0e-3), stop=np.log10(1.0), num=100)

# Re-create pipeline w/o Lasso estimator, estimation step differs for each alpha
transform = make_pipeline(
    PolynomialFeatures(
        degree=degree, 
        include_bias=False
    ),
    StandardScaler()
)

# Create polynomial features
X_trans = transform.fit_transform(x[:, None])

# Compute Lasso path
alphas, coefs, _ = lasso_path(X_trans, y, alphas=alphas, max_iter=100_000)


-   Plot number of nonzero coefficients on $\alpha$ grid

In [None]:
# Number of non-zero coefficients for each alpha. 
nonzero = np.sum(np.abs(coefs) > 1.0e-6, axis=0).astype(int)

# Plot number of non-zero coefficients against alpha
plt.plot(alphas, nonzero, lw=1.5, c='steelblue')
plt.xscale('log', base=10)
plt.yticks(np.arange(0, np.amax(nonzero) + 1))
plt.xlabel(r'Regularization strength $\alpha$ (log scale)')
plt.title('Number of non-zero coefficients')

-   Plot coefficient magnitudes on $\alpha$ grid

In [None]:
plt.figure(figsize=(6, 4))
plt.plot(alphas, coefs.T, lw=1.0)
plt.xscale('log', base=10)
plt.axhline(0.0, ls='--', lw=0.75, c='black')
plt.xlabel(r'Regularization strength $\alpha$ (log scale)')
plt.ylabel('Coefficient value')
plt.title('Lasso coefficients as function of regularization strength')
plt.legend([rf'$\beta_{{{i}}}$' for i in range(degree)], ncols=5)

***
### Tuning the regularization parameter via cross-validation

-   Regularization strength $\alpha$ can be cross-validated with
    [`LassoCV`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html)
-   Uses MSE (or some other metric) to find optimal $\alpha$
-   Instead of $\alpha$ grid we can specify $\epsilon = \frac{\alpha_{min}}{\alpha_{max}}$ (default: $10^{-3}$)
    and the grid size 

#### Step 1: Run Lasso CV

-   `LassoCV` does not support pipelines, transform features manually

In [None]:
from sklearn.linear_model import LassoCV

# LassoCV does not support pipelines, so we need to transform x before
# cross-validation.
transform = make_pipeline(
    PolynomialFeatures(
        degree=degree, 
        include_bias=False
    ),
    StandardScaler()
)

# Create standardized polynomial features
X_trans = transform.fit_transform(x[:, None])

# TODO: Create and run Lasso cross-validation, use defaults for eps and n_alphas
# lcv = 

# TODO: Store and report best alpha
# alpha_best =

# TODO: Report number of non-zero coefficients

#### Step 2: Plot validation curve

In [None]:
# TODO: Compute average MSE for each alpha
# mse_mean = 

# TODO: Compute index of minimal MSE
# imin = 

In [None]:
import matplotlib.pyplot as plt

# Recover grid of alphas used for CV
alphas = lcv.alphas_

# Plot MSE against alphas, highlight minimum MSE
plt.plot(alphas, mse_mean)
plt.xlabel(r'Regularization strength $\alpha$ (log scale)')
plt.ylabel('Cross-validated MSE')
plt.scatter(alphas[imin], mse_mean[imin], s=15, c='black', zorder=100)
plt.axvline(alphas[imin], ls=':', lw=0.75, c='black')
plt.xscale('log')
plt.title('Validation curve for Lasso')

#### Step 3: Re-estimate model with optimal alpha (optional)

-   Not strictly need, could directly use fitted `LassoCV` object
-   But `LassoCV` does not support pipelines...

In [None]:
# TODO: Create pipeline with Lasso using optimal alpha
pipe_lasso = make_pipeline(
    PolynomialFeatures(
        degree=degree, 
        include_bias=False
    ),
    StandardScaler(),
    # TODO: Add Lasso estimator with optimal alpha
)

# TODO: Fit Lasso with optimal alpha

#### Step 4: Plot predicted values from optimal model

In [None]:
# Grid on which to evaluate predictions
xvalues = np.linspace(np.amin(x), np.amax(x), 100)

# TODO: Predicted values from Lasso regression
# y_pred = 

In [None]:
# Plot sample and true relationship
ax = plot_trig_sample(x, y)

# Plot prediction from optimal Lasso model
plt.plot(xvalues, y_pred, c='darkorange', lw=2.0, label=r'Lasso (optimal $\alpha$)')
    
plt.legend()

***
# Models for classification

-   Predict categorical outcome (class label) instead of continuous outcome

## Logistic regression

-   Most simple setup: **binary** classifier with $y_i \in \{0, 1\}$
-   Probability of $y_i = 1$ is given by **sigmoid** function (logistic CDF):  
    $$
    p(\mathbf{x}_i) \equiv 
    \text{Prob}\bigl(y_i = 1 ~|~\mathbf{x}_i\bigr) 
        = \frac{1}{1 + \exp\left(\mu + \mathbf{x}_i'\bm\beta\right)}
    $$
-   Sigmoid function maps any real $z = \mu + \mathbf{x}_i'\bm\beta$ into $(0, 1)$:


In [None]:
import numpy as np
from scipy.stats import logistic

zvalues = np.linspace(-6, 6, 50)

plt.plot(zvalues, logistic.cdf(zvalues), lw=2.0)
# Add horizontal and vertical lines
for y in (0.0, 0.5, 1.0):
    plt.axhline(y, ls='--', lw=0.75, c='black')
plt.axvline(0.0, ls='--', lw=0.75, c='black')
plt.xlabel('$z$')
plt.ylabel(r'$\sigma(z)$')
plt.yticks([0.0, 0.5, 1.0])
plt.title('Sigmoid function (logistic CDF)')

-   **Loss function:** derived from log likelihood (MLE) + penalty
    $$
    L(\mu, \bm\beta) = 
    - \underbrace{\frac{1}{N} \mathcal{L}(\mu,\bm\beta)}_{\text{scaled log likelihood}} 
    + \underbrace{\frac{r(\bm\beta)}{C}}_{\text{regularization}}
    $$

    -   Regularization term $r(\bm\beta)$: L1, L2, L1 & L2, None
    -   Regularization strength governed by $C$: large $C$ $\Rightarrow$ small penalty

***
### Example: Predicting binary class membership

-   Stylized example: $y_i$ is a function of two features $(x_{1i}, x_{2i})$:
    $$
    \begin{aligned}
    y_i &= 
    \begin{cases}
        1 & \text{if }~ f(x_{1i}, x_{2i}) + \epsilon_i \geq 0 \\
        0 & \text{else} 
    \end{cases} \\
    f(x_{1i}, x_{2i}) &= \sin(2\pi x_{1i}) \cos(\pi x_{2i}) \\
    \epsilon_i &\stackrel{\text{iid}}{\sim} \mathcal{N}\left(0, \sigma_{\epsilon}^2\right)
    \end{aligned}
    $$

#### Step 1: Create sample

In [None]:
from lecture12_classifiers import create_class_data

# Sample size
N = 100

# Standard deviation of noise
sigma_eps = 0.2

# Create demo data set for classification
X, y = create_class_data(N=N, sigma=sigma_eps)

In [None]:
from lecture12_classifiers import plot_classes

# Plot sample
plot_classes(X, y)

#### Step 2: Train-test split

-   Use stratification to preserve relative frequency of class labels in training and test sets

In [None]:
from sklearn.model_selection import train_test_split

# TODO: Split data into training and test sets
# X_train, X_test, y_train, y_test = 

#### Step 3: Estimate logistic regression model (no regularization)

-   Estimate simplest model with two features:
    $$
    z_i = \mu + \beta_1 x_{1i} + \beta_2 x_{2i}
    $$

-   Implemented in [`LogisticRegression`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)
-   Relevant arguments:

    -   `penalty`: Type of regularization to use
    -   `C`: Regularization strength (ignored if `penalty=None`)
    -   `max_iter`: may need to increase this from default value
    -   `solver`: select solver (depends on `penalty`)
    -   `random_state`: needed for some solvers which use RNG

-   `LogisticRegression` implements four different types of regularization:

    | Penalty | $r(\bm\beta)$                          | `penalty` argument |
    |---------------------|----------------------------|---------------------|
    | L1   | $r(\bm\beta) = \|\bm\beta\|_1 = \sum_{k=1}^K \|\beta_k\| $ | `'l1'` |
    | L2   | $r(\bm\beta) = \frac{1}{2} \|\bm\beta\|_2^2 = \frac{1}{2} \sum_{k=1}^K \beta_k^2$ | `'l2'` |
    | L1 and L2 | $r(\bm\beta) = \rho \|\bm\beta\|_1 + \frac{1-\rho}{2} \|\bm\beta\|_2^2 $  | `'ElasticNet'` |
    | None | | `None` |


In [None]:
# TODO: Create and estimate Logistic regression model
# lr = 

#### Step 4: Visually assess model predictions

-   Visually inspect decision boundary (possible for 2D case, not possible in general)

In [None]:
from lecture12_classifiers import plot_decision_boundary

# Create x-values used to evaluate decisions
xvalues = np.linspace(0, 1, 1000)

ax = plot_classes(X_train, y_train, X_test, y_test)
plot_decision_boundary(ax, xvalues, lr)
ax.set_title('Classification with logistic regression')

#### Step 5: Assess model accuracy

In [None]:
from lecture12_classifiers import plot_generic_confusion_matrix
plot_generic_confusion_matrix()

-   **Accuracy:** implemented in [`accuracy_score()`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html)
    $$
    ACC = \frac{TP + TN}{FP + FN + TP + TN}
    $$
-   **Precision:** implemented in [`precision_score()`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html)
    $$
    PRE = \frac{TP}{TP + FP}
    $$
-   **Recall:** implemented in [`recall_score()`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html)
    $$
    REC = \frac{TP}{FN + TP}
    $$
-   **F1 score:** implemented in [`f1_score()`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html)
    $$
    F1 = 2 \frac{PRE \cdot REC}{PRE + REC}
    $$


In [None]:
# TODO: Predict y on test sample
# y_test_pred = 

# TODO: Compute accuracy
# TODO: Compute precision
# TODO: Compute recall
# TODO: Compute F1 score

#### Step 6: Plot the confusion matrix

In [None]:
from sklearn.metrics import ConfusionMatrixDisplay

# Plot confusion matrix from predicted values
ConfusionMatrixDisplay.from_predictions(
    y_test,
    y_test_pred,
    colorbar=False,
    cmap='Blues',
    text_kw={'fontsize': 12, 'fontweight': 'bold'},
).ax_.set_title('Confusion matrix for linear index')

***
### Fitting a model with polynomials

-   Estimate model with polynomial interactions:
    $$
    z_i = \mu + \beta_1 x_{1i} + \beta_2 x_{2i} + \beta_3 x_{1i} x_{2i} + \beta_4 x_{1i}^2 + \beta_5 x_{2i}^2 + \dots
    $$

#### Step 1: Estimate logistic regression model

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.pipeline import make_pipeline

# Maximum polynomial degree
degree = 5

# Create pipeline with polynomial features and logistic regression
pipe_lr = make_pipeline(
    PolynomialFeatures(degree=degree, include_bias=False),
    StandardScaler(),
    # TODO: Add Logistic regression estimator
)

# TODO: Fit logistic regression with polynomial features

#### Step 2: Visually inspect decision boundaries

In [None]:
# Create x-values used to evaluate decisions
xvalues = np.linspace(0, 1, 1000)
ax = plot_classes(X_train, y_train, X_test, y_test)
plot_decision_boundary(ax, xvalues, pipe_lr)
ax.set_title('Classification with logistic regression (polynomials)')

#### Step 3: Compute accuracy metrics

In [None]:
# Predict y on test sample
y_test_pred = pipe_lr.predict(X_test)

acc_test = accuracy_score(y_test, y_test_pred)
pre_test = precision_score(y_test, y_test_pred)
rec_test = recall_score(y_test, y_test_pred)

print(f'Accuracy on test sample: {acc_test:.3f}')
print(f'Precision on test sample: {pre_test:.3f}')
print(f'Recall on test sample: {rec_test:.3f}')

#### Step 4: Plot confusion matrix

In [None]:
# Plot confusion matrix from predicted values
ConfusionMatrixDisplay.from_predictions(
    y_test,
    y_test_pred,
    colorbar=False,
    cmap='Blues',
    text_kw={'fontsize': 12, 'fontweight': 'bold'},
).ax_.set_title('Confusion matrix for linear index')

***
### Cross-validating the penalty term

- Regularization strength $C$ can be cross-validated with
[LogisticRegressionCV](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html)
- `LogisticRegressionCV` does not support pipelines   

#### Step 1: Perform cross-validation

In [None]:
from sklearn.linear_model import LogisticRegressionCV

# Pipeline to create polynomial features and standardize them
transform = make_pipeline(
    PolynomialFeatures(degree=degree, include_bias=False), StandardScaler()
)

# Fit and transform training data
# X_train_poly = transform.fit_transform(X_train)


# TODO: Create and run Logistic regression cross-validation
# lrcv =

# TODO: Run cross-validation

# TODO: Store and report best C

#### Step 2: Re-run model with optimal C (optional)

-   Not strictly need, could directly use fitted `LogisticRegressionCV` object
-   But `LogisticRegressionCV` does not support pipelines...

In [None]:
lr_opt = make_pipeline(
    PolynomialFeatures(degree=degree, include_bias=False),
    StandardScaler()
    # TODO: Add Logistic regression estimator with optimal C
)

# TODO: Fit model
# lr_opt.fit(X_train, y_train)

#### Step 3: Visually inspect decision boundaries

In [None]:
# Create x-values used to evaluate decisions
xvalues = np.linspace(0, 1, 1000)
ax = plot_classes(X_train, y_train, X_test, y_test)
plot_decision_boundary(ax, xvalues, lr_opt)
ax.set_title('Classification with logistic regression (CV)')

#### Step 4: Compute accuracy metrics

In [None]:
# Predict y on test sample
y_test_pred = lr_opt.predict(X_test)

# Compute accuracy of cross-validated model on test data
acc_test = accuracy_score(y_test, y_test_pred)
pre_test = precision_score(y_test, y_test_pred)
rec_test = recall_score(y_test, y_test_pred)

print(f'Accuracy on test sample: {acc_test:.3f}')
print(f'Precision on test sample: {pre_test:.3f}')
print(f'Recall on test sample: {rec_test:.3f}')

***
## Random forest

-   Averages results from many decision trees
-   Leads to less overfitting
-   Fully nonlinear classifier, usually does not require polynomial interactions, dummy variable encoding, etc.

#### Step 1: Create estimation sample

-   Recreate estimation sample from earlier

In [None]:
from lecture12_classifiers import create_class_data
from sklearn.model_selection import train_test_split

# Sample size
N = 100

# Standard deviation of noise
sigma_eps = 0.2

# Create demo data set for classification
X, y = create_class_data(N=N, sigma=sigma_eps)

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=y, test_size=0.3, random_state=1234
)

#### Step 2: Estimate Random forest

-   Implemented in [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)
-   Important hyperparameters:

    -   `n_estimators`: number of trees to grow
    -   `max_depth`: maximum depth of individual trees

-   Other important arguments:
    -   `random_state`: trees are grown on bootstrapped samples (involves RNG)
    -   `n_jobs`: number of parallel processes to use

In [None]:
# TODO: fit Random forest classifier
# forest = 

#### Step 3: Visually inspect decision boundaries

In [None]:
ax = plot_classes(X_train, y_train, X_test, y_test)
plot_decision_boundary(ax, xvalues, forest)
ax.set_title('Classification with random forest')

#### Step 4: Compute accuracy metrics

In [None]:
from sklearn.metrics import accuracy_score

# Predict y on training and test samples
y_train_pred_forest = forest.predict(X_train)
y_test_pred_forest = forest.predict(X_test)

# Compute accuracy on training and test samples
acc_train = accuracy_score(y_train, y_train_pred_forest)
acc_test = accuracy_score(y_test, y_test_pred_forest)

print(f'Accuracy on training sample: {acc_train:.3f}')
print(f'Accuracy on test sample: {acc_test:.3f}')

### Cross-validating Random forest hyperparameters

-   No dedicated cross-validation class for Random forest available
-   Use generic [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)

#### Step 1: Run grid search

-   Candidate grid for each parameter is specified using `param_grid` argument

In [None]:
from sklearn.model_selection import GridSearchCV

# TODO: Define grid for max_depth

# TODO: Define grid for n_estimators

# TODO: Create and run GridSearchCV
# forest_cv = 

In [None]:
# Report best hyperparameters
print(f'Best accuracy: {forest_cv.best_score_:.3f}')
print(f'Best parameters: {forest_cv.best_params_}')

#### Step 2: Visually inspect decision boundaries

In [None]:
ax = plot_classes(X_train, y_train, X_test, y_test)
plot_decision_boundary(ax, xvalues, forest_cv)
ax.set_title(f'Classification with random forest (max depth: {forest_cv.best_params_["max_depth"]})')

#### Step 3: Compute accuracy metrics

In [None]:
y_train_pred_forest = forest.predict(X_train)
y_test_pred_forest = forest.predict(X_test)

acc_train = accuracy_score(y_train, y_train_pred_forest)
acc_test = accuracy_score(y_test, y_test_pred_forest)

print(f'Accuracy on training sample: {acc_train:.3f}')
print(f'Accuracy on test sample: {acc_test:.3f}')