**Linear Model Selection and Regularization**

[ISLR Chapter 6](https://link.springer.com/chapter/10.1007/978-1-4614-7138-7_6)

This chapter discusses some ways in which the simple linear model can be improved, by **replacing** plain least squares fitting with some alternative fitting procedures.

Why might we want to use another fitting procedure instead of least squares? Alternative fitting procedures can yield better prediction accuracy and model interpretability.

- Prediction accuracy: If n >> p (number of samples >> number of features), then the least squares estimate coefficients tend to have lower variance. But as the ratio of n:p gets smaller and smaller, the variability goes up to the point where if p >> n, then there can no longer be a *unique* least squares coefficient estimate. The variance is essentially infinite [(see here)](https://stats.stackexchange.com/questions/205987/how-variance-becomes-infinite#:~:text=p%2DNo.,cannot%20be%20used%20at%20all.). Constraining or shrinking the coefficients will reduce the variance at a neglible cost to bias. This can lead to substantial improvements in accuracy.

- Model interpretability: Including *irrelevant* models leads to unnecessary complexity in the model.

**Name three alternatives to using ordinary least squares in a linear model.**
1. Subset selection
2. Shrinkage (regularization)
3. Dimension reduction


CSVs for sample data can be found here: https://github.com/JWarmenhoven/ISLR-python/tree/master/Notebooks/Data

In [1]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Code formatting Jupyter black
%load_ext nb_black

<IPython.core.display.Javascript object>

[Writing math symbols in markdown](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Typesetting%20Equations.html)

[quick reference](https://math.meta.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference)

# Subset selection


# Shrinkage methods

## Ridge regression

# --