# Statistical Learning
---

## What is Statistical Learning?

We try to determine the association between the input variables and output variables.

Input variables are also known as *predictors, independent variables, features,* or sometimes just *variables*.

Output variables are also known as *response* or *dependent variables*.

Here, we will consider the **Advertising** dataset.

Generally, we suppose a relation $$Y = f(X) + \epsilon$$ where Y is the response, X is predictors, $f$ is some fixed but unknown function of p different predictors $X = (X_{1}, X_{2}, \cdots, X_{p})$ and $\epsilon$ is the general error term independent of X with mean error 0.

In essence, statistical learning refers to a set of approaches for estimating $f$.

### Why estimate $f$?

There are two main reasons to estimate $f$:
- *Prediction*
- *Inference*

### *Prediction*

In many situations, a set of inputs X are readily available, but the output
Y cannot be easily obtained. In this setting, since the error term averages
to zero, we can predict Y using $$\hat{Y} = \hat{f}(X)$$
where $\hat{f}$ represents our estimate for $f$, and $\hat{Y}$ represents the resulting prediction for $Y$.

In this setting, $\hat{f}$ is generally treated as *black box*, in the sense that we are generally not concerned with the exact form of $\hat{f}$, provided that it yields accurate predictions of Y.

The accuracy of $\hat{Y}$ as a prediction of $Y$ depends on two quantities:
- *reducible error*
- *irreducible error*

In general,
$\hat{f}$ will not be a perfect estimate for $f$, and this inaccuracy will introduce
some error. This error is reducible because we can potentially improve the
accuracy of $\hat{f}$ by using the most appropriate statistical learning technique to
estimate $f$. However, even if it were possible to form a perfect estimate for
$f$, so that our estimated response took the form $\hat{Y} = f(X)$, our prediction
would still have some error in it! This is because Y is also a function of
$\epsilon$, which, by definition, cannot be predicted using X. Therefore, variability
associated with $\epsilon$ also affects the accuracy of our predictions. This is known
as the irreducible error, because no matter how well we estimate $f$, we
cannot reduce the error introduced by $\epsilon$.

Consider a given estimate $\hat{f}$ and a set of predictors $X$, which yields the prediction $\hat{Y} = \hat{f}(X)$. Assume for a moment that both $\hat{f}$ and $X$ are fixed. Then, it is easy to show that
$$E(Y - \hat{Y})^{2} = E[f(X) + \epsilon - \hat{f}(X)]^{2} = [f(X) - \hat{f}(X)]^{2} + Var(\epsilon)$$
where $E(Y - \hat{Y})^{2}$ represents the average, or expected value, of the squared
difference between the predicted and actual value of Y, and Var($\epsilon$) represents the variance associated with the error term $\epsilon$ which is the irreducible error while $[f(X) - \hat{f}(X)]^{2}$ is the reducible error.