# Introduction to Statistical Learning

## Estimation of $f$
- $x$: matrix of predictors
- $Y$: response variable
- $f()$: unknown function that connects x with y
- $\epsilon$: error term

$$Y = f(X) + \epsilon$$

*Goal: Find the best $f$ for the data*

### Why estimate $f$?
- **Prediction**
    - $\hat{Y}= f(\hat{X})$
    - Reducible: improve $f$
    - Non-reducible: measurement error
    - Ex: Predict GDP of a town?
- **Inference** 
    - How $Y$ is related with a set of $X$s
    - Ex: How does investment in radio relate to investment in newspapers?

### How do we estimate $f$?
- Teach the computer $\hat{f}$
- **Parametric**
    - Make an assumption about the *functional form* (e.g., linear)
    - *Train* the model using the data
    - Easy to estimate
    - May *overfit* our estimates
- **Non-parametric**
    - Doesn't assume
    - Seeks an estimate f that gets as close to the data
    - Overfit
    
---
**Overfitting**: estimation does well in the training set but not applied to other observations in the real-world

### Tradeoffs
- Flexibility vs interpretability
- Inferense &rarr; restrictive model
- Prediction &rarr; flexible model as it captures more nuanced relationships
- Ex: Tesla's self-driving cars
    - Predict: when to turn
    - Interpretability: why do ppl complain about self-driving Teslas?

![flexint.png](attachment:flexint.png)

### Approaches
- Supervised: for each observation $i$ we have to target $Y_{i}$
    - Semi-Supervised: we know a few $Y_{i}$ but we want to predict the $Y_{i}$s for the majority of the data
- Unsupervised: no target $Y_{i}$, only $X_{i}$s

## Model Accuracy
- *No free lunch in statistics*
- Mean Squared Error (MSE)
    - $MSE = \frac{1}{n} \Sigma_{i=1}^n({y}-\hat{y})^2$
    - Difference between the true and predicted value squared
    - Can compute in the *training* data though we want to know how it performs in the *test* data 