# Introduction to Statistical Learning - Chapter 6

- [6. Moving Beyond Linearity](#6.-Moving-Beyond-Linearity)
    * [6.1. Polynomial Regression](#6.1.-Polynomial-Regression)
    * [6.2. Step Functions](#6.2.-Step-Functions)
    * [6.3. Basis Functions](#6.3.-Basis-Functions)

# 6. Moving Beyond Linearity

- Polynomial regression
    * Extends the linear model by adding extra predictors, obtained by raising each of the original predictors to a power
        + Simple way to provide a non-linear fit to data
- Step functions
    * Cut the range of a variables into $K$ distinct regions in order to produce a qualitative variable
        + Fitting a piecewise constant function
- Regression splines
    * More flexible than polynomial and step functions
    * Involves dividing the range of $X$ into $K$ distinct regions
    * Within each region, a polynomial function is fitted to the data
        + polynomials are constrained so they join smoothly at the region boundaries or knots
- Smoothing splines
    * Result from minimizing a residual sum of squares criterion subject to a smoothness penalty
- Local regression
    * Similar to splines but differs in an important way
        + Regions are allowed to overlap and they do so in a smooth way
- Generalized additive models
    * Allows us to extend the methods above to deal with multiple predictors

## 6.1. Polynomial Regression

$$ y_{i} = \beta_{0} + \beta_{1}x_{i} + \epsilon_{i} $$

$$ y_{i} = \beta_{0} + \beta_{1}x_{i} + \beta_{2}x_{i}^{2} + + \beta_{3}x_{i}^{3} + ... + + \beta_{d}x_{i}^{d} + \epsilon_{i} $$ 

- Polynomial regression allows us to produce an extremely non-linear curve
- Coefficients of the polynomial regression can be easily estimated using least squares linear regression because it is just a standard linear model with predictors $x_{i}$, $x_{i}^{2}$, $x_{i}^{3}$, ..., $x_{i}^{d}$
- Least squares regression returns variance estimates for each of the fitted coefficients, $\hat{\beta_{j}}$ as well as the covariances between pairs of coefficient estimates
    * The can be used to compute the estimated variance of $\hat{f}(x_{0})$

## 6.2. Step Functions

- Using polynomial functions of features as predictors in a linear model imposes a global structure on the non-linear function of $X$
    * Using step functions can avoids imposing such a global structure
- Step function breaks the range of $X$ into bins and fit a different constant in each bin
    * Converting a continuous variable into an ordered categorical variable
        + We can then use least squares to fit a linear model using $C_{1}(X), C_{2}(X), ... , C_{K}(X)$ as predictors

$$y_{i} = \beta_{0} + \beta_{1}C_{1}(x_{i}) + ... + \beta_{K}C_{K}(x_{i}) + \epsilon_{i}$$
where for a given value of X, at most one of the $C_{1}, C_{2}, ... , C_{K}$ can be non-zero

- Unless there are natural breakpoints in the predictors, piecewise-constant functions can miss the trend in the previous bins

## 6.3. Basis Functions

- Polynomial and piecewise-constant regression models are in fact special cases of a basis function approach

$$y_{i} = \beta_{0} + \beta_{1}b_{1}(x_{i}) + \beta_{2}b_{2}(x_{i}) +... + \beta_{K}b_{K}(x_{i}) + \epsilon_{i}$$

where the basis function $b_{1}(\cdot), b_{2}(\cdot),..., b_{K}(\cdot)$ are fixed and known

- For polynomial regression, the basis functions are $b_{j}(x_{i}) = x^{j}_{i}$ and for piecewise constant functions, they are $b_{j}(x_{i}) = I(c_{j} \leq x_{i} < c_{j+1})$
    * Similar to a standard linear model with predictors $\beta_{1}(x_{i}), \beta_{2}(x_{i}),..., \beta_{K}(x_{i})$

## 6.4. Regression Splines

### 6.4.1. Piecewise Polynomials

- Piecewise polynomial regression involves fitting separate low-degree polynomials over different regions of X
$$ y_{i} = \beta_{0} + \beta_{1}x_{i} + \beta_{2}x_{i}^{2} + \beta_{3}x_{i}^{3} + \epsilon_{i} $$

where coefficients $\beta_{0},\beta_{1}, \beta_{2}, \beta_{3}$ different in different parts of the range of X. The point where the coefficents change are called $knots$

$$ y_{i} = \beta_{01} + \beta_{11}x_{i} + \beta_{21}x_{i}^{2} + \beta_{31}x_{i}^{3} + \epsilon_{i} \space\space\text{ if } x_{i} < c \\ y_{i} = \beta_{02} + \beta_{12}x_{i} + \beta_{22}x_{i}^{2} + \beta_{32}x_{i}^{3} + \epsilon_{i}  \space\space\text{ if } x_{i} \geq c$$

- Using more knots leads to a more flexible piecewise polynomial
    * If we place K different knots throughout the range of X, then we will end up fitting K+1 different cubic polynomials
        + However, the function is dicontinuous and looks ridiculous

### 6.4.2. Constraints and Splines