# Generalized Additive (Linear) Models (GAM)
---

1. Generalized linear models build on linear regression models to predict a non-Gaussian distribution. It keeps the weighted sum of the features of the linear regression, but connect the weighted sum and the expected mean of the output distribution through a possibly nonlinear function. 
1. For example, the logistic regression is a type of the generalized linear models and it assumes a Bernoulli distribution for the outcome and links the expected mean and the weighted sum using the logic function.
1. Generalized additive models further relax the restriction that the relationship must be a simple weighted sum, and instead assume that the outcome can be modeled by a sum of arbitrary functions of each feature. It allows to model the potentially non-linear relations between the features and the output.

## Preliminary
---

### Statistics

#### Gaussian (normal) distribution
> TODO

## Linear regression in a probablitiy view
---

### Linear combination of Gaussian distributions

Gaussian distribution is closed under linear transformation, which means:

1. Let $X \sim \mathcal{N}(\mu, \sigma^{2})$. Then the random variable $Y = aX + b$ also follows Gaussian distribution

    $$ Y \sim \mathcal{N}(a\mu + b, a\sigma^{2}) $$
    
1. Let $X \sim \mathcal{N}(\mu_{X}, \sigma_{X}^{2})$ and $Y \sim \mathcal{N}(\mu_{Y}, \sigma_{Y}^{2})$. Then, if $X$ and $Y$ are independent, the random variable $Z = X + Y$ also follows Gaussian distribution: 

    $$ Z \sim \mathcal{N}(\mu_{X} + \mu_{Y}, \sigma_{X}^{2} + \sigma_{Y}^{2}) $$
    
1. Let $X_1, \dots, X_n$ be $n$ mutually independent Gaussian random variables with means $\mu_{1}, \dots, \mu_{n}$ and variances $\sigma_{1}^{2}, \dots \sigma_{n}^{2}$. If the random variable $Y$ is a linear combinations of the $X$ with $w_{1}, \dots, w_{n}$ coefficients and $b_{1}, \dots, b_{n}$ biases: 

    $$ Y = \sum_{i=1}^{n} w_{i}X_{i} + b_{i}$$
    
    then $Y$ also follows Gaussian distribution:
    
    $$ Y \sim \mathcal{N}(\sum_{i=1}^{n} w_{i}\mu_{i} + b_{i}, \sum_{i=1}^{n} w_{i}^2\sigma_{i}^2) $$

### Assumptions of linear regression

Given a dataset $\mathbf{X} \in \mathbb{R}^{n \times d}$ with each feature treated as a random variable ($X_{1}, X_{2}, \dots, X_{d}$), then linear regression of the form 

$$ y = \mathbf{w}\mathbf{x} + b $$

will have the following assumptions to work properly:
1. The input features are independent from each other. 
1. The output distribution given the input features follows a Gaussian distribution.
1. The true relationship between each feature and the output is linear. 

However, in most of the real world applications, the assumptions above can hardly be satisfied and can be remedied by following methods accordingly:
1. Use PCA to extract a set of independent features from all features that are not natually mutually independent.
1. Use **GLM** to model non-Gaussian output distribution.
1. Use **GAM** to model non-linear relation between features and the output. 

## Generalized Linear Models (GLM)
---

**Generalized linear models (GLM)** extends the linear regression model by using a non-linear function that connects the weighted sum of the Gaussian input features with the expected mean of the non-Gaussian output distribution. 

GLM can be expressed as:

$$ g(\mathbb{E}(Y)) = \sum_{i=1}^{n} w_{i}X_{i} + b $$ 

where $X_{1}, X_{2}, \dots X_{n}$ are the input random variables and $Y$ is the output random variable. 

The components of GLM can be formalized as:
1. Random component: the probability distribution of the output variable $Y$. It's expected value (mean value) is $\mathbb{E}(Y)$.
1. Systematic component: the weighted sum $\sum_{i=1}^{n} w_{i}X_{i} + b$.
1. Link function: the relation (often non-linear) $g(\cdot)$ between the random component and the systematic component. 

### Logistic regression as a GLM for Bernoulli distribution 

> TODO

## Generalized Addictive Models (GAM)
---

$$ g(\mathrm{E}(Y)) = \sum_{1}^{n}f(x_i) $$ 
where $f()$ can be arbitrarily defined function. 

> TODO

## References
---

1. https://christophm.github.io/interpretable-ml-book/extend-lm.html
1. http://www.stat.ucla.edu/~nchristo/introstatistics/introstats_normal_linear_combinations.pdf