### Introduction

Regression and classification are special cases of a broader family of models, called Generalized Linear Models (GLMs).

#### Exponential Family

We say that a class of distributions is in the exponential family if it can be written in the form
$$
p(y;\eta) = b(y) exp(\eta^{T}T(y) - a(\eta))
$$
Here,
$\eta$ is called the Natural or Canonical parameter of distribution.

$T(y)$ is called the Sufficient Statistic. In most of the cases $T(y)=y$.

$a(\eta)$ is called the Log Partition Function. 

The quantity $e^{-a(\eta)}$ essentially plays the role of a normalization constant, that makes sure that the distribution $p(y;\eta)$ sums/integrates over $y$ to $1$.

A fixed choice of T, a and b defines a family(or set) of distributions that is parameterized by $\eta$. As we vary $\eta$ we get different distributions within this family.

#### Bernoulli Distribution
The Bernoulli distribution with mean $φ$, written Bernoulli($\phi$) , specifies a
distribution over $y \in \{0,1\}$ such that
$$
p(y;\phi) = \phi^y(1-\phi)^{1-y}
= exp(log(\phi^y(1-\phi)^{1-y}))
= exp \left(y\ log\left( \cfrac {\phi}{1-\phi} \right) + log(1-\phi)\right)       
$$
Consider,
$$
\eta = log\left( \cfrac {\phi}{1-\phi} \right)
$$
which can be rewritten as,
$$
\phi = \cfrac {1}{1-e^{-\eta}}
$$
As we vary $\phi$, we obtain Bernoulli distributions with different means.

$$
p(y;\phi) = exp \left(y\ log\left( \cfrac {\phi}{1-\phi} \right) + log(1-\phi)\right)       
$$
Compare with 
$$
p(y;\eta) = b(y)\exp(\eta^{T}T(y) - a(\eta))
$$
we obtain
$$ 
b(y)=1 \\ 
\eta = log\left( \cfrac {\phi}{1-\phi} \right) \\
T(y)=y \\
a(\eta)=-log(1-\phi)
$$
This shows that Bernoulli Distribution can be written in the form of $p(y;\eta) = b(y)\exp(\eta^{T}T(y) - a(\eta))$, using appropriate choice of T,a and b.

#### Gaussian Distribution
To simplify things, lets set $\sigma = 1 $
$$
p(y;\mu) = \cfrac{1}{\sqrt{2 \pi}} \exp \left(- \cfrac{(y-\mu)^2}{2} \right) 
$$
Rearranging the terms we obtain,
$$
p(y;\mu) = \cfrac{1}{\sqrt{2 \pi}} \exp \left(\cfrac{-y^2}{2} \right).\exp \left(y\mu-\cfrac{\mu^2}{2} \right)  
$$
Compare this with 
$$
p(y;\eta) = b(y)\exp(\eta^{T}T(y) - a(\eta))
$$
we obtain
$$ 
b(y)=\cfrac{1}{\sqrt{2 \pi}} \exp \left(\cfrac{-y^2}{2} \right) \\
$$
$$
\eta = \mu \\
$$
$$
T(y)=y \\
$$
$$
a(\eta)=\cfrac {\eta^2}{2}
$$
This shows that Gaussian Distribution can be written in the form of $p(y;\eta) = b(y)\exp(\eta^{T}T(y) - a(\eta))$, using appropriate choice of T,a and b.

There are many other distributions that are members of the exponential family:
e.g. Multinomial distribution, Poisson distribution, Gamma Distribuion, Beta Distribution and Dirichlet Distribution

### Constructing GLMs
Consider a classification or regression problem where we would like to predict the value of some random variable y as a function of x.

To derive a GLM for this problem, we will make the following three assumptions about the conditional distribution of given and about our model:

1. The distribution of $y$ given $\mathbf x$ parameterized by $\mathbf w$ follows some exponential family distribution, with parameter $\eta$.
$$
y|\mathbf x;\mathbf w ~ ExponentialFamily(\eta)
$$

2. Our goal is to predict the expected value of $T(y)$ given $\mathbf x$ . Since
in most examples, $T(y)=y$ , our prediction need to satisfy
the following equality:
$$
h(x)=E[y|\mathbf x]
$$

3. The natural parameter and the inputs are related linearly:
$$
\eta=\mathbf w^T\mathbf x
$$

The resulting models are often very effective for modelling different
types of distributions over y.