In [16]:
import numpy as np
from sklearn.datasets import make_regression
import matplotlib.pyplot as plt

## Maximum Likelihood Estimation

MLE is a widely used method for estimating probabilities from data. Namely, MLE is a method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model (parameterized) the observed data is most probable.

Likelihood function: $L(\theta | D)$

You **maximize** the likelihood function such that, which $\theta$ gives you the highest probability of observing the data

$\theta$$ = $$argmax_\theta P(D; \theta)$
 - Select the parameter that returns the highest probability of observing the data

MLE = Maximizing $P(Y | x, \theta) = \prod_{i=1}^N(y_i | x_i, \theta)$
   - If we choose parameter $\theta$, how likely is it that each particular $x_i$ gives rise to label $y_i$

#### Approach 1 - Using Linear Algebra

In order to find the parameters $\theta$, it is kind of like reverse engineering. 

$\underbrace{\begin{pmatrix} -3 \\ -1 \\ 0 \\ 1 \\ 3 \end{pmatrix}}_{\mathbf{\vec{x}}}$ *
$\underbrace{\begin{pmatrix}  w \end{pmatrix}}_{\mathbf{\vec{\vec{w}}}}$ =
$\underbrace{\begin{pmatrix} -1.2 \\ -0.7 \\ 0.14 \\ 0.67 \\ 1.67 \end{pmatrix}}_{\mathbf{\vec{y}}}$

What parameter $w$ when multiplied by $\vec{x}$ lands on $\vec{y}$

In [51]:
# Define training set
X = np.array([-3, -1, 0, 1, 3]).reshape(-1,1) # 5x1 vector, N=5, D=1
y = np.array([-1.2, -0.7, 0.14, 0.67, 1.67]).reshape(-1,1) # 5x1 vector

In [36]:
def max_likeli_est(X, y):
    #N, D = X.shape[0], X.shape[1]
    theta = np.linalg.solve(X.T @ X, X.T @ y)
    return theta

In [54]:
max_likeli_est(X, y)

array([[0.499]])

## Maximum Posterior Estimation