# Maximum likelihood estimation, naive Bayes

## Estimating the parameters of distributions

We’re moving now from probability to statistics
The basic question: given some data 

$x^{(1)},...,x^{(m)}$

how do I find a distribution that 
captures this data “well”?

In general (if we can pick from the space of all distributions), this is a hard question, 
but if we pick from a particular parameterized family of distributions $p(X,\theta)$, the 
question is (at least a little bit) easier

Question becomes: how do I find parameters $\theta$ of this distribution that fit the data?

## Maximum likelihood estimation

Given a distribution $p(X,\theta)$, and a collection of observed (independent) data points 
$x^{(1)},...,x^{(m)}$, the probability of observing this data is simply

$$p(x^{(1)},\ldots,x^{(m)};\theta) = \prod_{i=1}^m p(x^{(i)};\theta)$$

Basic idea of maximum likelihood estimation (MLE): find the parameters that 
maximize the probability of the observed data
maximize

$$
maximize_\theta \; \prod_{i=1}^m p(x^{(i)};\theta) = maximize_\theta \; \frac{1}{m} \sum_{i=1}^m \log p(x^{(i)};\theta)$$

where $l_{\theta}$ is called the log likelihood of the data
Seems “obvious”, but there are many other ways of fitting parameters

In [3]:
import numpy as np
from scipy import stats
from scipy.optimize import minimize

np.random.seed(100)
data = np.random.normal(loc=5, scale=2, size=1000)  # mean=5, std=2

data_new = [190, 172, 170, 182, 184]

def neg_log_likelihood(params, data):
    mu, sigma = params[0], params[1]
    # Negative log likelihood for normal distribution
    nll = -np.sum(stats.norm.logpdf(data, loc=mu, scale=sigma))
    return nll

initial_guess = [0, 1]  # initial step
result = minimize(neg_log_likelihood, initial_guess, args=(data_new,), bounds=[(None, None), (1e-6, None)])  # sigma > 0

mu_mle, sigma_mle = result.x
print(f"Estimated mean (mu): {mu_mle}")
print(f"Estimated standard deviation (sigma): {sigma_mle}")

Estimated mean (mu): 179.59999879116265
Estimated standard deviation (sigma): 7.525952115737213


## Bayes classifier:

In [4]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
gnb = GaussianNB()
y_pred = gnb.fit(X_train, y_train).predict(X_test)
print("Number of mislabeled points out of a total %d points : %d"
      % (X_test.shape[0], (y_test != y_pred).sum()))

Number of mislabeled points out of a total 75 points : 4


In [7]:
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

X = [[0, 0],
    [0, 1],
    [0, 2],
    [2, 1],
    [2, 0],
    [1, 2],
    [1, 0]]
 
y = [1, 0, 1, 0, 1, 0, 1]

gnb = GaussianNB()
y_pred = gnb.fit(X, y).predict([[0, 2]])
print(y_pred)

[0]
