# Estimación máximo-verosímil

## Definición
El **estimador máximo-verosímil** de un vector de parámetros $\;\boldsymbol{\theta}$ con respecto a un conjunto de $N$ datos $\;\mathcal{D}=\{(\boldsymbol{x}_n,\boldsymbol{y}_n)\}$ independientes e idénticamente distribuidos según una fdp (o fp) $\;p(\boldsymbol{y}\mid\boldsymbol{x}, \boldsymbol{\theta})$ es:
$$\begin{align*}
\hat{\boldsymbol{\theta}}_{\text{mle}}%
&=\operatorname*{argmax}\limits_{\boldsymbol{\theta}}\; \operatorname{L}(\boldsymbol{\theta})%
\quad\text{con}\quad%
\operatorname{L}(\boldsymbol{\theta})%
=p(\mathcal{D}\mid\boldsymbol{\theta})%
=\prod_{n=1}^N \;p(\boldsymbol{y}_n\mid\boldsymbol{x}_n, \boldsymbol{\theta})\\%
&=\operatorname*{argmax}\limits_{\boldsymbol{\theta}}\; \operatorname{LL}(\boldsymbol{\theta})%
\quad\text{con}\quad%
\operatorname{LL}(\boldsymbol{\theta})%
=\log \operatorname{L}(\boldsymbol{\theta})%
=\sum_{n=1}^N \;\log p(\boldsymbol{y}_n\mid\boldsymbol{x}_n, \boldsymbol{\theta})\\%
&=\operatorname*{argmin}\limits_{\boldsymbol{\theta}}\; \operatorname{NLL}(\boldsymbol{\theta})%
\quad\text{con}\quad%
\operatorname{NLL}(\boldsymbol{\theta})%
=-\operatorname{LL}(\boldsymbol{\theta})%
=-\sum_{n=1}^N \;\log p(\boldsymbol{y}_n\mid\boldsymbol{x}_n, \boldsymbol{\theta})
\end{align*}$$

## MLE para la Bernoulli

Si $\;p(\boldsymbol{y}_n\mid\boldsymbol{x}_n, \boldsymbol{\theta})%
=p(y_n\mid\theta)=\operatorname{Ber}(\theta)$, entonces 
$\;\hat{\theta}=\dfrac{N_1}{N}$ con $\;N_1=\mathbb{I}(y_n=1)$.

In [22]:
from scipy.stats import bernoulli

t, N = 0.20, 100
Y = bernoulli(t).rvs(N)
ht = Y.mean(axis=0)
print('t = {:.2f} y ht = {:.2f}'.format(t, ht))

t = 0.20 y ht = 0.20


## MLE para la categórica

Si $\;p(\boldsymbol{y}_n\mid\boldsymbol{x}_n, \boldsymbol{\theta})%
=p(y_n\mid\boldsymbol{\theta})=\operatorname{Cat}(\boldsymbol{\theta})$, entonces 
$\;\hat{\theta}_c=\dfrac{N_c}{N}$ con $\;N_c=\mathbb{I}(y_n=c)$.

In [34]:
from scipy.stats import multinomial

t, N = [0.3, 0.2, 0.5], 100
Y = multinomial(N, t).rvs(N)
ht = Y.mean(axis=0) / N
print(t, ht)

[0.3, 0.2, 0.5] [0.3  0.21 0.49]


## MLE para la Gaussiana univariada

Si $\;p(\boldsymbol{y}_n\mid\boldsymbol{x}_n, \boldsymbol{\theta})%
=p(y_n\mid\boldsymbol{\theta})=\mathcal{N}(\mu, \sigma^2)$, entonces 
$\;\hat{\mu}=\bar{y}\;$ y $\;\hat{\sigma}^2=s^2-\bar{y}^2,\;$ donde $\bar{y}$ es la media empírica y $s^2$ es la suma de cuadrados empírica.

In [73]:
import numpy as np
from scipy.stats import norm

m, v, N = 0.0, 1.0, 100
Y = norm(m, v).rvs(N)
hm = Y.mean(axis=0) / N
s2 = np.dot(Y, Y.T) / N
hv = s2 - hm * hm
print('m={:.4f} v={:.4f} hm={:.4f} hv={:.4f}'.format(m, v, hm, hv))

m=0.0000 v=1.0000 hm=0.0011 hv=1.1563
