# Ajuste del modelo

**Log-verosimilitud conjunta:** $\;\boldsymbol{\theta}'=\{\boldsymbol{\theta}_c\},\,$ $\boldsymbol{\theta}_c=\{\boldsymbol{\theta}_{dc}\}$
$$\begin{align*}
\log p(\mathcal{D}\mid\boldsymbol{\theta})%
&=\sum_c N_c\log\pi_c + \sum_c\sum_{n:y_n=c}\log p(\boldsymbol{x}_n\mid y=c,\boldsymbol{\theta}_c)\\%
&=\sum_c N_c\log\pi_c + \sum_d\sum_c\sum_{n:y_n=c}\log p(x_{nd}\mid y=c,\boldsymbol{\theta}_{dc})%
\end{align*}$$

**Maximización en $\boldsymbol{\theta}'$:** $\;\hat{\boldsymbol{\theta}}'\,$ puede hallarse mediante maximización separada en
cada $\boldsymbol{\theta}_{dc}$ (con restricciones posiblemente)
$$\hat{\boldsymbol{\theta}}'=\operatorname*{argmax}_{\boldsymbol{\theta}'\in\mathcal{C}_{\boldsymbol{\theta}'}}\sum_d\sum_c\sum_{n:y_n=c}\log p(x_{nd}\mid y=c,\boldsymbol{\theta}_{dc})%
\;\Leftrightarrow\;%
\hat{\boldsymbol{\theta}}_{dc}=\operatorname*{argmax}_{\boldsymbol{\theta}_{dc}\in\mathcal{C}_{\boldsymbol{\theta}_{dc}}}\sum_{n:y_n=c}\log p(x_{nd}\mid y=c,\boldsymbol{\theta}_{dc})\quad\text{para todo $d$ y $c$}$$

## Maximización: Bernoulli
Si las características son binarias, $x_{nd}\in\{0,1\}$, es fácil comprobar que:
$$\hat{\theta}_{dc}=\frac{N_{dc}}{N_c}%
\quad\text{con}\quad%
N_{dc}=\sum_{n:y_n=c}\mathbb{I}(x_{nd}=1)$$

**Ejemplo:** $\;C=2$, $\;\pi_1=\pi_2=0.5$, $\;D=2$, 
$\;\boldsymbol{\theta}=[\boldsymbol{\theta}_1;\boldsymbol{\theta}_2]$, $\;\boldsymbol{\theta}_1=(0.7, 0.3)^t$, $\;\boldsymbol{\theta}_2=(0.2, 0.8)^t$

In [1]:
import numpy as np
from scipy.stats import multinomial, bernoulli
N = 100 # >=2 para tener al menos un dato por clase
yy = multinomial(1, [0.5, 0.5]).rvs(N - 2)
N1 = yy[yy[:, 0] == 1].shape[0] + 1
xxy1 = np.hstack((np.vstack(bernoulli(0.7).rvs(N1)), np.vstack(bernoulli(0.3).rvs(N1))))
theta1 = xxy1.mean(axis=0)
N2 = N - N1
xxy2 = np.hstack((np.vstack(bernoulli(0.2).rvs(N2)), np.vstack(bernoulli(0.8).rvs(N2))))
theta2 = xxy2.mean(axis=0)
print("theta1: ", theta1, " theta2: ", theta2)

theta1:  [0.63157895 0.33333333]  theta2:  [0.25581395 0.8372093 ]


## Maximización: categórica
Si las características son categóricas, $\;x_{nd}\in\{1,\dotsc,K\}$, es fácil comprobar que:
$$\hat{\theta}_{dck}=\frac{N_{dck}}{N_c}%
\quad\text{con}\quad%
N_{dck}=\sum_{x_{nd}\in\mathcal{D}_{dc}}\mathbb{I}(x_{nd}=k)$$


**Ejemplo:** $\;C=2$, $\;\pi_1=\pi_2=0.5$, $\;D=2$, $\;K_1=K_2=3$
$$\begin{align*}
\boldsymbol{\theta}_{11}&=(0.6, 0.2, 0.2)^t & \boldsymbol{\theta}_{12}&=(0.3, 0.4, 0.3)^t\\
\boldsymbol{\theta}_{21}&=(0.1, 0.3, 0.6)^t & \boldsymbol{\theta}_{22}&=(0.3, 0.2, 0.5)^t%
\end{align*}$$

In [2]:
import numpy as np
from scipy.stats import multinomial
N = 100 # >=2 para tener al menos un dato por clase
yy = multinomial(1, [0.5, 0.5]).rvs(N - 2)
N1 = yy[yy[:, 0] == 1].shape[0] + 1
N2 = N - N1
theta11 = np.vstack(multinomial(1, [0.6, 0.2, 0.2]).rvs(N1)).mean(axis=0)
theta21 = np.vstack(multinomial(1, [0.1, 0.3, 0.6]).rvs(N1)).mean(axis=0)
theta12 = np.vstack(multinomial(1, [0.3, 0.4, 0.3]).rvs(N2)).mean(axis=0)
theta22 = np.vstack(multinomial(1, [0.3, 0.2, 0.5]).rvs(N2)).mean(axis=0)
print("theta11: ", theta11, " theta12: ", theta12)
print("theta21: ", theta21, " theta22: ", theta22)

theta11:  [0.63461538 0.09615385 0.26923077]  theta12:  [0.35416667 0.41666667 0.22916667]
theta21:  [0.13461538 0.26923077 0.59615385]  theta22:  [0.41666667 0.25       0.33333333]


## Maximización: Gaussiana
Si las características son reales que siguen una distribución normal, $\;x_{nd}\in\mathbb{R}$, es fácil comprobar que:
$$\begin{align*}
\hat{\mu}_{dc}&=\frac{1}{N_c}\sum_{x_{nd}\in\mathcal{D}_{dc}}x_{nd}\\%
\hat{\sigma}_{dc}^2&=\frac{1}{N_c}\sum_{x_{nd}\in\mathcal{D}_{dc}}(x_{nd}-\hat{\mu}_{dc})^2%
\end{align*}$$

**Ejemplo:** $\;C=2$, $\;\pi_1=\pi_2=0.5$, $\;D=2$,
$\;\boldsymbol{\theta}=[\boldsymbol{\theta}_1;\boldsymbol{\theta}_2]$
$$\begin{align*}
\boldsymbol{\theta}_1&=(\boldsymbol{\mu}_1,\mathbf{\Sigma}_1)%
&\boldsymbol{\mu}_1&=(\mu_{11},\mu_{21})^t=(-2,0)^t%
&\mathbf{\Sigma}_1&=\operatorname{diag}(\sigma_{11}^2, \sigma_{21}^2)=\mathbf{I}_2\\%
\boldsymbol{\theta}_2&=(\boldsymbol{\mu}_2,\mathbf{\Sigma}_2)%
&\boldsymbol{\mu}_2&=(\mu_{12},\mu_{22})^t=(2,0)^t%
&\mathbf{\Sigma}_2&=\operatorname{diag}(\sigma_{12}^2, \sigma_{22}^2)=\mathbf{I}_2%
\end{align*}$$

In [3]:
import numpy as np
from scipy.stats import multinomial, multivariate_normal
N = 100 # >=2 para tener al menos un dato por clase
yy = multinomial(1, [0.5, 0.5]).rvs(N - 2)
N1 = yy[yy[:, 0] == 1].shape[0] + 1
N2 = N - N1
xxy1 = multivariate_normal([-2, 0], np.eye(2)).rvs(N1)
mu1 = xxy1.mean(axis=0)
Sigma1 = np.var(xxy1, axis=0)
xxy2 = multivariate_normal([2, 0], np.eye(2)).rvs(N2)
mu2 = xxy2.mean(axis=0)
Sigma2 = np.var(xxy2, axis=0)
print("mu1: ", mu1, " Sigma1: ", Sigma1)
print("mu2: ", mu2, " Sigma2: ", Sigma2)

mu1:  [-2.04576436  0.18229609]  Sigma1:  [0.89499539 1.12635571]
mu2:  [ 2.0795843  -0.18044901]  Sigma2:  [1.00070955 1.10808814]
