# 9.3.2 Ajuste del modelo


**Log-verosimilitud conjunta:** $\;\boldsymbol{\theta}'=\{\boldsymbol{\theta}_c\},\,$ $\boldsymbol{\theta}_c=\{\boldsymbol{\theta}_{dc}\}$
$$\begin{align*}
\log p(\mathcal{D}\mid\boldsymbol{\theta})%
&=\sum_c N_c\log\pi_c + \sum_c\sum_{n:y_n=c}\log p(\boldsymbol{x}_n\mid y=c,\boldsymbol{\theta}_c)\\%
&=\sum_c N_c\log\pi_c + \sum_d\sum_c\sum_{n:y_n=c}\log p(x_{nd}\mid y=c,\boldsymbol{\theta}_{cd})%
\end{align*}$$

**Maximización en $\boldsymbol{\theta}'$:** $\;\hat{\boldsymbol{\theta}}'\,$ puede hallarse mediante maximización separada en
cada $\boldsymbol{\theta}_{dc}$ (con restricciones posiblemente)
$$\hat{\boldsymbol{\theta}}'=\operatorname*{argmax}_{\boldsymbol{\theta}'\in\mathcal{C}_{\boldsymbol{\theta}'}}\sum_d\sum_c\sum_{n:y_n=c}\log p(x_{nd}\mid y=c,\boldsymbol{\theta}_{cd})%
\;\Leftrightarrow\;%
\hat{\boldsymbol{\theta}}_{cd}=\operatorname*{argmax}_{\boldsymbol{\theta}_{dc}\in\mathcal{C}_{\boldsymbol{\theta}_{dc}}}\sum_{n:y_n=c}\log p(x_{nd}\mid y=c,\boldsymbol{\theta}_{cd})\quad\text{para todo $c$ y $d$}$$

## Maximización: Bernoulli
Si las características son binarias, $x_{nd}\in\{0,1\}$, es fácil comprobar que:
$$\hat{\theta}_{cd}=\frac{N_{cd}}{N_c}%
\quad\text{con}\quad%
N_{cd}=\sum_{n:y_n=c}\mathbb{I}(x_{nd}=1)$$

**Ejemplo:** $\;C=2,\,$ $\pi_1=\pi_2=0.5,\,$ $D=2,\,$ $\boldsymbol{\theta}_1=(0.7, 0.3)^t,\,$ $\boldsymbol{\theta}_2=(0.2, 0.8)^t$

In [1]:
import numpy as np; np.set_printoptions(precision=2); from scipy.stats import multinomial, bernoulli
N = 100 # >=2 para tener al menos un dato por clase
y = multinomial(1, [0.5, 0.5]).rvs(N - 2, random_state=23)
_, (N1, N2) = np.unique(y[:, 0], return_counts=True) + np.ones((1, 2), dtype=int)
x1 = np.array([bernoulli(0.7).rvs(N1, random_state=23), bernoulli(0.3).rvs(N1, random_state=7)]).T
x2 = np.array([bernoulli(0.2).rvs(N2, random_state=23), bernoulli(0.8).rvs(N2, random_state=7)]).T
t1 = x1.mean(axis=0); t2 = x2.mean(axis=0); print(f'theta1 = {t1} theta2 = {t2}')

theta1 = [0.71 0.27] theta2 = [0.21 0.83]


## Maximización: categórica
Si las características son categóricas, $\;x_{nd}\in\{1,\dotsc,K\}$, es fácil comprobar que:
$$\hat{\theta}_{cdk}=\frac{N_{cdk}}{N_c}%
\quad\text{con}\quad%
N_{cdk}=\sum_{n:y_n=c}\mathbb{I}(x_{nd}=k)$$

**Ejemplo:** $\;C=2$, $\;\pi_1=\pi_2=0.5$, $\;D=2$, $\;K_1=K_2=3$
$$\begin{align*}
\boldsymbol{\theta}_{11}&=(0.6, 0.2, 0.2)^t & \boldsymbol{\theta}_{12}&=(0.1, 0.3, 0.6)^t\\
\boldsymbol{\theta}_{21}&=(0.3, 0.4, 0.3)^t & \boldsymbol{\theta}_{22}&=(0.3, 0.2, 0.5)^t%
\end{align*}$$

In [2]:
import numpy as np; np.set_printoptions(precision=2); from scipy.stats import multinomial
N = 100 # >=2 para tener al menos un dato por clase
y = multinomial(1, [0.5, 0.5]).rvs(N - 2, random_state=23); 
_, (N1, N2) = np.unique(y[:, 0], return_counts=True) + np.ones((1, 2), dtype=int)
t11 = multinomial(1, [0.6, 0.2, 0.2]).rvs(N1, random_state=23).mean(axis=0)
t12 = multinomial(1, [0.1, 0.3, 0.6]).rvs(N1, random_state=23).mean(axis=0)
t21 = multinomial(1, [0.3, 0.4, 0.3]).rvs(N2, random_state=23).mean(axis=0)
t22 = multinomial(1, [0.3, 0.2, 0.5]).rvs(N2, random_state=23).mean(axis=0)
print(f'theta11 = {t11} theta12 = {t12}\ntheta21 = {t21} theta22 = {t22}')

theta11 = [0.65 0.17 0.17] theta12 = [0.12 0.21 0.67]
theta21 = [0.31 0.38 0.31] theta22 = [0.31 0.21 0.48]


## Maximización: Gaussiana
Si las características son reales que siguen una distribución normal, $\;x_{nd}\in\mathbb{R}$, es fácil comprobar que:
$$\hat{\mu}_{cd}=\frac{1}{N_c}\sum_{n:y_n=c}x_{nd}\qquad\text{y}\qquad%
\hat{\sigma}_{cd}^2=\frac{1}{N_c}\sum_{n:y_n=c}(x_{nd}-\hat{\mu}_{cd})^2$$

**Ejemplo:** $\;C=2,\,$ $\pi_1=\pi_2=0.5,\,$ $D=2$
$$\begin{align*}
\boldsymbol{\theta}_1&=(\boldsymbol{\mu}_1,\mathbf{\Sigma}_1)%
&\boldsymbol{\mu}_1&=(\mu_{11},\mu_{12})^t=(-2,0)^t%
&\mathbf{\Sigma}_1&=\operatorname{diag}(\sigma_{11}^2, \sigma_{12}^2)=\mathbf{I}_2\\%
\boldsymbol{\theta}_2&=(\boldsymbol{\mu}_2,\mathbf{\Sigma}_2)%
&\boldsymbol{\mu}_2&=(\mu_{21},\mu_{22})^t=(2,0)^t%
&\mathbf{\Sigma}_2&=\operatorname{diag}(\sigma_{21}^2, \sigma_{22}^2)=\mathbf{I}_2%
\end{align*}$$

In [3]:
import numpy as np; np.set_printoptions(precision=2); from scipy.stats import multinomial, multivariate_normal
N = 100 # >=2 para tener al menos un dato por clase
y = multinomial(1, [0.5, 0.5]).rvs(N - 2, random_state=23); 
_, (N1, N2) = np.unique(y[:, 0], return_counts=True) + np.ones((1, 2), dtype=int)
x1 = multivariate_normal([-2, 0], np.eye(2)).rvs(N1); m1 = x1.mean(axis=0); S1 = np.var(x1, axis=0)
x2 = multivariate_normal([ 2, 0], np.eye(2)).rvs(N2); m2 = x2.mean(axis=0); S2 = np.var(x2, axis=0)
print(f'mu1 = {m1} Sigma1 = {S1}\nmu2 = {m2} Sigma2 = {S2}')

mu1 = [-1.92  0.22] Sigma1 = [0.84 1.23]
mu2 = [1.98 0.01] Sigma2 = [0.88 1.13]
