# Derivations
---
## Log-likelihood
For data points $x_1, \dots, x_m \in \mathbb{R}$ from a mixture of 2 one-dimensional Gaussians,  
$$
p(x_i) = \pi_1 \,\mathcal{N}(x_i \mid \mu_1, \sigma_1^2) + \pi_2 \,\mathcal{N}(x_i \mid \mu_2, \sigma_2^2),
$$  
where $\pi_1, \pi_2 \geq 0$, $\pi_1 + \pi_2 = 1$, and $\sigma_1^2, \sigma_2^2 > 0$.  
The log-likelihood is  
$$
\ell(\theta) = \sum_{i=1}^m \log \Big( \pi_1 \,\mathcal{N}(x_i \mid \mu_1, \sigma_1^2) + \pi_2 \,\mathcal{N}(x_i \mid \mu_2, \sigma_2^2) \Big).
$$  
Explanation: The log converts the product of probabilities into a sum, which is easier to optimize and more numerically stable.  

---
## Responsibility
Define the responsibility of component $k \in \{1,2\}$ for point $x_i$:  
$$
r_{ik} = \frac{\pi_k \,\mathcal{N}(x_i \mid \mu_k, \sigma_k^2)}
{\pi_1 \,\mathcal{N}(x_i \mid \mu_1, \sigma_1^2) + \pi_2 \,\mathcal{N}(x_i \mid \mu_2, \sigma_2^2)}.
$$  
These satisfy  
$$
r_{i1} + r_{i2} = 1 \quad \text{for each } i.
$$  
Explanation: Responsibilities are assignments telling us the probability that each Gaussian component generated point $x_i$.  

---
## Gradients
Let  
$$
N_k = \sum_{i=1}^m r_{ik}
$$  
be the effective number of points assigned to component $k$.  

---
### Means
The gradient with respect to the means is  
$$
\frac{\partial \ell}{\partial \mu_k} = \sum_{i=1}^m r_{ik} \,\frac{(x_i - \mu_k)}{\sigma_k^2}, \quad k \in \{1,2\}.
$$  
Explanation: This moves $\mu_k$ closer to the weighted average of the data points, with the responsibilities giving weights.

---
### Variances
For the 1-dimensional variances $\sigma_k^2$, the gradient is  
$$
\frac{\partial \ell}{\partial \sigma_k^2}
= \frac{1}{2} \sum_{i=1}^m r_{ik} \left[ \frac{(x_i - \mu_k)^2}{(\sigma_k^2)^2} - \frac{1}{\sigma_k^2} \right], \quad k \in \{1,2\}.
$$  
Explanation: The variance increases if points are far from the mean and decreases if they are too close, balancing the spread.  

---
### Mixing weights
The gradient of the log-likelihood with respect to the mixing weights is  
$$
\frac{\partial \ell}{\partial \pi_k} = \frac{N_k}{\pi_k}, \quad k \in \{1,2\}.
$$  
Explanation: Each $\pi_k$ is updated in proportion to the effective number of points assigned to its component.  
Since $\pi_1 + \pi_2 = 1$, we typically reparameterize with softmax: $\pi_k = \text{softmax}(\alpha_k)$.  

---
We derived the formulas needed to update the mixing weights, means, and variances for a 2-component, 1-dimensional Gaussian mixture model. These results are exactly what is required to solve the estimation problem in the assignment.