# Module 4.6: GMM + EM 演算法（進階）

## 學習目標

完成這個 notebook 後，你將能夠：

1. 理解 Gaussian Mixture Model (GMM) 的概念
2. 理解 Expectation-Maximization (EM) 演算法
3. 實作 GMM 的 E-step 和 M-step
4. 比較 GMM（軟聚類）與 K-Means（硬聚類）

## 背景知識

GMM 假設資料是由 K 個高斯分布混合而成。與 K-Means 不同，GMM 給出的是「機率性」的聚類結果。

---

## 參考資源

- Bishop PRML Ch. 9.2-9.3: Mixtures of Gaussians and EM

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse

plt.rcParams['figure.figsize'] = [10, 6]
np.random.seed(42)

print("環境設定完成！")

---

## Part 1: Gaussian Mixture Model

### 模型定義

GMM 假設資料點 $x$ 來自 K 個高斯分布的混合：

$$p(x) = \sum_{k=1}^K \pi_k \mathcal{N}(x | \mu_k, \Sigma_k)$$

其中：
- $\pi_k$：第 k 個成分的混合係數（權重），$\sum_k \pi_k = 1$
- $\mu_k$：第 k 個高斯的均值
- $\Sigma_k$：第 k 個高斯的協方差矩陣

### 多元高斯分布

$$\mathcal{N}(x | \mu, \Sigma) = \frac{1}{(2\pi)^{D/2} |\Sigma|^{1/2}} \exp\left(-\frac{1}{2}(x-\mu)^T \Sigma^{-1} (x-\mu)\right)$$

In [None]:
def multivariate_gaussian(x, mean, cov):
    """
    計算多元高斯分布的機率密度
    
    Parameters
    ----------
    x : np.ndarray, shape (N, D) or (D,)
        資料點
    mean : np.ndarray, shape (D,)
        均值
    cov : np.ndarray, shape (D, D)
        協方差矩陣
    
    Returns
    -------
    np.ndarray, shape (N,) or float
        機率密度值
    """
    x = np.atleast_2d(x)
    N, D = x.shape
    
    # 計算常數項
    det_cov = np.linalg.det(cov)
    norm_const = 1.0 / (np.power(2 * np.pi, D/2) * np.sqrt(det_cov))
    
    # 計算指數項
    cov_inv = np.linalg.inv(cov)
    diff = x - mean  # shape (N, D)
    
    # (x-μ)^T Σ^{-1} (x-μ)
    exponent = np.sum(diff @ cov_inv * diff, axis=1)  # shape (N,)
    
    pdf = norm_const * np.exp(-0.5 * exponent)
    
    return pdf.squeeze()


# 測試
print("=== 測試多元高斯分布 ===")

mean = np.array([0, 0])
cov = np.array([[1, 0.5], [0.5, 1]])

# 在均值處的密度應該最大
print(f"p([0, 0]) = {multivariate_gaussian([0, 0], mean, cov):.4f}")
print(f"p([1, 1]) = {multivariate_gaussian([1, 1], mean, cov):.4f}")
print(f"p([3, 3]) = {multivariate_gaussian([3, 3], mean, cov):.6f}")

In [None]:
# 生成 GMM 測試資料
def generate_gmm_data(n_samples=300, seed=42):
    """
    生成 GMM 測試資料
    """
    np.random.seed(seed)
    
    # 3 個高斯成分
    means = [
        np.array([0, 0]),
        np.array([4, 4]),
        np.array([8, 0])
    ]
    
    covs = [
        np.array([[1, 0.5], [0.5, 1]]),
        np.array([[1.5, -0.3], [-0.3, 0.5]]),
        np.array([[0.8, 0], [0, 2]])
    ]
    
    weights = [0.3, 0.4, 0.3]
    
    X_list = []
    y_list = []  # 真實成分標籤
    
    for k in range(3):
        n_k = int(n_samples * weights[k])
        X_k = np.random.multivariate_normal(means[k], covs[k], n_k)
        X_list.append(X_k)
        y_list.append(np.full(n_k, k))
    
    X = np.vstack(X_list)
    y = np.concatenate(y_list)
    
    idx = np.random.permutation(len(y))
    return X[idx], y[idx], means, covs, weights


# 視覺化
X, y_true, true_means, true_covs, true_weights = generate_gmm_data(n_samples=300)

def plot_ellipse(mean, cov, ax, n_std=2, **kwargs):
    """繪製高斯分布的等高線橢圓"""
    eigenvalues, eigenvectors = np.linalg.eigh(cov)
    angle = np.degrees(np.arctan2(eigenvectors[1, 0], eigenvectors[0, 0]))
    width, height = 2 * n_std * np.sqrt(eigenvalues)
    
    ellipse = Ellipse(xy=mean, width=width, height=height, angle=angle, **kwargs)
    ax.add_patch(ellipse)


fig, ax = plt.subplots(figsize=(10, 8))

colors = ['blue', 'red', 'green']
for k in range(3):
    mask = y_true == k
    ax.scatter(X[mask, 0], X[mask, 1], c=colors[k], s=30, alpha=0.6, label=f'Component {k}')
    plot_ellipse(true_means[k], true_covs[k], ax, n_std=2, 
                fill=False, edgecolor=colors[k], linewidth=2)

ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')
ax.set_title('GMM 測試資料（真實分布）')
ax.legend()
ax.grid(True, alpha=0.3)
ax.set_aspect('equal')
plt.show()

---

## Part 2: EM 演算法

### 問題

GMM 參數的最大似然估計沒有閉式解。

### EM 演算法

**E-step (Expectation)**：計算每個點屬於各成分的「責任」(responsibility)

$$\gamma_{ik} = \frac{\pi_k \mathcal{N}(x_i | \mu_k, \Sigma_k)}{\sum_{j=1}^K \pi_j \mathcal{N}(x_i | \mu_j, \Sigma_j)}$$

**M-step (Maximization)**：用責任更新參數

$$N_k = \sum_{i=1}^N \gamma_{ik}$$

$$\mu_k = \frac{1}{N_k} \sum_{i=1}^N \gamma_{ik} x_i$$

$$\Sigma_k = \frac{1}{N_k} \sum_{i=1}^N \gamma_{ik} (x_i - \mu_k)(x_i - \mu_k)^T$$

$$\pi_k = \frac{N_k}{N}$$

In [None]:
class GaussianMixture:
    """
    Gaussian Mixture Model with EM algorithm
    
    Parameters
    ----------
    n_components : int
        混合成分數量 K
    max_iter : int
        最大迭代次數
    tol : float
        收斂容差（log-likelihood 變化）
    reg_covar : float
        協方差正則化（避免奇異）
    """
    
    def __init__(self, n_components=3, max_iter=100, tol=1e-4, reg_covar=1e-6):
        self.n_components = n_components
        self.max_iter = max_iter
        self.tol = tol
        self.reg_covar = reg_covar
        
        self.weights_ = None  # π_k
        self.means_ = None    # μ_k
        self.covariances_ = None  # Σ_k
        self.history = None
    
    def _initialize(self, X):
        """
        初始化參數
        """
        N, D = X.shape
        K = self.n_components
        
        # 權重：均勻
        self.weights_ = np.ones(K) / K
        
        # 均值：隨機選 K 個點
        indices = np.random.choice(N, K, replace=False)
        self.means_ = X[indices].copy()
        
        # 協方差：初始化為單位矩陣
        self.covariances_ = np.array([np.eye(D) for _ in range(K)])
    
    def _compute_responsibilities(self, X):
        """
        E-step：計算 responsibilities γ_{ik}
        
        Returns
        -------
        responsibilities : np.ndarray, shape (N, K)
            γ_{ik} = P(z_i = k | x_i)
        """
        N = X.shape[0]
        K = self.n_components
        
        # 計算每個成分對每個點的加權機率密度
        weighted_probs = np.zeros((N, K))
        
        for k in range(K):
            weighted_probs[:, k] = self.weights_[k] * multivariate_gaussian(
                X, self.means_[k], self.covariances_[k]
            )
        
        # 正規化
        sum_probs = np.sum(weighted_probs, axis=1, keepdims=True) + 1e-10
        responsibilities = weighted_probs / sum_probs
        
        return responsibilities
    
    def _m_step(self, X, responsibilities):
        """
        M-step：更新參數
        """
        N, D = X.shape
        K = self.n_components
        
        # N_k: 有效樣本數
        N_k = np.sum(responsibilities, axis=0)  # shape (K,)
        
        # 更新權重
        self.weights_ = N_k / N
        
        # 更新均值
        for k in range(K):
            self.means_[k] = np.sum(responsibilities[:, k:k+1] * X, axis=0) / N_k[k]
        
        # 更新協方差
        for k in range(K):
            diff = X - self.means_[k]  # shape (N, D)
            # 加權外積和
            weighted_diff = responsibilities[:, k:k+1] * diff  # shape (N, D)
            self.covariances_[k] = (weighted_diff.T @ diff) / N_k[k]
            
            # 正則化（避免奇異）
            self.covariances_[k] += self.reg_covar * np.eye(D)
    
    def _compute_log_likelihood(self, X):
        """
        計算 log-likelihood
        """
        N = X.shape[0]
        K = self.n_components
        
        # 計算混合機率
        probs = np.zeros((N, K))
        for k in range(K):
            probs[:, k] = self.weights_[k] * multivariate_gaussian(
                X, self.means_[k], self.covariances_[k]
            )
        
        # log-likelihood = Σ log(Σ_k π_k N(x|μ_k, Σ_k))
        return np.sum(np.log(np.sum(probs, axis=1) + 1e-10))
    
    def fit(self, X, verbose=True):
        """
        使用 EM 演算法擬合 GMM
        """
        self._initialize(X)
        self.history = {'log_likelihood': []}
        
        prev_ll = -np.inf
        
        for iteration in range(self.max_iter):
            # E-step
            responsibilities = self._compute_responsibilities(X)
            
            # M-step
            self._m_step(X, responsibilities)
            
            # 計算 log-likelihood
            ll = self._compute_log_likelihood(X)
            self.history['log_likelihood'].append(ll)
            
            if verbose and (iteration + 1) % 10 == 0:
                print(f"Iter {iteration+1}: Log-likelihood = {ll:.4f}")
            
            # 檢查收斂
            if abs(ll - prev_ll) < self.tol:
                if verbose:
                    print(f"收斂於 iteration {iteration+1}")
                break
            
            prev_ll = ll
        
        return self
    
    def predict(self, X):
        """
        預測每個點最可能的成分（硬分配）
        """
        responsibilities = self._compute_responsibilities(X)
        return np.argmax(responsibilities, axis=1)
    
    def predict_proba(self, X):
        """
        預測每個點屬於各成分的機率（軟分配）
        """
        return self._compute_responsibilities(X)


# 測試
print("=== 測試 GMM ===")

gmm = GaussianMixture(n_components=3, max_iter=100)
gmm.fit(X, verbose=True)

print(f"\n學習到的權重: {gmm.weights_}")
print(f"真實權重: {true_weights}")

In [None]:
# 視覺化 GMM 結果

def plot_gmm_result(X, gmm, title="GMM Result"):
    """繪製 GMM 聚類結果"""
    fig, axes = plt.subplots(1, 2, figsize=(14, 6))
    
    # 硬分配
    labels = gmm.predict(X)
    colors = ['blue', 'red', 'green']
    
    for k in range(gmm.n_components):
        mask = labels == k
        axes[0].scatter(X[mask, 0], X[mask, 1], c=colors[k], s=30, alpha=0.6)
        plot_ellipse(gmm.means_[k], gmm.covariances_[k], axes[0], n_std=2,
                    fill=False, edgecolor=colors[k], linewidth=2)
        axes[0].scatter(gmm.means_[k][0], gmm.means_[k][1], c='black', marker='X', s=150)
    
    axes[0].set_title('Hard Assignment (argmax γ)')
    axes[0].grid(True, alpha=0.3)
    axes[0].set_aspect('equal')
    
    # 軟分配（用顏色混合表示）
    responsibilities = gmm.predict_proba(X)
    
    # RGB 顏色混合
    color_matrix = np.array([[0, 0, 1], [1, 0, 0], [0, 1, 0]])  # blue, red, green
    point_colors = responsibilities @ color_matrix
    
    axes[1].scatter(X[:, 0], X[:, 1], c=point_colors, s=30, alpha=0.6)
    
    for k in range(gmm.n_components):
        plot_ellipse(gmm.means_[k], gmm.covariances_[k], axes[1], n_std=2,
                    fill=False, edgecolor=colors[k], linewidth=2)
    
    axes[1].set_title('Soft Assignment (color ∝ γ)')
    axes[1].grid(True, alpha=0.3)
    axes[1].set_aspect('equal')
    
    plt.suptitle(title, fontsize=14)
    plt.tight_layout()
    return fig


plot_gmm_result(X, gmm, title='GMM 聚類結果')
plt.show()

# Log-likelihood 曲線
plt.figure(figsize=(10, 5))
plt.plot(gmm.history['log_likelihood'])
plt.xlabel('Iteration')
plt.ylabel('Log-likelihood')
plt.title('EM 演算法收斂曲線')
plt.grid(True, alpha=0.3)
plt.show()

---

## Part 3: GMM vs K-Means 比較

In [None]:
# 比較 GMM 和 K-Means
print("=== GMM vs K-Means 比較 ===")

# 使用前面實作的 K-Means
class KMeansSimple:
    def __init__(self, n_clusters=3, max_iter=100):
        self.n_clusters = n_clusters
        self.max_iter = max_iter
        self.centroids = None
        self.labels_ = None
    
    def fit(self, X):
        N = X.shape[0]
        indices = np.random.choice(N, self.n_clusters, replace=False)
        self.centroids = X[indices].copy()
        
        for _ in range(self.max_iter):
            distances = np.sum((X[:, np.newaxis, :] - self.centroids[np.newaxis, :, :]) ** 2, axis=2)
            self.labels_ = np.argmin(distances, axis=1)
            
            new_centroids = np.array([X[self.labels_ == k].mean(axis=0) if np.sum(self.labels_ == k) > 0 
                                      else self.centroids[k] for k in range(self.n_clusters)])
            
            if np.allclose(new_centroids, self.centroids):
                break
            self.centroids = new_centroids
        
        return self
    
    def predict(self, X):
        distances = np.sum((X[:, np.newaxis, :] - self.centroids[np.newaxis, :, :]) ** 2, axis=2)
        return np.argmin(distances, axis=1)


# 生成橢圓形 cluster 資料
np.random.seed(42)
X_ellipse, y_ellipse, _, _, _ = generate_gmm_data(n_samples=300)

# 訓練
kmeans = KMeansSimple(n_clusters=3)
kmeans.fit(X_ellipse)

gmm = GaussianMixture(n_components=3)
gmm.fit(X_ellipse, verbose=False)

# 視覺化比較
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

colors = ['blue', 'red', 'green']

# K-Means
for k in range(3):
    mask = kmeans.labels_ == k
    axes[0].scatter(X_ellipse[mask, 0], X_ellipse[mask, 1], c=colors[k], s=30, alpha=0.6)
axes[0].scatter(kmeans.centroids[:, 0], kmeans.centroids[:, 1], c='black', marker='X', s=200)
axes[0].set_title('K-Means (Hard, Spherical)')
axes[0].grid(True, alpha=0.3)
axes[0].set_aspect('equal')

# GMM
labels_gmm = gmm.predict(X_ellipse)
for k in range(3):
    mask = labels_gmm == k
    axes[1].scatter(X_ellipse[mask, 0], X_ellipse[mask, 1], c=colors[k], s=30, alpha=0.6)
    plot_ellipse(gmm.means_[k], gmm.covariances_[k], axes[1], n_std=2,
                fill=False, edgecolor=colors[k], linewidth=2)
axes[1].set_title('GMM (Soft, Elliptical)')
axes[1].grid(True, alpha=0.3)
axes[1].set_aspect('equal')

plt.suptitle('K-Means vs GMM', fontsize=14)
plt.tight_layout()
plt.show()

print("\n觀察：")
print("- K-Means 假設 cluster 是球形的")
print("- GMM 可以學習橢圓形的 cluster")
print("- GMM 提供機率性的軟分配")

---

## 練習題

### 練習 1：BIC/AIC 選擇成分數

In [None]:
# 練習 1 解答：使用 BIC 選擇 K

def compute_bic(gmm, X):
    """
    計算 BIC (Bayesian Information Criterion)
    
    BIC = -2 * log-likelihood + k * log(N)
    
    其中 k 是參數數量
    """
    N, D = X.shape
    K = gmm.n_components
    
    # 參數數量
    # K-1 weights (sum to 1)
    # K * D means
    # K * D*(D+1)/2 covariance parameters (symmetric)
    n_params = (K - 1) + K * D + K * D * (D + 1) / 2
    
    # Log-likelihood
    ll = gmm._compute_log_likelihood(X)
    
    # BIC（越小越好）
    bic = -2 * ll + n_params * np.log(N)
    
    return bic


# 測試不同的 K
print("=== 使用 BIC 選擇成分數 ===")

K_range = range(1, 7)
bics = []
lls = []

for K in K_range:
    gmm = GaussianMixture(n_components=K, max_iter=100)
    gmm.fit(X, verbose=False)
    
    bic = compute_bic(gmm, X)
    ll = gmm._compute_log_likelihood(X)
    
    bics.append(bic)
    lls.append(ll)
    print(f"K={K}: BIC = {bic:.2f}, Log-likelihood = {ll:.2f}")

# 視覺化
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].plot(K_range, lls, 'bo-', linewidth=2)
axes[0].set_xlabel('Number of components (K)')
axes[0].set_ylabel('Log-likelihood')
axes[0].set_title('Log-likelihood vs K')
axes[0].grid(True, alpha=0.3)

axes[1].plot(K_range, bics, 'ro-', linewidth=2)
axes[1].set_xlabel('Number of components (K)')
axes[1].set_ylabel('BIC (lower is better)')
axes[1].set_title('BIC vs K')
axes[1].axvline(x=K_range[np.argmin(bics)], color='green', linestyle='--', label=f'Best K={K_range[np.argmin(bics)]}')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\n最佳 K (BIC): {K_range[np.argmin(bics)]}")

---

## 總結

### 本 Notebook 涵蓋的內容

1. **Gaussian Mixture Model**：
   - K 個高斯分布的混合
   - 參數：$\{\pi_k, \mu_k, \Sigma_k\}$

2. **EM 演算法**：
   - E-step：計算 responsibilities
   - M-step：更新參數
   - 保證 log-likelihood 單調增加

3. **GMM vs K-Means**：
   - 軟分配 vs 硬分配
   - 橢圓 vs 球形
   - 機率模型 vs 距離模型

4. **模型選擇**：
   - BIC/AIC 選擇成分數

### 關鍵要點

1. GMM 是生成模型，可以用於密度估計和取樣
2. EM 演算法是通用的隱變數模型優化方法
3. 初始化很重要，多次運行選最佳結果

### 下一步

在下一個 notebook 中，我們將整合 HOG + SVM 建立完整的影像分類器。