# EM vs Gibbs Sampler - Results

This is an experiment to compare performance of Expectation Maximization (EM) and Gibbs Sampler (GS) in the context of Gaussian Mixture Models.

- 500 runs each for K = 3 and K = 6 clusters
- 1000 data points in each
- Univariate

During Data Generation, Means were generated from a Uniform [-10, 10] distribution. Standard Deviations were generated from a Uniform [0.25, 5] distribution.

In [None]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Load Data
gs3 = pd.read_csv("gs-k-3.csv")
gs6 = pd.read_csv("gs-k-6.csv")

em3 = pd.read_csv("em-k-3.csv")
em6 = pd.read_csv("em-k-6.csv")

## The Data

Gibbs Sampler Results have the following data

- RS: Rand Score
- ARS: Adjusted Rand Score
- SS: Silhouette Score

for each of the three methods

- Base GS
- GS with Multiple Initializations
- GS with Burn In



In [None]:
# GS with K = 3
gs3

The dataframe for EM has the Adjusted Rand Score (ARS) results for EM in 2 modes:

- EM with Many Random Initializations (`gmm_mri_ars`)
- EM with K-Means Initialization (`gmm_kmeans_ars`)

And the final column is the standard K-Means Clustering results (`kmeans_ars`)

In [None]:
# EM with K = 3
em3

## Results

The plots are interactive.

### K = 3

In [None]:
fig = go.Figure()
fig.add_trace(go.Box(y=gs3['gs_base_ars'], name="GS Base"))
fig.add_trace(go.Box(y=gs3['gs_burnin_ars'], name="GS Burn In"))
fig.add_trace(go.Box(y=gs3['gs_multi_ars'], name="GS Multi"))
fig.add_trace(go.Box(y=em3['gmm_mri_ars'], name="EM Multi Init"))
fig.add_trace(go.Box(y=em3['gmm_kmeans_ars'], name="EM K-Means Init"))
fig.add_trace(go.Box(y=em3['kmeans_ars'], name="Standard K-Means"))
fig.update_layout(title_text="K = 3")
fig.show()

### K = 6

In [None]:
fig = go.Figure()
fig.add_trace(go.Box(y=gs6['gs_base_ars'], name="GS Base"))
fig.add_trace(go.Box(y=gs6['gs_burnin_ars'], name="GS Burn In"))
fig.add_trace(go.Box(y=gs6['gs_multi_ars'], name="GS Multi"))
fig.add_trace(go.Box(y=em6['gmm_mri_ars'], name="EM Multi Init"))
fig.add_trace(go.Box(y=em6['gmm_kmeans_ars'], name="EM K-Means Init"))
fig.add_trace(go.Box(y=em6['kmeans_ars'], name="Standard K-Means"))
fig.update_layout(title_text="K = 6")
fig.show()

## Conclusions

Gibbs Sampler with Multi Init, and both of the EM versions perform better than the standard K-Means.

Between GS and EM, GS with Multi Init seems to be performing the best, by a slight margin over EM.