
<h1 style="background: linear-gradient(to right, #49A, #0FB); color: white; padding: 20px;">Program 7 Implementation: EM vs K-means clustering</h1>

<ol start="7">
    <li>Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same 
data set for clustering using k-Means algorithm. Compare the results of these two 
algorithms and comment on the quality of clustering. You can add Python ML 
library classes/API in the program   </li>
</ol>



In [1]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
from sklearn.metrics import adjusted_rand_score
from sklearn.preprocessing import StandardScaler

# Load the Iris dataset
iris = load_iris()
data = iris.data  # Features
target = iris.target  # True labels

# Standardize the data
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)

# Apply k-Means clustering
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans_labels = kmeans.fit_predict(data_scaled)

# Apply EM clustering (Gaussian Mixture Model)
gmm = GaussianMixture(n_components=3, random_state=42)
gmm_labels = gmm.fit_predict(data_scaled)

# Evaluate the clustering quality using Adjusted Rand Index (ARI)
kmeans_ari = adjusted_rand_score(target, kmeans_labels)
gmm_ari = adjusted_rand_score(target, gmm_labels)

# Print results
print("Adjusted Rand Index:")
print(f"k-Means: {kmeans_ari:.3f}")
print(f"Gaussian Mixture Model (EM): {gmm_ari:.3f}")

# Compare clustering quality
if kmeans_ari > gmm_ari:
    print("k-Means produced better clustering quality based on ARI.")
elif kmeans_ari < gmm_ari:
    print("Gaussian Mixture Model (EM) produced better clustering quality based on ARI.")
else:
    print("Both algorithms produced the same clustering quality based on ARI.")


Adjusted Rand Index:
k-Means: 0.433
Gaussian Mixture Model (EM): 0.516
Gaussian Mixture Model (EM) produced better clustering quality based on ARI.


In [2]:
import pandas as pd
import numpy as np
from sklearn.mixture import GaussianMixture
from sklearn.cluster import KMeans
from sklearn.metrics import adjusted_rand_score, silhouette_score
from sklearn.datasets import load_iris

# Load Iris dataset
data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
labels_true = data.target

# Apply Gaussian Mixture Model (EM Algorithm)
gmm = GaussianMixture(n_components=3, random_state=42)
gmm_labels = gmm.fit_predict(df)

# Apply k-Means Algorithm
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans_labels = kmeans.fit_predict(df)

# Evaluate the results
ari_gmm = adjusted_rand_score(labels_true, gmm_labels)
ari_kmeans = adjusted_rand_score(labels_true, kmeans_labels)

silhouette_gmm = silhouette_score(df, gmm_labels)
silhouette_kmeans = silhouette_score(df, kmeans_labels)

# Print results
print("Adjusted Rand Index (ARI):")
print(f"GMM: {ari_gmm:.4f}")
print(f"k-Means: {ari_kmeans:.4f}\n")

print("Silhouette Scores:")
print(f"GMM: {silhouette_gmm:.4f}")
print(f"k-Means: {silhouette_kmeans:.4f}\n")

# Compare and comment
if ari_gmm > ari_kmeans:
    print("GMM provides better clustering based on ARI.")
else:
    print("k-Means provides better clustering based on ARI.")

if silhouette_gmm > silhouette_kmeans:
    print("GMM provides better clustering based on Silhouette Score.")
else:
    print("k-Means provides better clustering based on Silhouette Score.")


Adjusted Rand Index (ARI):
GMM: 0.9039
k-Means: 0.7163

Silhouette Scores:
GMM: 0.5012
k-Means: 0.5512

GMM provides better clustering based on ARI.
k-Means provides better clustering based on Silhouette Score.
