# Machine Learning â€“ Clustering Algorithms


## Q1: Difference between K-Means and Hierarchical Clustering

K-Means partitions data into a fixed number of clusters by minimizing intra-cluster variance, while Hierarchical Clustering builds a tree-like structure of clusters without requiring a predefined number.

K-Means is suitable for large datasets, whereas Hierarchical Clustering is useful for exploring data structure.

## Q2: Silhouette Score

The Silhouette Score measures how similar a data point is to its own cluster compared to other clusters. It ranges from -1 to 1, where higher values indicate better-defined clusters.

## Q3: Core parameters of DBSCAN

DBSCAN uses `eps` (neighborhood radius) and `min_samples` (minimum points to form a dense region). These parameters control cluster density and noise detection.

## Q4: Importance of Feature Scaling in Clustering

Clustering algorithms rely on distance calculations. Feature scaling ensures all features contribute equally and prevents dominance by features with larger numeric ranges.

## Q5: Elbow Method

The Elbow Method plots the number of clusters against inertia. The optimal number of clusters is chosen where adding more clusters yields diminishing returns in variance reduction.

In [None]:
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

X, _ = make_blobs(n_samples=300, centers=4, random_state=42)
kmeans = KMeans(n_clusters=4, random_state=42, n_init=10)
labels = kmeans.fit_predict(X)

plt.scatter(X[:,0], X[:,1])
plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1], marker='x')
plt.show()

In [None]:
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import DBSCAN

wine = load_wine()
X = StandardScaler().fit_transform(wine.data)

dbscan = DBSCAN(eps=1.2, min_samples=5)
labels = dbscan.fit_predict(X)

clusters = len(set(labels)) - (1 if -1 in labels else 0)
print("Number of clusters found:", clusters)

Number of clusters found: 0


In [None]:
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt

X, _ = make_moons(n_samples=200, noise=0.1, random_state=42)
X_scaled = StandardScaler().fit_transform(X)

dbscan = DBSCAN(eps=0.3, min_samples=5)
labels = dbscan.fit_predict(X_scaled)

plt.scatter(X_scaled[:,0], X_scaled[:,1])
plt.show()

In [None]:
from sklearn.decomposition import PCA
from sklearn.cluster import AgglomerativeClustering
import matplotlib.pyplot as plt

X_pca = PCA(n_components=2).fit_transform(X)
labels = AgglomerativeClustering(n_clusters=3).fit_predict(X_pca)

plt.scatter(X_pca[:,0], X_pca[:,1])
plt.show()

## Q10: Real-world customer segmentation workflow

In an e-commerce setting, clustering can segment customers based on purchasing behavior. Data preprocessing includes handling missing values, encoding categorical variables, and scaling features. K-Means or DBSCAN can be used depending on cluster shape and noise. The number of clusters can be determined using the Elbow Method or Silhouette Score. Marketing teams benefit through targeted campaigns and personalization.