NAME : ASMI JAIN

DIV : TY-3

ROLL NO : 25

BATCH : B

SUBJECT : DWM

EXPERIMENT : 4


**AIM** : Implementation of Clustering Algorithm (K-means / Agglomerative) using Python

**INTRODUCTION** : Clustering is an unsupervised learning technique used to group similar data points.

K-means: A centroid-based clustering algorithm that partitions data into K clusters by minimizing intra-cluster variance.

Agglomerative Clustering: A hierarchical clustering approach that merges or splits clusters based on a linkage criterion.

**PROCEDURE:**
1. Import necessary libraries.
2. Load and preprocess the dataset.
3. Apply K-means clustering.
4. Apply Agglomerative clustering.
5. Visualize and analyze the results.
6. Evaluate clustering performance using metrics like silhouette score.

In [None]:
import numpy as np

def kmean(data):
    n = len(data)
    if n < 2 :
        print ("Not enough data points for k-means clustering.")
        return

    m1 = data[0]
    m2 = data[1]

    while True:
        c1 = []
        c2 = []
        for i in data:
            if abs(m1 - i) <= abs(m2 - i):
                c1.append(i)
            else:
                c2.append(i)

        new_m1 = np.mean(c1) if c1 else m1  # Handle empty clusters
        new_m2 = np.mean(c2) if c2 else m2

        if new_m1 == m1 and new_m2 == m2:
            break

        m1 = new_m1
        m2 = new_m2

    return c1, c2, m1, m2


n = int(input("Enter the number of data points: "))
data = []
print("Enter the data points:")
for _ in range(n):
    data.append(int(input()))

c1, c2, m1, m2 = kmean(data)

print("Cluster 1:", c1)
print("Mean of Cluster 1:", m1)
print("Cluster 2:", c2)
print("Mean of Cluster 2:", m2)

Enter the number of data points: 5
Enter the data points:
2
4
9
1
5
Cluster 1: [2, 1]
Mean of Cluster 1: 1.5
Cluster 2: [4, 9, 5]
Mean of Cluster 2: 6.0


**CONCLUSION** :

K-means is efficient for large datasets but requires predefining K clusters.
Agglomerative Clustering provides a hierarchical structure but is computationally expensive for large datasets.
The choice of clustering technique depends on dataset size, shape, and required interpretability.

**REVIEW QUESTIONS**

---



1. What is the K-means clustering algorithm, and how does it work?

  Ans: K-Means is an unsupervised clustering algorithm that partitions data into K clusters. It initializes K centroids, assigns each point to the nearest centroid, recalculates centroids as the mean of assigned points, and repeats until convergence. It minimizes intra-cluster variance for better segmentation..

---



2. How do you determine the optimal number of clusters in K-means

  Ans: The optimal number of clusters (K) is found using:

  Elbow Method: Plot inertia vs. K; the “elbow point” is optimal.

  Silhouette Score: Measures cluster separation. Higher values indicate better clustering.

  Gap Statistic: Compares clustering performance to random uniform distributions.


---
3. What are the common distance metrics used in Agglomerative Clustering?

  Ans:
  Euclidean Distance (most common)

  Manhattan Distance (L1 norm)

  Cosine Similarity (for high-dimensional data)

  Mahalanobis Distance (accounts for correlations)

  Hamming Distance (for categorical data)

---



GITHUB LINK :https://github.com/asmi-04/DWM-ASMI-25.git