# K-Medoids Clustering Algorithm
K-Medoids is a partitioning-based clustering algorithm similar to K-Means. However, instead of using the mean of the data points in a cluster as the cluster center, K-Medoids selects one of the actual data points as the center (called the **medoid**). This makes the algorithm more robust to noise and outliers.

## Why K-Medoids?
K-Medoids is widely used as a clustering algorithm when:
- **Robustness to Outliers**: Unlike K-Means, K-Medoids is less sensitive to outliers because it chooses actual data points as medoids.
- **Flexibility with Distance Metrics**: K-Medoids supports different distance metrics like Manhattan, Cosine, or Hamming distance.

---

## How K-Medoids Solves K-Means Problems

### 1. Outlier Sensitivity
- **K-Means**: Centroids can be distorted by outliers, making clusters less accurate.
- **K-Medoids**: Uses medoids (actual data points) as cluster centers, reducing the impact of outliers.

### 2. Distance Metrics
- **K-Means**: Works best with Euclidean distance.
- **K-Medoids**: Works with any distance metric, making it more adaptable to various datasets.

### 3. Interpretability
- **K-Means**: Produces abstract centroids.
- **K-Medoids**: Medoids are actual data points, which makes clusters easier to interpret.

---

## Algorithm Steps

### 1. Initialization
Randomly select \( k \) points from the dataset as initial medoids.

### 2. Assignment
Assign each data point to the cluster of the nearest medoid:

$$
\text{Cluster}(x_i) = \arg \min_{j \in \{1, \dots, k\}} d(x_i, m_j)
$$

Where:
- \( d(x_i, m_j) \) is the distance between a data point \( x_i \) and a medoid \( m_j \).

### 3. Update
For each cluster, compute the new medoid by minimizing the total distance between all points in the cluster:

$$
m_j = \arg \min_{x \in C_j} \sum_{x_i \in C_j} d(x, x_i)
$$

### 4. Stopping Criterion
The algorithm stops when:
1. Medoids do not change between iterations.
2. A maximum number of iterations is reached.

---

## Mathematical Expressions

### Distance Calculation
For a given distance metric (e.g., Manhattan distance):

$$
d(x, y) = \sum_{i=1}^{n} |x_i - y_i|
$$

### Total Distance Within a Cluster
For a cluster \( C_j \), the total distance is:

$$
\text{Total Distance} = \sum_{x_i \in C_j} d(m_j, x_i)
$$

### Medoid Update
The new medoid \( m_j \) minimizes the total distance:

$$
m_j = \arg \min_{x \in C_j} \sum_{x_i \in C_j} d(x, x_i)
$$

---

## Advantages of K-Medoids
1. **Robustness**: Handles outliers better than K-Means.
2. **Flexibility**: Works with any distance metric.
3. **Interpretability**: Uses actual data points as cluster centers.


In [13]:
import numpy as np
np.random.seed(42)


In [14]:
# Function to calculate Manhattan distance
def manhattan_distance(p1, p2):
    return np.sum(np.abs(p1 - p2))

In [15]:

# K-Medoids implementation
def k_medoids(points, k, max_iters=100):
    # Step 1: Initialize medoids randomly
    medoids = points[np.random.choice(len(points), k, replace=False)]
    
    for _ in range(max_iters):
        # Step 2: Assign points to the nearest medoid
        clusters = {i: [] for i in range(k)}
        for point in points:
            distances = [manhattan_distance(point, medoid) for medoid in medoids]
            nearest_medoid = np.argmin(distances)
            clusters[nearest_medoid].append(point)
        
        # Step 3: Update medoids
        new_medoids = []
        for cluster_points in clusters.values():
            cluster_points = np.array(cluster_points)
            # Select the point that minimizes the sum of distances
            costs = [sum(manhattan_distance(p, candidate) for p in cluster_points) for candidate in cluster_points]
            new_medoids.append(cluster_points[np.argmin(costs)])
        
        # Check for convergence
        if np.array_equal(new_medoids, medoids):
            break
        medoids = new_medoids
    
    return clusters, medoids

In [16]:
# Example dataset
data_points = np.array([[0, 1], [1, 3], [2, 2], [3, 5], [4, 7], [5, 8], [6, 8], [7, 9], [8, 10], [9, 12]])
num_clusters = 3

In [17]:

# Run K-Medoids
clusters, medoids = k_medoids(data_points, num_clusters)

In [18]:

# Print results
for i, cluster in clusters.items():
    print(f"Cluster {i+1}:")
    for point in cluster:
        print(f"  {point}")
    print(f"Medoid: {medoids[i]}")
    print()


Cluster 1:
  [7 9]
  [ 8 10]
  [ 9 12]
Medoid: [ 8 10]

Cluster 2:
  [0 1]
  [1 3]
  [2 2]
  [3 5]
Medoid: [1 3]

Cluster 3:
  [4 7]
  [5 8]
  [6 8]
Medoid: [5 8]



In [20]:
import numpy as np
np.random.seed(42)

from sklearn.metrics import pairwise_distances
from sklearn.utils import shuffle

def k_medoids(data, n_clusters, max_iter=100):
    # Initialize medoids randomly
    medoid_indices = shuffle(np.arange(data.shape[0]))[:n_clusters]
    medoids = data[medoid_indices]
    
    # Loop until convergence or max iterations
    for iteration in range(max_iter):
        # Step 1: Assign points to the closest medoid
        distances = pairwise_distances(data, medoids, metric='euclidean')
        labels = np.argmin(distances, axis=1)

        # Step 2: Update medoids to minimize total cost
        new_medoids = []
        for cluster in range(n_clusters):
            cluster_points = data[labels == cluster]
            if len(cluster_points) > 0:  # Ensure the cluster is non-empty
                # Compute the cost of all points as potential medoids
                cost = pairwise_distances(cluster_points, cluster_points).sum(axis=1)
                new_medoid = cluster_points[np.argmin(cost)]
                new_medoids.append(new_medoid)

        # Check for convergence
        new_medoids = np.array(new_medoids)
        if np.array_equal(new_medoids, medoids):
            break

        medoids = new_medoids

    return medoids, labels

# Example dataset
# Example dataset
data = np.array([[0, 1], [1, 3], [2, 2], [3, 5], [4, 7], [5, 8], [6, 8], [7, 9], [8, 10], [9, 12]])

# Perform K-Medoids clustering
n_clusters = 3
medoids, labels = k_medoids(data, n_clusters)

# Print the results
for i in range(n_clusters):
    cluster_points = data[labels == i]
    print(f"Cluster {i + 1}:")
    for point in cluster_points:
        print(f"  {point}")
    print(f"Medoid: {medoids[i]}")
    print()


Cluster 1:
  [7 9]
  [ 8 10]
  [ 9 12]
Medoid: [ 8 10]

Cluster 2:
  [0 1]
  [1 3]
  [2 2]
  [3 5]
Medoid: [1 3]

Cluster 3:
  [4 7]
  [5 8]
  [6 8]
Medoid: [5 8]

