Skip to content

Conversation

@MeKaustubh07
Copy link
Contributor

K-Medoids (PAM) Algorithm – Summary Review

Overview

K-Medoids (Partitioning Around Medoids) clusters data into k groups using actual data points (medoids) as centers — more robust and interpretable than K-Means, especially against outliers.


Algorithm Summary

  1. Initialize: Randomly choose k medoids
  2. Assign: Each point → nearest medoid
  3. Update: Replace medoid with point minimizing total dissimilarity
  4. Repeat until convergence

Key Traits:

  • Uses real points (medoids)
  • Works with any distance metric
  • Resistant to outliers
  • Easy to interpret

Strengths ✅

  • Well-structured R6 OOP design
  • Handles edge cases, validation, and convergence
  • Supports multiple distance metrics
  • Helpful methods: fit(), predict(), silhouette_score(), get_medoids()
  • Excellent documentation and examples
  • Includes quality metrics (silhouette, inertia)

Improvements 🔧

  • Optimize distance matrix (O(n²))
  • Add smarter initialization (K-Means++)
  • Enable parallel processing
  • Add early stopping & multi-run options
  • Add visualization (plot_clusters())

Complexity

  • Time: O(n² × d + i × k × (n-k)²)
  • Space: O(n²)
  • Scalability: Best for small–medium datasets; use CLARA for large ones

K-Medoids vs K-Means

Aspect K-Medoids K-Means
Centers Real data points Means
Outlier Sensitivity Low High
Distance Metrics Any Euclidean
Interpretability High Moderate
Speed Slower Faster

Use Cases

✅ Customer segmentation
✅ Document clustering
✅ Noisy or mixed data
❌ Avoid for huge or high-dimensional datasets


Testing

  • Verify clustering correctness and silhouette range
  • Check Euclidean & Manhattan distances
  • Edge cases: k=1, identical points, high-dim data

Verdict ⭐⭐⭐⭐⭐

Excellent, production-ready implementation — clean, robust, well-documented, and ideal for educational or research use.
Next steps: add optimization, visualization, and CLARA variant for scalability.

Copilot AI review requested due to automatic review settings October 18, 2025 16:46
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a complete implementation of the K-Medoids (PAM) clustering algorithm in R using R6 object-oriented design. K-Medoids is a robust alternative to K-Means that uses actual data points as cluster centers, making it more resistant to outliers.

Key changes:

  • Full K-Medoids implementation with support for Euclidean and Manhattan distance metrics
  • Comprehensive API including fit(), predict(), silhouette_score(), and helper methods
  • Extensive documentation with roxygen2 comments and four detailed usage examples

@MeKaustubh07 MeKaustubh07 requested a review from Copilot October 20, 2025 06:22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

@MeKaustubh07 MeKaustubh07 requested a review from Copilot October 20, 2025 06:23
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

@siriak siriak merged commit 95da54d into TheAlgorithms:master Oct 25, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants