Skip to content

K-medoids is a clustering algorithm related to K-means. In contrast to the K-means algorithm, K-medoids chooses datapoints as centers of the clusters.There are 2 Initialization,Assign and Update methods implemented, so there can be 8 combinations to achive the best results in a given dataset. Also the Clara algorithm is implemented

Notifications You must be signed in to change notification settings

billDrett/K-Medoids-Clustering

Repository files navigation

K-medoids Clustering Algorithm

Introduction

K-medoids is a clustering algorithm related to K-means. In contrast to the K-means algorithm, K-medoids chooses datapoints as centers of the clusters. There are eight combinations of Initialization, Assignment and Update algorithms to achieve the best results in the given dataset. Also Clara algorithm approach is implemented.

Initialization

  • K-medoids++
  • Concentrate(Park-Jun, paper)
  • Assignment

  • PAM assignment(simplest approach)
  • Assignment by LSH/DBH: Reverse Approach
  • Update

  • Update a la Lloyd’s
  • CLARANS update
  • Clara Algorithm

    Silhouette value is used to see the quality of the clusters.

    Support

    The K-medoids algorithm supports:

  • Euclidean Vector Space
  • Cosine Vector Space
  • Hamming Metric Space
  • Distance Matrix Metric Space
  • See the datatypes file for the format of each metric space.

    Compile

    Use make command to compile and make clean to delete object files(there is a MakeFile).

    Run

    ./medoids –d <input file> –c <configuration file> -ο <output file> -complete
    -d <input file>: The input file name

    –c <configuration file>: configuration file name
    -ο <output file>: Output file name
    -complete: Output has the members of each cluster

    Functionality

    The input date are clustered in k clusters. The k-medoids clustering works like this. 1)Initialize centroids
    2)Assign the rest of the points to the centroids
    3)Update the centroids
    Go to 2 until no better clusters are found
    All initialization, assign and update methods are used and the results of the clusters are writed to the output file as with the silhouette value of each cluster

    K-medoids API

    void KMedoids<T>::run(InitializationType initial, AssignmentType Assign, UpdateType update, int s = 2)
    initial is ={InitializationPP, InitializationConcentrate}
    Assign is ={PamAssign, LSHAssign}
    update is={Lloyds,Clarans}
    void KMedoids<T>::clara(int s = 5)
    is the clara algorithm, and s is the number of iterations

    More information about the algorithms in the Papers

    Files

    Format of input file:
    Vector space
    @metric_space vector
    @metric {euclidean, cosine} //default: euclidean
    item_id1 x11 x12 ... x1d
    item_id2
    .
    .
    item_idN xN1 xN2 ... xNd

    Hamming
    @metric_space hamming
    item_id1 B1
    ....
    item_idN

    Distance Matrix
    @metric_space matrix
    item_id1 x11 x12 ... x1d
    item_id2
    .
    .
    item_idQ xQ1 xQ2 ... xQd

    Configuration file
    number_of_clusters: int // k value
    number_of_hash_functions: int //default 4
    number_of_hash_tables: int //default 5
    clarans_set_fraction: int //default max{0.12*k(N-k),250}, k is the number of clusters and N the number of the datapoints
    clarans_iterations: int //default 2

    About

    K-medoids is a clustering algorithm related to K-means. In contrast to the K-means algorithm, K-medoids chooses datapoints as centers of the clusters.There are 2 Initialization,Assign and Update methods implemented, so there can be 8 combinations to achive the best results in a given dataset. Also the Clara algorithm is implemented

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published