Skip to content

Clustering toy datasets using K-means algorithm and Spectral Clusting algorithm

License

Notifications You must be signed in to change notification settings

PJunhyuk/kmeans-and-spectral

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kmeans-and-spectral

Clustering toy datasets using K-means algorithm and Spectral Clustering algorithm

Usage

  1. Download Matlab functions in src folder, and toy dataset in toydata folder
  2. Open toydata file
  3. Set data, and set K which is suitable for it (and sigma for spectral_clust function)
  • Suggested sigma value is 1

Sample Usage

Clustering data_Aggregation using K-means algorithm
load('data_Aggregation.mat');
my_kmeans(D, 7);
Clustering data_TwoDiamonds using Spectral Clustering algorithm
load('data_TwoDiamonds.mat');
spectral_clust(D, 2, 1);
Show correct answer of data_Spiral using given label(L)
load('data_Spiral.mat');
visualize_result(D, L);

Results

You can check clustering results in form of 'label', and you can check visualized result by using 'visualize_result' function and 'visualize_centroids' function. It is also built in 'my_kmeans' function and 'spectral_clust' function.

And you can check cluster times, and time required on each implementation.

Sample Results

Result-set of data_Aggregation

Alt Text

Result-set of data_Spiral

Alt Text

Result-set of data_TwoDiamonds

Alt Text All data-set is in order: K-means, Spectral Clustering, and given label.

K-means mechanism

Alt Text

Spectral Clustering mechanism

  1. N : number of data
  2. d : dimension of data
  3. data = D - N by d matrix
  4. matrix_A : affinity matrix, A_ij = exp(- (data_i - data_j)^2 / (2*sigma^2) ) - N by N matrix
  5. matrix_D : diagonal matrix whose (i,i)-element is the sum of matrix_A's i-th row - N by N matrix
  6. matrix_L : (D^(-1/2)) * A * (D^(-1/2)) - N by N matrix
  7. matrix_X : union of k largest eigenvectors of matrix_L - N by K matrix
  8. matrix_L : renormalizing each of X's rows to have unit length - N by K matrix
  9. Set matrix_L to data, and run K-means algorithm, and save it's result on label.
  10. Visualize data using that label.

Code Structure

  • my_kmeans.m : Calculate label using k-mean algorithm.
  • initialize_centroids.m : Initliaze centroids by randomly pick K points.
  • make_clusters.m : Make clusters by finding nearest centroids of each point.
  • euclidean_distance.m : Calculate euclidean distance between p1, p2.
  • find_empty_cluster.m : Find empty cluster, and return it's index. If all clusters are non-empty, return 0.
  • re_set_that_centroid.m : Re-set centroid of empty_cluster_index. New centroid of empty_cluster_index is farthest data from current cluster.
  • set_centroids.m : Set new centroids of cluster, finding average of each clusters.
  • check_change_of_centroids.m : Check change of centroids. Return 0 if previous_centroids and centroids are same.
  • visualize_result.m : Plot graph of data.
  • visualize_centroids.m : Plot graph of centroids.
  • spectral_clust.m : Calculate label using the function of spectral clustering.
  • my_kmeans_no_visualize.m : alculate label using k-mean algorithm, with no data visualization - for spectral_clust function.

Reference

On Spectral Clustering: Analysis and an algorithm

License

MIT License

About

Clustering toy datasets using K-means algorithm and Spectral Clusting algorithm

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages