# **CLARANS Algorithm**

CLARANS is a partitioning method of clustering particularly useful in spatial data mining. We mean recognizing patterns and relationships existing in spatial data (such as distance-related, direction-relation or topological data, e.g. data plotted on a road map) by spatial data mining.

The CLARA algorithm was introduced as an extension of K-Medoids. It uses only random samples of the input data (instead of the entire dataset) and computes the best medoids in those samples. It thus works better than K-Medoids for crowded datasets. However, the algorithm may give wrong clustering results if one or more sampled medoids are away from the actual best medoids.

To know about it more, please refer [Comprehensive Guide to CLARANS](https://analyticsindiamag.com/comprehensive-guide-to-clarans-clustering-algorithm/).

## **Practical Implementation**

Here’s a demonstration of using CLARANS algorithm on the sklearn library’s [Breast Cancer Wisconsin dataset](https://scikit-learn.org/stable/datasets/toy_dataset.html#breast-cancer-dataset). Though the dataset is primarily used for binary classification tasks, we use it to show how CLARANS algorithm can form separate clusters of the constituent data points falling under one of the two target categories (‘malignant’ or ‘benign’). The pyclustering data mining library has been used here for Pythonic implementation of CLARANS. Step-wise explanation of the code is as follows:

### **Install pyclustering library**

In [None]:
!python -m pip install pip --upgrade --user -q
!python -m pip install numpy pandas seaborn matplotlib scipy sklearn statsmodels scikit-image pyclustering --user -q

In [None]:
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

### **Import required libraries and modules**

In [None]:
#Class for implementing CLARANS algorithm
from pyclustering.cluster.clarans import clarans
#To execute a function with execution time recorded
from pyclustering.utils import timedcall
#sklearn package for using a toy dataset
from sklearn import datasets
#Class for plotting multi-dimensional data
from pyclustering.cluster import cluster_visualizer_multidim 

### **Import the Breast Cancer dataset**

In [None]:
bc_dataset =  datasets.load_breast_cancer()
#Display the dataset
bc_dataset 

### **Extract the data points from the loaded dataset**

In [None]:
#get the Breast Cancer data
bc_data = bc_dataset.data

In [None]:
bc_data

Convert the dataset from a numpy array to a list because a list of lists is fed as an input to the CLARANS’ implementation of the pyclustering library.

In [None]:
bc_data = bc_data.tolist()

#get a glimpse of dataset
#Display the data in the form of a list 
print(bc_data[:5])

### **Instantiate the CLARANS class.**

In [None]:
clarans_obj = clarans(bc_data, 2, 3, 5)

### **Analyze the clusters**

process() method analyzes the clusters as per the CLARANS algorithm. We call the process() method and encapsulate it in the call to timedcall() function so that the time taken for executing process() method also gets recorded. 

In [None]:
#calls the clarans method 'process' to implement the algortihm
(tks, res) = timedcall(clarans_obj.process);
print("Execution time : ", tks, "\n");

### **Get the clusters allocated by the algorithm**

In [None]:
#returns the clusters 
clst = clarans_obj.get_clusters();

### **Get the list of medoids of the clusters allocated by the algorithm.**

In [None]:
#returns the mediods 
med = clarans_obj.get_medoids();

### **Print the results**

In [None]:
print("Index of clusters' points :\n",clst)
print("\nLabel class of each point :\n ",bc_dataset.target)
print("\nIndex of the best medoids : ",med)

### **Visualize the Clusters**

 Here, the input data has 30 features. Cluster_visualizer class of the pyclustering library can be used to visualize the 1D, 2D or 3D data. While for more than three-dimensional data, cluster_visualizer_multidim class can be used as follows:

In [None]:
from pyclustering.cluster import cluster_visualizer_multidim

In [None]:
vis = cluster_visualizer_multidim();
vis.append_clusters(clst,bc_data,marker="*",markersize=5);
vis.show(pair_filter=[[1,2],[1,3],[27,28],[27,29]],max_row_size=2);

# **Related Articles:**


> * [Comprehensive Guide to CLARANS](https://analyticsindiamag.com/comprehensive-guide-to-clarans-clustering-algorithm/)

> * [Comprehensive Guide to K-Medoids](https://analyticsindiamag.com/comprehensive-guide-to-k-medoids-clustering-algorithm/)

> * [Clustering Algorithm every Data Science Practitioner should know](https://analyticsindiamag.com/clustering-techniques-every-data-science-beginner-should-swear-by/)

> * [Beginner Guide to K-Means](https://analyticsindiamag.com/beginners-guide-to-k-means-clustering/)

> * [Is K-Means is the best algorithm?](https://analyticsindiamag.com/is-k-means-clustering-really-the-best-unsupervised-learning-algorithm/)