Skip to content

OptimalCluster is the Python implementation of various algorithms to find the optimal number of clusters. The algorithms include elbow, elbow-k_factor, silhouette, gap statistics, gap statistics with standard error, and gap statistics without log. Various types of visualizations are also supported.

License

Notifications You must be signed in to change notification settings

shreyas-bk/OptimalCluster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OptimalCluster

Downloads (From PePy)

Check out https://shreyas-bk.github.io/OptimalCluster/

Pip package: https://pypi.org/project/optimalcluster

OptimalCluster is the Python implementation of various algorithms to find the optimal number of clusters. The algorithms include elbow, elbow-k_factor, silhouette, gap statistics, gap statistics with standard error, and gap statistics without log. Various types of visualizations are also supported.

For references about the different algorithms visit the following sites:

elbow : Elbow Method

elbow_kf : my own implementation, details to be added soon

silhouette : Silhouette Score

gap_stat : Paper | Python

gap_stat_wolog : Paper

Installation

To install the OptimalCluster package through the Python Package index (PyPI) run the following command:

pip install OptimalCluster

Documentation

Visit this link : Documentation

Example

Visit this link for interactive colab demo : Example

from OptimalCluster.opticlust import Optimal
opt = Optimal({'max_iter':350})
import pandas as pd
from sklearn.datasets.samples_generator import make_blobs
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = 6, 6
x, y = make_blobs(1000, n_features=2, centers=3)
plt.scatter(x[:, 0], x[:, 1])
plt.show()
df = pd.DataFrame(x,columns=['A','B'])

png

opt.elbow(df)
Optimal number of clusters is:  3 
3
opt.elbow(df,display=True,visualize=True)

png

png

Optimal number of clusters is:  3 
3
x, y = make_blobs(1000, n_features=2, centers=5)
plt.scatter(x[:, 0], x[:, 1])
plt.show()
df = pd.DataFrame(x,columns=['A','B'])

png

opt.elbow(df,display=True,visualize=True,method='lin',sq_er=0.5)

png

png

Optimal number of clusters is:  5 
5
x, y = make_blobs(1000, n_features=3, centers=8)
plt.scatter(x[:, 0], x[:, 1])
plt.show()
df = pd.DataFrame(x,columns=['A','B','C'])

png

opt.elbow_kf(df,display=True,visualize=True)

png

png

Optimal number of clusters is:  7  with k_factor: 0.29 . Lesser k_factor may be due to overlapping clusters, try increasing the se_weight parameter to 2.0
7
opt.elbow_kf(df,se_weight=2.5)
Optimal number of clusters is:  8  with k_factor: 0.88 . 
8
opt.elbow_kf(df,se_weight=3)
Optimal number of clusters is:  8  with k_factor: 0.88 . 
8
opt.elbow_kf(df,se_weight=3.5)
Optimal number of clusters is:  8  with k_factor: 1.0 . 
8
x, y = make_blobs(1000, n_features=2, centers=10)
plt.scatter(x[:, 0], x[:, 1])
plt.show()
df = pd.DataFrame(x,columns=['A','B'])

png

opt.gap_stat(df,display=True,visualize=True)

png

png

10
x, y = make_blobs(1000, n_features=3, centers=12)
df = pd.DataFrame(x,columns=['A','B','C'])
opt.gap_stat_se(df,display=True,visualize=True,upper=20)

png

png

11
from sklearn.datasets import load_iris
import numpy as np
import pandas as pd
iris = load_iris()
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= iris['feature_names'] + ['target'])
opt.gap_stat_wolog(df[['petal width (cm)']],display=True,visualize=True)

png

png

3

Contributions are welcome, please raise a PR

TODO

  • add increment_step param to elbow_kf with default as 0.5
  • New verbose parameter addition for methods
  • Needs checks for upper and lower parameters

About

OptimalCluster is the Python implementation of various algorithms to find the optimal number of clusters. The algorithms include elbow, elbow-k_factor, silhouette, gap statistics, gap statistics with standard error, and gap statistics without log. Various types of visualizations are also supported.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages