# cluster-search

> Package to optimise clustering using model and parameter search

## Install

You can install via pip with the following command:

   `pip install cluster-search`

## How to use

Below is an example of using `grid_search` to perform grid search over some clustering models and model parameters to find the highest silhouette score when clustering a sample of the iris dataset:

In [None]:
from sklearn import datasets
from cluster_search.cluster_grid_search import cluster, grid_search

In [None]:
from sklearn import datasets
iris = datasets.load_iris(as_frame=True)
X = iris.data
X.head(5)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [None]:
cluster_models = [
    cluster.KMeans,
    cluster.AffinityPropagation
]

model_kwargs_list = [
    {"n_clusters": [2, 3, 4], "init": ["k-means++", "random"]},
    {"damping": [0.6, 0.7, 0.8]}
]

In [None]:
grid_search(
    data_to_cluster=X,
    cluster_models=cluster_models,
    model_kwargs_list=model_kwargs_list,
    sort=True,
    highlight=True
)

0it [00:00, ?it/s]
  0%|                                               | 0/6 [00:00<?, ?it/s][A
100%|███████████████████████████████████████| 6/6 [00:00<00:00, 57.06it/s][A
1it [00:00,  9.21it/s]
100%|███████████████████████████████████████| 3/3 [00:00<00:00, 77.47it/s][A
2it [00:00, 13.37it/s]


Unnamed: 0,cluster_model,model_params,avg_silhouette_score
0,KMeans,"{'init': 'k-means++', 'n_clusters': 2}",0.681046
3,KMeans,"{'init': 'random', 'n_clusters': 2}",0.681046
1,KMeans,"{'init': 'k-means++', 'n_clusters': 3}",0.552819
4,KMeans,"{'init': 'random', 'n_clusters': 3}",0.552819
5,KMeans,"{'init': 'random', 'n_clusters': 4}",0.498051
2,KMeans,"{'init': 'k-means++', 'n_clusters': 4}",0.497455
7,AffinityPropagation,{'damping': 0.7},0.474338
8,AffinityPropagation,{'damping': 0.8},0.468801
6,AffinityPropagation,{'damping': 0.6},0.345462
