# scikit-clarans examples

This notebook contains practical examples demonstrating how to use the `scikit-clarans` library (CLARANS). Sections include quickstart, API examples, cookbook recipes, integrations, profiling, and packaging.

Table of contents

1. Installation & Import
2. Quickstart
3. Function examples
4. Classes & objects
5. Cookbook
6. Data handling & pandas
7. Async & Concurrency
8. Testing & Examples as tests
9. Error handling & edge cases
10. Performance & Profiling
11. Extending & Plugins
12. Real-world example (End-to-end)
13. Typing & Static checking
14. Debugging & Logging

> Note: Code cells are intended to run in a standard Python environment (numpy, scikit-learn, matplotlib). Optional dependencies are wrapped in try/except blocks.

## 1. Installation & Import

Install from the repository root using pip:

```
pip install .
```

Or install from PyPI:

```
pip install scikit-clarans
```

Below is a basic import check.

In [6]:
# Basic import test
import importlib

try:
    import clarans
    from clarans import CLARANS
    print("clarans imported from:", clarans.__file__)
except Exception as e:
    print("Import failed:", e)

try:
    import sklearn
    import numpy as np
    print("scikit-learn:", sklearn.__version__, "numpy:", np.__version__)
except Exception as e:
    print("Dependency check failed:", e)

Import failed: No module named 'clarans'
scikit-learn: 1.6.1 numpy: 2.0.2


## 2. Quickstart

A short 'hello world' example to run the library's main functionality:
- Generate 2D sample data
- Fit the model
- Plot the result inline in the notebook

In [5]:
%matplotlib inline
import matplotlib.pyplot as plt
from clarans import CLARANS
from sklearn.datasets import make_blobs

X, _ = make_blobs(n_samples=400, centers=4, n_features=2, random_state=42)
model = CLARANS(n_clusters=4, numlocal=3, init="k-medoids++", random_state=42)
model.fit(X)

print("Medoid indices:", model.medoid_indices_)

plt.figure(figsize=(6, 5))
plt.scatter(X[:, 0], X[:, 1], c=model.labels_, s=20, cmap="tab10")
plt.scatter(model.cluster_centers_[:, 0], model.cluster_centers_[:, 1], c="black", marker="*", s=200)
plt.title("CLARANS quickstart example")
plt.tight_layout()
plt.show()

ModuleNotFoundError: No module named 'clarans'

## 3. Function examples

Useful helper functions are provided in `clarans.initialization` and `clarans.utils`:
- `initialize_k_medoids_plus_plus`, `initialize_build`, `initialize_heuristic`
- `calculate_cost(X, medoid_indices, metric)`

In [None]:
from clarans.initialization import initialize_k_medoids_plus_plus, initialize_build, initialize_heuristic
from clarans.utils import calculate_cost

# Reuse the X from the quickstart cell
medoids_pp = initialize_k_medoids_plus_plus(X, n_clusters=4, random_state=42)
medoids_build = initialize_build(X, n_clusters=4)
medoids_heur = initialize_heuristic(X, n_clusters=4)

print("k-medoids++ medoids:", medoids_pp)
print("BUILD medoids:", medoids_build)
print("heuristic medoids:", medoids_heur)

print("Cost (k-medoids++):", calculate_cost(X, medoids_pp))


k-medoids++ medoids: [102 319 190 199]
BUILD medoids: [276 267 333 242]
heuristic medoids: [276 260 386 247]
Cost (k-medoids++): 603.858302452664


## 4. Classes & objects

`CLARANS` is a scikit-learn compatible estimator. After fitting it exposes attributes such as `medoid_indices_`, `cluster_centers_`, and `labels_`. You can also call `predict` on new data.

In [None]:
model = CLARANS(n_clusters=4, init="heuristic", random_state=0)
model.fit(X)

print("n_iter_:", getattr(model, "n_iter_", None))
print("medoid_indices_:", model.medoid_indices_)
print("cluster_centers_ shape:", model.cluster_centers_.shape)

# Predict on a couple of new points
new_points = [[0, 0], [3, 3]]
print("Predictions for new points:", model.predict(new_points))


n_iter_: 43
medoid_indices_: [242 254  17  98]
cluster_centers_ shape: (4, 2)
Predictions for new points: [1 1]


## 5. Cookbook: Short recipes

A handful of short, copy/paste-ready examples for common tasks that are convenient to put in README or docs.

In [None]:
# Use sparse input (scipy sparse)
try:
    from scipy import sparse
    Xs = sparse.csr_matrix(X)
    CLARANS(n_clusters=3).fit(Xs)
    print("Sparse input example: OK")
except Exception as e:
    print("Sparse example skipped:", e)

# Use cosine metric
CLARANS(n_clusters=4, metric="cosine").fit(X)
print("Cosine metric example: OK")

# Custom initialization by providing array-like centers
import numpy as np
centers = np.array([[0.0, 0.0], [3.0, 3.0], [1.5, -1.0], [-2.0, 2.0]])
CLARANS(n_clusters=4, init=centers).fit(X)
print("Custom init example: OK")

Sparse input example: OK
Cosine metric example: OK




Custom init example: OK


## 6. Data handling & pandas

Easily use pandas `DataFrame` and convert to NumPy arrays with `.values` or `.to_numpy()`.

In [None]:
import pandas as pd

config = {"n_clusters": 4, "numlocal": 3, "init": "k-medoids++"}
model = CLARANS(**config)
model.fit(X)
print("Config example medoids:", model.medoid_indices_)

# Pandas example
df = pd.DataFrame(X, columns=["x1", "x2"])
model = CLARANS(n_clusters=4).fit(df.values)
print("DataFrame -> .values -> OK")

Config example medoids: [254 242  98  24]
DataFrame -> .values -> OK


## 7. Async & Concurrency

Example showing how to run multiple fits in parallel using `concurrent.futures`.

In [None]:
from concurrent.futures import ThreadPoolExecutor

seeds = [0, 1, 2, 3]

def run(seed):
    m = CLARANS(n_clusters=4, random_state=seed).fit(X)
    return seed, calculate_cost(X, m.medoid_indices_)

with ThreadPoolExecutor(max_workers=4) as ex:
    for seed, cost in ex.map(run, seeds):
        print(f"seed={seed} cost={cost:.2f}")

seed=0 cost=492.03
seed=1 cost=494.91
seed=2 cost=493.17
seed=3 cost=496.24


## 8. Testing & Examples as tests

A small pytest-style example that demonstrates testing the core API.

In [None]:
# Minimal pytest-style test example

def test_clarans_basic():
    X_small, _ = make_blobs(n_samples=30, centers=3, n_features=2, random_state=0)
    m = CLARANS(n_clusters=3, random_state=0).fit(X_small)
    assert len(m.medoid_indices_) == 3

print('Pytest-style example: function defined (run with pytest)')

Pytest-style example: function defined (run with pytest)


## 9. Error handling & edge cases

Examples that intentionally raise errors and how to catch/validate exceptions.

In [None]:
# n_clusters >= n_samples should raise ValueError
try:
    small = X[:3]
    CLARANS(n_clusters=3).fit(small)
    print("This should not have succeeded")
except ValueError as e:
    print("Caught expected ValueError:", e)

Caught expected ValueError: n_clusters must be less than n_samples


## 10. Performance & Profiling

Use `%timeit` for quick timing and `cProfile` for detailed profiling.

In [None]:
# Timing a small fit using the magic (works in Jupyter environments)
# Uncomment below line in a real Jupyter environment
# %timeit -n 3 CLARANS(n_clusters=4, numlocal=2, random_state=0).fit(X)

# Simple cProfile example saved to file
import cProfile
pr = cProfile.Profile()
pr.enable()
CLARANS(n_clusters=4, numlocal=2, random_state=0).fit(X)
pr.disable()

import io, pstats
s = io.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')
ps.print_stats(10)
print(s.getvalue())

         1051733 function calls (1049987 primitive calls) in 6.139 seconds

   Ordered by: cumulative time
   List reduced from 267 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1726    0.070    0.000    5.952    0.003 c:\Users\HP\AppData\Local\Programs\Python\Python313\Lib\site-packages\sklearn\utils\_param_validation.py:187(wrapper)
     1726    0.029    0.000    5.170    0.003 c:\Users\HP\AppData\Local\Programs\Python\Python313\Lib\site-packages\sklearn\metrics\pairwise.py:695(pairwise_distances_argmin_min)
     1726    3.795    0.002    4.136    0.002 c:\Users\HP\AppData\Local\Programs\Python\Python313\Lib\site-packages\sklearn\metrics\_pairwise_distances_reduction\_dispatcher.py:184(compute)
     1726    0.022    0.000    0.896    0.001 c:\Users\HP\AppData\Local\Programs\Python\Python313\Lib\site-packages\sklearn\metrics\pairwise.py:83(check_pairwise_arrays)
     3453    0.121    0.000    0.833    0.000 c:\Users\HP\AppD

## 11. Extending & Plugins

Example showing how to subclass `CLARANS` to add logging or custom behavior.

In [None]:
class VerboseCLARANS(CLARANS):
    def fit(self, X, y=None):
        print("Starting VerboseCLARANS fit with config:", {"n_clusters": self.n_clusters, "numlocal": self.numlocal})
        return super().fit(X, y)

m = VerboseCLARANS(n_clusters=4, numlocal=2).fit(X)
print('Verbose fit done; medoids:', m.medoid_indices_)

Starting VerboseCLARANS fit with config: {'n_clusters': 4, 'numlocal': 2}
Verbose fit done; medoids: [280  98 242 333]


## 12. Real-world example (End-to-end)

Small pipeline demo: scaling -> clustering -> export labels.

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

pipe = Pipeline([('scaler', StandardScaler()), ('clarans', CLARANS(n_clusters=4))])
pipe.fit(X)
labels = pipe.named_steps['clarans'].labels_

import pandas as pd
output = pd.DataFrame(labels, columns=['label'])
print(output)

     label
0        3
1        0
2        3
3        1
4        1
..     ...
395      1
396      1
397      2
398      1
399      0

[400 rows x 1 columns]


## 13. Typing & Static checking

Use type annotations to help mypy and linters understand your API.

In [None]:
from typing import List

def choose_medoids(X: List[List[float]], k: int) -> List[int]:
    """A small helper signature to indicate expected types."""
    # Implementation omitted for brevity - real implementations should be typed
    return list(range(k))

print('Typing example: function signature provided')

Typing example: function signature provided


## 14. Debugging & Logging

Configure logging to monitor the algorithm's behavior during development.

In [None]:
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger('clarans_examples')
logger.info('This is an informational message from the examples notebook')

# Wrap a small run and log elapsed time
import time
start = time.time()
CLARANS(n_clusters=3, random_state=0).fit(X[:200])
logger.info('Run finished in %.3f seconds', time.time() - start)

INFO:clarans_examples:This is an informational message from the examples notebook
INFO:clarans_examples:Run finished in 2.694 seconds
