# Adjusted Rand Index (ARI)

The Adjusted Rand Index (ARI) is a measure used in clustering and classification tasks to assess the similarity between two data partitions, taking into account chance agreement. It is an adjustment of the Rand Index and provides a normalized score that considers random clustering.

The formula for Adjusted Rand Index is expressed as follows:


$$
ARI = \frac{\text{RI} - \text{Expected}}{\text{Max} - \text{Expected}}
$$



```{admonition} Explanation!
:class: tip, dropdown
$$
\begin{aligned}
\text{RI} & = \text{Rand Index} \\
\text{Expected} & = \text{Expected Rand Index under independence} \\
\text{Max} & = \text{Maximum possible Rand Index}
\end{aligned}
$$
```



The Rand Index is calculated using the formula:

$$
RI = \frac{a + b}{a + b + c + d}
$$

```{admonition} Explanation!
:class: tip, dropdown
$$
\begin{aligned}
a & = \text{Number of pairs of elements that are in the same cluster in both partitions} \\
b & = \text{Number of pairs of elements that are in different clusters in both partitions} \\
c & = \text{Number of pairs of elements that are in the same cluster in the first partition and in different clusters in the second partition} \\
d & = \text{Number of pairs of elements that are in different clusters in the first partition and in the same cluster in the second partition}
\end{aligned}
$$
```

#### Example: Digits Dataset

In [None]:
# Hide this cell
%%capture
import warnings
warnings.filterwarnings('ignore')
!pip install matplotlib
%pip install scikit-learn
%pip install ipywidgets
import warnings
warnings.filterwarnings('ignore')

Dataset looks like:

In [None]:
import matplotlib.pyplot as plt
from sklearn import datasets
# Load the digits dataset
digits = datasets.load_digits()

# Display a few images and their labels
fig, axes = plt.subplots(2, 5, figsize=(10, 4))
for i in range(10):  # Corrected loop range
    axes[i // 5, i % 5].imshow(digits.images[i], cmap=plt.cm.gray_r, interpolation='nearest')
    axes[i // 5, i % 5].set_title(f"Label: {digits.target[i]}")
    axes[i // 5, i % 5].axis('off')

plt.show()


<IPython.core.display.Javascript object>

Assume we are using KMeans clustering algorithm with 10 clusters (digits 0-9):

```python
kmeans = KMeans(n_clusters=10, random_state=42)
y_pred = kmeans.fit_predict(X)
```

Calculate Adjusted Rand Index:

```python
ari_score = adjusted_rand_score(y_true, y_pred)
```

In [None]:
# Hide this cell
%%capture
!pip install bokeh



In [None]:
# Hide this cell
from bokeh.plotting import figure, show, output_notebook
output_notebook()

In [None]:
import numpy as np
from sklearn import datasets
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.metrics import adjusted_rand_score
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.palettes import Category10_10
from bokeh.io import output_notebook

# Load the digits dataset
digits = datasets.load_digits()
data = digits.data
target = digits.target

# Reduce dimensionality for visualization (using PCA)
pca = PCA(n_components=2)
data_pca = pca.fit_transform(data)

# Perform K-means clustering
kmeans = KMeans(n_clusters=10, random_state=42)
clusters = kmeans.fit_predict(data)

# Calculate Adjusted Rand Index (ARI)
ari = adjusted_rand_score(target, clusters)
print(f"Adjusted Rand Index (ARI): {ari}")

# Create a Bokeh figure
output_notebook()
p = figure(title=f"Interactive Clustering of Digits Dataset\nARI: {ari:.4f}", width=800, height=600)

# Map cluster labels to colors
colors = [Category10_10[i] for i in clusters]
source = ColumnDataSource(data=dict(x=data_pca[:, 0], y=data_pca[:, 1], color=colors, digit=target, cluster=clusters))

# Add glyphs to the plot
scatter = p.scatter(x='x', y='y', size=8, color='color', legend_field='digit', source=source, fill_alpha=0.6, line_alpha=0.6)

# Customize plot aesthetics
p.title.text_font_size = '16pt'
p.legend.title = 'Digit'
p.legend.label_text_font_size = '10pt'
p.xaxis.axis_label = 'Principal Component 1'
p.yaxis.axis_label = 'Principal Component 2'

# Add tooltips with images
hover = HoverTool()
hover.tooltips = [("Digit", "@digit"), ("Cluster", "@cluster")]
hover.renderers = [scatter]
p.add_tools(hover)

# Show the interactive plot
show(p)






Adjusted Rand Index (ARI): 0.6649616935161553
