## affinity propagation clustering algorithm

Affinity Propagation can be interesting as it chooses the number of clusters based on the data provided. 

AffinityPropagation creates clusters by sending messages between pairs of samples until convergence.
The messages sent between points belong to one of two categories. 
* The first is the responsibility `r(i, k)` 
    * which is the accumulated evidence that sample `k` should be the exemplar for sample  `i`
* The second is the availability `a(i, k)`
    * which is the accumulated evidence that sample `i` should choose sample `k` to be its exemplar 

The messages sent between pairs represent the suitability for one sample to be the exemplar of the other, which is updated in response to the values from other pairs. 
<div style='color : seagreen'>
<h3>In this way, exemplars are chosen by samples</h3>  
<li>If they are similar enough to many samples</li> 
<li>If they are chosen by many samples to be representative of themselves.</li> 
</div>

This updating happens iteratively until convergence, at which point the final exemplars are chosen, and hence the final clustering is given.

A dataset is then described using a small number of exemplars, which are identified as those most representative of other samples.


The two important parameters are 
* `preference` <br>which controls how many exemplars are used
    * Preferences for each point 
        * points with larger values of preferences are more likely to be chosen as exemplars. 
    * The number of exemplars, ie of clusters, is influenced by the input preferences value. 
    * If the preferences are not passed as arguments, they will be set to the `median` of the input similarities.
*  `damping factor` <br>which damps the responsibility and availability messages to avoid numerical oscillations when updating these messages.
    * Damping factor in the range `[0.5, 1.0`)` is the extent to which the current value is maintained relative to incoming values 
    * This in order to avoid numerical oscillations when updating these values (messages).

#### **The main drawback of Affinity Propagation is its complexity.** 
* This makes Affinity Propagation most appropriate for small to medium sized datasets.

In [1]:
import numpy as np

from sklearn.cluster import AffinityPropagation
from sklearn import metrics
from sklearn.datasets import make_blobs

import matplotlib.pyplot as plt

from sklearn.metrics import homogeneity_score
from sklearn.metrics import completeness_score
from sklearn.metrics import v_measure_score
from sklearn.metrics import adjusted_rand_score
from sklearn.metrics import adjusted_mutual_info_score
from sklearn.metrics import silhouette_score


### Data

In [28]:
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(
    n_samples=300, centers=centers, cluster_std=0.5, random_state=0
)
plt.scatter(X[:, 0], X[:, 1], c=labels_true);
plt.title('Data');

<img src='./plots/data.png'>

In [8]:
af = AffinityPropagation(preference=-50 ,random_state=0).fit(X)

In [9]:
def evaluate_clustering(data, labels, y_preds):
    clustering_metrics = {
        "homogeneity_score" : homogeneity_score(labels, y_preds),
        "completeness_score" : completeness_score(labels, y_preds),
        "v_measure_score" : v_measure_score(labels, y_preds),
        "adjusted_rand_score"  : adjusted_rand_score(labels, y_preds),
        "adjusted_mutual_info_score" : adjusted_mutual_info_score(labels, y_preds),
        "silhouette_score" : silhouette_score(data, y_preds)
    }
    return clustering_metrics

In [5]:
evaluate_clustering(X, labels_true, af.labels_)

{'homogeneity_score': 0.8715595298385134,
 'completeness_score': 0.8715859753374195,
 'v_measure_score': 0.8715727523873623,
 'adjusted_rand_score': 0.9119626080431966,
 'adjusted_mutual_info_score': 0.8707815164449694,
 'silhouette_score': 0.5575114103770364}

### Results

In [10]:
cluster_center_indices = af.cluster_centers_indices_
labels = af.labels_

In [24]:

plt.scatter(X[:, 0], X[:, 1], c=af.labels_, alpha=0.7);
plt.scatter(X[cluster_center_indices][:, 0], X[cluster_center_indices][:, 1], marker='o', s=250, c='k');

for c in range(3):
    cluster_members = X[labels == c]
    xx, yy = X[cluster_center_indices[c]]

    for x,y in cluster_members:
        plt.plot([xx, x], [yy, y], alpha=0.5)

plt.title(f"Estimated numbers of clusters :{len(cluster_center_indices)}");

<img src='./plots/affinity-propagation-clusters.png'>