The main drawback of Affinity Propagation is its complexity. The algorithm has a time complexity of the order
, where
 is the number of samples and
 is the number of iterations until convergence. Further, the memory complexity is of the order
 if a dense similarity matrix is used, but reducible if a sparse similarity matrix is used. This makes Affinity Propagation most appropriate for small to medium sized datasets.

# 🔁 Message Updates & Parameter Tuning — Affinity Propagation

---

## Responsibilities Update — Competitive, Individual Judgment

Each point \(i\) looks at all candidates \(k\) and asks:
**“Right now, which \(k\) looks most promising for me?”**

Mathematically:

$$
r(i,k) = S(i,k) - \max_{k' \neq k} \big[a(i,k') + S(i,k')\big]
$$

**Meaning:**
- \(S(i,k)\): similarity between point \(i\) and candidate \(k\) (higher = better).
- \(a(i,k') + S(i,k')\): the current attractiveness of all other candidates \(k'\).
- Subtracting the maximum competing attractiveness gives:
  → “How much better is \(k\) compared to my next best choice?”

If \(r(i,k)\) is positive, \(i\) believes \(k\) is clearly the best exemplar for it.
If negative, \(i\) prefers another exemplar.

---

## Why Not Use \(S(i,k) + a(i,k) - \max_{k' \neq k} [a(i,k') + S(i,k')]\) ?

At first glance, it seems logical to include \(a(i,k)\) directly —
but doing so **breaks the separation** between the two roles in message passing.

**Reasoning:**

- \(r(i,k)\) represents *point \(i\)’s personal preference* for candidate \(k\).
- \(a(i,k)\) represents *how much \(k\) is supported by others* — that’s information flowing from the exemplar side.

When computing responsibilities, each point should **compare all exemplars on equal footing**,
so we use the same set of \(a(i,k')\) terms in the subtraction but not add \(a(i,k)\) to the left side.

Otherwise, if \(a(i,k)\) were included, we would “double count” the influence of \(k\)’s own availability,
biasing responsibilities toward already popular exemplars.

This separation keeps the updates balanced:

| Perspective | Handles | Formula role |
|--------------|----------|---------------|
| Responsibility \(r\) | Competition among exemplars | Uses others’ \(a(i,k')\) to compare options |
| Availability \(a\) | Cooperation among exemplars | Aggregates positive \(r(i',k)\) votes |

That’s why the official update is:

$$
r(i,k) = S(i,k) - \max_{k' \neq k}\big[a(i,k') + S(i,k')\big]
$$

and not

$$
r(i,k) = S(i,k) + a(i,k) - \max_{k' \neq k}\big[a(i,k') + S(i,k')\big].
$$

---

## Availabilities Update — Collective, Group Decision

Each candidate \(k\) listens to all points and asks:
**“Do I have enough supporters to justify being a real exemplar?”**

Mathematically:

$$
a(i,k) =
\begin{cases}
\min\Big(0,\ r(k,k) + \sum_{i' \notin \{i,k\}} \max(0, r(i',k))\Big), & i \neq k \\[6pt]
\sum_{i' \neq k} \max(0, r(i',k)), & i = k
\end{cases}
$$

**Interpretation:**
- \(r(k,k)\): how much \(k\) itself wants to be an exemplar.
- \(\sum_{i'} \max(0, r(i',k))\): how many other points “vote” for \(k\).
- The \(\min(0, \dots)\) cap ensures that if \(k\) lacks support, availability becomes negative — discouraging others from choosing it.

Together, responsibilities and availabilities act as a negotiation:
points vote for exemplars, and exemplars rise or fall depending on their support.

---

## ⚙️ Parameters You Can Tune

| Parameter | Role | What It Controls | Effect of Increasing | Effect of Decreasing |
|------------|------|------------------|----------------------|----------------------|
| **`preference`** | Self-similarity \(S(k,k)\) | How much each point “wants” to be an exemplar | ✅ More exemplars → **more clusters** | 🔻 Fewer exemplars → **fewer clusters** |
| **`damping` (λ ∈ [0.5, 1))** | Mixing factor for message updates | Stability vs. speed | ✅ More stable (less oscillation) but slower | 🔻 Faster but risk of oscillation |
| **`max_iter`** | Number of update iterations | How long messages keep updating | ✅ More thorough convergence | 🔻 May underfit or stop early |
| **`convergence_iter`** | Iterations with no change before stopping | Strictness of convergence | ✅ Ensures stable result | 🔻 May terminate prematurely |
| **`affinity`** | Distance or similarity metric | How similarities are computed | `'euclidean'`: automatic from data | `'precomputed'`: you supply the matrix |
| **`random_state`** | Random seed | Repeatability | Deterministic results | Different results each run |

---

## 🎯 Tuning Guidelines

1. Start with:
   - `preference = median(S)`
   - `damping = 0.7`
2. If oscillations occur → increase `damping` to 0.8 or 0.9.
3. Too many clusters → lower `preference`.
4. Too few clusters → raise `preference`.
5. Always allow enough iterations (`max_iter ≥ 200`, `convergence_iter ≥ 15`).

---

## 💡 Typical Heuristics for `preference`

- Start with the **median** of all similarity values → reasonable baseline.
- **Lower it** → points become pickier about being exemplars → **fewer clusters**.
- **Raise it** → more egalitarian → **more clusters**.

---

## 🪄 Quick Summary

| Behavior | Adjustment |
|-----------|-------------|
| Too many clusters | ↓ `preference` |
| Too few clusters | ↑ `preference` |
| Oscillating / unstable | ↑ `damping` |
| Converges too soon | ↑ `max_iter` or `convergence_iter` |

---

In summary, Affinity Propagation doesn’t choose *K* explicitly —
you shape the clustering behavior indirectly through **`preference`** and **`damping`**,
balancing the competitive and cooperative forces between responsibilities and availabilities.
