# **Elbow Method**

```python

ssd = []
for k in k_values:
    km = KMeans(n_clusters = k, random_state = 32)
    km.fit(df)
    ssd.append(km.inertia_)

plt.plot(k_values,ssd)
plt.title('Elbow Method - Optimal number of clusters')
plt.xlabel('Number of Clusters (k)')
plt.ylabel('Inertia')
plt.grid(True)
plt.show()
```

# **Silhouette Score**

```python
from sklearn .metrics import silhouette_score

k_values = range(2,8)

silhouette_scores=[]

for k in k_values:
  kmeans=KMeans(n_clusters=k,random_state=32)
  kmeans.fit(df_scaled)

  silhouette_avg=silhouette_score(df_scaled, kmeans.labels_)
  silhouette_scores.append(silhouette_avg)

plt.subplot(1,2,2)
plt.plot(k_values,silhouette_scores,marker='o',color='green')


for k in k_values:
  km = KMeans(n_clusters=k, max_iter=150, random_state=32)
  km.fit(df_scaled)
  silhouette_scores.append(silhouette_score(df_scaled,km.labels_))
  print(f"Silhouette score for k={k}: {silhouette_scores[-1]}")
```

---

## 📘 Definitions

### 🔹 **Elbow Method**
- **Purpose**: To find the optimal number of clusters `k` by plotting the **Within-Cluster Sum of Squares (WCSS)** against various values of `k`.
- **Logic**: As `k` increases, WCSS decreases. The point at which the rate of decrease sharply shifts ("elbow") indicates the optimal `k`.

### 🔹 **Silhouette Method**
- **Purpose**: Measures how similar a data point is to its own cluster vs. other clusters.
- **Metric**: **Silhouette Coefficient** (ranges from -1 to 1):
  - **+1**: Data point is well matched to its own cluster and poorly matched to others.
  - **0**: Borderline.
  - **-1**: Incorrectly assigned.
---


---

### 📊 **Comparison Table: Elbow Method vs. Silhouette Method**

| **Aspect**                | **Elbow Method**                             | **Silhouette Method**                          |
|---------------------------|-----------------------------------------------|------------------------------------------------|
| **Purpose**               | Finds optimal `k` by minimizing WCSS          | Finds optimal `k` by maximizing silhouette score |
| **Metric Used**           | WCSS (Within-Cluster Sum of Squares)          | Silhouette Coefficient                         |
| **Range of `k`**          | Can include `k=1`                             | Must start from `k=2` (undefined for `k=1`)     |
| **Plot Characteristic**   | Sharp "elbow" point where WCSS decrease slows | Peak point where silhouette score is highest    |
| **Best `k` Selection**    | Where WCSS reduction slows significantly      | Where silhouette score is highest               |
| **Interpretability**      | Visual and subjective                         | More quantitative and objective                 |
| **Cluster Quality Insight**| Does not provide insight into separation     | Provides insight into cohesion & separation     |
| **Computation Time**      | Faster (only requires inertia)                | Slightly slower (needs pairwise distances)      |
| **Robustness**            | May be ambiguous if elbow is not clear        | More robust to variations in data               |
| **Common Use Case**       | Quick estimation of cluster count             | More precise evaluation of cluster quality      |

---

### ✅ Summary Use Cases

| **Scenario**                             | **Recommended Method**      |
|------------------------------------------|-----------------------------|
| Want fast estimation of cluster number   | Elbow Method                |
| Want to evaluate cluster quality         | Silhouette Method           |
| Clusters may not be well-separated       | Silhouette Method           |
| Data has a very obvious elbow            | Elbow Method                |
| Need numerical validation                | Silhouette Method           |

---