# K-Means Clustering 

[- Sklearn Documentation ](https://scikit-learn.org/stable/modules/clustering.html#k-means)

[- APPRENTISSAGE NON-SUPERVISÉ : By Machine Learnia ](https://www.youtube.com/watch?v=FTtzd31IAOw)

## - Clustering

**- Clustering** is an **unsupervised** machine learning method of **identifying and grouping similar data points in larger datasets** without concern for the specific outcome. Clustering (sometimes called cluster analysis) is usually used to classify data into structures that are more easily understood and manipulated

**- Clustering** of unlabeled data can be performed with the module **sklearn.cluster.**

## - K - Mean clustering Algorithm description

**The KMeans algorithm** clusters data by **trying to separate samples in n groups of equal variance**, **minimizing a criterion** known as the **inertia** or within-cluster sum-of-squares (see below). This algorithm requires the number of clusters to be specified. It scales well to large numbers of samples and has been used across a large range of application areas in many different fields.

**The K-means** algorithm aims to choose **centroids** that **minimise the inertia**, or within-cluster sum-of-squares criterion:

![r_kmean_formula.png](attachment:r_kmean_formula.png)

![OqnnS.png](attachment:OqnnS.png)

![K-means-Algorithm-Process.png](attachment:K-means-Algorithm-Process.png)

![k-means-process-1024x709.webp](attachment:k-means-process-1024x709.webp)

![1515242729.png](attachment:1515242729.png)

## - K-mean model implementation

```python
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

# Create synthetic data with three clusters
X, y = make_blobs(n_samples=300, centers=3, random_state=42, cluster_std=0.60)

# Create KMeans model with 3 clusters
kmeans = KMeans(n_clusters=3)

# Fit the model to the data
kmeans.fit(X)

# Get cluster centers and labels
centers = kmeans.cluster_centers_
labels = kmeans.labels_

# Visualize the data and cluster centers
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', edgecolor='k', s=50)
plt.scatter(centers[:, 0], centers[:, 1], c='red', marker='X', s=200, label='Centroids')
plt.title('K-means Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()


![Screenshot%202024-01-15%20at%2014.42.53.png](attachment:Screenshot%202024-01-15%20at%2014.42.53.png)

## - Elbow method 

The **elbow method** is a graphical method for finding the **optimal K value** in a k-means clustering algorithm. The elbow graph shows the within-cluster-sum-of-square (WCSS) values on the y-axis corresponding to the different values of K (on the x-axis). The optimal K value is the point at which the graph forms an elbow.

![1682277078758.png](attachment:1682277078758.png)

![ny3Ht.png](attachment:ny3Ht.png)