# Lesson 8: K-Means Clustering

In this notebook, we'll use the K-Means algorithm to find clusters in a dataset.

## 1. Import Libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

## 2. Generate Sample Data

We'll use scikit-learn's `make_blobs` function to create a dataset with some natural clusters.

In [None]:
X, y = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)

# Plot the data to see what it looks like
plt.scatter(X[:, 0], X[:, 1])
plt.title('Sample Data for Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.grid(True)
plt.show()

## 3. Create and Train the K-Means Model

Let's create a K-Means model with K=4, since we can see 4 distinct clusters in our data.

In [None]:
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)

## 4. Visualize the Clusters

Now let's plot the data again, but this time we'll color the points based on the cluster they were assigned to. We'll also plot the cluster centroids.

In [None]:
y_kmeans = kmeans.predict(X)
centers = kmeans.cluster_centers_

plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.75, marker='X', label='Centroids')
plt.title('K-Means Clustering Results')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.grid(True)
plt.show()