# Clustering Basics

Clustering is an **unsupervised learning technique** where the goal is to group similar data points together.

Unlike supervised learning, clustering does not require labeled data. Instead, the algorithm tries to find natural groupings (clusters) in the dataset.

## Common Clustering Algorithms:
- **K-Means** → Divides data into `k` clusters.
- **Hierarchical Clustering** → Builds a tree of clusters.
- **DBSCAN** → Groups data points based on density.

In this notebook, we will focus on **KMeans clustering**.

## Importing Libraries and Dataset

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans

# Load Iris dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df.head()

## Applying KMeans Clustering
We will cluster the Iris dataset into 3 groups (since we know there are 3 species, but remember unsupervised algorithms don't use labels).

In [None]:
# Create KMeans model
kmeans = KMeans(n_clusters=3, random_state=42)
df['cluster'] = kmeans.fit_predict(df)

df.head()

## Visualizing Clusters

In [None]:
plt.scatter(df['sepal length (cm)'], df['sepal width (cm)'], 
            c=df['cluster'], cmap='viridis', alpha=0.7)
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.title('KMeans Clustering on Iris Dataset')
plt.show()

## Understanding the Results
- KMeans divided the dataset into **3 clusters**.
- The colors show the grouping made by the algorithm.
- These clusters may or may not perfectly align with the actual Iris species labels, but they capture the natural patterns in the data.

## Key Notes:
- KMeans requires choosing the number of clusters `k` beforehand.
- It works well with spherical-shaped clusters but may struggle with irregular patterns.
- Choosing the right `k` can be done using the **Elbow Method** or **Silhouette Score** (we’ll cover this later).