# Introduction to Unsupervised Learning

Unsupervised learning is a type of machine learning where the model is not given labeled data (i.e., no input-output pairs). Instead, the algorithm tries to find hidden patterns, groupings, or structures in the data.

Key characteristics:
- No target variable (unlike supervised learning).
- Focuses on exploring the structure of data.
- Commonly used for **clustering, dimensionality reduction, and anomaly detection**.

## Examples in real life:
- Customer segmentation in marketing.
- Grouping similar news articles.
- Detecting fraudulent transactions.
- Data compression using dimensionality reduction.

## Types of Unsupervised Learning

1. **Clustering** → Grouping data points into clusters (e.g., KMeans, Hierarchical, DBSCAN).
2. **Dimensionality Reduction** → Reducing the number of features while keeping essential information (e.g., PCA, t-SNE, UMAP).
3. **Anomaly Detection** → Identifying outliers that do not conform to the general pattern (e.g., Isolation Forest, One-Class SVM).

## Example Dataset
Let's look at a simple dataset (Iris dataset) and try to understand how unsupervised learning can be applied.

In [None]:
from sklearn.datasets import load_iris
import pandas as pd

# Load dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df.head()

Notice that the dataset contains **features only**, and in real-world unsupervised tasks, the labels (target) may not be available.

In [None]:
import matplotlib.pyplot as plt

# Quick visualization of two features
plt.scatter(df['sepal length (cm)'], df['sepal width (cm)'], alpha=0.7)
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.title('Iris Data - Feature Scatter Plot')
plt.show()

## Next Steps
In the upcoming notebooks, we will explore:
- Clustering methods like KMeans, Hierarchical, and DBSCAN.
- Dimensionality reduction with PCA and t-SNE.
- Evaluation metrics for unsupervised models.
- Practical pipelines combining these methods.