# Module 01: Supervised vs Unsupervised Learning

**Difficulty**: ⭐ Beginner  
**Estimated Time**: 75 minutes  
**Prerequisites**: 
- [Module 00: Introduction to Machine Learning](00_introduction_to_machine_learning.ipynb)
- Understanding of basic ML concepts

## Learning Objectives

By the end of this notebook, you will be able to:

1. Distinguish between supervised and unsupervised learning approaches
2. Identify when to use classification vs regression
3. Understand the difference between clustering and classification
4. Apply both supervised and unsupervised algorithms to real datasets
5. Recognize the appropriate ML approach for different problem types
6. Implement examples of each learning paradigm

## 1. Setup and Imports

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Scikit-learn for ML
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Supervised learning algorithms
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestRegressor

# Unsupervised learning algorithms
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA

# Metrics
from sklearn.metrics import (
    accuracy_score, mean_squared_error, r2_score,
    silhouette_score, classification_report
)

# Configuration
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')
np.random.seed(42)

print("All libraries imported successfully!")

## 2. Supervised Learning Deep Dive

### What is Supervised Learning?

**Supervised learning** is learning from **labeled data**. Each training example consists of:
- **Input features (X)**: The data we have
- **Output labels (y)**: The correct answer we want to predict

The algorithm learns to map inputs to outputs by finding patterns in the labeled examples.

### The Two Types of Supervised Learning

#### 1. Classification (Discrete Outputs)

**Goal**: Predict which category/class an input belongs to

**Examples**:
- Email: Spam or Not Spam (binary classification)
- Image: Cat, Dog, or Bird (multi-class classification)
- Customer: Will churn or stay (binary classification)
- Sentiment: Positive, Negative, or Neutral (multi-class)

**Common Algorithms**:
- Logistic Regression
- Decision Trees
- Random Forests
- Support Vector Machines (SVM)
- Neural Networks

#### 2. Regression (Continuous Outputs)

**Goal**: Predict a numerical value

**Examples**:
- House price based on features (price in dollars)
- Temperature prediction (degrees)
- Stock price (value)
- Sales forecast (quantity)

**Common Algorithms**:
- Linear Regression
- Polynomial Regression
- Ridge/Lasso Regression
- Random Forest Regressor
- Neural Networks

## 3. Example: Classification Problem

Let's build a classification model to identify iris species based on flower measurements.

In [None]:
# Load iris dataset
iris = datasets.load_iris()
X_iris = iris.data
y_iris = iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X_iris, y_iris, test_size=0.2, random_state=42, stratify=y_iris
)

# Train a Decision Tree classifier
clf = DecisionTreeClassifier(max_depth=3, random_state=42)
clf.fit(X_train, y_train)

# Make predictions
y_pred = clf.predict(X_test)

# Evaluate
accuracy = accuracy_score(y_test, y_pred)
print(f"Classification Accuracy: {accuracy:.2%}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

In [None]:
# Visualize classification results
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Actual vs Predicted (using two features for visualization)
feature_idx = [2, 3]  # Petal length and width
colors = ['red', 'green', 'blue']

for i, species in enumerate(iris.target_names):
    mask = y_test == i
    axes[0].scatter(
        X_test[mask, feature_idx[0]],
        X_test[mask, feature_idx[1]],
        c=colors[i],
        label=f'Actual {species}',
        marker='o',
        s=100,
        alpha=0.6
    )

# Mark misclassifications with X
misclassified = y_test != y_pred
if misclassified.any():
    axes[0].scatter(
        X_test[misclassified, feature_idx[0]],
        X_test[misclassified, feature_idx[1]],
        marker='x',
        s=200,
        c='black',
        linewidths=3,
        label='Misclassified'
    )

axes[0].set_xlabel(iris.feature_names[feature_idx[0]])
axes[0].set_ylabel(iris.feature_names[feature_idx[1]])
axes[0].set_title('Classification Results (Test Set)')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Plot 2: Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=iris.target_names,
            yticklabels=iris.target_names, ax=axes[1])
axes[1].set_title('Confusion Matrix')
axes[1].set_ylabel('Actual')
axes[1].set_xlabel('Predicted')

plt.tight_layout()
plt.show()

print("Notice: Classification assigns discrete labels (species names)")

## 4. Example: Regression Problem

Let's build a regression model to predict house prices based on various features.

In [None]:
# Load California housing dataset
housing = datasets.fetch_california_housing()
X_housing = housing.data
y_housing = housing.target  # Median house value (in $100,000s)

# Use only first 1000 samples for faster computation
X_housing = X_housing[:1000]
y_housing = y_housing[:1000]

print("Dataset shape:", X_housing.shape)
print("Features:", housing.feature_names)
print("Target: Median house value in $100,000s")
print(f"Price range: ${y_housing.min()*100000:.0f} - ${y_housing.max()*100000:.0f}")

In [None]:
# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X_housing, y_housing, test_size=0.2, random_state=42
)

# Train a Linear Regression model
reg = LinearRegression()
reg.fit(X_train, y_train)

# Make predictions
y_pred_train = reg.predict(X_train)
y_pred_test = reg.predict(X_test)

# Evaluate
train_r2 = r2_score(y_train, y_pred_train)
test_r2 = r2_score(y_test, y_pred_test)
test_rmse = np.sqrt(mean_squared_error(y_test, y_pred_test))

print(f"Training R² Score: {train_r2:.3f}")
print(f"Test R² Score: {test_r2:.3f}")
print(f"Test RMSE: ${test_rmse*100000:.2f}")
print(f"\nInterpretation: Model explains {test_r2:.1%} of variance in house prices")

In [None]:
# Visualize regression results
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Actual vs Predicted
axes[0].scatter(y_test, y_pred_test, alpha=0.6, edgecolors='k')
axes[0].plot(
    [y_test.min(), y_test.max()],
    [y_test.min(), y_test.max()],
    'r--', lw=2, label='Perfect Prediction'
)
axes[0].set_xlabel('Actual Price ($100,000s)')
axes[0].set_ylabel('Predicted Price ($100,000s)')
axes[0].set_title('Regression: Actual vs Predicted Prices')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Plot 2: Residuals (prediction errors)
residuals = y_test - y_pred_test
axes[1].scatter(y_pred_test, residuals, alpha=0.6, edgecolors='k')
axes[1].axhline(y=0, color='r', linestyle='--', lw=2)
axes[1].set_xlabel('Predicted Price ($100,000s)')
axes[1].set_ylabel('Residual (Actual - Predicted)')
axes[1].set_title('Residual Plot')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Notice: Regression predicts continuous values (not discrete categories)")

## 5. Unsupervised Learning Deep Dive

### What is Unsupervised Learning?

**Unsupervised learning** works with **unlabeled data**. We only have:
- **Input features (X)**: The data
- **No labels (y)**: We don't know the "correct answer"

The algorithm tries to find hidden patterns, structures, or groupings in the data.

### Main Types of Unsupervised Learning

#### 1. Clustering

**Goal**: Group similar data points together

**Examples**:
- Customer segmentation (group customers with similar behavior)
- Document categorization (group similar articles)
- Image compression (group similar colors)
- Anomaly detection (find outliers that don't fit any group)

**Common Algorithms**:
- K-Means
- DBSCAN
- Hierarchical Clustering
- Gaussian Mixture Models

#### 2. Dimensionality Reduction

**Goal**: Reduce number of features while preserving information

**Examples**:
- Visualizing high-dimensional data in 2D/3D
- Feature extraction before training ML models
- Data compression
- Noise reduction

**Common Algorithms**:
- Principal Component Analysis (PCA)
- t-SNE
- UMAP
- Autoencoders

### Key Difference from Supervised Learning

| Aspect | Supervised | Unsupervised |
|--------|-----------|-------------|
| **Data** | Labeled (X and y) | Unlabeled (only X) |
| **Goal** | Predict labels | Find patterns |
| **Evaluation** | Compare predictions to true labels | No ground truth; use metrics like silhouette score |
| **Use Case** | When you know what to predict | When exploring data or don't have labels |

## 6. Example: Clustering (K-Means)

Let's use K-Means to find groups in the Iris dataset, pretending we don't know the species labels.

In [None]:
# Use iris data but ignore the labels (unsupervised!)
X_iris_unsupervised = iris.data

# Apply K-Means clustering with 3 clusters
# We choose 3 because we suspect there might be 3 groups
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
cluster_labels = kmeans.fit_predict(X_iris_unsupervised)

# Evaluate clustering quality using silhouette score
# Score ranges from -1 to 1; higher is better
silhouette = silhouette_score(X_iris_unsupervised, cluster_labels)

print(f"Silhouette Score: {silhouette:.3f}")
print(f"Cluster assignments (first 10 samples): {cluster_labels[:10]}")
print(f"\nCluster sizes:")
unique, counts = np.unique(cluster_labels, return_counts=True)
for cluster, count in zip(unique, counts):
    print(f"  Cluster {cluster}: {count} samples")

In [None]:
# Visualize clusters
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Use petal measurements for visualization
petal_length = X_iris_unsupervised[:, 2]
petal_width = X_iris_unsupervised[:, 3]

# Plot 1: K-Means clusters (what algorithm found)
scatter1 = axes[0].scatter(
    petal_length, petal_width,
    c=cluster_labels, cmap='viridis',
    s=100, alpha=0.6, edgecolors='k'
)
# Plot cluster centers
centers = kmeans.cluster_centers_
axes[0].scatter(
    centers[:, 2], centers[:, 3],
    c='red', marker='X', s=300,
    edgecolors='black', linewidths=2,
    label='Cluster Centers'
)
axes[0].set_xlabel('Petal Length (cm)')
axes[0].set_ylabel('Petal Width (cm)')
axes[0].set_title('Unsupervised: K-Means Clusters')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
plt.colorbar(scatter1, ax=axes[0], label='Cluster')

# Plot 2: True species labels (for comparison)
scatter2 = axes[1].scatter(
    petal_length, petal_width,
    c=iris.target, cmap='viridis',
    s=100, alpha=0.6, edgecolors='k'
)
axes[1].set_xlabel('Petal Length (cm)')
axes[1].set_ylabel('Petal Width (cm)')
axes[1].set_title('True Species Labels (for comparison)')
axes[1].grid(True, alpha=0.3)
plt.colorbar(scatter2, ax=axes[1], label='Species')

plt.tight_layout()
plt.show()

print("Notice: Clustering found groups WITHOUT being told the species labels!")

## 7. Example: Dimensionality Reduction (PCA)

Let's use PCA to reduce the Iris dataset from 4 features to 2 for easy visualization.

In [None]:
# Apply PCA to reduce from 4D to 2D
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_iris_unsupervised)

# Check how much variance is preserved
explained_variance = pca.explained_variance_ratio_
print(f"Variance explained by PC1: {explained_variance[0]:.1%}")
print(f"Variance explained by PC2: {explained_variance[1]:.1%}")
print(f"Total variance preserved: {explained_variance.sum():.1%}")
print(f"\nOriginal shape: {X_iris_unsupervised.shape}")
print(f"Reduced shape: {X_pca.shape}")

In [None]:
# Visualize PCA results
plt.figure(figsize=(10, 6))

for i, species in enumerate(iris.target_names):
    mask = iris.target == i
    plt.scatter(
        X_pca[mask, 0], X_pca[mask, 1],
        label=species, s=100, alpha=0.7, edgecolors='k'
    )

plt.xlabel(f'First Principal Component ({explained_variance[0]:.1%} variance)')
plt.ylabel(f'Second Principal Component ({explained_variance[1]:.1%} variance)')
plt.title('PCA: 4D Iris Data Reduced to 2D')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("PCA reduced 4 features to 2 while keeping", f"{explained_variance.sum():.1%} of information!")

## 8. Supervised vs Unsupervised: Side-by-Side Comparison

Let's directly compare both approaches on the same dataset.

In [None]:
# Prepare iris data
X = iris.data
y_true = iris.target

# SUPERVISED: Train a classifier (knows the labels)
X_train, X_test, y_train, y_test = train_test_split(
    X, y_true, test_size=0.3, random_state=42
)
supervised_model = LogisticRegression(max_iter=200)
supervised_model.fit(X_train, y_train)
y_pred_supervised = supervised_model.predict(X_test)
supervised_accuracy = accuracy_score(y_test, y_pred_supervised)

# UNSUPERVISED: Apply clustering (doesn't know the labels)
unsupervised_model = KMeans(n_clusters=3, random_state=42, n_init=10)
y_pred_unsupervised = unsupervised_model.fit_predict(X)
unsupervised_silhouette = silhouette_score(X, y_pred_unsupervised)

# Compare results
print("=" * 60)
print("SUPERVISED LEARNING (Logistic Regression)")
print("=" * 60)
print(f"Training data: {len(X_train)} labeled samples")
print(f"Test accuracy: {supervised_accuracy:.2%}")
print("Evaluation: Compared predictions to true labels")

print("\n" + "=" * 60)
print("UNSUPERVISED LEARNING (K-Means Clustering)")
print("=" * 60)
print(f"Training data: {len(X)} unlabeled samples")
print(f"Silhouette score: {unsupervised_silhouette:.3f} (quality of clusters)")
print("Evaluation: No true labels; measured cluster cohesion")

print("\n" + "=" * 60)
print("KEY DIFFERENCES")
print("=" * 60)
print("• Supervised NEEDS labels; Unsupervised works WITHOUT labels")
print("• Supervised PREDICTS labels; Unsupervised DISCOVERS patterns")
print("• Supervised has clear accuracy metric; Unsupervised uses internal metrics")

## 9. When to Use Each Approach

### Choose Supervised Learning When:

✅ You have labeled training data  
✅ You know what you want to predict  
✅ You can measure success objectively (accuracy, error, etc.)  
✅ Examples: Spam detection, price prediction, medical diagnosis  

### Choose Unsupervised Learning When:

✅ You don't have labeled data (or it's expensive to obtain)  
✅ You want to explore and discover patterns  
✅ You need to reduce dimensionality or compress data  
✅ Examples: Customer segmentation, anomaly detection, recommendation systems  

### Can You Use Both?

**Yes!** Common combinations:

1. **Unsupervised → Supervised**:
   - Use PCA to reduce features, then train classifier
   - Use clustering to create new features

2. **Semi-Supervised Learning**:
   - Small amount of labeled data + large amount of unlabeled data
   - Use unsupervised to learn structure, then supervised to predict

3. **Unsupervised for Preprocessing**:
   - Anomaly detection to clean data
   - Feature extraction before supervised learning

## 10. Practice Exercises

### Exercise 1: Classification vs Regression

For each problem, identify whether it's classification or regression:

1. Predicting tomorrow's temperature
2. Detecting fraudulent credit card transactions
3. Estimating the number of sales next month
4. Classifying emails as important, promotional, or spam
5. Predicting whether a patient has a disease

Write your answers below:

In [None]:
# Your answers here (as comments)
# 1. 
# 2. 
# 3. 
# 4. 
# 5. 

### Exercise 2: Build a Regression Model

Use the diabetes dataset from sklearn to build a regression model:
1. Load the data using `datasets.load_diabetes()`
2. Split into train/test sets
3. Train a `RandomForestRegressor`
4. Evaluate using R² score
5. Compare to `LinearRegression` - which performs better?

In [None]:
# Your code here


### Exercise 3: Clustering Different Numbers of Clusters

Apply K-Means to the iris dataset with k=2, 3, 4, and 5 clusters. Calculate the silhouette score for each. Which value of k gives the best score?

In [None]:
# Your code here
# Hint: Loop through k values, fit KMeans, calculate silhouette_score


### Exercise 4: PCA with Different Components

Apply PCA to the iris dataset with 1, 2, and 3 components. For each, print the total variance explained. How many components do you need to explain at least 95% of variance?

In [None]:
# Your code here


## 11. Summary

### Key Concepts Learned

1. **Supervised Learning**:
   - Uses labeled data (X and y)
   - Two types: Classification (discrete) and Regression (continuous)
   - Goal: Learn to predict labels for new data
   - Evaluated by comparing predictions to true labels

2. **Unsupervised Learning**:
   - Uses unlabeled data (only X)
   - Two main types: Clustering and Dimensionality Reduction
   - Goal: Discover hidden patterns or structure
   - Evaluated using internal metrics (no ground truth)

3. **Key Differences**:
   - Supervised needs labels; unsupervised doesn't
   - Supervised predicts; unsupervised explores
   - Different use cases and evaluation methods

4. **Practical Skills**:
   - Built classification and regression models
   - Applied clustering (K-Means)
   - Used dimensionality reduction (PCA)
   - Compared both approaches on same data

### Decision Tree: Which Approach to Use?

```
Do you have labeled data?
├── YES: Supervised Learning
│   ├── Predicting categories? → Classification
│   └── Predicting numbers? → Regression
└── NO: Unsupervised Learning
    ├── Finding groups? → Clustering
    └── Reducing features? → Dimensionality Reduction
```

### Next Steps

In the next module, we'll dive deep into:
- **Data preparation techniques**
- **Train/test splitting strategies**
- **Feature scaling and encoding**
- **Handling missing data**
- **Data leakage and how to avoid it**

### Additional Resources

- [Scikit-learn Choosing the Right Estimator](https://scikit-learn.org/stable/tutorial/machine_learning_map/)
- [Supervised vs Unsupervised Learning](https://www.ibm.com/cloud/blog/supervised-vs-unsupervised-learning)
- [Andrew Ng's ML Course - Week 1](https://www.coursera.org/learn/machine-learning)