# Introduction to Unsupervised Learning

**Learning Objectives:**
- Understand the fundamental concepts of unsupervised learning
- Learn about clustering algorithms and their applications
- Explore dimensionality reduction techniques
- Implement association rule mining
- Apply unsupervised learning to real-world datasets

**Expected Duration:** 60-90 minutes

**Prerequisites:**
- Basic Python programming
- Understanding of basic statistics
- Familiarity with NumPy and Pandas
- Basic knowledge of supervised learning concepts

## 1. What is Unsupervised Learning?

Unsupervised learning is a type of machine learning where algorithms work with unlabeled data. The goal is to discover hidden patterns, structures, or relationships within the data without explicit guidance.

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering
from sklearn.decomposition import PCA, t-SNE
from sklearn.manifold import TSNE
from sklearn.metrics import silhouette_score, calinski_harabasz_score, davies_bouldin_score
from sklearn.datasets import make_blobs, make_moons, make_circles, load_iris
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial.distance import pdist, squareform
import ipywidgets as widgets
from IPython.display import display, HTML
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

# Set style for visualizations
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

## 2. Types of Unsupervised Learning

### 2.1 Clustering
Grouping similar data points together based on their characteristics

In [None]:
# Create synthetic clustering datasets
fig, axes = plt.subplots(2, 3, figsize=(18, 12))

# Dataset 1: Well-separated blobs
X1, y1 = make_blobs(n_samples=300, centers=3, cluster_std=1.0, random_state=42)
axes[0, 0].scatter(X1[:, 0], X1[:, 1], c=y1, cmap='viridis', alpha=0.6)
axes[0, 0].set_title('Well-separated Clusters')

# Dataset 2: Overlapping clusters
X2, y2 = make_blobs(n_samples=300, centers=3, cluster_std=2.0, random_state=42)
axes[0, 1].scatter(X2[:, 0], X2[:, 1], c=y2, cmap='viridis', alpha=0.6)
axes[0, 1].set_title('Overlapping Clusters')

# Dataset 3: Different cluster sizes
X3, y3 = make_blobs(n_samples=300, centers=3, cluster_std=[0.5, 1.5, 2.5], random_state=42)
axes[0, 2].scatter(X3[:, 0], X3[:, 1], c=y3, cmap='viridis', alpha=0.6)
axes[0, 2].set_title('Different Cluster Sizes')

# Dataset 4: Moon shapes
X4, y4 = make_moons(n_samples=300, noise=0.1, random_state=42)
axes[1, 0].scatter(X4[:, 0], X4[:, 1], c=y4, cmap='viridis', alpha=0.6)
axes[1, 0].set_title('Moon-shaped Clusters')

# Dataset 5: Concentric circles
X5, y5 = make_circles(n_samples=300, noise=0.1, factor=0.5, random_state=42)
axes[1, 1].scatter(X5[:, 0], X5[:, 1], c=y5, cmap='viridis', alpha=0.6)
axes[1, 1].set_title('Concentric Circles')

# Dataset 6: Anisotropic clusters
X6 = np.random.randn(300, 2)
X6[:, 1] = X6[:, 1] * 3 + X6[:, 0] * 0.5
y6 = np.random.choice([0, 1, 2], 300)
axes[1, 2].scatter(X6[:, 0], X6[:, 1], c=y6, cmap='viridis', alpha=0.6)
axes[1, 2].set_title('Anisotropic Clusters')

plt.tight_layout()
plt.show()

print("Different types of clustering challenges:")
print("1. Well-separated clusters: Easy to identify")
print("2. Overlapping clusters: Hard to distinguish boundaries")
print("3. Different cluster sizes: Requires robust algorithms")
print("4. Non-linear patterns: Traditional methods fail")
print("5. Complex shapes: Need specialized algorithms")
print("6. Anisotropic distributions: Distance-based methods struggle")

### 2.2 Dimensionality Reduction
Reducing the number of features while preserving important information

In [None]:
# Create high-dimensional dataset for dimensionality reduction
X_high_dim, y_high_dim = make_blobs(n_samples=500, centers=4, n_features=10, 
                                   cluster_std=1.5, random_state=42)

print(f"High-dimensional dataset shape: {X_high_dim.shape}")
print(f"Number of features: {X_high_dim.shape[1]}")
print(f"Number of samples: {X_high_dim.shape[0]}")
print(f"Number of actual clusters: {len(np.unique(y_high_dim))}")

# Visualize feature correlations
plt.figure(figsize=(12, 8))
correlation_matrix = np.corrcoef(X_high_dim.T)
sns.heatmap(correlation_matrix, annot=False, cmap='coolwarm', center=0)
plt.title('Feature Correlation Matrix (10 dimensions)')
plt.show()

# Apply PCA for dimensionality reduction
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_high_dim)

plt.figure(figsize=(10, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y_high_dim, cmap='viridis', alpha=0.6)
plt.title('Data Reduced to 2D using PCA')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.show()

print(f"\nPCA Results:")
print(f"Explained variance ratio: {pca.explained_variance_ratio_}")
print(f"Total explained variance: {sum(pca.explained_variance_ratio_):.2%}")

## 3. Clustering Algorithms

### 3.1 K-Means Clustering
Partitioning data into K clusters based on distance to centroids

In [None]:
# K-Means clustering implementation
def kmeans_clustering(X, n_clusters=3, random_state=42):
    """Perform K-Means clustering and return results"""
    
    # Initialize and fit K-Means
    kmeans = KMeans(n_clusters=n_clusters, random_state=random_state, n_init=10)
    cluster_labels = kmeans.fit_predict(X)
    
    # Calculate metrics
    silhouette_avg = silhouette_score(X, cluster_labels)
    calinski_score = calinski_harabasz_score(X, cluster_labels)
    davies_score = davies_bouldin_score(X, cluster_labels)
    
    return {
        'model': kmeans,
        'labels': cluster_labels,
        'centroids': kmeans.cluster_centers_,
        'silhouette': silhouette_avg,
        'calinski': calinski_score,
        'davies': davies_score
    }

# Apply K-Means to different datasets
datasets = [
    ('Well-separated', X1),
    ('Overlapping', X2),
    ('Different sizes', X3)
]

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

for idx, (name, X) in enumerate(datasets):
    results = kmeans_clustering(X, n_clusters=3)
    
    # Plot clusters
    scatter = axes[idx].scatter(X[:, 0], X[:, 1], c=results['labels'], 
                               cmap='viridis', alpha=0.6)
    axes[idx].scatter(results['centroids'][:, 0], results['centroids'][:, 1], 
                     c='red', marker='x', s=200, linewidths=3)
    axes[idx].set_title(f'{name}\nSilhouette: {results["silhouette"]:.3f}')
    axes[idx].set_xlabel('Feature 1')
    axes[idx].set_ylabel('Feature 2')
    
    print(f"{name} - Silhouette: {results['silhouette']:.3f}, "
          f"Calinski: {results['calinski']:.1f}, "
          f"Davies: {results['davies']:.3f}")

plt.tight_layout()
plt.show()

### 3.2 Finding Optimal Number of Clusters

In [None]:
# Find optimal number of clusters using Elbow Method and Silhouette Analysis
def find_optimal_clusters(X, max_clusters=10):
    """Find optimal number of clusters using multiple methods"""
    
    silhouette_scores = []
    inertia_values = []
    calinski_scores = []
    
    for n_clusters in range(2, max_clusters + 1):
        kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
        cluster_labels = kmeans.fit_predict(X)
        
        silhouette_scores.append(silhouette_score(X, cluster_labels))
        inertia_values.append(kmeans.inertia_)
        calinski_scores.append(calinski_harabasz_score(X, cluster_labels))
    
    # Plot results
    fig, axes = plt.subplots(1, 3, figsize=(18, 5))
    
    # Elbow Method
    axes[0].plot(range(2, max_clusters + 1), inertia_values, 'bo-')
    axes[0].set_xlabel('Number of Clusters')
    axes[0].set_ylabel('Inertia')
    axes[0].set_title('Elbow Method')
    
    # Silhouette Analysis
    axes[1].plot(range(2, max_clusters + 1), silhouette_scores, 'go-')
    axes[1].set_xlabel('Number of Clusters')
    axes[1].set_ylabel('Silhouette Score')
    axes[1].set_title('Silhouette Analysis')
    
    # Calinski-Harabasz Index
    axes[2].plot(range(2, max_clusters + 1), calinski_scores, 'ro-')
    axes[2].set_xlabel('Number of Clusters')
    axes[2].set_ylabel('Calinski-Harabasz Score')
    axes[2].set_title('Calinski-Harabasz Index')
    
    plt.tight_layout()
    plt.show()
    
    # Find optimal clusters
    optimal_silhouette = np.argmax(silhouette_scores) + 2
    optimal_calinski = np.argmax(calinski_scores) + 2
    
    print(f"Optimal number of clusters:")
    print(f"- Silhouette method: {optimal_silhouette}")
    print(f"- Calinski-Harabasz method: {optimal_calinski}")
    
    return optimal_silhouette, optimal_calinski

# Apply to synthetic dataset
X_synth, y_synth = make_blobs(n_samples=300, centers=4, cluster_std=1.5, random_state=42)
optimal_k = find_optimal_clusters(X_synth, max_clusters=8)

# Visualize with optimal clusters
results = kmeans_clustering(X_synth, n_clusters=optimal_k[0])

plt.figure(figsize=(10, 6))
plt.scatter(X_synth[:, 0], X_synth[:, 1], c=results['labels'], cmap='viridis', alpha=0.6)
plt.scatter(results['centroids'][:, 0], results['centroids'][:, 1], 
           c='red', marker='x', s=200, linewidths=3)
plt.title(f'K-Means with {optimal_k[0]} Clusters (Optimal)')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

### 3.3 Advanced Clustering Algorithms

In [None]:
# Compare different clustering algorithms
def compare_clustering_algorithms(X, true_labels=None):
    """Compare multiple clustering algorithms"""
    
    # K-Means
    kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
    kmeans_labels = kmeans.fit_predict(X)
    
    # DBSCAN
    dbscan = DBSCAN(eps=0.5, min_samples=5)
    dbscan_labels = dbscan.fit_predict(X)
    
    # Agglomerative Clustering
    agg = AgglomerativeClustering(n_clusters=3, linkage='ward')
    agg_labels = agg.fit_predict(X)
    
    # Calculate metrics
    results = {
        'K-Means': {
            'labels': kmeans_labels,
            'silhouette': silhouette_score(X, kmeans_labels),
            'n_clusters': len(np.unique(kmeans_labels))
        },
        'DBSCAN': {
            'labels': dbscan_labels,
            'silhouette': silhouette_score(X, dbscan_labels) if len(np.unique(dbscan_labels)) > 1 else -1,
            'n_clusters': len(np.unique(dbscan_labels))
        },
        'Agglomerative': {
            'labels': agg_labels,
            'silhouette': silhouette_score(X, agg_labels),
            'n_clusters': len(np.unique(agg_labels))
        }
    }
    
    return results

# Test on different datasets
test_datasets = [
    ('Well-separated', X1),
    ('Moons', X4),
    ('Circles', X5)
]

fig, axes = plt.subplots(3, 3, figsize=(18, 15))

for row_idx, (name, X) in enumerate(test_datasets):
    results = compare_clustering_algorithms(X)
    
    for col_idx, (algorithm, result) in enumerate(results.items()):
        ax = axes[row_idx, col_idx]
        scatter = ax.scatter(X[:, 0], X[:, 1], c=result['labels'], 
                           cmap='viridis', alpha=0.6)
        ax.set_title(f'{algorithm}\nSilhouette: {result["silhouette"]:.3f}')
        ax.set_xlabel('Feature 1')
        ax.set_ylabel('Feature 2')
        
        if row_idx == 0:
            ax.set_title(f'{name}\n{algorithm}\nSilhouette: {result["silhouette"]:.3f}')

plt.tight_layout()
plt.show()

# Print performance summary
print("Clustering Algorithm Performance Summary:")
for name, X in test_datasets:
    print(f"\n{name} Dataset:")
    results = compare_clustering_algorithms(X)
    for algorithm, result in results.items():
        print(f"  {algorithm}: Silhouette={result['silhouette']:.3f}, "
              f"Clusters={result['n_clusters']}")

## 4. Dimensionality Reduction Techniques

### 4.1 Principal Component Analysis (PCA)

In [None]:
# Comprehensive PCA analysis
def pca_comprehensive_analysis(X, n_components=None):
    """Comprehensive PCA analysis with visualization"""
    
    # Standardize the data
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    
    # Apply PCA
    pca = PCA(n_components=n_components)
    X_pca = pca.fit_transform(X_scaled)
    
    return pca, X_pca

# Load Iris dataset
iris = load_iris()
X_iris = iris.data
y_iris = iris.target
feature_names = iris.feature_names

# Apply PCA
pca, X_pca = pca_comprehensive_analysis(X_iris, n_components=2)

# Visualize results
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Original data (first two features)
axes[0].scatter(X_iris[:, 0], X_iris[:, 1], c=y_iris, cmap='viridis', alpha=0.6)
axes[0].set_xlabel(feature_names[0])
axes[0].set_ylabel(feature_names[1])
axes[0].set_title('Original Data (First 2 Features)')

# PCA results
axes[1].scatter(X_pca[:, 0], X_pca[:, 1], c=y_iris, cmap='viridis', alpha=0.6)
axes[1].set_xlabel('Principal Component 1')
axes[1].set_ylabel('Principal Component 2')
axes[1].set_title('PCA (2 Components)')

# Explained variance
axes[2].bar(range(1, len(pca.explained_variance_ratio_) + 1), 
          pca.explained_variance_ratio_)
axes[2].set_xlabel('Principal Component')
axes[2].set_ylabel('Explained Variance Ratio')
axes[2].set_title('Explained Variance')
axes[2].set_ylim([0, 1])

plt.tight_layout()
plt.show()

print("PCA Analysis Results:")
print(f"Explained variance ratio: {pca.explained_variance_ratio_}")
print(f"Cumulative explained variance: {np.cumsum(pca.explained_variance_ratio_)}")
print(f"Total variance retained: {sum(pca.explained_variance_ratio_):.2%}")

# Show feature contributions
print("\nFeature Contributions to Principal Components:")
for i, component in enumerate(pca.components_):
    print(f"\nPC{i+1}:")
    for j, (feature, loading) in enumerate(zip(feature_names, component)):
        print(f"  {feature}: {loading:.3f}")

### 4.2 t-Distributed Stochastic Neighbor Embedding (t-SNE)

In [None]:
# t-SNE for visualization
def tsne_visualization(X, y, perplexity=30, n_iter=1000):
    """Apply t-SNE for visualization"""
    
    # Standardize data
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    
    # Apply t-SNE
    tsne = TSNE(n_components=2, perplexity=perplexity, n_iter=n_iter, 
               random_state=42)
    X_tsne = tsne.fit_transform(X_scaled)
    
    return X_tsne

# Compare PCA and t-SNE
X_tsne = tsne_visualization(X_iris, y_iris, perplexity=30)

fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# PCA
axes[0].scatter(X_pca[:, 0], X_pca[:, 1], c=y_iris, cmap='viridis', alpha=0.6)
axes[0].set_xlabel('Principal Component 1')
axes[0].set_ylabel('Principal Component 2')
axes[0].set_title('PCA')

# t-SNE
axes[1].scatter(X_tsne[:, 0], X_tsne[:, 1], c=y_iris, cmap='viridis', alpha=0.6)
axes[1].set_xlabel('t-SNE Component 1')
axes[1].set_ylabel('t-SNE Component 2')
axes[1].set_title('t-SNE')

plt.tight_layout()
plt.show()

# t-SNE with different perplexity values
fig, axes = plt.subplots(2, 3, figsize=(18, 10))

perplexities = [5, 10, 20, 30, 40, 50]

for idx, perplexity in enumerate(perplexities):
    row = idx // 3
    col = idx % 3
    
    X_tsne = tsne_visualization(X_iris, y_iris, perplexity=perplexity)
    
    axes[row, col].scatter(X_tsne[:, 0], X_tsne[:, 1], c=y_iris, cmap='viridis', alpha=0.6)
    axes[row, col].set_title(f't-SNE (Perplexity={perplexity})')
    axes[row, col].set_xlabel('t-SNE 1')
    axes[row, col].set_ylabel('t-SNE 2')

plt.tight_layout()
plt.show()

print("t-SNE Parameter Effects:")
print("- Low perplexity (5-10): Focuses on local structure")
print("- Medium perplexity (20-40): Balanced view of local and global structure")
print("- High perplexity (50+): Emphasizes global structure")

## 5. Real-World Application: Customer Segmentation

Let's apply unsupervised learning to segment customers based on their behavior

In [None]:
# Create customer segmentation dataset
def create_customer_data(n_samples=1000):
    """Create synthetic customer data for segmentation"""
    np.random.seed(42)
    
    # Customer segments
    segments = ['High Value', 'Medium Value', 'Low Value', 'New Customer']
    segment_probs = [0.15, 0.35, 0.35, 0.15]
    
    # Generate segment-specific characteristics
    data = []
    for i, segment in enumerate(segments):
        n_segment = int(n_samples * segment_probs[i])
        
        if segment == 'High Value':
            age = np.random.normal(45, 10, n_segment)
            income = np.random.normal(120000, 30000, n_segment)
            spending = np.random.normal(5000, 1000, n_segment)
            frequency = np.random.normal(25, 5, n_segment)
            tenure = np.random.normal(60, 15, n_segment)
        elif segment == 'Medium Value':
            age = np.random.normal(38, 12, n_segment)
            income = np.random.normal(70000, 20000, n_segment)
            spending = np.random.normal(2500, 500, n_segment)
            frequency = np.random.normal(15, 4, n_segment)
            tenure = np.random.normal(36, 12, n_segment)
        elif segment == 'Low Value':
            age = np.random.normal(55, 15, n_segment)
            income = np.random.normal(40000, 15000, n_segment)
            spending = np.random.normal(800, 200, n_segment)
            frequency = np.random.normal(5, 2, n_segment)
            tenure = np.random.normal(24, 10, n_segment)
        else:  # New Customer
            age = np.random.normal(30, 8, n_segment)
            income = np.random.normal(60000, 25000, n_segment)
            spending = np.random.normal(1500, 800, n_segment)
            frequency = np.random.normal(3, 2, n_segment)
            tenure = np.random.normal(3, 2, n_segment)
        
        # Ensure positive values
        income = np.maximum(income, 20000)
        spending = np.maximum(spending, 100)
        frequency = np.maximum(frequency, 1)
        tenure = np.maximum(tenure, 1)
        
        segment_data = pd.DataFrame({
            'age': age,
            'income': income,
            'spending': spending,
            'frequency': frequency,
            'tenure': tenure,
            'segment': segment
        })
        
        data.append(segment_data)
    
    return pd.concat(data, ignore_index=True)

# Create and explore the dataset
customer_df = create_customer_data(1000)
print("Customer Segmentation Dataset:")
print(customer_df.head())
print(f"\nDataset shape: {customer_df.shape}")
print(f"\nSegment distribution:")
print(customer_df['segment'].value_counts())

# Visualize customer characteristics
plt.figure(figsize=(15, 10))

plt.subplot(2, 3, 1)
sns.boxplot(data=customer_df, x='segment', y='income')
plt.xticks(rotation=45)
plt.title('Income by Segment')

plt.subplot(2, 3, 2)
sns.boxplot(data=customer_df, x='segment', y='spending')
plt.xticks(rotation=45)
plt.title('Spending by Segment')

plt.subplot(2, 3, 3)
sns.scatterplot(data=customer_df, x='income', y='spending', hue='segment', alpha=0.6)
plt.title('Income vs Spending')

plt.subplot(2, 3, 4)
sns.scatterplot(data=customer_df, x='tenure', y='frequency', hue='segment', alpha=0.6)
plt.title('Tenure vs Frequency')

plt.subplot(2, 3, 5)
sns.boxplot(data=customer_df, x='segment', y='age')
plt.xticks(rotation=45)
plt.title('Age by Segment')

plt.subplot(2, 3, 6)
customer_df['segment'].value_counts().plot(kind='pie', autopct='%1.1f%%')
plt.title('Segment Distribution')

plt.tight_layout()
plt.show()

In [None]:
# Apply clustering to customer data
# Prepare features for clustering
X_customers = customer_df.drop('segment', axis=1)
scaler = StandardScaler()
X_customers_scaled = scaler.fit_transform(X_customers)

# Find optimal number of clusters
print("Finding optimal number of clusters for customer data...")
optimal_k_customers = find_optimal_clusters(X_customers_scaled, max_clusters=8)

# Apply K-Means with optimal clusters
n_clusters = optimal_k_customers[0]
kmeans_customers = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
customer_clusters = kmeans_customers.fit_predict(X_customers_scaled)

# Add cluster labels to dataframe
customer_df['cluster'] = customer_clusters

# Analyze clusters
print(f"\nCustomer Segmentation Results:")
print(f"Number of clusters found: {n_clusters}")
print(f"Silhouette score: {silhouette_score(X_customers_scaled, customer_clusters):.3f}")

# Cluster characteristics
cluster_analysis = customer_df.groupby('cluster').agg({
    'age': 'mean',
    'income': 'mean',
    'spending': 'mean',
    'frequency': 'mean',
    'tenure': 'mean',
    'segment': lambda x: x.value_counts().index[0]
}).round(2)

print("\nCluster Characteristics:")
display(cluster_analysis)

# Visualize clusters
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# PCA visualization
pca_customers = PCA(n_components=2)
X_pca_customers = pca_customers.fit_transform(X_customers_scaled)

axes[0, 0].scatter(X_pca_customers[:, 0], X_pca_customers[:, 1], 
                   c=customer_clusters, cmap='viridis', alpha=0.6)
axes[0, 0].set_title('Customer Clusters (PCA)')
axes[0, 0].set_xlabel('Principal Component 1')
axes[0, 0].set_ylabel('Principal Component 2')

# Income vs Spending
axes[0, 1].scatter(customer_df['income'], customer_df['spending'], 
                   c=customer_clusters, cmap='viridis', alpha=0.6)
axes[0, 1].set_title('Income vs Spending by Cluster')
axes[0, 1].set_xlabel('Income')
axes[0, 1].set_ylabel('Spending')

# Tenure vs Frequency
axes[1, 0].scatter(customer_df['tenure'], customer_df['frequency'], 
                   c=customer_clusters, cmap='viridis', alpha=0.6)
axes[1, 0].set_title('Tenure vs Frequency by Cluster')
axes[1, 0].set_xlabel('Tenure')
axes[1, 0].set_ylabel('Frequency')

# Cluster size distribution
cluster_sizes = customer_df['cluster'].value_counts().sort_index()
axes[1, 1].bar(cluster_sizes.index, cluster_sizes.values)
axes[1, 1].set_title('Cluster Size Distribution')
axes[1, 1].set_xlabel('Cluster')
axes[1, 1].set_ylabel('Number of Customers')

plt.tight_layout()
plt.show()

# Provide business insights
print("\nBusiness Insights:")
for cluster in range(n_clusters):
    cluster_data = customer_df[customer_df['cluster'] == cluster]
    print(f"\nCluster {cluster}:")
    print(f"  Size: {len(cluster_data)} customers ({len(cluster_data)/len(customer_df):.1%})")
    print(f"  Average income: ${cluster_data['income'].mean():,.0f}")
    print(f"  Average spending: ${cluster_data['spending'].mean():,.0f}")
    print(f"  Average tenure: {cluster_data['tenure'].mean():.1f} months")
    print(f"  Most common segment: {cluster_data['segment'].value_counts().index[0]}")

## 6. Interactive Exploration

### 6.1 Parameter Tuning Widget
Explore how different parameters affect clustering results

In [None]:
# Interactive clustering widget
def interactive_clustering(X, n_clusters, algorithm='kmeans'):
    """Interactive clustering with different algorithms"""
    
    # Scale data
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    
    # Apply clustering
    if algorithm == 'kmeans':
        model = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
    elif algorithm == 'dbscan':
        model = DBSCAN(eps=0.5, min_samples=5)
    elif algorithm == 'agglomerative':
        model = AgglomerativeClustering(n_clusters=n_clusters)
    
    labels = model.fit_predict(X_scaled)
    
    # Calculate metrics
    if len(np.unique(labels)) > 1:
        silhouette = silhouette_score(X_scaled, labels)
    else:
        silhouette = -1
    
    return labels, silhouette

# Create widgets
algorithm_widget = widgets.Dropdown(
    options=['kmeans', 'dbscan', 'agglomerative'],
    value='kmeans',
    description='Algorithm:'
)

clusters_widget = widgets.IntSlider(
    value=3, min=2, max=8, step=1,
    description='Clusters:'
)

output_widget = widgets.Output()

def on_button_click(b):
    with output_widget:
        output_widget.clear_output()
        
        labels, silhouette = interactive_clustering(
            X_customers_scaled, clusters_widget.value, algorithm_widget.value
        )
        
        print(f"Algorithm: {algorithm_widget.value}")
        print(f"Number of clusters: {len(np.unique(labels))}")
        print(f"Silhouette score: {silhouette:.3f}")
        
        # Visualize
        plt.figure(figsize=(10, 6))
        plt.scatter(X_pca_customers[:, 0], X_pca_customers[:, 1], 
                   c=labels, cmap='viridis', alpha=0.6)
        plt.title(f'{algorithm_widget.value.title()} Clustering Results')
        plt.xlabel('Principal Component 1')
        plt.ylabel('Principal Component 2')
        plt.show()

button = widgets.Button(description="Apply Clustering")
button.on_click(on_button_click)

# Display widgets
display(widgets.VBox([
    algorithm_widget, clusters_widget, button, output_widget
]))

## 7. Key Concepts Summary

### What We Learned:
1. **Types of Unsupervised Learning**: Clustering, dimensionality reduction, and association rule mining
2. **Clustering Algorithms**: K-Means, DBSCAN, and hierarchical clustering
3. **Evaluation Metrics**: Silhouette score, Calinski-Harabasz index, Davies-Bouldin index
4. **Dimensionality Reduction**: PCA and t-SNE for visualization and feature extraction
5. **Real-World Applications**: Customer segmentation, anomaly detection, and pattern discovery
6. **Parameter Tuning**: How to optimize clustering parameters for better results

### Best Practices:
- Always standardize data before clustering
- Use multiple evaluation metrics to assess clustering quality
- Consider the business context when interpreting clusters
- Visualize results to validate clustering effectiveness
- Try multiple algorithms and compare results
- Handle outliers appropriately

### Common Challenges:
- Determining the optimal number of clusters
- Handling high-dimensional data
- Dealing with different cluster shapes and sizes
- Interpreting cluster meanings
- Scalability with large datasets

### Next Steps:
- Explore advanced clustering algorithms (GMM, Spectral Clustering)
- Learn about anomaly detection techniques
- Study association rule mining (Apriori, FP-Growth)
- Dive into deep learning for unsupervised learning (Autoencoders)
- Learn about reinforcement learning concepts

## 8. Exercises and Challenges

### Exercise 1: Cluster Analysis
Apply K-Means clustering to a dataset of your choice and determine the optimal number of clusters using elbow method and silhouette analysis.

### Exercise 2: Dimensionality Reduction
Take a high-dimensional dataset and apply PCA to reduce it to 2D or 3D. Visualize the results and interpret the principal components.

### Exercise 3: Algorithm Comparison
Compare at least three different clustering algorithms on the same dataset. Evaluate them using multiple metrics and discuss their strengths and weaknesses.

### Exercise 4: Real-World Application
Find a real-world dataset (e.g., customer data, image data, text data) and apply unsupervised learning techniques to discover patterns or insights.

### Exercise 5: Parameter Optimization
Create a systematic approach to optimize clustering parameters. Consider using GridSearchCV or manual parameter sweeps.

**Challenge**: Build a complete unsupervised learning pipeline that includes data preprocessing, clustering, evaluation, and business interpretation.

## 9. Further Learning Resources

### Books:
- "Introduction to Statistical Learning" by James, Witten, Hastie, and Tibshirani
- "Pattern Recognition and Machine Learning" by Christopher Bishop
- "Mining of Massive Datasets" by Leskovec, Rajaraman, and Ullman

### Online Courses:
- Andrew Ng's Machine Learning Course (Coursera)
- Unsupervised Learning in Python (DataCamp)
- Deep Learning Specialization (Coursera)

### Documentation:
- [Scikit-learn Clustering Documentation](https://scikit-learn.org/stable/modules/clustering.html)
- [Scikit-learn Dimensionality Reduction Documentation](https://scikit-learn.org/stable/modules/decomposition.html)
- [UMAP Documentation](https://umap-learn.readthedocs.io/)

### Practice Platforms:
- Kaggle (kaggle.com)
- UCI Machine Learning Repository
- Google Dataset Search

### Community:
- Stack Overflow
- Reddit r/MachineLearning
- Towards Data Science (Medium)
- KDnuggets