# DTSA 5510: Unsupervised Algorithms in Machine Learning

## Course Overview and Quick Reference Guide

This notebook serves as a comprehensive overview and quick reference guide for the key concepts, techniques, and implementations covered in this course.

### Course Objectives
- Understanding unsupervised learning principles
- Implementing clustering algorithms
- Applying dimensionality reduction techniques
- Analyzing patterns in unlabeled data

In [None]:
# Import common libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering
from sklearn.decomposition import PCA, NMF
from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler

# Display settings
%matplotlib inline
plt.style.use('seaborn')
np.random.seed(42)

## Week 1: Introduction to Unsupervised Learning

### Key Concepts
- 

### Important Terms
- 

### Code Examples

In [None]:
def preprocess_data(X):
    """Preprocess data for unsupervised learning"""
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    return X_scaled, scaler

## Week 2: K-Means Clustering

### Key Concepts
- 

### Important Parameters
- 

### Code Examples

In [None]:
def kmeans_clustering(X, n_clusters=3):
    """Perform K-means clustering"""
    kmeans = KMeans(n_clusters=n_clusters, random_state=42)
    labels = kmeans.fit_predict(X)
    
    # Plot results if 2D
    if X.shape[1] == 2:
        plt.figure(figsize=(10, 6))
        plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
        plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], 
                    marker='x', s=200, linewidths=3, color='r', label='Centroids')
        plt.title('K-means Clustering Results')
        plt.legend()
        plt.show()
        
    return labels, kmeans.cluster_centers_

## Week 3: Hierarchical Clustering

### Key Concepts
- 

### Important Methods
- 

### Code Examples

In [None]:
def hierarchical_clustering(X, n_clusters=3):
    """Perform hierarchical clustering"""
    clustering = AgglomerativeClustering(n_clusters=n_clusters)
    labels = clustering.fit_predict(X)
    
    # Plot dendrogram
    from scipy.cluster.hierarchy import dendrogram, linkage
    plt.figure(figsize=(10, 7))
    dendrogram(linkage(X, method='ward'))
    plt.title('Hierarchical Clustering Dendrogram')
    plt.xlabel('Sample Index')
    plt.ylabel('Distance')
    plt.show()
    
    return labels

## Week 4: Density-Based Clustering

### Key Concepts
- 

### Important Parameters
- 

### Code Examples

In [None]:
def dbscan_clustering(X, eps=0.5, min_samples=5):
    """Perform DBSCAN clustering"""
    dbscan = DBSCAN(eps=eps, min_samples=min_samples)
    labels = dbscan.fit_predict(X)
    
    # Plot results if 2D
    if X.shape[1] == 2:
        plt.figure(figsize=(10, 6))
        plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
        plt.title('DBSCAN Clustering Results')
        plt.show()
        
    return labels

## Week 5: Principal Component Analysis

### Key Concepts
- 

### Important Components
- 

### Code Examples

In [None]:
def perform_pca(X, n_components=2):
    """Perform PCA dimensionality reduction"""
    pca = PCA(n_components=n_components)
    X_pca = pca.fit_transform(X)
    
    # Plot explained variance ratio
    plt.figure(figsize=(10, 6))
    plt.plot(np.cumsum(pca.explained_variance_ratio_))
    plt.xlabel('Number of Components')
    plt.ylabel('Cumulative Explained Variance Ratio')
    plt.title('PCA Explained Variance Ratio')
    plt.show()
    
    return X_pca, pca

## Week 6: t-SNE and UMAP

### Key Concepts
- 

### Important Parameters
- 

### Code Examples

In [None]:
def perform_tsne(X, perplexity=30):
    """Perform t-SNE dimensionality reduction"""
    tsne = TSNE(n_components=2, perplexity=perplexity, random_state=42)
    X_tsne = tsne.fit_transform(X)
    
    # Plot results
    plt.figure(figsize=(10, 6))
    plt.scatter(X_tsne[:, 0], X_tsne[:, 1])
    plt.title('t-SNE Visualization')
    plt.show()
    
    return X_tsne

## Week 7: Matrix Factorization

### Key Concepts
- 

### Important Methods
- 

### Code Examples

In [None]:
def perform_nmf(X, n_components=2):
    """Perform Non-negative Matrix Factorization"""
    nmf = NMF(n_components=n_components, random_state=42)
    W = nmf.fit_transform(X)
    H = nmf.components_
    
    return W, H, nmf

## Week 8: Advanced Topics and Applications

### Key Concepts
- 

### Important Applications
- 

### Code Examples

In [None]:
def evaluate_clustering(X, labels):
    """Evaluate clustering results"""
    from sklearn.metrics import silhouette_score, calinski_harabasz_score
    
    metrics = {
        'silhouette': silhouette_score(X, labels),
        'calinski_harabasz': calinski_harabasz_score(X, labels)
    }
    
    return metrics

## Additional Resources and References

### Useful Libraries
- Scikit-learn: Machine learning algorithms
- UMAP-learn: UMAP implementation
- Scipy: Scientific computing
- Yellowbrick: ML visualization

### External Links
- Course materials
- Algorithm implementations
- Research papers

### Personal Notes
- Key algorithms
- Parameter selection
- Best practices