# 7.0 Supervised learning algorithms
This lesson provides an overview of several key unsupervised learning algorithms and their practical implementation.

**Lesson Objectives:** By the end of the lesson, students should be able to:
* Understand the basic concepts of unsupervised learning algorithms.
* Implement key unsupervised learning algorithms using real-world datasets.

Unsupervised learning algorithms are used when the data does not have labeled output or target variables. The goal is to infer the underlying structure or distribution in the data. Unlike supervised learning, unsupervised learning doesn't have a predefined target variable, so the focus is on discovering patterns, groupings, or relationships within the data.

# 7.1. Clustering
Clustering algorithms group similar data points together based on their features. This is useful for discovering inherent structures in the data.

## 7.1.1. K-Means Clustering
K-Means is one of the most popular clustering algorithms. It partitions data into *k* distinct, non-overlapping clusters based on feature similarity.

**How it works:**
1. Choose k initial centroids (randomly or using heuristics).
2. Assign each data point to the closest centroid.
3. Recompute the centroids based on the points assigned to them.
4. Repeat steps 2 and 3 until convergence (centroids don't change significantly).

**Key Concept:** The algorithm minimizes the within-cluster sum of squares (WCSS), also known as inertia, to achieve compact and well-separated clusters.

In [None]:
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

# Generate synthetic dataset
X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)

# Apply KMeans with 4 clusters
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)

# Get cluster centers and labels
centroids = kmeans.cluster_centers_
labels = kmeans.labels_

# Plot the data and cluster centers
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.scatter(centroids[:, 0], centroids[:, 1], marker='X', s=200, c='red')
plt.title("K-Means Clustering")
plt.show()
# In this example, K-Means finds 4 clusters in the data, 
# and the red 'X' marks the centroids of each cluster.

## 7.1.2. Hierarchical Clustering
Hierarchical clustering builds a tree of clusters (also called a dendrogram). It can be agglomerative (bottom-up) or divisive (top-down).

**How it works:**
* **Agglomerative:** Starts with each point as its own cluster, and iteratively merges the closest pairs of clusters.
* **Divisive:** Starts with all points in a single cluster, and recursively splits it into smaller clusters.

**Key Concept:** The result is a tree-like structure that represents different levels of similarity. The user can choose a level of the tree to "cut" to get a specific number of clusters.

In [None]:
from sklearn.cluster import AgglomerativeClustering

# Apply Agglomerative Clustering
agg_clustering = AgglomerativeClustering(n_clusters=4)
labels = agg_clustering.fit_predict(X)

# Plot the hierarchical clustering result
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.title("Agglomerative Clustering")
plt.show()
# This method also finds 4 clusters but uses a different approach than K-Means.

# 7.2. Dimensionality Reduction
Dimensionality reduction techniques aim to reduce the number of features in the data while preserving as much information as possible. These techniques are especially useful when dealing with high-dimensional data (many features) that may be sparse or redundant.

## 7.2.1. Principal Component Analysis (PCA)

PCA is a technique for reducing the dimensionality of data by projecting it onto fewer dimensions (principal components) that explain the most variance in the data.

**How it works:** PCA identifies the directions (principal components) along which the data varies the most and projects the data onto these directions. The first few principal components typically retain most of the information.

**Key Concept:** PCA finds the eigenvectors and eigenvalues of the data's covariance matrix. The eigenvectors determine the direction of the new feature axes, and the eigenvalues determine their importance.

In [None]:
from sklearn.decomposition import PCA
import numpy as np

# Reduce the data to 2 dimensions using PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

# Plot the data in 2D
plt.scatter(X_pca[:, 0], X_pca[:, 1])
plt.title("PCA - Dimensionality Reduction")
plt.show()
# This example reduces the data to two dimensions and projects it onto the two principal components.

## 7.2.2. t-Distributed Stochastic Neighbor Embedding (t-SNE)
t-SNE is a non-linear dimensionality reduction technique used for visualizing high-dimensional data in 2D or 3D. It is particularly good for visualizing clusters.

**How it works:** t-SNE minimizes the divergence between probability distributions that measure pairwise similarities in the high-dimensional and low-dimensional spaces.

**Key Concept:** Unlike PCA, which focuses on variance, t-SNE focuses on preserving local structure, making it better at capturing clusters. t-SNE can often reveal more visually distinct clusters compared to PCA, especially in cases where the data has complex relationships.

In [None]:
from sklearn.manifold import TSNE

# Apply t-SNE for dimensionality reduction
tsne = TSNE(n_components=2)
X_tsne = tsne.fit_transform(X)

# Plot the 2D data after t-SNE
plt.scatter(X_tsne[:, 0], X_tsne[:, 1])
plt.title("t-SNE Visualization")
plt.show()

# 7.3. Anomaly Detection
Anomaly detection identifies outliers or unusual data points that do not conform to expected patterns.

**Isolation Forest:** Isolation Forest is an unsupervised algorithm specifically designed for anomaly detection. It isolates outliers by randomly selecting a feature and then randomly selecting a split value between the minimum and maximum values of that feature.

**How it works:** The idea is that anomalies are easier to isolate because they are far from the majority of the data points. The algorithm builds a forest of trees where outliers are isolated faster.

In [None]:
from sklearn.ensemble import IsolationForest

# Generate some synthetic data with outliers
X_outliers = np.vstack([X, np.random.uniform(low=-10, high=10, size=(10, 2))])

# Fit Isolation Forest model
model = IsolationForest()
outliers = model.fit_predict(X_outliers)

# Plot data points and highlight outliers
plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c=outliers, cmap='coolwarm')
plt.title("Isolation Forest - Anomaly Detection")
plt.show()
# the data points that are far from the main cluster (outliers) are flagged as anomalies.

# 7.4. Association Rule Learning
Association rule learning is used to discover interesting relationships or patterns in large datasets, typically in market basket analysis.

**Apriori Algorithm:** The Apriori algorithm is used to mine frequent itemsets and generate association rules. It’s widely used for market basket analysis to find relationships between products purchased together.

**How it works:** The algorithm finds all itemsets that appear frequently together in the dataset and then generates rules like “If a customer buys bread, they are likely to also buy butter.”

In [None]:
from mlxtend.frequent_patterns import apriori, association_rules
import pandas as pd

# Example dataset: transactions with items
data = {'Bread': [1, 1, 1, 0, 1],
        'Butter': [1, 1, 0, 1, 1],
        'Jam': [0, 1, 1, 0, 1]}

df = pd.DataFrame(data)

# Find frequent itemsets
frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True)

# Generate association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

print(rules)