# Expert Level - Traffic Sign Recognition

## Table of Contents

1. **T1: Dataset Loading** - Load images and extract class labels
2. **T2: Preprocessing Methods** - Compare different preprocessing techniques
3. **T3: Feature Extraction** - Implement and compare different feature extraction methods
4. **T4: Classification Models** - Test different classifiers and find best combinations

## Baseline Setup

- **Preprocessing**: Resize + Grayscale + Gaussian Blur
- **Features**: HOG (Histogram of Oriented Gradients)
- **Classifier**: SVM (Support Vector Machine)

**Note**: See `concepts.ipynb` for visualizations and concept explanations.

In [1]:
# Imports
import cv2
import numpy as np
import glob
import os
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import image as mpimg
from skimage.feature import hog, local_binary_pattern
from skimage.color import rgb2hsv
from skimage.measure import regionprops
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neural_network import MLPClassifier
import warnings
warnings.filterwarnings('ignore')

# Set style for better plots
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

print("Libraries imported successfully!")



Libraries imported successfully!


## T1: Dataset Loading

Load images and extract class IDs from filenames.


In [2]:
def load_dataset():
    """Load dataset from disk."""
    # Get project root (go up from expert/ folder)
    current_dir = os.getcwd()
    if 'expert' in current_dir:
        project_root = os.path.dirname(current_dir)
    else:
        project_root = current_dir
    
    dataset_path = os.path.join(project_root, "datasets", "dataset1") + os.sep
    
    X = []
    y = []
    image_files = glob.glob(dataset_path + '*.png', recursive=True)
    
    for i in image_files:
        filename = os.path.basename(i)
        class_id = filename[:3]
        y.append(class_id)
        
        img = cv2.imread(i)
        if img is not None:
            X.append(img)
    
    return X, y

# Load dataset
X, y = load_dataset()
print(f"✓ Loaded {len(X)} images")
print(f"✓ {len(set(y))} unique classes")
print(f"✓ Sample class IDs: {sorted(set(y))[:10]}...")



✓ Loaded 5998 images
✓ 58 unique classes
✓ Sample class IDs: ['000', '001', '002', '003', '004', '005', '006', '007', '008', '009']...


## T2: Preprocessing Methods

Four preprocessing techniques:

1. **Simple**: Resize + Grayscale
2. **Blur**: Resize + Grayscale + Gaussian Blur (reduces noise)
3. **Histogram Eq**: Equalize histograms + Resize + Grayscale (improves contrast)
4. **Advanced**: Resize + Grayscale + Blur + Histogram Eq + Normalization


In [3]:
def preprocessing_simple(X):
    """Simple: Resize + Grayscale"""
    X_processed = []
    for x in X:
        temp_x = cv2.resize(x, (48, 48))
        temp_x = cv2.cvtColor(temp_x, cv2.COLOR_BGR2GRAY)
        X_processed.append(temp_x)
    return X_processed

def preprocessing_blur(X):
    """Blur: Resize + Grayscale + Gaussian Blur"""
    X_processed = []
    for x in X:
        temp_x = cv2.resize(x, (48, 48))
        temp_x = cv2.cvtColor(temp_x, cv2.COLOR_BGR2GRAY)
        temp_x = cv2.GaussianBlur(temp_x, (3, 3), 0)
        X_processed.append(temp_x)
    return X_processed

def preprocessing_histogram_eq(X):
    """Histogram Equalization: Equalize each channel + Resize + Grayscale"""
    X_processed = []
    for x in X:
        b, g, r = cv2.split(x)
        bH = cv2.equalizeHist(b)
        gH = cv2.equalizeHist(g)
        rH = cv2.equalizeHist(r)
        result = cv2.merge((bH, gH, rH))
        result = cv2.resize(result, (48, 48))
        result = cv2.cvtColor(result, cv2.COLOR_BGR2GRAY)
        X_processed.append(result)
    return X_processed

def preprocessing_advanced(X):
    """Advanced: Resize + Grayscale + Blur + Histogram Equalization + Normalization"""
    X_processed = []
    for x in X:
        temp_x = cv2.resize(x, (48, 48))
        temp_x = cv2.cvtColor(temp_x, cv2.COLOR_BGR2GRAY)
        temp_x = cv2.GaussianBlur(temp_x, (3, 3), 0)
        temp_x = cv2.equalizeHist(temp_x)
        temp_x = temp_x.astype(np.float32) / 255.0
        X_processed.append(temp_x)
    return X_processed

print("✓ Preprocessing functions defined")



✓ Preprocessing functions defined


## T3: Feature Extraction

Implement different feature extraction methods and compare their performance.

In [4]:
def feature_HOG(X_processed):
    """HOG: Histogram of Oriented Gradients"""
    X_features = []
    for x in X_processed:
        x_feature = hog(x, orientations=8, pixels_per_cell=(10, 10),
                        cells_per_block=(1, 1), visualize=False)
        X_features.append(x_feature)
    return np.array(X_features)

def feature_LBP(X_processed):
    """LBP: Local Binary Pattern"""
    X_features = []
    radius = 3
    n_points = 8 * radius
    for x in X_processed:
        x_feature = local_binary_pattern(x, n_points, radius)
        x_feature = x_feature.reshape(-1)
        X_features.append(x_feature)
    return np.array(X_features)

def feature_pyramid(X_processed):
    """Feature Pyramid: Multi-scale Laplacian pyramid"""
    num_layers = 3
    X_features = []
    for x in X_processed:
        gaussian_pyr = [x]
        image = x
        for i in range(1, num_layers):
            image = cv2.pyrDown(image)
            gaussian_pyr.append(image)
        
        laplacian_pyr = [gaussian_pyr[num_layers - 1]]
        for i in range(num_layers - 1, 0, -1):
            expanded = cv2.pyrUp(gaussian_pyr[i])
            laplacian = cv2.subtract(gaussian_pyr[i - 1], expanded)
            laplacian_pyr.append(laplacian)
        
        flattened_arrays = [arr.flatten() for arr in laplacian_pyr]
        x_feature = np.concatenate(flattened_arrays)
        X_features.append(x_feature)
    return np.array(X_features)

def feature_FFT(X_processed):
    """FFT: Frequency domain magnitude spectrum"""
    X_features = []
    for x in X_processed:
        f = np.fft.fft2(x)
        f_shift = np.fft.fftshift(f)
        magnitude_spectrum = 20 * np.log(np.abs(f_shift) + 1)
        magnitude_spectrum = magnitude_spectrum.reshape(-1)
        X_features.append(magnitude_spectrum)
    return np.array(X_features)

def feature_HuMoments(X_processed):
    """Hu Moments: 7 invariant shape moments"""
    X_features = []
    for x in X_processed:
        ret, binary = cv2.threshold(x, 127, 255, cv2.THRESH_BINARY)
        contours, _ = cv2.findContours(binary, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
        if len(contours) == 0:
            hu_moments = np.zeros(7)
        else:
            moments = cv2.moments(contours[0])
            hu_moments = cv2.HuMoments(moments).reshape(-1)
        X_features.append(hu_moments)
    return np.array(X_features)

def feature_multi(X, X_processed):
    """Multi-feature: HOG + Color (HSV) + Shape features"""
    X_features = []
    for x, x_processed in zip(X, X_processed):
        # HOG
        hog_feature = hog(x_processed, orientations=8, pixels_per_cell=(10, 10),
                          cells_per_block=(1, 1), visualize=False)
        
        # Color features (HSV)
        hsv_image = rgb2hsv(x)
        hue_hist = np.histogram(hsv_image[:, :, 0], bins=8, range=(0, 1))[0]
        sat_hist = np.histogram(hsv_image[:, :, 1], bins=8, range=(0, 1))[0]
        val_hist = np.histogram(hsv_image[:, :, 2], bins=8, range=(0, 1))[0]
        
        # Shape features
        label_img = np.uint8(x_processed > 0)
        props = regionprops(label_img)[0]
        shape_features = [props.area, props.perimeter, props.eccentricity]
        
        # Combine
        x_features = np.concatenate((hog_feature, hue_hist, sat_hist, val_hist, shape_features))
        X_features.append(x_features)
    return np.array(X_features)

print("✓ Feature extraction functions defined")



✓ Feature extraction functions defined


## T4: Classification Models

Test different classifiers and find the best preprocessing + feature + classifier combinations.

### T4.1: Compare Preprocessing Methods

**Goal**: Compare preprocessing techniques.

**Setup**: HOG features + SVM classifier, vary preprocessing method.

In [5]:
# Exploration 1: Compare Preprocessing Methods
print("=" * 60)
print("T4.1: Compare Preprocessing Methods")
print("=" * 60)
print("Testing: HOG features + SVM classifier\n")

# Test different preprocessing methods
preprocessing_methods = {
    'Simple': preprocessing_simple,
    'Blur': preprocessing_blur,
    'Histogram Eq': preprocessing_histogram_eq,
    'Advanced': preprocessing_advanced,
}

results_preprocessing = []
classifier = SVC()

for prep_name, prep_func in preprocessing_methods.items():
    # Preprocess
    X_processed = prep_func(X)
    
    # Extract features
    X_features = feature_HOG(X_processed)
    
    # Split data
    X_train, X_test, y_train, y_test = train_test_split(
        X_features, y, test_size=0.2, random_state=42
    )
    
    # Train and evaluate
    classifier.fit(X_train, y_train)
    accuracy = classifier.score(X_test, y_test)
    
    results_preprocessing.append({
        'Preprocessing': prep_name,
        'Accuracy': accuracy
    })
    print(f"  {prep_name:20s}: {accuracy:.4f}")

df_preprocessing = pd.DataFrame(results_preprocessing)
print("\n" + df_preprocessing.to_string(index=False))

# Find best preprocessing
best_prep = df_preprocessing.loc[df_preprocessing['Accuracy'].idxmax()]
print(f"\n✓ Best preprocessing: {best_prep['Preprocessing']} ({best_prep['Accuracy']:.4f})")



T4.1: Compare Preprocessing Methods
Testing: HOG features + SVM classifier

  Simple              : 0.9417
  Blur                : 0.9558
  Histogram Eq        : 0.9475
  Advanced            : 0.9442

Preprocessing  Accuracy
       Simple  0.941667
         Blur  0.955833
 Histogram Eq  0.947500
     Advanced  0.944167

✓ Best preprocessing: Blur (0.9558)


### T4.2: Compare Classifiers

**Goal**: Compare classifier performance.

**Setup**: Blur preprocessing + HOG features, vary classifier type.

In [6]:
# Exploration 2: Compare Classifiers
print("=" * 60)
print("T4.2: Compare Classifiers")
print("=" * 60)
print("Testing: Blur preprocessing + HOG features\n")

# Preprocess (using blur from baseline)
X_processed = preprocessing_blur(X)
print(f"✓ Preprocessed: {len(X_processed)} images")

# Extract features
X_features = feature_HOG(X_processed)
print(f"✓ Features extracted: {X_features.shape}")

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X_features, y, test_size=0.2, random_state=42
)

# Define classifiers
classifiers = {
    'SVM': SVC(),
    'Random Forest': RandomForestClassifier(),
    'kNN': KNeighborsClassifier(),
    'Decision Tree': DecisionTreeClassifier(),
    'Naive Bayes': GaussianNB(),
    'MLP': MLPClassifier(max_iter=500)
}

results_classifiers = []
for name, classifier in classifiers.items():
    classifier.fit(X_train, y_train)
    accuracy = classifier.score(X_test, y_test)
    results_classifiers.append({'Classifier': name, 'Accuracy': accuracy})
    print(f"  {name:20s}: {accuracy:.4f}")

df_classifiers = pd.DataFrame(results_classifiers)
print("\n" + df_classifiers.to_string(index=False))

# Find best classifier
best_classifier = df_classifiers.loc[df_classifiers['Accuracy'].idxmax()]
print(f"\n✓ Best classifier: {best_classifier['Classifier']} ({best_classifier['Accuracy']:.4f})")



T4.2: Compare Classifiers
Testing: Blur preprocessing + HOG features

✓ Preprocessed: 5998 images
✓ Features extracted: (5998, 128)
  SVM                 : 0.9558
  Random Forest       : 0.9750
  kNN                 : 0.8633
  Decision Tree       : 0.9250
  Naive Bayes         : 0.7992
  MLP                 : 0.9667

   Classifier  Accuracy
          SVM  0.955833
Random Forest  0.975000
          kNN  0.863333
Decision Tree  0.925000
  Naive Bayes  0.799167
          MLP  0.966667

✓ Best classifier: Random Forest (0.9750)


## Exploration 3: Advanced Techniques & Combinations

**Goal**: Explore advanced techniques and find best combinations.

### Part A: Multi-feature + PCA + Scaling

**PCA**: Reduces dimensions while preserving information. Finds principal components (directions of max variance), projects data, keeps top N components.

**Why Scaling**: Features have different scales (HOG: 0-1, area: 0-2304). Classifiers (SVM, kNN, MLP) are scale-sensitive. StandardScaler normalizes to mean=0, std=1.

**Setup**: Advanced preprocessing + Multi-feature (HOG+Color+Shape) + PCA + Scaling


In [7]:
# Exploration 3A: Multi-feature + PCA + Scaling
print("=" * 60)
print("T4.3: Advanced Techniques (Multi-feature + PCA + Scaling)")
print("=" * 60)

# Preprocess
X_processed = preprocessing_advanced(X)
print(f"✓ Preprocessed: {len(X_processed)} images")

# Extract multi-features
X_features = feature_multi(X, X_processed)
print(f"✓ Features extracted: {X_features.shape} (before PCA)")

# Apply PCA
pca = PCA(n_components=100)
X_features = pca.fit_transform(X_features)
print(f"✓ After PCA: {X_features.shape} (reduced from {feature_multi(X[:1], X_processed[:1]).shape[1]} to 100)")

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X_features, y, test_size=0.2, random_state=42
)

# Test classifiers with scaling
results_advanced = []
for name, classifier in classifiers.items():
    # Scale features
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    classifier.fit(X_train_scaled, y_train)
    accuracy = classifier.score(X_test_scaled, y_test)
    results_advanced.append({'Classifier': name, 'Accuracy': accuracy})
    print(f"  {name:20s}: {accuracy:.4f}")

df_advanced = pd.DataFrame(results_advanced)
print("\n" + df_advanced.to_string(index=False))



T4.3: Advanced Techniques (Multi-feature + PCA + Scaling)
✓ Preprocessed: 5998 images
✓ Features extracted: (5998, 155) (before PCA)
✓ After PCA: (5998, 100) (reduced from 155 to 100)
  SVM                 : 0.9600
  Random Forest       : 0.9500
  kNN                 : 0.7950
  Decision Tree       : 0.8950
  Naive Bayes         : 0.7575
  MLP                 : 0.9617

   Classifier  Accuracy
          SVM  0.960000
Random Forest  0.950000
          kNN  0.795000
Decision Tree  0.895000
  Naive Bayes  0.757500
          MLP  0.961667


### Part B: All Combinations Comparison

**Goal**: Test all combinations of preprocessing × feature × classifier to find optimal setup.


In [8]:
# Exploration 3B: All Combinations
print("=" * 60)
print("T3.1: Compare Feature Extraction Methods")
print("=" * 60)
print("Testing: preprocessing × feature extraction × classifier\n")

# Define methods (subset for speed - can expand)
preprocessing_methods = {
    'Simple': preprocessing_simple,
    'Blur': preprocessing_blur,
    'Histogram Eq': preprocessing_histogram_eq,
}

feature_methods = {
    'HOG': feature_HOG,
    'LBP': feature_LBP,
    'Pyramid': feature_pyramid,
    'FFT': feature_FFT,
    'Hu Moments': feature_HuMoments,
}

# Test combinations (using SVM for speed - can test all classifiers)
results_combinations = []
classifier = SVC()

for prep_name, prep_func in preprocessing_methods.items():
    X_processed = prep_func(X)
    
    for feat_name, feat_func in feature_methods.items():
        try:
            X_features = feat_func(X_processed)
            
            X_train, X_test, y_train, y_test = train_test_split(
                X_features, y, test_size=0.2, random_state=42
            )
            
            classifier.fit(X_train, y_train)
            accuracy = classifier.score(X_test, y_test)
            
            results_combinations.append({
                'Preprocessing': prep_name,
                'Feature': feat_name,
                'Accuracy': accuracy
            })
            print(f"  {prep_name:15s} × {feat_name:15s}: {accuracy:.4f}")
        except Exception as e:
            print(f"  {prep_name:15s} × {feat_name:15s}: ERROR - {str(e)[:30]}")

df_combinations = pd.DataFrame(results_combinations)
if len(df_combinations) > 0:
    print("\n" + df_combinations.to_string(index=False))
    
    # Find best combination
    best = df_combinations.loc[df_combinations['Accuracy'].idxmax()]
    print(f"\n✓ Best combination: {best['Preprocessing']} × {best['Feature']} = {best['Accuracy']:.4f}")


T3.1: Compare Feature Extraction Methods
Testing: preprocessing × feature extraction × classifier

  Simple          × HOG            : 0.9417
  Simple          × LBP            : 0.9158
  Simple          × Pyramid        : 0.8175
  Simple          × FFT            : 0.7950
  Simple          × Hu Moments     : 0.0858
  Blur            × HOG            : 0.9558
  Blur            × LBP            : 0.9275
  Blur            × Pyramid        : 0.7533
  Blur            × FFT            : 0.5825
  Blur            × Hu Moments     : 0.0875
  Histogram Eq    × HOG            : 0.9475
  Histogram Eq    × LBP            : 0.9075
  Histogram Eq    × Pyramid        : 0.8775
  Histogram Eq    × FFT            : 0.8283
  Histogram Eq    × Hu Moments     : 0.0850

Preprocessing    Feature  Accuracy
       Simple        HOG  0.941667
       Simple        LBP  0.915833
       Simple    Pyramid  0.817500
       Simple        FFT  0.795000
       Simple Hu Moments  0.085833
         Blur        HOG  0.95

In [9]:
# Summary
print("=" * 60)
print("SUMMARY & CONCLUSIONS")
print("=" * 60)

print("\n1. Preprocessing Methods Comparison:")
print(df_preprocessing.to_string(index=False))

print("\n2. Classifiers Comparison:")
print(df_classifiers.to_string(index=False))

print("\n3. Advanced Approach (Multi-feature + PCA + Scaling):")
print(df_advanced.to_string(index=False))

if len(results_combinations) > 0:
    print("\n4. Best Combinations (Preprocessing × Feature):")
    print(df_combinations.nlargest(5, 'Accuracy').to_string(index=False))

# Compare best accuracies
best_prep = df_preprocessing['Accuracy'].max()
best_classifier = df_classifiers['Accuracy'].max()
best_advanced = df_advanced['Accuracy'].max()
best_combination = df_combinations['Accuracy'].max() if len(results_combinations) > 0 else 0

print("\n" + "=" * 60)
print("BEST RESULTS:")
print("=" * 60)
print(f"Best Preprocessing:        {df_preprocessing.loc[df_preprocessing['Accuracy'].idxmax(), 'Preprocessing']:20s} ({best_prep:.4f})")
print(f"Best Classifier:           {df_classifiers.loc[df_classifiers['Accuracy'].idxmax(), 'Classifier']:20s} ({best_classifier:.4f})")
print(f"Advanced Approach:         Multi-feature+PCA+Scaling ({best_advanced:.4f})")
if best_combination > 0:
    best_combo = df_combinations.loc[df_combinations['Accuracy'].idxmax()]
    print(f"Best Combination:         {best_combo['Preprocessing']} × {best_combo['Feature']:20s} ({best_combination:.4f})")

print("\n" + "=" * 60)
print("KEY INSIGHTS:")
print("=" * 60)
if best_advanced > best_classifier:
    improvement = (best_advanced - best_classifier) * 100
    print(f"✓ Multi-feature + PCA improves by {improvement:.2f}% over single features")
else:
    print("✓ Single features perform well")

if best_combination > 0 and best_combination > best_advanced:
    print(f"✓ Best combination outperforms advanced approach")
    print(f"  → Use: {df_combinations.loc[df_combinations['Accuracy'].idxmax(), 'Preprocessing']} + {df_combinations.loc[df_combinations['Accuracy'].idxmax(), 'Feature']}")
else:
    print("✓ Advanced approach (multi-feature + PCA) is optimal")



SUMMARY & CONCLUSIONS

1. Preprocessing Methods Comparison:
Preprocessing  Accuracy
       Simple  0.941667
         Blur  0.955833
 Histogram Eq  0.947500
     Advanced  0.944167

2. Classifiers Comparison:
   Classifier  Accuracy
          SVM  0.955833
Random Forest  0.975000
          kNN  0.863333
Decision Tree  0.925000
  Naive Bayes  0.799167
          MLP  0.966667

3. Advanced Approach (Multi-feature + PCA + Scaling):
   Classifier  Accuracy
          SVM  0.960000
Random Forest  0.950000
          kNN  0.795000
Decision Tree  0.895000
  Naive Bayes  0.757500
          MLP  0.961667

4. Best Combinations (Preprocessing × Feature):
Preprocessing Feature  Accuracy
         Blur     HOG  0.955833
 Histogram Eq     HOG  0.947500
       Simple     HOG  0.941667
         Blur     LBP  0.927500
       Simple     LBP  0.915833

BEST RESULTS:
Best Preprocessing:        Blur                 (0.9558)
Best Classifier:           Random Forest        (0.9750)
Advanced Approach:         Mult