
# Random-Subspace SVM — Demo Experiment

This notebook gives a **5‑minute, reproducible** tour of the core idea from the paper *Regularized ERM on Random Subspaces* (AISTATS 2021).
It builds a small **Random‑Subspace SVM** (an ensemble of SVMs trained on random feature subsets) and compares it to a standard SVM on a synthetic dataset.

**What you'll see**
- quick install and run
- simple, well‑commented reference implementation (scikit‑learn compatible)
- sanity‑check metrics and a clean plot

> Tip: Run `Runtime → Run all` to reproduce the results end‑to‑end.


In [None]:

# Minimal dependencies for this demo:
# (Install once per environment. Comment out if already installed.)
# %pip install -q numpy scipy scikit-learn matplotlib
import numpy as np
from dataclasses import dataclass
from typing import Optional, List, Tuple
from time import perf_counter

import matplotlib.pyplot as plt
from sklearn.base import BaseEstimator, ClassifierMixin, clone
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score, ConfusionMatrixDisplay
from sklearn.svm import SVC, LinearSVC
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.utils import check_random_state



## 1) Create a toy high‑dimensional dataset
We simulate a binary classification task with **50 features** where only a subset is informative.  
This is a common regime where random subspaces can help.


In [None]:

RANDOM_SEED = 7
rng = np.random.default_rng(RANDOM_SEED)

X, y = make_classification(
    n_samples=2000,
    n_features=50,
    n_informative=10,
    n_redundant=10,
    n_repeated=0,
    n_classes=2,
    n_clusters_per_class=2,
    class_sep=1.2,
    flip_y=0.02,
    random_state=RANDOM_SEED,
)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=RANDOM_SEED, stratify=y
)

X_train.shape, X_test.shape



## 2) A tiny Random‑Subspace SVM (sklearn‑compatible)

We train **M base SVMs**, each on a **subset of features** (size `k`).  
At inference, we **average decision functions** (or majority vote) for a stable, regularized predictor.

This keeps the demo lightweight while conveying the core idea.


In [None]:

class RandomSubspaceSVM(BaseEstimator, ClassifierMixin):
    """Ensemble of SVMs trained on random feature subsets.
    
    Parameters
    ----------
    base_estimator : sklearn classifier
        Typically `LinearSVC()` or `SVC(kernel="rbf")`.
    M : int
        Number of base models (subspaces).
    k : int
        Subspace dimension (number of features per model).
    voting : {'hard','soft'}
        'hard' uses majority vote; 'soft' averages decision_function.
    random_state : int or numpy RandomState
        Reproducibility.
    scale : bool
        If True, includes a StandardScaler before the SVM in each base learner.
    """
    def __init__(self, base_estimator=None, M=25, k=20, voting='soft', random_state=None, scale=True):
        self.base_estimator = base_estimator if base_estimator is not None else LinearSVC(dual=False)
        self.M = int(M)
        self.k = int(k)
        self.voting = voting
        self.random_state = random_state
        self.scale = scale
        
    def fit(self, X, y):
        rs = check_random_state(self.random_state)
        n_features = X.shape[1]
        if self.k <= 0 or self.k > n_features:
            raise ValueError(f"k must be in [1, {n_features}] (got {self.k}).")
        
        self.subspaces_: List[np.ndarray] = []
        self.models_: List[Pipeline] = []
        
        for m in range(self.M):
            feat_idx = rs.choice(n_features, size=self.k, replace=False)
            self.subspaces_.append(np.sort(feat_idx))
            
            est = clone(self.base_estimator)
            steps = []
            if self.scale:
                steps.append(('scaler', StandardScaler(with_mean=True)))
            steps.append(('clf', est))
            pipe = Pipeline(steps)
            pipe.fit(X[:, feat_idx], y)
            self.models_.append(pipe)
        return self
    
    def decision_function(self, X):
        if not hasattr(self, 'models_'):
            raise RuntimeError("Model not fitted.")
        # Average decision function (soft voting)
        scores = None
        for pipe, idx in zip(self.models_, self.subspaces_):
            # Ensure the estimator has decision_function; fall back to predict_proba if available.
            clf = pipe.named_steps['clf']
            if hasattr(clf, 'decision_function'):
                s = pipe.decision_function(X[:, idx])
            elif hasattr(clf, 'predict_proba'):
                proba = pipe.predict_proba(X[:, idx])
                # Map proba to signed score: P(y=1)-P(y=0)
                s = proba[:,1] - proba[:,0]
            else:
                # Last resort: use predicted labels as +/-1
                preds = pipe.predict(X[:, idx])
                s = (preds * 2 - 1).astype(float)
            scores = s if scores is None else scores + s
        return scores / self.M
    
    def predict(self, X):
        if self.voting == 'soft':
            return (self.decision_function(X) >= 0).astype(int)
        else:
            # Hard voting
            votes = None
            for pipe, idx in zip(self.models_, self.subspaces_):
                pred = pipe.predict(X[:, idx])
                votes = pred if votes is None else votes + pred
            # majority threshold
            return (votes >= (self.M/2)).astype(int)



## 3) Train baseline SVM vs. Random‑Subspace SVM
We compare a strong baseline SVM with RBF kernel to an ensemble of `M=25` linear SVMs on `k=20`‑dimensional subspaces.


In [None]:

# Baseline
baseline = Pipeline([
    ('scaler', StandardScaler(with_mean=True)),
    ('clf', SVC(kernel='rbf', C=1.0, gamma='scale', probability=False))
])

# Random‑Subspace ensemble
rssvm = RandomSubspaceSVM(
    base_estimator=LinearSVC(dual=False), 
    M=25, k=20, voting='soft', random_state=RANDOM_SEED, scale=True
)

t0 = perf_counter(); baseline.fit(X_train, y_train); t_baseline = perf_counter()-t0
t0 = perf_counter(); rssvm.fit(X_train, y_train);     t_rssvm   = perf_counter()-t0

yhat_base = baseline.predict(X_test)
yhat_rss  = rssvm.predict(X_test)

acc_base = accuracy_score(y_test, yhat_base)
acc_rss  = accuracy_score(y_test, yhat_rss)

print(f"Baseline SVM (RBF):  acc={acc_base:.3f}  train_time={t_baseline*1e3:.0f} ms")
print(f"Random‑Subspace SVM: acc={acc_rss:.3f}  train_time={t_rssvm*1e3:.0f} ms")



## 4) Quick cross‑validation (sanity check)
We do a 3‑fold CV on the training set. (Fast and indicative; tune as needed.)


In [None]:

scores_base = cross_val_score(baseline, X_train, y_train, cv=3, n_jobs=None)
scores_rss  = cross_val_score(
    RandomSubspaceSVM(base_estimator=LinearSVC(dual=False), M=15, k=15, voting='soft', random_state=RANDOM_SEED),
    X_train, y_train, cv=3, n_jobs=None
)
print(f"CV accuracy — Baseline SVM:  mean={scores_base.mean():.3f} ± {scores_base.std():.3f}")
print(f"CV accuracy — RSSVM (M=15,k=15): mean={scores_rss.mean():.3f} ± {scores_rss.std():.3f}")



## 5) Visualize performance
A simple confusion matrix for each model on the test set.


In [None]:

fig, axes = plt.subplots(1, 2, figsize=(10, 4))
disp1 = ConfusionMatrixDisplay.from_predictions(y_test, yhat_base, ax=axes[0], colorbar=False)
axes[0].set_title("Baseline SVM (RBF)")

disp2 = ConfusionMatrixDisplay.from_predictions(y_test, yhat_rss, ax=axes[1], colorbar=False)
axes[1].set_title("Random‑Subspace SVM")

plt.tight_layout()
plt.show()



## 6) Next steps
- Replace this lightweight reference with the **full experimental pipeline** in `scripts/` when ready.
- Try other subspace sizes `k`, number of models `M`, or an RBF base SVM.
- Port this class to the library (e.g., `src/`) and add unit tests.
- Consider Nyström kernel approximations for large‑scale experiments.

**Reproducibility:** set `RANDOM_SEED` and keep `train/test` splits fixed when comparing settings.
