# Benchmark: Rust Linear Regression vs scikit-learn

This notebook benchmarks my Rust-based `MyRustLinearRegression`
(exposed via the `rust_core` module) against scikit-learn:

- `LinearRegression` (closed-form solution)
- `SGDRegressor` (iterative gradient-based optimizer)

The goal is to compare **training + prediction time** on the same synthetic dataset.


## Imports

In [1]:
import importlib
import time
import platform
import numpy as np

import rust_core
from sklearn.linear_model import LinearRegression, SGDRegressor

importlib.reload(rust_core)

print("Python version:", platform.python_version())
print("Platform:", platform.platform())
print("NumPy version:", np.__version__)
print("Rust Core path:", rust_core.__file__)


Python version: 3.11.9
Platform: Linux-6.14.0-35-generic-x86_64-with-glibc2.39
NumPy version: 2.0.1
Rust Core path: /home/jonaslilletvedt/miniconda3/envs/rust-ml/lib/python3.11/site-packages/rust_core/__init__.py


## Generate synthetic dataset

In [None]:
# Use a reasonably large dataset to see performance differences
rng = np.random.default_rng(0)

n_train = 200_000   
n_test = 10_000    
n_features = 40    
X_train = rng.normal(size=(n_train, n_features))
w_true = rng.normal(size=n_features)
y_train = X_train @ w_true + rng.normal(scale=0.1, size=n_train)

X_test = rng.normal(size=(n_test, n_features))

# Column-wise normalization (same as in functional tests)
norms = np.linalg.norm(X_train, axis=0)
norms[norms == 0.0] = 1.0

X_train_scaled = X_train / norms
X_test_scaled = X_test / norms

print("X_train_scaled:", X_train_scaled.shape)
print("X_test_scaled:", X_test_scaled.shape)
print("y_train:", y_train.shape)


X_train_scaled: (100000, 40)
X_test_scaled: (10000, 40)
y_train: (100000,)


## Benchmark helper

In [3]:
def bench(name, func, repeat=5):
    """Run `func` multiple times and print min/mean runtime."""
    times = []
    for _ in range(repeat):
        start = time.perf_counter()
        func()
        times.append(time.perf_counter() - start)
    times = np.array(times)
    print(f"{name}: min={times.min():.4f}s  mean={times.mean():.4f}s over {repeat} runs")


## scikit-learn: `LinearRegression` (closed-form baseline)

In [4]:
def run_sklearn_linear():
    model = LinearRegression()
    model.fit(X_train_scaled, y_train)
    _ = model.predict(X_test_scaled)

# Warm-up run (JIT, caching, etc.)
run_sklearn_linear()

bench("sklearn LinearRegression", run_sklearn_linear)


sklearn LinearRegression: min=0.0263s  mean=0.0281s over 5 runs


## scikit-learn: `SGDRegressor` (iterative, closer to your Rust implementation)

In [5]:
def run_sklearn_sgd():
    model = SGDRegressor(
        learning_rate="constant",
        eta0=0.05,
        max_iter=1_000,
        penalty=None,
        random_state=0,
    )
    model.fit(X_train_scaled, y_train)
    _ = model.predict(X_test_scaled)

# Warm-up
run_sklearn_sgd()

bench("sklearn SGDRegressor (1k iters)", run_sklearn_sgd)


sklearn SGDRegressor (1k iters): min=0.6409s  mean=0.6704s over 5 runs


## Rust: `MyRustLinearRegression`

In [6]:
def run_rust_linear(iterations=1_000):
    model = rust_core.MyRustLinearRegression(
        learning_rate=0.05,
        iterations=iterations,
        mode=rust_core.Mode.Regression,
    )
    model.fit(X_train_scaled, y_train)
    _ = model.predict(X_test_scaled)

# Warm-up
run_rust_linear(iterations=1_000)

bench("rust_core MyRustLinearRegression (1k iters)", lambda: run_rust_linear(1_000))


rust_core MyRustLinearRegression (1k iters): min=6.3376s  mean=6.4164s over 5 runs


## Ideas for further experiments

- Vary `n_train`, `n_test` and `n_features` to see how each implementation scales.
- Plot runtime vs. iterations for the Rust model.
- Compare not only runtime but also **error** (e.g. MSE) against the true weights `w_true`.
- Dump benchmark results to a CSV file so you can track progress over time as you optimize.
