# Linear Regression â€“ Functional Test Notebook

This notebook is used to **numerically test** the Rust-based `MyRustLinearRegression`
model exposed through the `rust_core` Python module.

The goal here is *correctness*, not speed:
- Verify that training runs without errors
- Check that predictions have the expected shape
- Compare predictions against scikit-learn's `LinearRegression` on simple synthetic data


## Imports


In [28]:
import importlib
import sys
from pathlib import Path
import numpy as np

# Ensure repo root is on sys.path so we can import the C++ wrapper
repo_root = Path.cwd().resolve().parents[1]
if str(repo_root) not in sys.path:
    sys.path.append(str(repo_root))

import coreflux_rust
from wrapper import coreflux_cpp
from sklearn.linear_model import LinearRegression


## Simple numerical test data

In [29]:
# Simple 2D linear regression toy example

X_train = np.array([
    [1.0, 2.0],
    [3.0, 4.0],
    [5.0, 6.0],
    [7.0, 8.0],
], dtype=np.float64)

y_train = np.array([10.0, 20.0, 30.0, 40.0], dtype=np.float64)

X_test = np.array([
    [2.0, 3.0],
    [6.0, 7.0],
], dtype=np.float64)

print("X_train shape:", X_train.shape)
print("y_train shape:", y_train.shape)
print("X_test shape:", X_test.shape)


X_train shape: (4, 2)
y_train shape: (4,)
X_test shape: (2, 2)


### Feature scaling

In [30]:
# Column-wise L2 normalization (same as you used before)
norms = np.linalg.norm(X_train, axis=0)
norms[norms == 0.0] = 1.0

X_train_scaled = X_train / norms
X_test_scaled = X_test / norms

print("X_train_scaled:\n", X_train_scaled)
print("X_test_scaled:\n", X_test_scaled)


X_train_scaled:
 [[0.10910895 0.18257419]
 [0.32732684 0.36514837]
 [0.54554473 0.54772256]
 [0.76376262 0.73029674]]
X_test_scaled:
 [[0.21821789 0.27386128]
 [0.65465367 0.63900965]]


## Reference: scikit-learn `LinearRegression` on the same data

In [31]:
lr_sklearn = LinearRegression()
lr_sklearn.fit(X_train_scaled, y_train)

pred_sklearn = lr_sklearn.predict(X_test_scaled)
print("sklearn predictions:", pred_sklearn)
print("Shape:", pred_sklearn.shape)


sklearn predictions: [15. 35.]
Shape: (2,)


## Test: Rust `LinearRegression` 

In [32]:
lr_rust = coreflux_rust.LinearRegresssion(
    learning_rate=0.05,
    iterations=100_000,
    mode=coreflux_rust.Mode.Regression,
)

lr_rust.fit(X_train_scaled, y_train)
print("Rust model (100k iters) fitted successfully.")

pred_rust = lr_rust.predict(X_test_scaled)
print("Predictions (100k iters):", pred_rust)
print("Shape:", pred_rust.shape)


Rust model (100k iters) fitted successfully.
Predictions (100k iters): [15. 35.]
Shape: (2,)


## Test: C++ `LinearRegression`


In [None]:
# Optimized C++ implementation via ctypes wrapper
lr_cpp = coreflux_cpp.LinearRegressionV21(
    learning_rate=0.05,
    iterations=100_000,
)

lr_cpp.fit(X_train_scaled, y_train)
print("C++ model fitted successfully.")

pred_cpp = lr_cpp.predict(X_test_scaled)
print("C++ predictions:", pred_cpp)
print("Shape:", pred_cpp.shape)


C++ model fitted successfully.
C++ predictions: [15. 35.]
Shape: (2,)


## Compare Rust and C++ vs scikit-learn numerically

In [34]:
def print_diff(name, a, b):
    diff = a - b
    print(f"{name} diff:")
    print("  values:", diff)
    print("  L2 norm:", np.linalg.norm(diff))
    print("  max abs:", np.max(np.abs(diff)))
    print()

print("Rust (100k iters) vs sklearn:")
print_diff("100k iters", pred_rust, pred_sklearn)

print("C++ (100k iters) vs sklearn:")
print_diff("C++", pred_cpp, pred_sklearn)


Rust (100k iters) vs sklearn:
100k iters diff:
  values: [ 1.4033219e-13 -1.3500312e-13]
  L2 norm: 1.9472792812579195e-13
  max abs: 1.4033219031261979e-13

C++ (100k iters) vs sklearn:
C++ diff:
  values: [ 1.4033219e-13 -1.3500312e-13]
  L2 norm: 1.9472792812579195e-13
  max abs: 1.4033219031261979e-13



## Notes

- This notebook is meant for **functional testing**, while a separate notebook
  can be used for **performance benchmarking**.
