# Comparison of computation times

We will compare the training times and performances of LinearRegression and SGDRegressor when the number of features increases.

In [6]:
import numpy as np
from sklearn.linear_model import LinearRegression, SGDRegressor
from sklearn.datasets import make_regression
from sklearn.metrics import r2_score
import time

In [7]:

# Function to compare models
def compare_models(n_samples, n_features):
    print(f"\n--- Dataset: {n_samples} samples, {n_features} features ---")

    # Generate synthetic data
    X, y = make_regression(n_samples=n_samples, n_features=n_features, noise=0.1, random_state=42)

    # Linear Regression
    start = time.time()
    linreg = LinearRegression().fit(X, y)
    linreg_time = time.time() - start
    linreg_r2 = r2_score(y, linreg.predict(X))

    # SGD Regressor
    start = time.time()
    sgd = SGDRegressor(max_iter=1000, tol=1e-3, random_state=42).fit(X, y)
    sgd_time = time.time() - start
    sgd_r2 = r2_score(y, sgd.predict(X))

    # Display results
    print(f"LinearRegression - Training time: {linreg_time:.4f}s | R²: {linreg_r2:.4f}")
    print(f"SGDRegressor     - Training time: {sgd_time:.4f}s | R²: {sgd_r2:.4f}")

In [8]:
# Run comparisons
compare_models(10_000, 10)
compare_models(10_000, 100)
compare_models(10_000, 1_000)
compare_models(1_000_000, 100)


--- Dataset: 10000 samples, 10 features ---
LinearRegression - Training time: 0.0265s | R²: 1.0000
SGDRegressor     - Training time: 0.0319s | R²: 1.0000

--- Dataset: 10000 samples, 100 features ---
LinearRegression - Training time: 0.8863s | R²: 1.0000
SGDRegressor     - Training time: 0.0628s | R²: 1.0000

--- Dataset: 10000 samples, 1000 features ---
LinearRegression - Training time: 15.8792s | R²: 1.0000
SGDRegressor     - Training time: 0.3645s | R²: 1.0000

--- Dataset: 1000000 samples, 100 features ---
LinearRegression - Training time: 9.6525s | R²: 1.0000
SGDRegressor     - Training time: 6.9456s | R²: 1.0000


## Key Takeaways

- **LinearRegression** performs best on **small or low-dimensional datasets**, thanks to its direct analytical solution.  
- **SGDRegressor** scales much better as the **number of features or samples increases**, showing significant speed advantages in **high-dimensional or large-scale data**.  
- **Performance (R²)** remains identical because both methods converge to similar optimal coefficients under ideal conditions.  
- The **computational cost** of LinearRegression grows roughly with **O(n²·p)**, while **SGDRegressor** grows more linearly with the number of samples and features, depending on the number of iterations.  


## Conclusion

For **small datasets**, *LinearRegression* is **simple, fast, and exact**.  
For **large-scale or high-dimensional datasets**, *SGDRegressor* is **much more efficient** and **memory-friendly**, making it the better choice in real-world machine learning pipelines.