This project is a hands-on experimental lab designed to understand how different gradient-based optimizers and learning-rate schedules influence the performance of linear regression models. It focuses on building everything from scratch—loss functions, gradients, optimizers, and schedulers—to analyze convergence speed, training stability, and generalization.
This repository implements a complete workflow for regularized linear regression (MSE + L2).
It provides a reproducible environment to compare:
- Batch / Mini-batch / Stochastic Gradient Descent
- Optimizers: SGD, Momentum, Adagrads, RMSProp, Adam
- Learning-rate schedules: Inverse Time, Step Decay, Exponential, Polynomial, Cosine Annealing, SGDR
- Fully from-scratch implementations (loss, gradients, optimizers, LR schedulers)
- Modular structure for rapid experimentation
- Consistent logging: train/test loss, best iteration, parameter stability
- Visualization tools for loss curves and learning rate evolution
- Experiment-driven workflow (each Ex1-xx isolates one technique)
- SGD
- SGD with Momentum
- RMSProp
- Adam
- Batch Gradient Descent
- Mini-batch Gradient Descent
- Stochastic Gradient Descent
- Fixed learning rate
- Inverse Time Decay
- Step Decay
- Exponential Decay
- Polynomial Decay
- Cosine Annealing
- Cosine Annealing with Warm Restarts (SGDR)
- L2 (Ridge)
GradientDescentLab/ ├── gdlib/ # Core implementations: optimizers, LR schedulers, utilities ├── notebooks/ # Experiment notebooks (Ex1-xx) ├── images/ # Exported plots and visual results └── README.md
Install dependencies: pip install -r requirements.txt
Launch Jupyter: jupyter lab
Reproduce an experiment: 1. Open any notebook inside notebooks/ 2. Select optimizer and learning-rate schedule 3. Set hyperparameters (learning rate, decay, batch size, regularization) 4. Run training 5. Visualize the results (loss curves, learning-rate curve)
The following comparisons are selected from a broader set of 44 controlled experiments.
The groups highlighted here represent the most informative and practically relevant outcomes for understanding optimizer behavior in linear regression training.
- Combines Adam’s adaptive updates with SGD-style stochasticity
- Produces fast early convergence with moderate variance
- Useful for examining the trade-off between stability and exploration
- Shows clear improvements in escaping shallow minima
- Achieves better mid-training generalization compared to fixed learning rates
- Demonstrates strong performance in scenarios requiring periodic learning-rate resets
- Provides the most stable and predictable convergence among all tested setups
- Reduces gradient noise while retaining efficiency
- Achieves consistently strong generalization across experiments
More detailed plots and experiment logs are available in the images/ folder and notebooks.