feat: Add decoupled weight decay (AdamW) to Adam optimizer by AliAlimohammadi · Pull Request #1037 · TheAlgorithms/Rust

AliAlimohammadi · 2026-04-23T21:56:15Z

Description

Extends the existing Adam optimizer in src/machine_learning/optimization/adam.rs to support decoupled weight decay (AdamW), as introduced in Decoupled Weight Decay Regularization (Loshchilov & Hutter, 2019).

Rather than adding a separate AdamW struct, a single weight_decay: f64 field (defaulting to 0.0) is added to the existing Adam struct. When weight_decay is 0.0 the update is identical to standard Adam. When positive, the decay term $\lambda \cdot \theta_{t-1}$ is subtracted directly from the parameters after the adaptive gradient step, keeping it independent of the second moment scaling — the key correction AdamW makes over naive L2 regularisation inside Adam.

The step signature changes from step(&mut self, gradients: &[f64]) to step(&mut self, gradients: &[f64], params: &[f64]), since the decoupled decay term requires the current parameter values. All existing tests are updated accordingly (zero-initialised params preserve the original expected values).

Algorithm

Both variants share the same moment update rules:

$$m_t = \beta_1 m_{t-1} + (1 - \beta_1) g_t$$

$$v_t = \beta_2 v_{t-1} + (1 - \beta_2) g_t^2$$

$$\hat{m}_t = \frac{m_t}{1 - \beta_1^t}, \qquad \hat{v}_t = \frac{v_t}{1 - \beta_2^t}$$

They differ only in the parameter update step:

Adam — weight decay is absent (or equivalently, folded into $g_t$ as L2 regularisation, where it gets scaled down by $1/\sqrt{\hat{v}_t}$):

$$\theta_t = \theta_{t-1} - \alpha \cdot \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}$$

AdamW — weight decay is applied directly to $\theta_{t-1}$, fully decoupled from the adaptive scaling so its effect is constant regardless of gradient history:

$$\theta_t = \theta_{t-1} - \alpha \cdot \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon} - \alpha \lambda \theta_{t-1}$$

Setting $\lambda = 0$ in the AdamW update recovers Adam exactly, both mathematically and in the implementation (verified by
test_adamw_step_weight_decay_zero_matches_adam).

Type of change

Breaking change (fix or feature that would cause existing functionality to not work as expected)

Adam::step gains a required params: &[f64] argument. Existing callers must pass the current parameter slice; passing &vec![0.0; n] restores the original behaviour exactly.

Checklist

I ran cargo clippy --all -- -D warnings just before my last commit and fixed any issue that was found.
I ran cargo fmt just before my last commit.
I ran cargo test just before my last commit and all tests passed.
I added my algorithm to the corresponding mod.rs file within its own folder, and in any parent folder(s).
I added my algorithm to DIRECTORY.md with the correct link.
I checked CONTRIBUTING.md and my code follows its guidelines.

AliAlimohammadi · 2026-04-23T21:57:28Z

@siriak, this is ready to be merged.

codecov-commenter · 2026-04-23T21:58:30Z

Codecov Report

❌ Patch coverage is 73.83178% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.94%. Comparing base (e08f5a5) to head (96a8444).

Files with missing lines	Patch %	Lines
src/machine_learning/optimization/adam.rs	73.83%	28 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1037      +/-   ##
==========================================
- Coverage   96.01%   95.94%   -0.07%     
==========================================
  Files         392      392              
  Lines       29722    29806      +84     
==========================================
+ Hits        28537    28597      +60     
- Misses       1185     1209      +24

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR extends the existing Adam optimizer implementation to support decoupled weight decay (AdamW) via a new weight_decay: f64 field, and updates the optimizer API to accept the current parameter vector so AdamW can be applied correctly.

Changes:

Add weight_decay to Adam (default 0.0) to enable AdamW behavior when > 0.
Change Adam::step signature to step(&mut self, gradients: &[f64], params: &[f64]).
Update and expand tests to cover initialization, mismatch handling, and AdamW properties.

Comments suppressed due to low confidence (1)

src/machine_learning/optimization/adam.rs:105

step only asserts gradients.len() == params.len(), but it still indexes self.m[i]/self.v[i] based on gradients.len(). If the optimizer was constructed with a different params_len (or reused with different-sized parameter vectors), this will panic with an out-of-bounds access. Consider also asserting params.len() == self.m.len() (and self.v.len()) and update the panic message/doc accordingly.

    pub fn step(&mut self, gradients: &[f64], params: &[f64]) -> Vec<f64> {
        assert_eq!(
            gradients.len(),
            params.len(),
            "gradients and params must have the same length"
        );

        let mut updated_params = vec![0.0; params.len()];
        self.t += 1;

        for i in 0..gradients.len() {
            // update biased first moment estimate and second raw moment estimate
            self.m[i] = self.betas.0 * self.m[i] + (1.0 - self.betas.0) * gradients[i];
            self.v[i] = self.betas.1 * self.v[i] + (1.0 - self.betas.1) * gradients[i].powf(2f64);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

siriak

Looks good, thanks!

feat: Add decoupled weight decay (AdamW) to Adam optimizer

96a8444

AliAlimohammadi requested a review from imp2002 as a code owner April 23, 2026 21:56

siriak requested a review from Copilot April 24, 2026 07:37

Copilot started reviewing on behalf of siriak April 24, 2026 07:37 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Comment thread src/machine_learning/optimization/adam.rs

Comment thread src/machine_learning/optimization/adam.rs

Comment thread src/machine_learning/optimization/adam.rs

Comment thread src/machine_learning/optimization/adam.rs

siriak approved these changes Apr 24, 2026

View reviewed changes

siriak merged commit 3bccfa8 into TheAlgorithms:master Apr 24, 2026
11 checks passed

AliAlimohammadi deleted the add-adamw-weight-decay branch April 24, 2026 08:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add decoupled weight decay (AdamW) to Adam optimizer#1037

feat: Add decoupled weight decay (AdamW) to Adam optimizer#1037
siriak merged 1 commit intoTheAlgorithms:masterfrom
AliAlimohammadi:add-adamw-weight-decay

AliAlimohammadi commented Apr 23, 2026

Uh oh!

AliAlimohammadi commented Apr 23, 2026

Uh oh!

codecov-commenter commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

siriak left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

AliAlimohammadi commented Apr 23, 2026

Description

Algorithm

Type of change

Checklist

Uh oh!

AliAlimohammadi commented Apr 23, 2026

Uh oh!

codecov-commenter commented Apr 23, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

siriak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants