Added Nesterov and Adam Optimizers #13718
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR implements three advanced optimizers using pure NumPy as part of the effort to add neural network optimizers to the repository:
This PR addresses part of issue #13662 - Add neural network optimizers module to enhance training capabilities
What does this PR do?
Adam Optimizer
Nesterov Accelerated Gradient (NAG)
Muon Optimizer
All implementations provide clean, educational code without external deep learning frameworks.
Implementation Details
Adam
Nesterov Accelerated Gradient
Muon
Features
✅ Complete docstrings with parameter descriptions for all three optimizers
✅ Type hints for all function parameters and return values
✅ Doctests for correctness validation
✅ Usage examples demonstrating optimizer behavior on optimization problems
✅ PEP8 compliant code formatting
✅ Pure NumPy implementations - no framework dependencies
✅ Configurable hyperparameters (learning rates, momentum, betas, epsilon, NS steps)
Testing
All doctests pass:
Linting passes:
Example outputs demonstrate proper convergence behavior for all optimizers.
References
Adam
Nesterov Accelerated Gradient
Muon
Why Combine These Three?
These optimizers represent the cutting edge of neural network optimization:
Together, they provide learners with a comprehensive view from established best practices (Adam) to innovative research directions (Muon).
Relation to Issue #13662
This PR continues the optimizer sequence outlined in #13662:
With this PR combined with the other optimizer PRs, the neural network optimizers module is now complete with 6 fundamental optimizers covering classical to cutting-edge optimization techniques.
Checklist
Summary
This PR provides three advanced optimizers representing both industry-standard techniques (Adam) and cutting-edge research (Muon), contributing significantly to the neural network optimizers module for educational purposes.
This PR along with the following PRs collectively addresses issue #13662:
Related PRs:
Fixes #13662