Skip gradient updates when gradients contain NaN or Inf #1081

ChrisRackauckas-Claude · 2025-10-24T13:06:51Z

Summary

This PR addresses the issue reported in the Julia Discourse thread about handling NaN gradients during optimization with minibatches. When training neural networks with minibatches, some batches may produce NaN or Inf gradients due to numerical instability. Previously, applying these invalid gradients would corrupt all subsequent parameter updates, preventing the optimizer from converging.

Changes

Implementation

Added has_nan_or_inf() helper function that uses Functors.fmap to recursively check all gradient elements
Modified the optimization loop in OptimizationOptimisers.jl to skip parameter updates when NaN/Inf is detected
Iteration counter still increments even when updates are skipped (as per requirement)
Added warning message (with maxlog=10) to inform users when gradients are skipped

Dependencies

Added Functors as a dependency for robust NaN/Inf checking across arbitrary nested structures

Tests

Added comprehensive test suite that injects NaN and Inf values via callback
Verifies that optimization completes all iterations without crashes
Verifies that parameters remain finite even when encountering bad gradients
Tests both NaN and Inf cases separately

Fixes

Addresses: https://discourse.julialang.org/t/how-to-ignore-minibatches-with-nan-gradients-optimizing-a-hybrid-lux-model-using-optimization-jl/132615

Testing

All existing tests pass. New tests verify:

Optimizer completes all iterations when encountering NaN gradients
Optimizer completes all iterations when encountering Inf gradients
Parameters remain finite (not NaN/Inf) after optimization
Iteration counter increments correctly

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

- Add check before Optimisers.update to detect NaN/Inf in gradients - Skip update but still increment iteration counter when detected - Add warning message (maxlog=10) to inform users - Fixes issue where NaN gradients corrupt all subsequent updates Addresses: https://discourse.julialang.org/t/how-to-ignore-minibatches-with-nan-gradients-optimizing-a-hybrid-lux-model-using-optimization-jl/132615

- Test with custom gradient function that injects NaN periodically - Test with custom gradient function that injects Inf periodically - Verify iterations complete and parameters remain finite - Verify optimizer doesn't crash when encountering bad gradients

Use any(isnan, G) || any(isinf, G) instead of lambda function to correctly handle array elements and hierarchical structures

Use any(.!(isfinite.(G))) to properly handle arrays with broadcasting

- Add has_nan_or_inf() helper function - Uses Functors.fmap to recursively check all elements - Handles arbitrary nested structures (arrays, ComponentArrays, etc.) - Checks if any element is not finite (catches both NaN and Inf)

- Use Zygote for gradient computation - Inject NaN/Inf via callback that modifies state.grad - This better simulates real-world scenarios where autodiff produces NaN - Avoids issues with custom gradient function signatures

- Apply SciMLStyle formatting to OptimizationOptimisers.jl - Remove JuliaFormatter from runtime dependencies

- Remove Functors dependency and use simple all(isfinite, G) check - Make warning conditional on cache.progress flag - Rewrite tests to use functions that return NaN/Inf in certain regions instead of callback-based approach

- Use sqrt and max to produce NaN when x goes negative - Use 1/x pattern to produce Inf gradients - Functions naturally produce problematic gradients during optimization

SebastianM-C · 2025-10-24T15:37:47Z

lib/OptimizationOptimisers/Project.toml

-SciMLBase = "0bca4576-84f4-4d90-8ffe-ffa030f20462"
+Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
 Optimisers = "3bd65402-5787-11e9-1adc-39752487f4e2"
+Optimization = "7f7a1694-90dd-40f0-9382-eb1efda571ba"


This should not be here 😅

#1083 will fix it

ChrisRackauckas added 10 commits October 24, 2025 08:46

Fix NaN/Inf check to handle arrays properly

8fd8cef

Use any(isnan, G) || any(isinf, G) instead of lambda function to correctly handle array elements and hierarchical structures

Use broadcasting for NaN/Inf check

1aa8780

Use any(.!(isfinite.(G))) to properly handle arrays with broadcasting

Add robust NaN/Inf checker using Functors.fmap

5a2d692

- Add has_nan_or_inf() helper function - Uses Functors.fmap to recursively check all elements - Handles arbitrary nested structures (arrays, ComponentArrays, etc.) - Checks if any element is not finite (catches both NaN and Inf)

Add Functors as dependency for NaN/Inf checking

0cf9eec

Fix test to use callback approach for NaN/Inf injection

cc69c5d

- Use Zygote for gradient computation - Inject NaN/Inf via callback that modifies state.grad - This better simulates real-world scenarios where autodiff produces NaN - Avoids issues with custom gradient function signatures

Apply JuliaFormatter with SciMLStyle and fix Project.toml

0295dcf

- Apply SciMLStyle formatting to OptimizationOptimisers.jl - Remove JuliaFormatter from runtime dependencies

Address PR feedback

9ae7839

- Remove Functors dependency and use simple all(isfinite, G) check - Make warning conditional on cache.progress flag - Rewrite tests to use functions that return NaN/Inf in certain regions instead of callback-based approach

Update test functions to produce NaN/Inf gradients naturally

16140e6

- Use sqrt and max to produce NaN when x goes negative - Use 1/x pattern to produce Inf gradients - Functions naturally produce problematic gradients during optimization

ChrisRackauckas approved these changes Oct 24, 2025

View reviewed changes

ChrisRackauckas merged commit 7340165 into SciML:master Oct 24, 2025
68 of 81 checks passed

SebastianM-C reviewed Oct 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Skip gradient updates when gradients contain NaN or Inf #1081

Skip gradient updates when gradients contain NaN or Inf #1081

Uh oh!

ChrisRackauckas-Claude commented Oct 24, 2025

Uh oh!

Uh oh!

SebastianM-C Oct 24, 2025

Uh oh!

ChrisRackauckas Oct 24, 2025

Uh oh!

SebastianM-C Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Uh oh!

Skip gradient updates when gradients contain NaN or Inf #1081

Skip gradient updates when gradients contain NaN or Inf #1081

Uh oh!

Conversation

ChrisRackauckas-Claude commented Oct 24, 2025

Summary

Changes

Implementation

Dependencies

Tests

Fixes

Testing

Uh oh!

Uh oh!

SebastianM-C Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

ChrisRackauckas Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

SebastianM-C Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants