You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Gate fp8 NaN gradient sanitization on quantization config
The NaN sanitization introduced for fp8 delayed-scaling FSDP ran on
every step regardless of quantization, adding a ~2-3% step-time
regression on non-fp8 workloads (per-float-grad jnp.nan_to_num
tree_map). The failure mode only occurs under fp8, so gate the block
on config.quantization in {"fp8", "fp8_full", "nanoo_fp8"}.
Non-fp8 workloads skip the tree_map entirely; fp8 behavior is
unchanged (verified: step 1 loss still finite under gpt3-52k + FSDP=8).
0 commit comments