Skip to content

fix(async_ckpt): import inspect in async_utils on core_r0.17.0#4597

Merged
ko3n1g merged 1 commit into
NVIDIA:core_r0.17.0from
ko3n1g:ko3n1g/fix/async-utils-import-inspect
May 4, 2026
Merged

fix(async_ckpt): import inspect in async_utils on core_r0.17.0#4597
ko3n1g merged 1 commit into
NVIDIA:core_r0.17.0from
ko3n1g:ko3n1g/fix/async-utils-import-inspect

Conversation

@ko3n1g
Copy link
Copy Markdown
Contributor

@ko3n1g ko3n1g commented May 4, 2026

Claude summary

What

Add the missing import inspect to megatron/training/async_utils.py on core_r0.17.0.

Why

init_persistent_async_worker calls inspect.signature(...) in three places (around lines 74, 84, 99) but inspect is not imported on this branch, causing a hard failure during initialize_megatron:

File "/opt/megatron-lm/megatron/training/async_utils.py", line 79, in init_persistent_async_worker
    if "mp_mode" not in inspect.signature(get_write_results_queue).parameters:
NameError: name 'inspect' is not defined. Did you forget to import 'inspect'?

The inspect.signature(...) usage was introduced by the NVRx async checkpoint compatibility backport (#4453, commit bb8e34cb86). On main, import inspect was already present from an earlier change, so the backport diff didn't surface the missing import — on core_r0.17.0 it slipped through.

Repro

Any test that exercises the persistent async checkpoint worker on core_r0.17.0, e.g. moe/gpt3_moe_mcore_te_tp4_ep2_etp2_pp2_resume_torch_dist_dist_optimizer — see https://github.com/NVIDIA/Megatron-LM/actions/runs/24986791266/job/73546733227.

Fix

One-line change: add import inspect to the imports.

The NVRx backport (NVIDIA#4453) introduced inspect.signature() calls in
init_persistent_async_worker but did not add the corresponding
`import inspect`. The import existed on main from an earlier change so
the backport diff did not surface it; on core_r0.17.0 it was missing,
causing a NameError when the persistent async checkpoint worker is
initialized.

Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g ko3n1g requested review from a team as code owners May 4, 2026 08:51
@ko3n1g
Copy link
Copy Markdown
Contributor Author

ko3n1g commented May 4, 2026

/ok to test

@ko3n1g ko3n1g added the Run CICD label May 4, 2026
@ko3n1g ko3n1g merged commit aa6dbb9 into NVIDIA:core_r0.17.0 May 4, 2026
57 of 61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant