Skip to content

fix: find_best_model accepts named .pt files without epoch numbers#55

Merged
ivanbasov merged 2 commits into
NVIDIA:mainfrom
ivanbasov:fix/find-best-model-named-pt
Apr 8, 2026
Merged

fix: find_best_model accepts named .pt files without epoch numbers#55
ivanbasov merged 2 commits into
NVIDIA:mainfrom
ivanbasov:fix/find-best-model-named-pt

Conversation

@ivanbasov
Copy link
Copy Markdown
Member

Summary

Regression fix for #51. After the model rename, copying Ising-Decoder-SurfaceCode-1-Fast.pt into an output models/ dir and running WORKFLOW=inference via local_run.sh failed with:

Found 0 model files:
FileNotFoundError: No valid model checkpoint files found in .../models

Root cause: find_best_model had a hard-coded guard requiring filenames to start with PreDecoderModelMemory_ and to encode an epoch number in the third dot-separated segment. The renamed files satisfy neither condition.

Fix: When no epoch-numbered PreDecoderModelMemory_* checkpoints are found, fall back to any .pt file in the directory (sorted, last wins for determinism). The epoch-numbered logic for training checkpoints is unchanged.

Test plan

  • cp models/Ising-Decoder-SurfaceCode-1-Fast.pt outputs/predecoder_model_1/models/ && WORKFLOW=inference EXPERIMENT_NAME=predecoder_model_1 bash code/scripts/local_run.sh — should now print Found 1 model file(s): [*] Ising-Decoder-SurfaceCode-1-Fast.pt (epoch n/a) and load successfully
  • Training checkpoint workflow (epoch-numbered PreDecoderModelMemory_* files) unaffected

🤖 Generated with Claude Code

ivanbasov and others added 2 commits April 8, 2026 10:23
The old code required filenames to start with PreDecoderModelMemory_ and
encode an epoch number. After the model rename to Ising-Decoder-SurfaceCode-1-
{Fast,Accurate}.pt, copying one of these files into the models dir and running
inference via local_run.sh would fail with "Found 0 model files".

Fall back to any .pt file (sorted, last wins) when no epoch-numbered
PreDecoderModelMemory_ checkpoints are found in the directory.

Fixes regression reported in NVIDIA/Ising-Decoding#51

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ivanbasov ivanbasov force-pushed the fix/find-best-model-named-pt branch from 7ba3085 to 955688b Compare April 8, 2026 17:23
@ivanbasov ivanbasov requested a review from bmhowe23 April 8, 2026 17:24
Copy link
Copy Markdown
Collaborator

@bmhowe23 bmhowe23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ivanbasov.

@ivanbasov ivanbasov merged commit 2a4e0f4 into NVIDIA:main Apr 8, 2026
17 checks passed
@ivanbasov ivanbasov deleted the fix/find-best-model-named-pt branch April 8, 2026 18:29
ivanbasov added a commit that referenced this pull request Apr 10, 2026
* fix: find_best_model now accepts named .pt files without epoch numbers

The old code required filenames to start with PreDecoderModelMemory_ and
encode an epoch number. After the model rename to Ising-Decoder-SurfaceCode-1-
{Fast,Accurate}.pt, copying one of these files into the models dir and running
inference via local_run.sh would fail with "Found 0 model files".

Fall back to any .pt file (sorted, last wins) when no epoch-numbered
PreDecoderModelMemory_ checkpoints are found in the directory.

Fixes regression reported in #51

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: fix yapf formatting in find_best_model

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants