[BUG] Model loading logic fails to filter weight shards correctly for models with non-standard naming (e.g., Mistral-Large-Instruct)


The current weight loading mechanism in `modeling_utils.py` relies on a broad `glob.glob("*.safetensors")` search to identify weight files. This approach is brittle and fails when model checkpoints utilize non-standard naming conventions for their shards.
For instance, the `Mistral-Large-Instruct` model uses the `consolidated-00001-of-00051.safetensors` prefix format, rather than the standard `model-00001-of-00051.safetensors`. Without a strict filter, the loader might attempt to process unrelated files or fail to locate the correct sequence.


The loader should prioritize the `model.safetensors.index.json` manifest file. This JSON contains a definitive `weight_map` linking parameters to specific filenames. By parsing this map, we can extract the exact set of required files, ensuring robust loading regardless of the filename prefix. A fallback to the glob method can be retained for models lacking an index file.

<img width="1831" height="350" alt="Image" src="https://github.com/user-attachments/assets/5d736138-54fe-47b3-b260-cfb44490a0f2" />



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Model loading logic fails to filter weight shards correctly for models with non-standard naming (e.g., Mistral-Large-Instruct) #467

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] Model loading logic fails to filter weight shards correctly for models with non-standard naming (e.g., Mistral-Large-Instruct) #467

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions