Skip to content

[BUG] Model loading logic fails to filter weight shards correctly for models with non-standard naming (e.g., Mistral-Large-Instruct) #467

Description

@rubik-hua

The current weight loading mechanism in modeling_utils.py relies on a broad glob.glob("*.safetensors") search to identify weight files. This approach is brittle and fails when model checkpoints utilize non-standard naming conventions for their shards.
For instance, the Mistral-Large-Instruct model uses the consolidated-00001-of-00051.safetensors prefix format, rather than the standard model-00001-of-00051.safetensors. Without a strict filter, the loader might attempt to process unrelated files or fail to locate the correct sequence.

The loader should prioritize the model.safetensors.index.json manifest file. This JSON contains a definitive weight_map linking parameters to specific filenames. By parsing this map, we can extract the exact set of required files, ensuring robust loading regardless of the filename prefix. A fallback to the glob method can be retained for models lacking an index file.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions