Support I64 token mappings in safetensors model loading by maxprogrammer007 · Pull Request #41 · MinishLab/model2vec-rs

maxprogrammer007 · 2026-05-22T03:47:35Z

This PR fixes model loading for vocab-quantized models whose mapping tensor is stored as I64 in the safetensors header. The loader previously assumed all mappings were 32-bit integers and decoded the raw bytes with chunks_exact(4), which produced incorrect token ID mappings for models like potion-code-16M.

Changes

Decode mapping based on t.dtype() instead of assuming I32
Support both I32 and I64 mappings
Return a clear error for unsupported mapping dtypes
Add a regression test covering both I32 and I64 decoding

Validation

cargo test decode_token_mapping --lib
cargo test quantized_models_match_float32

fixes #40

Pringled · 2026-05-23T10:19:47Z

@maxprogrammer007 thanks for fixing this. There was a small formatting issue which caused CI to fail, I've fixed it on your branch. Will release this soon!

maxprogrammer007 and others added 2 commits May 22, 2026 09:15

Update model.rs

2eba86b

fix CI

eb6d985

Pringled merged commit 5d5d425 into MinishLab:main May 23, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support I64 token mappings in safetensors model loading#41

Support I64 token mappings in safetensors model loading#41
Pringled merged 2 commits into
MinishLab:mainfrom
maxprogrammer007:main

maxprogrammer007 commented May 22, 2026

Uh oh!

Pringled commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maxprogrammer007 commented May 22, 2026

Changes

Validation

Uh oh!

Pringled commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants