Release v2.1.0 — Pitfall #8: Tekken / Pixtral tokenizer skips the BOS · dgrauet/claude-skill-mlx-porting

Added

Pitfall #8 — Tekken / Pixtral tokenizer skips the BOS. When wiring `mlx-lm` Mistral-family text encoders (Ministral3, Mistral Small 3, Pixtral) to a diffusion DiT, `add_special_tokens=True` does NOT auto-prepend ``. Token 0 then enters the attention stack at an out-of-distribution magnitude and compounds layer-by-layer (diverges from HF transformers by 100× starting at layer 2 in the ERNIE-Image burn). The content tokens remain fine but the DiT receives conditioning it was not trained on.

The pitfall documents the symptom, the layer-by-layer measurement from the ERNIE-Image port that isolated it, and a one-line fix for the pipeline's `_tokenize` helper.

Why this matters

Pitfall #7 (checkerboard trap, shipped in v2.0.0) gave us the diagnostic procedure. Pitfall #8 is a follow-up trap that the same port surfaced. Both are now codified so future MLX diffusion ports using Mistral-family text encoders won't have to rediscover them.

Skill asset

The `mlx-porting.skill` artifact below is built by the release workflow (introduced in v2.0.0) and can be dropped straight into Claude Code via `/skill install` or unpacked into `~/.claude/skills/`.

Commits

`2bad819` — feat(pitfalls): add #8 — Tekken/Pixtral tokenizer skips BOS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.1.0 — Pitfall #8: Tekken / Pixtral tokenizer skips the BOS

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Added

Why this matters

Skill asset

Commits

Uh oh!