Added
- Pitfall #8 — Tekken / Pixtral tokenizer skips the BOS. When wiring `mlx-lm` Mistral-family text encoders (Ministral3, Mistral Small 3, Pixtral) to a diffusion DiT, `add_special_tokens=True` does NOT auto-prepend `
`. Token 0 then enters the attention stack at an out-of-distribution magnitude and compounds layer-by-layer (diverges from HF transformers by 100× starting at layer 2 in the ERNIE-Image burn). The content tokens remain fine but the DiT receives conditioning it was not trained on.
The pitfall documents the symptom, the layer-by-layer measurement from the ERNIE-Image port that isolated it, and a one-line fix for the pipeline's `_tokenize` helper.
Why this matters
Pitfall #7 (checkerboard trap, shipped in v2.0.0) gave us the diagnostic procedure. Pitfall #8 is a follow-up trap that the same port surfaced. Both are now codified so future MLX diffusion ports using Mistral-family text encoders won't have to rediscover them.
Skill asset
The `mlx-porting.skill` artifact below is built by the release workflow (introduced in v2.0.0) and can be dropped straight into Claude Code via `/skill install` or unpacked into `~/.claude/skills/`.
Commits
- `2bad819` — feat(pitfalls): add #8 — Tekken/Pixtral tokenizer skips BOS