Releases: dgrauet/claude-skill-mlx-porting
v2.1.0 — Pitfall #8: Tekken / Pixtral tokenizer skips the BOS
Added
- Pitfall #8 — Tekken / Pixtral tokenizer skips the BOS. When wiring `mlx-lm` Mistral-family text encoders (Ministral3, Mistral Small 3, Pixtral) to a diffusion DiT, `add_special_tokens=True` does NOT auto-prepend `
`. Token 0 then enters the attention stack at an out-of-distribution magnitude and compounds layer-by-layer (diverges from HF transformers by 100× starting at layer 2 in the ERNIE-Image burn). The content tokens remain fine but the DiT receives conditioning it was not trained on.
The pitfall documents the symptom, the layer-by-layer measurement from the ERNIE-Image port that isolated it, and a one-line fix for the pipeline's `_tokenize` helper.
Why this matters
Pitfall #7 (checkerboard trap, shipped in v2.0.0) gave us the diagnostic procedure. Pitfall #8 is a follow-up trap that the same port surfaced. Both are now codified so future MLX diffusion ports using Mistral-family text encoders won't have to rediscover them.
Skill asset
The `mlx-porting.skill` artifact below is built by the release workflow (introduced in v2.0.0) and can be dropped straight into Claude Code via `/skill install` or unpacked into `~/.claude/skills/`.
Commits
- `2bad819` — feat(pitfalls): add #8 — Tekken/Pixtral tokenizer skips BOS
v2.0.0 — Rename to mlx-porting + checkerboard pitfall + CI
Breaking
- Skill directory renamed
porting-pytorch-to-mlx/→mlx-porting/. Same move inside the packaged.skilltarball. - Frontmatter
nameupdated accordingly:mlx-porting. - Existing v1.0.0 installs (the
~/.claude/skills/porting-pytorch-to-mlx/layout) will keep working but will NOT receive these updates — reinstall from source or download the newmlx-porting.skillartifact below.
Added
- Pitfall #7 — The checkerboard trap in
references/common-pitfalls.md. Covers the four recurring causes (mx.tilevsmx.repeat, pixel-shuffle axis order, text-encoderhidden_states[-2]off-by-one, scheduler dtype leaking fp32 into a bf16 DiT) and a three-test diagnostic procedure to run before shipping every port. - SKILL.md upgrades: new reading-time checklist bullet flagging the checkerboard trap, plus a caveat in Step 5 that small-scale random-weight parity is necessary but insufficient.
- Helpers in
scripts/parity_helpers.py: `detect_checkerboard(image)` (autocorrelation-based) and `noise_decode_check(decode_fn, shape)` to wire the diagnostic as a permanent smoke test. - GitHub Actions:
- `ci.yml` validates frontmatter, `evals.json` schema, cross-references, and python syntax on every push and PR, plus a `.skill` packaging smoke test.
- `release.yml` auto-builds `mlx-porting.skill` on tag push and attaches it to the release (this release is the first to use it).
- CI badge on the README.
Why this matters
Every MLX port I've shipped has hit a checkerboard-looking output at some point because layer-level parity passes with random weights at small scale but the bug only manifests at production scale. This release codifies the fix: the 3-test diagnostic catches 95% of the class in under 90 seconds, and the rule "never tweak sampling parameters to mask a spatial-operator bug" is now part of the checklist.
Asset
`mlx-porting.skill` below is produced by the new release workflow. Drop it into Claude Code via `/skill install mlx-porting.skill` or unpack under `~/.claude/skills/`.
v1.0.0 — Initial release
First public release of the porting-pytorch-to-mlx Claude Code skill.
Install
Download porting-pytorch-to-mlx.skill and drop it into Claude Code, or clone the repo and copy the source:
```bash
git clone https://github.com/dgrauet/claude-skill-mlx-porting.git
cp -r claude-skill-mlx-porting/porting-pytorch-to-mlx ~/.claude/skills/
```
What's included
- SKILL.md — 7-step porting workflow + six reading-time traps
- 6 reference files — MLX docs, common pitfalls, attention patterns, weight conversion, parity testing, repo layout
- scripts/parity_helpers.py — reusable PyTorch↔MLX helpers
- evals/evals.json — 5 representative test cases
Measured performance
- Triggering accuracy: 100% (precision + recall = 1.0 across 20 queries)
- Pass-rate lift vs baseline Opus 4.7: +10 to +25 percentage points on workflow-intensive tasks
See the README for full details.