Skip to content

Releases: dgrauet/claude-skill-mlx-porting

v2.1.0 — Pitfall #8: Tekken / Pixtral tokenizer skips the BOS

20 Apr 12:00

Choose a tag to compare

Added

  • Pitfall #8 — Tekken / Pixtral tokenizer skips the BOS. When wiring `mlx-lm` Mistral-family text encoders (Ministral3, Mistral Small 3, Pixtral) to a diffusion DiT, `add_special_tokens=True` does NOT auto-prepend ``. Token 0 then enters the attention stack at an out-of-distribution magnitude and compounds layer-by-layer (diverges from HF transformers by 100× starting at layer 2 in the ERNIE-Image burn). The content tokens remain fine but the DiT receives conditioning it was not trained on.

The pitfall documents the symptom, the layer-by-layer measurement from the ERNIE-Image port that isolated it, and a one-line fix for the pipeline's `_tokenize` helper.

Why this matters

Pitfall #7 (checkerboard trap, shipped in v2.0.0) gave us the diagnostic procedure. Pitfall #8 is a follow-up trap that the same port surfaced. Both are now codified so future MLX diffusion ports using Mistral-family text encoders won't have to rediscover them.

Skill asset

The `mlx-porting.skill` artifact below is built by the release workflow (introduced in v2.0.0) and can be dropped straight into Claude Code via `/skill install` or unpacked into `~/.claude/skills/`.

Commits

  • `2bad819` — feat(pitfalls): add #8 — Tekken/Pixtral tokenizer skips BOS

v2.0.0 — Rename to mlx-porting + checkerboard pitfall + CI

20 Apr 00:23

Choose a tag to compare

Breaking

  • Skill directory renamed porting-pytorch-to-mlx/mlx-porting/. Same move inside the packaged .skill tarball.
  • Frontmatter name updated accordingly: mlx-porting.
  • Existing v1.0.0 installs (the ~/.claude/skills/porting-pytorch-to-mlx/ layout) will keep working but will NOT receive these updates — reinstall from source or download the new mlx-porting.skill artifact below.

Added

  • Pitfall #7 — The checkerboard trap in references/common-pitfalls.md. Covers the four recurring causes (mx.tile vs mx.repeat, pixel-shuffle axis order, text-encoder hidden_states[-2] off-by-one, scheduler dtype leaking fp32 into a bf16 DiT) and a three-test diagnostic procedure to run before shipping every port.
  • SKILL.md upgrades: new reading-time checklist bullet flagging the checkerboard trap, plus a caveat in Step 5 that small-scale random-weight parity is necessary but insufficient.
  • Helpers in scripts/parity_helpers.py: `detect_checkerboard(image)` (autocorrelation-based) and `noise_decode_check(decode_fn, shape)` to wire the diagnostic as a permanent smoke test.
  • GitHub Actions:
    • `ci.yml` validates frontmatter, `evals.json` schema, cross-references, and python syntax on every push and PR, plus a `.skill` packaging smoke test.
    • `release.yml` auto-builds `mlx-porting.skill` on tag push and attaches it to the release (this release is the first to use it).
  • CI badge on the README.

Why this matters

Every MLX port I've shipped has hit a checkerboard-looking output at some point because layer-level parity passes with random weights at small scale but the bug only manifests at production scale. This release codifies the fix: the 3-test diagnostic catches 95% of the class in under 90 seconds, and the rule "never tweak sampling parameters to mask a spatial-operator bug" is now part of the checklist.

Asset

`mlx-porting.skill` below is produced by the new release workflow. Drop it into Claude Code via `/skill install mlx-porting.skill` or unpack under `~/.claude/skills/`.

v1.0.0 — Initial release

19 Apr 16:51

Choose a tag to compare

First public release of the porting-pytorch-to-mlx Claude Code skill.

Install

Download porting-pytorch-to-mlx.skill and drop it into Claude Code, or clone the repo and copy the source:

```bash
git clone https://github.com/dgrauet/claude-skill-mlx-porting.git
cp -r claude-skill-mlx-porting/porting-pytorch-to-mlx ~/.claude/skills/
```

What's included

  • SKILL.md — 7-step porting workflow + six reading-time traps
  • 6 reference files — MLX docs, common pitfalls, attention patterns, weight conversion, parity testing, repo layout
  • scripts/parity_helpers.py — reusable PyTorch↔MLX helpers
  • evals/evals.json — 5 representative test cases

Measured performance

  • Triggering accuracy: 100% (precision + recall = 1.0 across 20 queries)
  • Pass-rate lift vs baseline Opus 4.7: +10 to +25 percentage points on workflow-intensive tasks

See the README for full details.