Skip to content

chore(dev→main): routine PR cadence for CodeRabbit review#3

Merged
Snider merged 89 commits intomainfrom
dev
Apr 27, 2026
Merged

chore(dev→main): routine PR cadence for CodeRabbit review#3
Snider merged 89 commits intomainfrom
dev

Conversation

@Snider
Copy link
Copy Markdown
Contributor

@Snider Snider commented Apr 27, 2026

Routine dev→main PR opened by Hephaestus PR-cadence task.

This PR exists to:

  1. Trigger CodeRabbit auto-review of accumulated dev work
  2. Surface any blocking review feedback before merge
  3. Keep main current with dev once approved

ahead_by: 86 commit(s) (per gh api compare)

If CodeRabbit clears with no blocking comments, this PR is eligible for gh pr merge --merge (real merge commit, no force-push). Conflicts and review feedback should be addressed on the dev branch before merge.

Co-authored-by: Hephaestus hephaestus@cladius

Summary by CodeRabbit

  • New Features

    • GGUF checkpoint support, Gemma 4 (text + vision), frame-oriented compute sessions with Metal kernels, LoRA training and in-place merge, direct root API for model load/generate.
  • Module Changes

    • Module path moved to dappco.re/go/mlx; native libraries now integrated as git submodules.
  • Improvements

    • CPU/GPU device selection with CPU fallback, improved memory/cache lifecycle, sampler chain refinements (MinP), build/docs updated to require submodule init.

Snider and others added 30 commits April 7, 2026 16:48
Replace all 22 forge.lthn.ai/core/go-{inference,io,log} imports with
dappco.re/go/core/{inference,io,log}. Bump deps: inference v0.3.0,
io v0.4.1, log v0.1.2. Regenerate go.sum.

Co-Authored-By: Virgil <virgil@lethean.io>
…re primitives

Replaced fmt, strings, sort, os, io, sync, encoding/json, path/filepath,
errors, log, reflect with core.Sprintf, core.E, core.Contains, core.Trim,
core.Split, core.Join, core.JoinPath, slices.Sort, c.Fs(), c.Lock(),
core.JSONMarshal, core.ReadAll and other CoreGO v0.8.0 primitives.

Framework boundary exceptions preserved where stdlib types are required
by external interfaces (Gin, net/http, CGo, Wails, bubbletea).

Co-Authored-By: Virgil <virgil@lethean.io>
- internal/metal/gemma4.go: config + model for gemma4 and gemma4_text —
  wrapper weight sanitisation, dense and MoE decoder execution, mixed
  head sizes (256/512), proportional RoPE per attention type, K=V full
  attention, V RMSNormNoScale, shared-KV reuse, per-layer input gating,
  cache allocation, logit softcapping, LoRA attachment
- fast.go / ops.go / nn.go: low-level ops required by the backend
  (gather_qmm, primitive extensions)
- model.go / generate.go / close.go: wire model loading, chat
  formatting, metadata and cleanup for the new arch
- gemma4_test.go: covers config, weight sanitisation, and the
  compile-only surfaces (runtime Metal allocation gated behind
  GO_MLX_RUN_METAL_TESTS=1)

Co-Authored-By: Virgil <virgil@lethean.io>
- gemma4.go: attention scales by effective per-layer head size; MoE
  configs default to 128 experts / top-8 when MoE enabled; router
  fallback handling hardened
- lora.go: adapter resolution understands Gemma 4 targets so
  WithAdapterPath(...) can resolve Gemma 4 projections
- optim.go + training.go: NewAdamW accepts either a learning rate or
  AdamWConfig (RFC API shape)
- tokenizer.go: exposes BOS, EOS, TokenID, IDToken
- tests cover all new paths (Good triads)
- docs/*: README and docs refreshed — Gemma 4 mention, current
  dappco.re/... module path

Co-Authored-By: Virgil <virgil@lethean.io>
- gemma4.go: router inputs now use pre_feedforward_layernorm_2 (the
  MoE normalised path) instead of the dense path; matches Gemma 4
  architecture where router and experts branch from a normalised
  hidden state
- internal/metal/training.go: exposes internal tokenizer from
  metal.Model so higher layers can tokenise without reaching through
  unexported fields
- register_metal.go / training.go / mlx_stub.go: nomlx build-tag
  behaviour tightened; stub paths now fail cleanly with the same
  error shape as the metal path
- tests: attention_test + mlx_test updated for the new behaviour

Co-Authored-By: Virgil <virgil@lethean.io>
- lora.go: LoRAConfig accepts RFC aliases (Scale, TargetLayers, Lambda);
  Batch and TrainConfig exported; LoRAAdapter.Step(batch, targets,
  optimizer) runs padded masked cross-entropy training step through the
  existing autodiff/AdamW path
- gemma3.go / gemma4.go / qwen3.go: adapters now bind back to their
  model instances so loss computation sees the adapted projections
- api_common.go / api_darwin.go / api_stub.go: RFC-facing API surface
  stitched together with nomlx fallback
- training.go: exported types for consumer packages
- lora_test.go: alias + batch length coverage

Remaining RFC gap: GGUF loading (repo still safetensors-only).

Co-Authored-By: Virgil <virgil@lethean.io>
- GGUF support wired in: model_files.go accepts safetensors or GGUF,
  gguf.go + gguf_bridge.cpp bridge to MLX, CMake + mlx_build_config.h
  enable GGUF on macOS, vendor lib/gguflib/ added for the read path
- lora_merge.go: real in-place merge/dequantize/materialize flow
  (replaces the stub path), surfaced via api_darwin.go
- sample.go: sampler chain reordered to match RFC
- training_stub.go + api_stub.go: nomlx fallback for non-darwin
- Tests: sample_test chain order, gemma4_test GGUF load, api_test
  for generate options + buffered output

Submodule pointer (lib/mlx no_gguf.cpp guard) intentionally left out —
separate commit after verification.

Co-Authored-By: Virgil <virgil@lethean.io>
- gemma4.go: parser accepts nested text_config.quantization and nested
  layer_types (multimodal Gemma 4 releases wrap these under text_config)
- gemma4.go: weight sanitisation canonicalises language_model.*
  wrapper prefixes BEFORE stripping multimodal weights and splitting
  experts.gate_up_proj — upstream HF checkpoints ship with this prefix
- gemma4_test.go: regression coverage for both paths

Co-Authored-By: Virgil <virgil@lethean.io>
- model.go loadModel falls back to nested text_config.model_type and
  architectures when top-level model_type is absent — matches
  multimodal wrapper checkpoints where the LM config is nested
- gemma4.go LoRA application now also covers router.proj,
  per_layer_input_gate, per_layer_projection (matching the existing
  adapter resolution surface)
- regression tests for both paths

Co-Authored-By: Virgil <virgil@lethean.io>
Pass #8 — 12 files, ~236/+47 lines. Implementation notes in the
codex log at /tmp (not persisted). Build green on compile-only pass;
runtime metal tests deferred to on-device runs.

Co-Authored-By: Virgil <virgil@lethean.io>
- MetalAvailable() and backend.Available() now use the real MLX
  runtime probe instead of always returning true; model loading now
  fails cleanly when Metal isn't actually available
- training_stub.go: non-Metal builds now expose the full training /
  array / LoRA / autodiff API surface (was partial) so cross-platform
  consumers get a consistent type surface
- unsupported_stub_test.go: compile-check ensures non-Metal builds
  present the same package API
- internal/metal/device.go and register_metal_stub.go: adjoining
  stubs for the probe/stub paths

Co-Authored-By: Virgil <virgil@lethean.io>
- gemma4.go: experts.gate_up_proj sanitisation now splits along the
  correct feature axis for 2D expert tensors too, fixing
  bias/scales handling during weight remapping
- gemma4.go: MoE decoder path matches RFC branch structure — each
  branch normalised independently and summed into residual
  directly, no extra combined post-feedforward norm
- gemma4_test.go: regression for 2D gate_up_proj.biases split

Co-Authored-By: Virgil <virgil@lethean.io>
- api_darwin.go: (*Model).Close() now clears the model and tokenizer
  handles before returning — close is idempotent, prevents reuse of
  freed state
- api_test.go: regression for repeated close + close-error behaviour
- mlxlm/bridge.py: Python subprocess bridge accepts min_p and passes
  it through to mlx_lm.sample_utils.make_sampler

Min-P not yet wired through the metal adapter — dappco.re/go/core/
inference doesn't expose WithMinP, deferred until the inference iface
adds it.

Co-Authored-By: Virgil <virgil@lethean.io>
- internal/metal/generate.go: metal.ModelInfo gains ContextLength;
  derived from loaded model config, explicit load override wins
- internal/metal/generate.go: Close() now clears the MLX allocator
  cache after freeing weights — matches RFC memory-release contract
- api_darwin.go: ModelInfo returns effective context length from
  native info instead of echoing only the caller's override
- api_test.go: covers both override and native fallback

Co-Authored-By: Virgil <virgil@lethean.io>
- gemma4.go parseGemma4Config: num_kv_shared_layers defaults to RFC
  value 20 when the field is omitted; explicit 0 still honoured
- gemma4.go parseGemma4Config: partial rope_parameters now merged
  with RFC defaults for full_attention and sliding_attention
  branches, instead of leaving missing fields at zero
- gemma4_test.go: default shared-KV, explicit zero override, partial
  RoPE merging coverage

Co-Authored-By: Virgil <virgil@lethean.io>
- api_darwin.go: Model.Info() now reports the requested context length
  (from WithContextLength) rather than only the model's native max
- options_darwin.go (new): centralised generation option translation
- register_metal.go: inference backend forwards generation options
  consistently across Generate, Chat, Classify, and BatchGenerate via
  the new options layer
- api_test.go: coverage for the translation

Co-Authored-By: Virgil <virgil@lethean.io>
Cache ownership is now planned explicitly with a CacheIndexByLayer
map: a shared layer that is the first instance of its attention type
is promoted to a real KV owner and gets a stable cache slot. NewCache()
allocates from the ownership plan instead of assuming every KV owner
lives in a simple prefix.

- gemma4.go: CacheIndexByLayer, BuildCacheLayout, promoted-owner path
- gemma4_test.go: NewCache_PromotedOwner Good coverage while keeping
  the normal shared-layer behaviour intact

Co-Authored-By: Virgil <virgil@lethean.io>
- gemma4.go: when tie_word_embeddings=false and lm_head.weight is
  missing, loader errors instead of masking the problem by reusing
  embed_tokens weights. Dedicated output-projection helper builds the
  untied path.
- LoadGemma4 wires the helper in
- gemma4_test.go: regression for missing-lm_head error behaviour

Co-Authored-By: Virgil <virgil@lethean.io>
Weight sanitisation now frees discarded raw tensors immediately when
stripping multimodal wrapper weights, splitting experts.gate_up_proj,
or replacing canonical duplicates. Keeps load-time memory bounded
instead of waiting for Go GC — matters for large Gemma 4 wrappers.

- gemma4.go: sanitizeWeights now calls .Free() on discarded tensors
  in the three spots where originals are superseded
- gemma4_test.go: regression verifying the canonical path

Co-Authored-By: Virgil <virgil@lethean.io>
- lora.go: LoRAAdapter.Step now applies adapter-weight regularisation
  via LoRAConfig.Lambda — previously the field was accepted but
  ignored, training proceeded as if Lambda=0
- lora.go: free intermediate logits handle inside loss closure to
  avoid holding extra graph memory during training
- lora_test.go: Regularization + Step_AppliesLambdaRegularization
  triads (Metal-runtime ones gated behind GO_MLX_RUN_METAL_TESTS)

Co-Authored-By: Virgil <virgil@lethean.io>
…ayers

InspectAttention assumed cache_index == layer_index, which was wrong
for Gemma 4 shared layers and promoted owners (the cache layout uses
CacheIndexByLayer to map). Inspection results were attributed to the
wrong cache slot for those layers.

- generate.go: resolve each layer to its actual cache owner via the
  layout map before inspecting
- generate_test.go (new): regression for the shared/promoted resolution

Co-Authored-By: Virgil <virgil@lethean.io>
…en match

- tokenizer.go: SentencePiece encode now inserts the ▁ boundary marker
  for every word, not just at the start of the whole segment. Round-trip
  for multi-word prompts like "hello hello" now produces stable tokens.
- tokenizer.go: special-token matching is deterministic and longest-
  first — overlapping tokens (e.g. <|im_start|> vs <|im_>) resolve to
  the longer match consistently
- options_darwin.go: backend option bridge forward-compat for a future
  inference.GenerateConfig.MinP field — MinP propagates if/when the
  shared inference iface exposes it
- tokenizer_test.go: per-word boundary + special-token longest-first
  regressions

Co-Authored-By: Virgil <virgil@lethean.io>
Three substantive additions matching the go-ml consumer contract:

- adapter.go: Message, GenOpts, Result, TokenCallback, InferenceAdapter
  with NewInferenceAdapter + NewMLXBackend. Buffered Generate/Chat,
  streaming callbacks, model access, close, attention passthrough
- gguf_info.go: pure-Go GGUF inspection + discovery — GGUFInfo,
  DiscoveredModel, ReadGGUFInfo, DiscoverModels. Parses GGUF headers/
  metadata directly, infers quant bits from tensor types, falls back
  to nearby config.json, returns loadable paths for safetensors dirs
  and .gguf files
- medium.go + api_common.go + api_darwin.go: WithMedium option +
  LoadModelFromMedium. Models staged into temp dir before native
  load; staged copy cleaned up on Close
- adapter_test, gguf_info_test, api_test: coverage for the new
  surface; unsupported_stub_test gets the new exports too

Closes the ScoreExact / ReadGGUFInfo / DiscoverModels MVP gaps from
sprint-beta1-plan.md. HTTP / llama / scoring / orchestrator still
belong in core/go-ml.

Co-Authored-By: Virgil <virgil@lethean.io>
- lora.go: Save now writes weights plus adjacent adapter_config.json;
  load derives rank, alpha, num_layers, and target layer paths from
  the saved adapter so it round-trips through the existing applyLoaded
  LoRA loader
- lora_test.go: config emission, directory saves, save→reload round-
  trip through applyLoadedLoRA

Co-Authored-By: Virgil <virgil@lethean.io>
- api_darwin.go: Encode strips the implicit BOS token; TokenID falls
  back from plain text ("hello") to a single-token encoded lookup
  when the internal vocab stores SentencePiece forms ("▁hello");
  IDToken returns decoded text where possible
- api_tokenizer_test.go (new): implicit-BOS stripping, explicit
  special-token preservation, normalised SentencePiece lookups

Co-Authored-By: Virgil <virgil@lethean.io>
- internal/metal/*.go guarded with darwin && arm64 && !nomlx build constraint
  — nomlx now excludes the native backend module-wide, not just stubs root pkg
- gemma4.go: explicit VNorm RMSNormModule state on Gemma4Attention per spec

Verified: go test . ./mlxlm (native) + go test -tags nomlx ./... (exclusion)

Co-Authored-By: Virgil <virgil@lethean.io>
- mlxlm/backend.go: forward min_p for Generate + Chat via reflection-based
  field lookup (same pattern as native backend) — stays compatible with older
  go/inference versions

Verified: GOWORK=off go test . ./mlxlm passes

Co-Authored-By: Virgil <virgil@lethean.io>
- tokenizer.go: hasBOS/hasEOS flags — token ID 0 no longer treated as phantom
  special token when tokenizer never declared BOS/EOS
- api_darwin.go: root-tokenizer BOS stripping respects presence flag
- generate.go + batch.go: EOS stop checks respect presence flag
- tests: regression coverage for no-special-tokens + BOSEOS round-trip

Verified: targeted go test runs pass (internal/metal + root tokenizer)

Co-Authored-By: Virgil <virgil@lethean.io>
- internal/tokenizer/: standalone pure-Go tokenizer (SentencePiece-style)
- tokenizer_common.go: shared root wrapper — BOS/EOS accessors, Encode/Decode
- api_tokenizer_darwin.go: native MLX loader (darwin && arm64 && !nomlx)
- api_tokenizer_stub.go: pure-Go fallback for nomlx / non-arm64 / non-darwin
- api_tokenizer_test.go: widened to run on all build targets

Per RFC: tokenizer must work on all platforms, not just native MLX path.
CI/cross-compile/ROCm builds now have tokenizer support.

Verified: GOWORK=off go test -tags nomlx ./... passes.
Native path not re-verified (pulls full MLX C++ build).

Co-Authored-By: Virgil <virgil@lethean.io>
- gemma4.go: dense and expert branches combined and then passed through
  post_feedforward_layernorm before the residual add (was applying layernorm
  before combine, mismatching spec)
- gemma4_test.go: MoE block coverage + per-layer input loading/forward execution

Verified: go test ./internal/metal -run 'Gemma4|AttentionCacheIndexByLayer' + go test . passed

Co-Authored-By: Virgil <virgil@lethean.io>
Snider and others added 18 commits April 15, 2026 04:22
Adds tests/cli/mlx/Taskfile.yaml with canonical build/test/vet/default
targets plus go-mlx-specific subtasks (build-mlx, build-stub, test-mlx,
mlxlm-build) covering the CGO Metal backend, stub build, and mlxlm
subprocess backend.

default deps-chains build/test/vet per Wave 2 convention.

Co-authored-by: Codex <noreply@openai.com>
Via-codex-lane: supervised by Cerberus on Athena #101 request
Closes tasks.lthn.sh/view.php?id=256
internal/metal/gemma4.go used strings.{HasPrefix,HasSuffix,TrimPrefix,
TrimSuffix,Contains} for weight key sanitisation. Non-CGO file with no
WASM exemption — AX-6 banned-imports rule applies. Swapped to core.*
equivalents; dropped the strings import.

Closes tasks.lthn.sh/view.php?id=251

Co-authored-by: Codex <noreply@openai.com>
internal/metal/model.go used strings.{HasPrefix,TrimPrefix,Contains}
for weight name path resolution and model-type switch matching. Swapped
to core.{HasPrefix,TrimPrefix,Contains}; dropped the strings import.

Closes tasks.lthn.sh/view.php?id=252

Co-authored-by: Codex <noreply@openai.com>
… (AX-6)

mlxlm/backend.go is the subprocess-based Python inference bridge:
- os/exec → core.Process().Start/Run via mlxlmCore
- sync.Mutex/RWMutex → Core named locks (core.New().Lock("mlxlm.X"))
- bufio.Scanner JSONL reader → bounded line reader over io.Reader

Verification: go vet + go test ./mlxlm/... pass.

Follow-up (out of this ticket's scope): the bounded line reader uses
raw io.Reader — also AX-6 flagged but not in #253's allowlist. Separate
ticket to route through coreio.Medium.

Closes tasks.lthn.sh/view.php?id=253

Co-authored-by: Codex <noreply@openai.com>
go.mod + 30 *.go files updated: dappco.re/go/core/{inference,io} →
dappco.re/go/{inference,io}.

Closes tasks.lthn.sh/view.php?id=596

Co-authored-by: Codex <noreply@openai.com>
- Bump dappco.re/go/* deps to v0.8.0-alpha.1 in go.mod (any forge.lthn.ai/core/* paths migrated to canonical dappco.re/go/* form)

Co-Authored-By: Athena <athena@lthn.ai>
errors is structural for sentinel error declaration in tokenizer;
core.E is downstream of go-mlx.

Co-authored-by: Codex <noreply@openai.com>
Replaced errors.New → core.NewError, strings.ToLower(strings.TrimSpace(...))
→ core.Lower(core.Trim(...)). Annotated retained iter.Seq + time.Duration
public API types as AX-6 structural.

Co-authored-by: Codex <noreply@openai.com>
…#594)

internal/metal/gemma4_vision.go (1246 lines, new):
- SigLIP-style vision tower: patch embedder, encoder layers, 2-D RoPE,
  pooling, standardize path
- Projector with weight-prefix extraction supporting:
  vision_tower.*, vision_model.*, embed_vision.*, multi_modal_projector.*
- ForwardMultiModal: replaces image token embeddings with projected
  vision soft tokens

internal/metal/gemma4.go: config/load/cleanup wiring for vision tower
internal/metal/close.go: cleanup hooks for vision components
internal/metal/ops.go: Conv2d primitive (NHWC input, [out_ch,kh,kw,in_ch] weight)
internal/metal/gemma4_test.go: parser + sanitizer coverage for vision configs

Verification: gofmt + diff --check pass. go test ./internal/metal is
blocked by missing vendored MLX sources (lib/mlx, mlx/c/private/mlx.h)
in this checkout — full test will run on a developer machine with
the proper Apple Metal stack.

Co-authored-by: Codex <noreply@openai.com>
Closes tasks.lthn.sh/view.php?id=594
Per OFM B2 (Clotho audit): ScoreContent local_native path requires a
sidecar daemon binding $XDG_RUNTIME_DIR/ofm/violet.sock that PHP can
speak to over UDS. Process-local, no HTTP — keeps inference traffic
off the network, no auth/TLS overhead.

Lands:
* pkg/daemon/server.go — UDS Server, JSON-line protocol, 0600 socket
  perms (filesystem-permission auth), graceful shutdown via ctx
* pkg/daemon/dispatch.go — action registry mapping embed/score/
  generate/info to handlers; stub responses; info reports daemon
  name + version + supported actions
* cmd/violet/main.go — wires Server, reads model paths from
  $XDG_CONFIG_HOME/ofm/violet.toml or env-var fallback, help/version
  handling
* tests/cli/violet/main.go — AX-10 CLI smoke: spawn daemon → connect
  → send {"action":"info"} → assert response shape
* Unit tests cover server bind/accept/dispatch and dispatch registry

Acceptance: socket bind works, smoke `printf '%s\n' '{"action":"info"}' |
nc -U <sock>` returned `{"actions":[...],"name":"violet","version":"dev"}`.

Architectural defaults documented in code: single binary, load-once-
at-startup (restart to swap), response-complete only (streaming TODO).

GOWORK=off go test/build pass. Plain go build blocked by go.work not
including this worktree.

Co-authored-by: Codex <noreply@openai.com>
Closes tasks.lthn.sh/view.php?id=1016
… (#255)

Per RFC §4 memory management: runtime.GC() is required at specific points
in the model lifecycle (CGO MLX cleanup), but AX-6 bans runtime in
consumer packages.

Resolution: Option A — wrap in a public mlx.GC() helper, scope the runtime
import to one internal file.

Lands:
* mlx.go — exports public GC() that delegates to metal.RuntimeGC()
* internal/metal/gc.go — // AX-6-exception: runtime import scoped here
  so consumers can call mlx.GC() instead of runtime.GC() directly
* internal/metal/gc_test.go — TestRuntimeGC_{Good,Bad,Ugly} per AX-10

Note: pre-existing runtime imports in internal/metal/{gguf,export,io,
io_custom,metal_kernel,grad,array,metal}.go remain — they're CGO lifecycle
adjacent and out of this lane's allowlist. mlx/cgo build itself is blocked
in sandbox (missing mlx/c/mlx.h headers); supervisor runs full build at
merge.

Co-authored-by: Codex <noreply@openai.com>
Closes tasks.lthn.sh/view.php?id=255
internal/metal/gemma4_vision.go:
- Gemma4VisionConfig (image_size, patch_size, num_channels, hidden_size,
  num_hidden_layers, num_attention_heads, intermediate_size, layer_norm_eps)
- Gemma4VisionLayer + MultiModalProjector type aliases
- Post-layernorm support, channel-aware patch prep
- Variadic Gemma4VisionEncoder.Forward
- Multimodal forward path: encodes images, projects vision tokens,
  injects into image-token positions in text embedding stream

internal/metal/model.go:
- Split gemma4_text vs gemma4 routing
- text path unloads vision sidecars
- multimodal path retains vision tower + projector discovered at load

Codex notes: gofmt clean; cgo build blocked in sandbox (no MLX C
headers — fatal error: 'mlx/c/mlx.h' file not found). Operators verify
on Apple Silicon at runtime where MLX headers are present.

Closes tasks.lthn.sh/view.php?id=254

Co-authored-by: Codex <noreply@openai.com>
# Conflicts:
#	api_common.go
#	training_stub.go
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

📝 Walkthrough

Walkthrough

Adds a root-package MLX API (model loading, generation, tokenizer, compute sessions), GGUF/safetensors discovery and I/O, Gemma4/Qwen loaders, Metal-backed frame compute with kernels and metrics, LoRA training/merge and adapter APIs, extensive internal Metal/MLX C/C++ integration and forwarding units, CLI, submodule integration, docs and tests.

Changes

Cohort / File(s) Summary
Module & Build
go.mod, .gitignore, .gitmodules, CMakeLists.txt, cpp/CMakeLists.txt
Module path changed to dappco.re/go/mlx; added .gitmodules for lib submodules; updated .gitignore to exclude lib/mlx-c/examples/; enabled MLX_BUILD_GGUF in CMake.
Documentation & README
README.md, docs/*
Updated import paths to dappco.re, added Compute Guide and submodule build instructions, Quick Start and model/docs updated for GGUF/safetensors and Gemma4, bumped Go version in README.
Root API Types & Shape Helpers
api_common.go, api_shape_common.go, api_shape_test.go, attention_snapshot_test.go
New root types/options for load/generate/attention, option helpers, shape/int normalisers and tests.
Darwin Metal API & Stub
api_darwin.go, api_stub.go, api_test.go, api_tokenizer_darwin.go, api_tokenizer_stub.go, api_tokenizer_test.go
Added Darwin/arm64 Metal-backed Model implementation and tokenizer loader; platform stub for unsupported builds and accompanying tests.
Inference Adapter
adapter.go, adapter_test.go
New InferenceAdapter wrapper over inference.TextModel providing buffered/streaming Generate/Chat, attention inspection, backend factory forcing metal, with tests.
GGUF Discovery & I/O
gguf_info.go, gguf_info_test.go, internal/metal/gguf.go, internal/metal/gguf_bridge.cpp, internal/metal/gguflib_impl.c
GGUF metadata reader/discovery and tests; C bridge and small C compilation unit includes for GGUF support.
Compute API & Metal Backend
compute.go, compute_darwin.go, compute_stub.go, compute_*.go, compute_*.go/test.go
New Compute/Session interfaces, Darwin Metal backend (buffers, kernels, metrics), comprehensive tests and cross-platform stub.
Model loading & architectures
internal/metal/model.go, internal/metal/model_files.go, internal/metal/gemma3.go, internal/metal/gemma4*.go, internal/metal/qwen3.go, internal/metal/*_test.go
Model root resolution, safetensors/gguf loading, probe/detect logic, Gemma3/Gemma4 (including vision) and Qwen3 loaders plus tests.
Generation, Sampling & Attention
internal/metal/generate.go, sample.go, generate_test.go, sample_test.go
Added MinP option, reordered sampler chain, device-scoped withDevice execution, attention extraction, and related tests.
Batch/KV Cache/Fast ops
internal/metal/batch.go, internal/metal/cache.go, internal/metal/fast.go, and tests
Batch/Classify refactor to device-wrapped execution, KV cache rotation/order fixes and explicit frees, RMSNorm/RoPE tweaks and tests.
LoRA Training & Merge
internal/metal/lora.go, internal/metal/lora_merge.go, internal/metal/lora_test.go
LoRA RFC-compatible config normalisation, training Step/Batch API, Save adapters as package+JSON, Merge to materialise weights, and tests.
Array/Ops/Kernel/Export
internal/metal/array.go, internal/metal/ops.go, internal/metal/metal_kernel.go, internal/metal/export.go, tests
Array helpers (Bool/Bytes/SetFloat64), new ops (Conv2d, GatherMM/GatherQMM, Any, Arange, IsNaN), MetalKernel + config APIs, closure export/import bridge and tests.
IO from Reader/Writer
internal/metal/io_custom.go, internal/metal/io.go, internal/metal/io_custom_test.go
Custom safetensors reader/writer vtable for io.ReadWriteSeeker, iterator/map helpers and round-trip tests.
Internal MLX C/C++ Integration
internal/metal/mlx_build_config.h, many internal/metal/mlx_*.cpp, internal/metal/mlxc_*.cpp
Added build config header and ~240 thin forwarding C++ translation units that include upstream lib/mlx/lib/mlx-c sources guarded by __has_include with helpful #error messages.
Array/GC/Export/Memory & Misc internals
many internal/metal/* files
Added RuntimeGC, export/import closures, device selection/withDevice, DefaultStream CPU/GPU resolution, AdamW config refactor, SwitchLinear for MoE, GC/finalizer and memory lifecycle improvements and tests.
Internal Tests & Test harness
many internal/metal/*_test.go, internal/metal/testmain_test.go
Extensive Darwin/arm64 Metal integration and unit tests for kernels, compute pipelines, generation, arrays, export/import, LoRA, Gemma4, GGUF and more; test main helper to skip when Metal unavailable.
CLI Tool
cmd/violet/main.go, cmd/violet/main_test.go
New violet daemon CLI with TOML-like config parsing, env overrides and tests.
CI Submodules
.forgejo/workflows/test.yml
Actions checkout now fetches submodules recursively in workflow.
CUDA fallback stub
internal/metal/mlx_mlx_backend_cuda_no_cuda.cpp
Stub implementation reporting CUDA unavailable and erroring on CUDA kernel use.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant API as mlx API
    participant Model as Metal Model (internal/metal)
    participant MLX as MLX-C (lib/mlx-c)
    participant GPU as Metal/GPU

    User->>API: LoadModel(path, opts)
    API->>Model: resolveModelRoot -> loadModelWeights
    Model->>MLX: LoadSafetensors or LoadGGUF
    MLX-->>Model: weight tensors
    Model->>Model: probeModelType -> select loader (Gemma3/Gemma4/Qwen)
    Model-->>API: *Model
    User->>API: model.Generate(prompt, opts)
    API->>Model: m.withDevice(fn)
    Model->>GPU: dispatch compute via MLX
    GPU-->>Model: logits / tensors
    Model-->>API: generated text/result
    API-->>User: result
Loading
sequenceDiagram
    participant User
    participant Compute as mlx Compute API
    participant Session as Session (internal/metal)
    participant Kernel as MetalKernel
    participant GPU as Metal/GPU

    User->>Compute: NewSession(opts)
    Compute->>Session: create backend/session
    User->>Session: BeginFrame()
    User->>Session: NewPixelBuffer / NewByteBuffer
    User->>Session: Run(kernelName, args)
    Session->>Kernel: compile/load MSL if needed
    Kernel->>GPU: dispatch
    GPU-->>Kernel: complete
    Kernel-->>Session: outputs
    User->>Session: Read buffer -> get results
    User->>Session: FinishFrame() -> FrameMetrics
Loading

@Snider
Copy link
Copy Markdown
Contributor Author

Snider commented Apr 27, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 24

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
internal/metal/slice.go (2)

16-27: ⚠️ Potential issue | 🔴 Critical

Add validation for empty and mismatched slice parameters before CGO pointer dereference.

Empty slices cause panics when dereferencing &slice[0], and the loops assume len(starts) == len(ends) without validation. Both Slice() and SliceUpdateInplace() are affected.

Proposed fix
 func Slice(a *Array, starts, ends []int32) *Array {
+	if len(starts) == 0 || len(starts) != len(ends) {
+		panic("Slice: starts/ends must be non-empty and equal length")
+	}
 	out := newArray("SLICE", a)
 	cStarts := make([]C.int, len(starts))
 	cEnds := make([]C.int, len(ends))
@@
 func SliceUpdateInplace(a, update *Array, starts, ends []int32) *Array {
+	if len(starts) == 0 || len(starts) != len(ends) {
+		panic("SliceUpdateInplace: starts/ends must be non-empty and equal length")
+	}
 	out := newArray("SLICE_UPDATE", a, update)
 	cStarts := make([]C.int, len(starts))
 	cEnds := make([]C.int, len(ends))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/slice.go` around lines 16 - 27, The Slice() and
SliceUpdateInplace() implementations must validate slice parameters before
creating C arrays and taking &slice[0]; check that len(starts) == len(ends) (and
> 0 if your code dereferences &slice[0]) and return an error or handle the no-op
case when lengths mismatch or are zero. Change the code that builds cStarts,
cEnds and strides (symbols: cStarts, cEnds, strides) to either pass nil and
C.size_t(0) to mlx_slice when the Go slices are empty, or return early with a
clear error for mismatched lengths, and only take &cStarts[0], &cEnds[0],
&strides[0] after confirming non-zero lengths; apply the same checks in both
Slice() and SliceUpdateInplace() and keep the mlx_slice(...) call and
DefaultStream().ctx usage unchanged otherwise.

42-47: ⚠️ Potential issue | 🟠 Major

Add bounds validation for axis parameter before array indexing.

After normalisation, if axis is out of range, accessing starts[ax] or ends[ax] will cause an uncontrolled panic. This should be validated explicitly:

Proposed fix
 	ax := axis
 	if ax < 0 {
 		ax = ndim + ax
 	}
+	if ax < 0 || ax >= ndim {
+		panic("SliceAxis: axis out of range")
+	}
 	starts[ax] = start
 	ends[ax] = end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/slice.go` around lines 42 - 47, After normalising axis into ax
(using ax := axis; if ax < 0 { ax = ndim + ax }), add an explicit bounds check
to ensure 0 <= ax < ndim before using starts[ax] or ends[ax]; if it fails,
return a clear error (or panic with a descriptive message) rather than indexing
out of range. Update the code around the ax/axis logic (variables ax, axis,
ndim, starts, ends) to perform this validation and handle the out-of-range case
consistently with the surrounding function's error-handling conventions.
internal/metal/io.go (1)

52-54: ⚠️ Potential issue | 🟠 Major

Remove leaked mlx_array allocation in iterator error path.

When mlx_map_string_to_array_iterator_next returns non-zero (error/end of iteration), the mlx_array allocated at line 52 is not freed. Compare with internal/metal/export.go (lines 137–140), which correctly frees on error:

Correct pattern from export.go
value := C.mlx_array_new()
if C.mlx_map_string_to_array_iterator_next(&key, &value, it) != 0 {
    C.mlx_array_free(value)
    break
}

Apply the same cleanup in io.go.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/io.go` around lines 52 - 54, In the iterator loop in
internal/metal/io.go, the mlx_array allocated via value := C.mlx_array_new() is
leaked when C.mlx_map_string_to_array_iterator_next(&key, &value, it) returns
non-zero; update that error/end-of-iteration branch to call
C.mlx_array_free(value) before breaking (same pattern as in
internal/metal/export.go). Ensure you reference the local variable value and the
iterator it and free value on the non-zero return path from
C.mlx_map_string_to_array_iterator_next.
internal/metal/lora.go (1)

747-820: ⚠️ Potential issue | 🟠 Major

Free adapter tensors that are skipped during injection.

All adapter arrays are materialised up front, but only successfully injected pairs become reachable from the model. Unrecognised names, incomplete pairs, and missing target layers currently leak their arrays for the lifetime of the process.

Suggested cleanup
 	scale := config.Alpha / float32(config.Rank)
 	injected := 0
+	kept := make(map[*Array]struct{})
 
 	for key, pair := range pairs {
 		if pair.matrixA == nil || pair.matrixB == nil {
 			core.Warn("adapter: incomplete LoRA pair, skipping",
 				"layer", key.layerIdx, "proj", key.projPath)
@@
 		lora := &LoRALinear{
 			Base:  linear,
 			A:     pair.matrixA,
 			B:     pair.matrixB,
@@
 		}
 		linear.LoRA = lora
+		kept[pair.matrixA] = struct{}{}
+		kept[pair.matrixB] = struct{}{}
 		injected++
 	}
+
+	freed := make(map[*Array]struct{})
+	for _, arr := range weights {
+		if arr == nil {
+			continue
+		}
+		if _, ok := kept[arr]; ok {
+			continue
+		}
+		if _, ok := freed[arr]; ok {
+			continue
+		}
+		Free(arr)
+		freed[arr] = struct{}{}
+	}
 
 	if injected == 0 {
 		return core.E("lora.applyLoadedLoRA", "no LoRA layers injected from "+adapterDir, nil)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/lora.go` around lines 747 - 820, Materialize currently makes
all adapter Arrays reachable but arrays for unrecognised names, incomplete
pairs, or missing target layers are never freed; to fix this, call the Array
cleanup method (e.g., arr.Free() or the project’s standard release API) for each
arr that will not be injected: after parseLoRAWeightName when layerIdx < 0 free
the arr, when iterating pairs free pair.matrixA and pair.matrixB for any pair
skipped due to missing partner, and also free both matrices when
resolveLinear(model, key.layerIdx, key.projPath) returns nil; ensure you only
free arrays that are not later attached to LoRALinear (do not free matrices
after successfully setting linear.LoRA) and keep references consistent so you
don’t double-free.
♻️ Duplicate comments (2)
internal/metal/mlx_gen_cpu_compiled_preamble.cpp (1)

1-1: ⚠️ Potential issue | 🔴 Critical

Verify that the build system correctly processes this forwarding stub.

This file exhibits the same pattern as internal/metal/mlxc_linalg.cpp—a bare relative path to a generated source file rather than valid C++ code. Confirm that your build system has special logic to handle these redirect stubs, particularly for generated sources under lib/generated/.

Please refer to the verification script in the previous comment for internal/metal/mlxc_linalg.cpp to check the build configuration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_gen_cpu_compiled_preamble.cpp` at line 1, The file
internal/metal/mlx_gen_cpu_compiled_preamble.cpp is a forwarding stub containing
only a relative include path ("../../lib/generated/cpu/compiled_preamble.cpp");
update the build configuration so these stubs are recognized and handled as
redirects to generated sources: ensure the build system (e.g., CMakeLists or
your custom build script) treats files matching this pattern
(internal/metal/mlx_gen_cpu_compiled_preamble.cpp and other stubs pointing into
lib/generated/) as aliases rather than real C++ sources, add logic to resolve
the relative path and add the actual file at
lib/generated/cpu/compiled_preamble.cpp to the compile list and dependency
graph, and add or run the same verification script you used for
internal/metal/mlxc_linalg.cpp to assert the redirect is correctly processed
during configure/build.
internal/metal/mlx_mlx_backend_metal_nojit_kernels.cpp (1)

1-1: ⚠️ Potential issue | 🟡 Minor

Missing trailing newline.

The AI summary notes this file has no trailing newline, which may trigger compiler warnings or cause issues if the build system concatenates files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_backend_metal_nojit_kernels.cpp` at line 1, The file
noj it_kernels.cpp is missing a trailing newline at EOF which can trigger
warnings; open the file referenced as nojit_kernels.cpp and add a single newline
character at the end of the file (ensure the editor saves the file with a final
'\n') so the file ends with a newline rather than EOF immediately after the last
character.
🟠 Major comments (33)
internal/metal/gc_test.go-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add SPDX licence header to this new Go test file.

This new Go file also needs the required SPDX header at the top.

Proposed fix
+// SPDX-Licence-Identifier: EUPL-1.2
 package metal_test

As per coding guidelines **/*.go: Add SPDX licence header // SPDX-Licence-Identifier: EUPL-1.2 to every new file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/gc_test.go` at line 1, Add the required SPDX license header to
the top of the new Go test file that declares package metal_test: insert the
single-line comment containing the exact SPDX identifier "//
SPDX-Licence-Identifier: EUPL-1.2" as the first line above the "package
metal_test" declaration so the file conforms to the repository's Go file
licensing guideline.
internal/metal/gc.go-1-2 (1)

1-2: ⚠️ Potential issue | 🟠 Major

Add SPDX licence header to this new Go file.

This file is missing the required SPDX header at the top.

Proposed fix
+// SPDX-Licence-Identifier: EUPL-1.2
 // AX-6-exception: runtime import scoped here so consumers can call mlx.GC() instead of runtime.GC() directly.
 package metal

As per coding guidelines **/*.go: Add SPDX licence header // SPDX-Licence-Identifier: EUPL-1.2 to every new file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/gc.go` around lines 1 - 2, The new Go file in package metal
(metal.gc.go / symbol: package metal) is missing the required SPDX header; add
the single-line SPDX comment `// SPDX-License-Identifier: EUPL-1.2` at the very
top of the file before the package declaration so every new .go file follows the
coding guideline for SPDX license headers.
internal/metal/mlx_mlx_io_gguf_quants.cpp-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Resolve the missing include target before merge.

Line 1 includes a non-existent file ../../lib/mlx/mlx/io/gguf_quants.cpp. This will cause a compilation failure; the target file does not exist in the repository.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_io_gguf_quants.cpp` at line 1, The file
internal/metal/mlx_mlx_io_gguf_quants.cpp currently includes a non-existent path
("../../lib/mlx/mlx/io/gguf_quants.cpp"); fix this by either adding the missing
source to the repo or changing the include to the correct existing target (e.g.,
include the appropriate header like "mlx/io/gguf_quants.h" or the correct
relative path to gguf_quants.cpp), and update any build/BUILD or CMakeLists
entries if you add the source file; locate the include statement in
internal/metal/mlx_mlx_io_gguf_quants.cpp and replace it with the correct
include or add the missing file to the project.
internal/metal/mlx_mlx_io_gguf.cpp-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Submodule lib/mlx must be initialised for build to succeed.

The include path at line 1 targets a file that does not exist in fresh clones because the lib/mlx submodule is not initialised. The .gitmodules configuration is correct, but builds will fail unless CI runs git submodule update --init --recursive or equivalent. Update the build documentation to clarify this requirement, or refactor the include to avoid requiring a submodule checkout.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_io_gguf.cpp` at line 1, The build fails because the
source file internal/metal/mlx_mlx_io_gguf.cpp includes
"../../lib/mlx/mlx/io/gguf.cpp" which requires the lib/mlx submodule to be
initialized; either update the build/documentation to require running "git
submodule update --init --recursive" (or equivalent) before building and add a
note to README/BUILD.md, or refactor the include in mlx_mlx_io_gguf.cpp to avoid
the submodule dependency (e.g., provide a vendored copy, a CMake option to
download/enable the submodule, or conditional include/fallback) and ensure the
chosen approach updates build scripts (CMakeLists/Makefile) and CI to match.
internal/metal/io.go-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Build constraints across all CGO files in internal/metal do not match the required project standard.

The internal/metal/ directory contains 50+ CGO files all using //go:build darwin && arm64 && !nomlx, but the coding guideline specifies //go:build darwin && arm64. Whilst the !nomlx feature flag is used consistently throughout, it contradicts the documented constraint standard in docs/development.md.

All affected CGO files in internal/metal/**/*.go must be updated to remove the && !nomlx suffix.

Example change (applies to all internal/metal CGO files)
-//go:build darwin && arm64 && !nomlx
+//go:build darwin && arm64
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/io.go` at line 1, Multiple CGO files use the nonstandard build
tag "//go:build darwin && arm64 && !nomlx"; update each occurrence to the
project-standard "//go:build darwin && arm64". Search for the exact string
"//go:build darwin && arm64 && !nomlx" in the internal/metal package, replace it
with "//go:build darwin && arm64" in every CGO file, ensure the change stays as
the first line of each file, and then run the build/tests (or go vet) to confirm
no build tags or CGO issues remain.
internal/metal/io_custom.go-211-220 (1)

211-220: ⚠️ Potential issue | 🟠 Major

Memory leak: C.CString allocations are never freed.

Each call to goIOLabel allocates a new C string via C.CString(), but these allocations are never freed. The comment states MLX-C only reads transiently, but the Go side must still free the memory. If MLX-C calls this callback multiple times per stream, memory will leak.

Consider caching the C string in the ioStream struct and freeing it in goIOFree, or documenting that MLX-C calls this at most once per stream lifetime.

Proposed fix: cache the C label string
 type ioStream struct {
 	rws    io.ReadWriteSeeker
 	label  string
+	clabel *C.char // cached C string, freed in goIOFree
 	size   int64
 	good   bool
 	open   bool
 }

 //export goIOLabel
 func goIOLabel(desc unsafe.Pointer) *C.char {
 	s := lookupIOStream(desc)
 	if s == nil {
 		return C.CString("<unknown go stream>")
 	}
-	// MLX-C does not free this; it only reads it transiently.
-	// We return a C string that lives until the stream is freed.
-	return C.CString(s.label)
+	if s.clabel == nil {
+		s.clabel = C.CString(s.label)
+	}
+	return s.clabel
 }

 //export goIOFree
 func goIOFree(desc unsafe.Pointer) {
+	s := lookupIOStream(desc)
+	if s != nil && s.clabel != nil {
+		C.free(unsafe.Pointer(s.clabel))
+	}
 	unregisterIOStream(desc)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/io_custom.go` around lines 211 - 220, goIOLabel currently
allocates a new C string with C.CString(s.label) on every call, leaking memory
because those buffers are never freed; change to cache the C string pointer on
the ioStream object (e.g., add a field like cLabel *C.char on the struct
returned by lookupIOStream), have goIOLabel return the cached cLabel (allocating
it once with C.CString when nil), and free that cached C string in goIOFree when
the stream is destroyed; update lookupIOStream, ioStream struct, goIOLabel, and
goIOFree accordingly to ensure each allocation is freed exactly once.
attention_snapshot_test.go-1-2 (1)

1-2: ⚠️ Potential issue | 🟠 Major

Add the required SPDX licence header to this new test file.

The file is newly added and is missing the mandatory SPDX line at the top.

Suggested fix
+// SPDX-Licence-Identifier: EUPL-1.2
+
 package mlx

As per coding guidelines, **/*.go: Add SPDX licence header // SPDX-Licence-Identifier: EUPL-1.2 to every new file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@attention_snapshot_test.go` around lines 1 - 2, Add the required SPDX header
to the top of the new test file: insert the line "// SPDX-Licence-Identifier:
EUPL-1.2" as the very first line of attention_snapshot_test.go (above the
package mlx declaration) so the file begins with the mandated license
identifier.
docs/development.md-34-34 (1)

34-34: ⚠️ Potential issue | 🟠 Major

Fix the replace directive to the migrated module path.

Line 34 still points to dappco.re/go/core/inference, which conflicts with the new top-level namespace and can mislead local setup.

Suggested fix
-replace dappco.re/go/core/inference => ../go-inference
+replace dappco.re/go/inference => ../go-inference
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/development.md` at line 34, The replace directive still references the
old module path "replace dappco.re/go/core/inference => ../go-inference"; update
that directive so the left-hand module path matches the migrated top-level
namespace (i.e., change "dappco.re/go/core/inference" to the new module path
used in the repo) while keeping the right-hand local replacement
"../go-inference" so local setups resolve correctly.
internal/metal/testmain_test.go-1-3 (1)

1-3: ⚠️ Potential issue | 🟠 Major

Add the required SPDX licence header to this new Go file.

The SPDX header is required but currently missing before the build constraint.

Suggested fix
+// SPDX-Licence-Identifier: EUPL-1.2
+
 //go:build darwin && arm64 && !nomlx

As per coding guidelines, **/*.go: Add SPDX licence header // SPDX-Licence-Identifier: EUPL-1.2 to every new file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/testmain_test.go` around lines 1 - 3, Add the required SPDX
license header to the top of the new Go file: insert the line "//
SPDX-Licence-Identifier: EUPL-1.2" immediately before the existing build
constraint comment "//go:build darwin && arm64 && !nomlx" in
internal/metal/testmain_test.go so the file begins with the SPDX header followed
by the build tag and package declaration.
api_tokenizer_darwin.go-1-3 (1)

1-3: ⚠️ Potential issue | 🟠 Major

Add the required SPDX licence header to this new Go file.

The file starts with a build tag but omits the mandatory SPDX licence identifier.

Suggested fix
+// SPDX-Licence-Identifier: EUPL-1.2
+
 //go:build darwin && arm64 && !nomlx

As per coding guidelines, **/*.go: Add SPDX licence header // SPDX-Licence-Identifier: EUPL-1.2 to every new file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api_tokenizer_darwin.go` around lines 1 - 3, Add the required SPDX license
header by inserting the exact comment "// SPDX-Licence-Identifier: EUPL-1.2"
immediately after the existing build tag line ("//go:build darwin && arm64 &&
!nomlx") and before the "package mlx" declaration in api_tokenizer_darwin.go so
the file contains the SPDX identifier while keeping the build tag and package
declaration intact.
api_tokenizer_stub.go-1-3 (1)

1-3: ⚠️ Potential issue | 🟠 Major

Add the required SPDX licence header to this new Go file.

Line 1 starts with the build tag, but the mandatory SPDX header is missing.

Suggested fix
+// SPDX-Licence-Identifier: EUPL-1.2
+
 //go:build !(darwin && arm64) || nomlx

As per coding guidelines, **/*.go: Add SPDX licence header // SPDX-Licence-Identifier: EUPL-1.2 to every new file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api_tokenizer_stub.go` around lines 1 - 3, Add the mandatory SPDX license
header to the top of the new Go file by inserting the comment line with the
identifier before the existing build tag; specifically, place "//
SPDX-Licence-Identifier: EUPL-1.2" immediately above the existing build tag line
so the file (which contains the build tag and the package mlx declaration)
includes the required SPDX header.
cmd/violet/main_test.go-1-2 (1)

1-2: ⚠️ Potential issue | 🟠 Major

Add the required SPDX licence header to this new test file.

The file is newly added and needs the SPDX header at the top.

Suggested fix
+// SPDX-Licence-Identifier: EUPL-1.2
+
 package main

As per coding guidelines, **/*.go: Add SPDX licence header // SPDX-Licence-Identifier: EUPL-1.2 to every new file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/violet/main_test.go` around lines 1 - 2, Add the required SPDX license
header to the top of the new test file: insert the comment line //
SPDX-Licence-Identifier: EUPL-1.2 as the very first line above the package main
declaration in main_test.go so the package main test file includes the mandated
SPDX header.
internal/metal/fast.go-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Keep the CGO build constraint in the mandated form.

This file now uses darwin && arm64 && !nomlx, but the repository rule for internal/metal CGO files requires the plain darwin && arm64 constraint.

Suggested fix
-//go:build darwin && arm64 && !nomlx
+//go:build darwin && arm64

As per coding guidelines, internal/metal/**/*.go: All CGO files must carry the build constraint //go:build darwin && arm64 as this project is darwin/arm64 only.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/fast.go` at line 1, The build constraint at the top of
internal/metal/fast.go must be changed from "//go:build darwin && arm64 &&
!nomlx" to the mandated "//go:build darwin && arm64" (remove the "&& !nomlx"
clause); update the file's leading build tag accordingly and ensure any
corresponding old-style "+build" line (if present) matches the same "darwin &&
arm64" constraint so the CGO file conforms to the repository rule for
internal/metal CGO files.
internal/metal/device.go-1-14 (1)

1-14: ⚠️ Potential issue | 🟠 Major

Align this CGO file header with the repo rules.

This new file is missing the SPDX header, and the build tag now diverges from the required darwin && arm64 form by adding && !nomlx.

Suggested fix
-//go:build darwin && arm64 && !nomlx
+// SPDX-Licence-Identifier: EUPL-1.2
+//go:build darwin && arm64

As per coding guidelines, internal/metal/**/*.go: All CGO files must carry the build constraint //go:build darwin && arm64 as this project is darwin/arm64 only, and **/*.go: Add SPDX licence header // SPDX-Licence-Identifier: EUPL-1.2 to every new file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/device.go` around lines 1 - 14, This CGO source (package
metal, in internal/metal/device.go) must be updated to follow repo rules:
replace the current build constraint with the exact required line "//go:build
darwin && arm64" (remove the "&& !nomlx" suffix) and add the SPDX license header
"// SPDX-Licence-Identifier: EUPL-1.2" at the top of the file above the build
tag so the file carries the mandated license and correct darwin/arm64-only build
constraint.
internal/metal/gguf.go-44-57 (1)

44-57: ⚠️ Potential issue | 🟠 Major

Potential memory leak when iterator yields early.

When yield returns false (Line 54-56), the loop breaks but value allocated at Line 46 via mlx_array_new() is not freed. The array is only wrapped with a finalizer when iteration continues.

Proposed fix
 		for {
 			var key *C.char
 			value := C.mlx_array_new()
 			if C.mlx_map_string_to_array_iterator_next(&key, &value, it) != 0 {
+				C.mlx_array_free(value)
 				break
 			}

 			name := C.GoString(key)
 			arr := &Array{ctx: value, name: name}
 			runtime.SetFinalizer(arr, finalizeArray)
 			if !yield(name, arr) {
+				// Note: value is now owned by arr with a finalizer, no leak here
 				break
 			}
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/gguf.go` around lines 44 - 57, The loop allocates a C array
with mlx_array_new() into value but doesn’t free it when breaking early; change
the loop in the iterator to call the appropriate C deallocator (e.g.
C.mlx_array_free(value) or the project’s mlx array free function) whenever you
break without wrapping value in an Array: free value before breaking when
mlx_map_string_to_array_iterator_next indicates no next item and also free value
before breaking when yield(name, arr) returns false (only wrap and set
finalizeArray when you intend to keep the array).
internal/metal/optim.go-84-101 (1)

84-101: ⚠️ Potential issue | 🟠 Major

AdamWConfig cannot express zero-valued overrides.

mergeAdamWConfig treats 0 as "not set", so NewAdamW(&AdamWConfig{WeightDecay: 0}) still leaves the default decay in place. The same problem applies to LearningRate, Beta1, Beta2, and Eps, which means valid caller input is silently ignored. Please use an explicit presence signal here instead of zero checks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/optim.go` around lines 84 - 101, mergeAdamWConfig currently
treats zero as "unset" so callers cannot intentionally override fields to 0;
change AdamWConfig to carry an explicit presence signal (preferably make numeric
fields pointers like *float64 for LearningRate, Beta1, Beta2, Eps, WeightDecay
or add corresponding booleans like HasLearningRate, HasWeightDecay, etc.), then
update mergeAdamWConfig to check for nil (or the Has* flags) and apply overrides
when present; also update any constructors like NewAdamW and call sites to pass
pointer values (or set flags) so explicit zero values are preserved.
internal/metal/metal.go-108-116 (1)

108-116: ⚠️ Potential issue | 🟠 Major

Avoid deriving MLX_METALLIB_PATH from runtime.Caller.

This approach breaks with -trimpath builds (which strip path information), relocated binaries, or module-cache builds—none can reconstruct ../../dist/lib/mlx.metallib at runtime. Initialisation becomes brittle unless the source tree layout is preserved alongside the binary distribution.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/metal.go` around lines 108 - 116, The current code uses
runtime.Caller to derive MLX_METALLIB_PATH which breaks with -trimpath/relocated
binaries; replace that logic in the init/initialisation block (the code that
checks core.Env("MLX_METALLIB_PATH") and currently uses runtime.Caller to build
metallib) with a robust approach: first try to read an explicit
environment/config value (core.Env), if empty derive a path relative to the
running executable using os.Executable() and filepath.Dir (instead of
runtime.Caller), falling back to failing fast with a clear error log if neither
yields a valid metallib path; set the env with os.Setenv("MLX_METALLIB_PATH",
<computed-path>) only when the computed path exists. Use the same symbols:
core.Env, os.Setenv, os.Executable (or filepath.Dir), and the metallib variable
name to locate and update the code.
internal/metal/ops.go-13-18 (1)

13-18: ⚠️ Potential issue | 🟠 Major

Add missing SPDX licence header.

The file is missing the required licence header. Add // SPDX-Licence-Identifier: EUPL-1.2 as the first line before the build constraint. The optionalInt semantics are correct: treating 0 as "no value" is intentional for quantisation contexts where 0 indicates "use MLX default" for groupSize and bits parameters.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/ops.go` around lines 13 - 18, Add the missing SPDX license
header as the very first line of the file: insert the line "//
SPDX-Licence-Identifier: EUPL-1.2" before the build constraint and any package
declarations so the file (which contains the function optionalInt) includes the
required license identifier; do not change the optionalInt semantics (treating 0
as "no value")—only prepend the SPDX header.
internal/metal/gemma3.go-228-247 (1)

228-247: 🛠️ Refactor suggestion | 🟠 Major

Add unit test validation for quantised layer zero-value semantics and SPDX licence header.

The zero groupSize/bits values are intentional semantics meaning "defer to MLX affine defaults" (consistent with gemma4_test.go precedent), but gemma3.go lacks explicit unit test coverage validating this behaviour. Additionally, the file is missing the required SPDX-Licence-Identifier header (// SPDX-Licence-Identifier: EUPL-1.2) per coding guidelines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/gemma3.go` around lines 228 - 247, Add the SPDX license header
to the top of gemma3.go and add unit tests that explicitly validate the
zero-value semantics of quantization parameters: when quantConfig.GroupSize and
quantConfig.Bits are zero, ensure NewQuantizedLinear(...) and the Embedding
fields (embed.GroupSize/embed.Bits) preserve zeros and therefore defer to MLX
affine defaults; locate the creation sites referenced by NewQuantizedLinear,
NewLinear, Embedding, and the weight(...) calls in gemma3.go to add tests that
construct a quantConfig with GroupSize=0 and Bits=0, assert the returned
QuantizedLinear/Embedding contain zeros for GroupSize/Bits (and that scales !=
nil branch is exercised), and add a short test mirroring the gemma4_test.go
precedent to encode the expected behavior.
internal/metal/gemma4.go-1228-1233 (1)

1228-1233: ⚠️ Potential issue | 🟠 Major

Free tensors that never become part of the model on load failure.

The failure defer only tears down arrays already reachable from m. If loading aborts before gemma4FreeUnusedWeights runs, tensors still sitting in weights or visionWeights stay live, so a failed load can retain GPU memory.

Suggested cleanup
 	loadSucceeded := false
 	defer func() {
 		if loadSucceeded {
 			return
 		}
+		retained := gemma4RetainedWeights(m)
+		gemma4FreeUnusedWeights(weights, retained)
+		gemma4FreeUnusedWeights(visionWeights, retained)
 		closeGemma4(m)
 		ClearCache()
 	}()

Also applies to: 1296-1303

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/gemma4.go` around lines 1228 - 1233, The code calls
loadModelWeights then creates visionWeights :=
sanitizeGemma4VisionWeights(rawWeights) and weights :=
sanitizeGemma4Weights(rawWeights) but on early return the tensors in
weights/visionWeights remain allocated; ensure you free those arrays when load
fails: after detecting err from loadModelWeights or any subsequent error before
returning from LoadGemma4, call gemma4FreeUnusedWeights(weights) and
gemma4FreeUnusedWeights(visionWeights) (or the appropriate cleanup function used
elsewhere) to release GPU tensors, then return the wrapped error; apply the same
fix to the other similar block around the sanitize calls (the one referenced at
1296-1303).
gguf_info.go-290-317 (1)

290-317: ⚠️ Potential issue | 🟠 Major

Bound GGUF collection sizes before allocating.

These counts and array lengths are taken straight from the file and then fed into make and recursion. A malformed GGUF can trigger enormous allocations or a runtime panic long before you return an error.

Suggested guard
+	const maxGGUFEntries = 1 << 20
+
 	var tensorCount uint64
 	if err := binary.Read(file, binary.LittleEndian, &tensorCount); err != nil {
 		return nil, nil, fmt.Errorf("mlx: read gguf tensor count: %w", err)
 	}
 	var metadataCount uint64
 	if err := binary.Read(file, binary.LittleEndian, &metadataCount); err != nil {
 		return nil, nil, fmt.Errorf("mlx: read gguf metadata count: %w", err)
 	}
+	if tensorCount > maxGGUFEntries || metadataCount > maxGGUFEntries {
+		return nil, nil, errors.New("mlx: gguf header counts are unreasonably large")
+	}
...
 		var length uint64
 		if err := binary.Read(reader, binary.LittleEndian, &length); err != nil {
 			return nil, err
 		}
+		if length > maxGGUFEntries {
+			return nil, errors.New("gguf array is unreasonably large")
+		}
 		values := make([]any, 0, length)

Also applies to: 387-399

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gguf_info.go` around lines 290 - 317, The code reads metadataCount and
tensorCount directly from the file and uses them in make(...) and for range
loops, which allows a malformed GGUF to trigger enormous allocations or panics;
validate these counts before allocating/iterating: add sane upper bounds (e.g.
const maxMetadataItems, maxTensorItems) and check that metadataCount and
tensorCount are <= those bounds and also convertible to int safely (compare
against uint64(math.MaxInt) or a chosen max), return a descriptive error if they
exceed limits, then allocate metadata := make(map[string]any,
int(metadataCount)) and iterate with for i := uint64(0); i < metadataCount; i++
(same for tensors slice of ggufTensorInfo) so the functions
readGGUFString/readGGUFValue and the subsequent parsing cannot cause unbounded
allocation or panic.
internal/metal/lora.go-399-415 (1)

399-415: ⚠️ Potential issue | 🟠 Major

Do not reuse mutable KV caches inside the gradient closure.

ValueAndGrad is being given a stateful lossFn: every evaluation mutates the same cache slice. That can feed stale KV state into later value/gradient passes and silently corrupt training.

Suggested change
-	caches := adapter.Model.NewCache()
 	defer Free(inputs, targetIDs, lossMask, attnMask)
-	defer freeCaches(caches)
 
 	argnums := make([]int, len(params))
 	for i := range params {
 		argnums[i] = i
 	}
 
 	lossFn := func(current []*Array) []*Array {
 		adapter.SetAllParams(current)
+		caches := adapter.Model.NewCache()
+		defer freeCaches(caches)
 		logits := adapter.Model.ForwardMasked(inputs, attnMask, caches)
 		loss := MaskedCrossEntropyLoss(logits, targetIDs, lossMask)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/lora.go` around lines 399 - 415, The closure assigned to
lossFn reuses a single mutable cache slice created by adapter.Model.NewCache()
(caches) and thus mutates shared KV state across ValueAndGrad evaluations;
change lossFn to allocate or clone fresh caches inside the closure (e.g., call
adapter.Model.NewCache() or deep-copy caches at the start of lossFn), use those
fresh caches for adapter.Model.ForwardMasked(inputs, attnMask, cachesLocal) and
ensure you Free()/freeCaches(cachesLocal) after each forward to avoid leaks,
while keeping the outer param setup via adapter.SetAllParams(current) intact.
internal/metal/gemma4.go-506-593 (1)

506-593: ⚠️ Potential issue | 🟠 Major

Reject negative config dimensions before allocating.

These fields come straight from JSON into signed int32s and then drive make, slicing, and integer range. A malformed config.json with a negative size will panic here instead of returning a parse error.

Suggested guard
+	if cfg.NumHiddenLayers < 0 ||
+		cfg.HiddenSize < 0 ||
+		cfg.IntermediateSize < 0 ||
+		cfg.NumAttentionHeads < 0 ||
+		cfg.NumKeyValueHeads < 0 ||
+		cfg.HeadDim < 0 ||
+		cfg.GlobalHeadDim < 0 ||
+		cfg.HiddenSizePerLayerInput < 0 ||
+		cfg.SlidingWindow < 0 ||
+		cfg.SlidingWindowPattern < 0 ||
+		cfg.MaxPositionEmbeddings < 0 ||
+		cfg.NumKVSharedLayers < 0 {
+		return nil, core.E("gemma4.parseConfig", "negative dimensions are invalid", nil)
+	}
+
 	if cfg.HeadDim == 0 && cfg.HiddenSize > 0 && cfg.NumAttentionHeads > 0 {
 		cfg.HeadDim = cfg.HiddenSize / cfg.NumAttentionHeads
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/gemma4.go` around lines 506 - 593, Validate and reject any
negative dimension/config fields after you apply defaults and before allocating
or ranging (e.g., before mergeGemma4RopeParameters and certainly before the
LayerTypes allocation loop); specifically check signed fields like
cfg.NumHiddenLayers, cfg.HiddenSize, cfg.HeadDim, cfg.HiddenSizePerLayerInput,
cfg.VocabSize, cfg.VocabSizePerLayerInput, cfg.MaxPositionEmbeddings,
cfg.SlidingWindow, cfg.SlidingWindowPattern, cfg.NumKVSharedLayers,
cfg.ImageTokenID and any pointer-backed counts (cfg.NumExperts, cfg.TopKExperts)
for < 0 and return an error via core.E("gemma4.parseConfig", "...", nil) when
they are invalid so the function fails with a parse error instead of panicking
during make/slicing/range operations (adjust the error message to indicate which
field is negative).
internal/metal/gemma4_vision.go-471-523 (1)

471-523: ⚠️ Potential issue | 🟠 Major

Fail fast when required vision weights are missing.

This constructor happily builds layers with nil QProj/KProj/VProj/OProj or MLP linears, then Forward dereferences them later. Please validate the mandatory weights here and return an error instead of deferring the crash to inference time.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/gemma4_vision.go` around lines 471 - 523, The constructor
currently builds Gemma4VisionEncoderLayer entries even when gemma4VisionLinear
returns nil for mandatory projections, causing panics later in Forward; modify
the layer construction loop to validate that required attention linears (QProj,
KProj, VProj, OProj) and MLP linears (GateProj, UpProj, DownProj) are non-nil
after calling gemma4VisionLinear and that required norms from gemma4VisionNorm
exist, and return a descriptive error (instead of silently continuing) if any
are missing; ensure the error bubbles up from whichever function instantiates
the Encoder (the function that creates Gemma4VisionEncoderLayer /
Gemma4VisionAttention / Gemma4VisionMLP) so callers can handle missing weight
files.
internal/metal/metal_kernel.go-115-125 (1)

115-125: ⚠️ Potential issue | 🟠 Major

Validate handles before calling into MLX.

Apply dereferences config and every input array unconditionally. A nil, freed, or otherwise invalid handle will panic in Go or cross into C as a null handle instead of returning a normal error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/metal_kernel.go` around lines 115 - 125, Check and validate
all Go-side handles before calling into MLX in MetalKernel.Apply: verify k and
k.ctx, config and config.ctx, and each input Array and its ctx are non-nil (and
return a descriptive error if any are invalid) before creating/using
inputVec/outputVec and calling C.mlx_fast_metal_kernel_apply; ensure you still
defer freeing inputVec/outputVec on early returns. This prevents panics or null
handles crossing into the C call (symbols: MetalKernel.Apply, MetalKernelConfig,
Array, k.ctx, config.ctx, a.ctx, DefaultStream, mlx_fast_metal_kernel_apply).
api_darwin.go-236-248 (1)

236-248: ⚠️ Potential issue | 🟠 Major

Add a cancellation path to the channel-based stream helpers.

Both helpers write to an unbuffered channel from a goroutine under context.Background(). If the caller stops receiving early, the goroutine blocks and the backend keeps generating with no way to stop it.

Also applies to: 252-268

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api_darwin.go` around lines 236 - 248, Update the channel-based stream
helpers (e.g., Model.GenerateStream and the similar helper at lines 252-268) to
accept a context.Context parameter, pass that context into m.model.Generate
instead of context.Background(), and use select when sending tokens to out to
handle ctx.Done(): e.g., inside the goroutine iterate over tokens and for each
token do select { case out <- Token{...}: case <-ctx.Done(): return }, ensuring
the goroutine returns and the stream stops when the caller cancels; also
close(out) via defer as already present.
internal/metal/gemma4_vision.go-584-588 (1)

584-588: ⚠️ Potential issue | 🟠 Major

Keep multimodal failure paths non-destructive.

When image encoding fails you fall back with llmTokens, and when feature counts do not match you still inject as many rows as you can. Both paths leave image positions converted to PAD embeddings instead of preserving the original token sequence.

Also applies to: 646-668

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/gemma4_vision.go` around lines 584 - 588, When image encoding
via m.encodeGemma4Images(imagePixels) fails or yields mismatched feature counts,
do not overwrite the original token embeddings for image positions with PADs;
instead preserve the original llm token sequence and only perform a
non-destructive fallback. Concretely: in the error path where imageFeatures ==
nil or !imageFeatures.Valid() (and the similar logic around the 646-668 region),
call Free(h) as before but return m.Forward(llmTokens, caches) without altering
the token buffer; in the partial-match path only inject embeddings for the exact
number of matched image tokens (using the corresponding indices) and leave any
unmatched image token positions unchanged. Update the code around
encodeGemma4Images, Free(h), and the injection logic so image positions are
preserved unless a valid replacement embedding exists.
internal/metal/metal_kernel.go-1-3 (1)

1-3: ⚠️ Potential issue | 🟠 Major

Bring the file header into policy compliance.

This new CGO file is missing the SPDX header, and the build constraint does not match the repository rule for internal/metal CGO files.

As per coding guidelines, internal/metal/**/*.go: "All CGO files must carry the build constraint //go:build darwin && arm64 as this project is darwin/arm64 only" and **/*.go: "Add SPDX licence header // SPDX-Licence-Identifier: EUPL-1.2 to every new file".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/metal_kernel.go` around lines 1 - 3, Add the required SPDX
header and correct the build constraint at the top of the file: replace the
existing build line "//go:build darwin && arm64 && !nomlx" with the
repository-mandated "//go:build darwin && arm64" and prepend the SPDX licence
header line "// SPDX-Licence-Identifier: EUPL-1.2" above the build constraint so
the package declaration "package metal" and any CGO contents comply with the
internal/metal CGO policy.
internal/metal/export.go-1-3 (1)

1-3: ⚠️ Potential issue | 🟠 Major

Bring the file header into policy compliance.

This new CGO file is missing the SPDX header, and the build constraint does not match the repository rule for internal/metal CGO files.

As per coding guidelines, internal/metal/**/*.go: "All CGO files must carry the build constraint //go:build darwin && arm64 as this project is darwin/arm64 only" and **/*.go: "Add SPDX licence header // SPDX-Licence-Identifier: EUPL-1.2 to every new file".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/export.go` around lines 1 - 3, This file is missing the SPDX
header and its build constraint includes an extra flag; add the SPDX license
header as the first line using the canonical tag "// SPDX-License-Identifier:
EUPL-1.2" and change the existing build constraint line to exactly "//go:build
darwin && arm64" (remove "&& !nomlx"), keeping the package metal declaration
unchanged so the file complies with internal/metal CGO rules.
adapter.go-105-123 (1)

105-123: ⚠️ Potential issue | 🟠 Major

Cancel generation when the callback wants to stop.

Breaking the receive loop on callback error leaves the underlying generation running on the original context. That can keep GPU work and goroutines alive until the model finishes anyway.

Proposed fix
 func (adapter *InferenceAdapter) GenerateStream(ctx context.Context, prompt string, opts GenOpts, cb TokenCallback) error {
 	if adapter == nil || adapter.model == nil {
 		return errors.New("mlx: inference adapter is nil")
 	}
 	if cb == nil {
 		return errors.New("mlx: token callback is nil")
 	}
+	ctx, cancel := context.WithCancel(ctx)
+	defer cancel()
 
 	var callbackErr error
 	for token := range adapter.model.Generate(ctx, prompt, genOptsToInference(opts)...) {
 		if err := cb(token.Text); err != nil {
 			callbackErr = err
+			cancel()
 			break
 		}
 	}
 	if callbackErr != nil {
 		return callbackErr
@@
 func (adapter *InferenceAdapter) ChatStream(ctx context.Context, messages []Message, opts GenOpts, cb TokenCallback) error {
 	if adapter == nil || adapter.model == nil {
 		return errors.New("mlx: inference adapter is nil")
 	}
 	if cb == nil {
 		return errors.New("mlx: token callback is nil")
 	}
+	ctx, cancel := context.WithCancel(ctx)
+	defer cancel()
 
 	var callbackErr error
 	for token := range adapter.model.Chat(ctx, messages, genOptsToInference(opts)...) {
 		if err := cb(token.Text); err != nil {
 			callbackErr = err
+			cancel()
 			break
 		}
 	}
 	if callbackErr != nil {
 		return callbackErr

Also applies to: 148-166

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapter.go` around lines 105 - 123, The GenerateStream loop must cancel the
underlying generation when the TokenCallback signals stop: wrap the incoming ctx
with context.WithCancel and pass that derived ctx to adapter.model.Generate;
when cb(token.Text) returns an error call cancel() immediately and record the
callback error, but continue draining the returned token channel (discarding
tokens) until it closes so the generator goroutine can exit cleanly; finally
return the callback error (or adapter.model.Err() if no callback error). Apply
the same change pattern to the other GenerateStream variant referenced (lines
148-166) and use symbols GenerateStream, adapter.model.Generate, TokenCallback,
genOptsToInference, and adapter.model.Err() to locate and update the code.
internal/metal/gemma4_vision.go-1-3 (1)

1-3: ⚠️ Potential issue | 🟠 Major

Add the SPDX licence header.

This new Go file is missing the repository-required header.

As per coding guidelines, **/*.go: "Add SPDX licence header // SPDX-Licence-Identifier: EUPL-1.2 to every new file".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/gemma4_vision.go` around lines 1 - 3, Add the required SPDX
licence header to the top of this new Go file: insert the line "//
SPDX-Licence-Identifier: EUPL-1.2" immediately above the build tag or package
declaration (the file with package "metal" and the build tag "//go:build darwin
&& arm64 && !nomlx") so every new .go file includes the repository-required
header.
internal/metal/metal_kernel.go-235-240 (1)

235-240: ⚠️ Potential issue | 🟠 Major

Handle scalar outputs without indexing shape[0].

A scalar output uses an empty shape, so &cShape[0] panics here. Pass a nil pointer when len(shape) == 0.

Proposed fix
 func (c *MetalKernelConfig) AddOutputArg(shape []int32, dtype DType) {
 	cShape := make([]C.int, len(shape))
 	for i, s := range shape {
 		cShape[i] = C.int(s)
 	}
-	C.mlx_fast_metal_kernel_config_add_output_arg(c.ctx, &cShape[0], C.size_t(len(cShape)), C.mlx_dtype(dtype))
+	var shapePtr *C.int
+	if len(cShape) > 0 {
+		shapePtr = &cShape[0]
+	}
+	C.mlx_fast_metal_kernel_config_add_output_arg(c.ctx, shapePtr, C.size_t(len(cShape)), C.mlx_dtype(dtype))
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/metal_kernel.go` around lines 235 - 240, In AddOutputArg on
MetalKernelConfig, avoid indexing cShape[0] for scalar outputs (empty shape)
which causes a panic; when len(shape)==0 pass a nil pointer to
C.mlx_fast_metal_kernel_config_add_output_arg instead of &cShape[0], otherwise
pass &cShape[0] with C.size_t(len(cShape)) as before, keeping the cShape slice
and dtype conversion logic unchanged.
api_darwin.go-1-3 (1)

1-3: ⚠️ Potential issue | 🟠 Major

Add the SPDX licence header.

This new Go file is missing the repository-required header.

As per coding guidelines, **/*.go: "Add SPDX licence header // SPDX-Licence-Identifier: EUPL-1.2 to every new file".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api_darwin.go` around lines 1 - 3, Add the required SPDX header to this new
Go file by inserting the line "// SPDX-Licence-Identifier: EUPL-1.2" at the very
top of the file before the existing build tag and package declaration (the file
containing the "//go:build darwin && arm64 && !nomlx" build tag and "package
mlx").

Comment thread internal/metal/export.go
Comment thread internal/metal/export.go
Comment thread internal/metal/mlx_mlx_backend_common_slicing.cpp Outdated
Comment thread internal/metal/mlx_mlx_backend_cpu_arg_reduce.cpp Outdated
Comment thread internal/metal/mlx_mlx_backend_cpu_available.cpp Outdated
Comment thread internal/metal/mlx_mlx_io_safetensors.cpp Outdated
Comment thread internal/metal/mlxc_device.cpp Outdated
Comment thread internal/metal/mlxc_fft.cpp Outdated
Comment thread internal/metal/mlxc_metal.cpp Outdated
Comment thread internal/metal/mlxc_transforms_impl.cpp Outdated
24+ findings dispositioned across CGO bridge, model loading, tensor
lifecycle, and stream cancellation surfaces.

Code fixes:
- export.go: zero-arg unary callback shape; CGO callback panic recovery
- internal/metal/*.cpp: C++ forwarding + symlink findings
- slice.go: empty/mismatched slice handling + axis bounds checks
- io.go + gguf.go + io_custom.go: iterator + label leak fixes
- gguf_info.go: bounded GGUF counts/arrays (no unbounded reads)
- gemma3.go: quant zero-semantics test coverage
- gemma4.go: failed-load tensor cleanup + negative config validation
- gemma4_vision.go: missing vision weights + non-destructive fallback
- lora.go: KV cache reuse + skipped adapter tensor cleanup
- metal.go: runtime.Caller metallib path resolution
- metal_kernel.go: handle validation + scalar output shape
- optim.go: zero-valued AdamW overrides handled
- api_darwin.go + adapter.go: stream cancellation paths now propagate
  context.Context through GenerateStream / ChatStream

Doc + build:
- SPDX + build-tag policy conformance
- docs/development.md: migrated replace path
- lib/mlx, lib/mlx-c, lib/generated: submodule build docs

Disposition replies (RESOLVED-COMMENT):
- AX-6/testify: no testify imports in go-mlx; remaining go.sum entries
  are transitive checksums from replaced sibling modules
- SonarCloud/GHAS: no PR annotations/checks beyond CodeRabbit;
  code-scanning API returned no analysis

Verification: gofmt -l clean, GOWORK=off go vet + go test -count=1 ./...
pass with explicit GOPATH/GOMODCACHE/GOCACHE.

Note: lib/mlx submodule was already dirty on entry (pre-existing WIP),
left untouched.

Closes findings on #3

Co-authored-by: Codex <noreply@openai.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 36

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
internal/metal/mlx_mlx_primitives.cpp (1)

1-2: ⚠️ Potential issue | 🔴 Critical

Initialise the lib/mlx git submodule or update the include path.

Line 1 includes ../../lib/mlx/mlx/primitives.cpp, but lib/mlx is an uninitialised git submodule (contains only a .git marker file). The lib/mlx/mlx/ subdirectory does not exist, so compilation will fail with a "file not found" error. This applies to all five forwarding files in internal/metal/ that reference the missing lib/mlx hierarchy (mlx_mlx_distributed_primitives.cpp, mlx_mlx_backend_metal_primitives.cpp, mlx_mlx_backend_gpu_primitives.cpp, mlx_mlx_backend_cpu_primitives.cpp).

Either initialise the submodule (git submodule update --init lib/mlx) or replace the #include-based forwarding with proper CMake source list configuration to avoid including .cpp files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_primitives.cpp` around lines 1 - 2, The forwarding
file internal/metal/mlx_mlx_primitives.cpp is including an inexistent submodule
file ("#include \"../../lib/mlx/mlx/primitives.cpp\""), causing build failures;
fix by either initializing the lib/mlx git submodule (git submodule update
--init lib/mlx) so the referenced path exists, or stop including .cpp files from
internal/metal (mlx_mlx_primitives.cpp and the other forwarding files
mlx_mlx_distributed_primitives.cpp, mlx_mlx_backend_metal_primitives.cpp,
mlx_mlx_backend_gpu_primitives.cpp, mlx_mlx_backend_cpu_primitives.cpp) and
instead add the real mlx source files into your CMakeLists (add_library /
target_sources or target_link_libraries) so the mlx sources are compiled
directly and the include path adjusted accordingly. Ensure any include
directives or target_include_directories point to the correct lib/mlx include
directory if you keep the submodule.
docs/development.md (1)

304-312: ⚠️ Potential issue | 🟡 Minor

Refresh the dependency graph namespace.

The dependency graph still points at forge.lthn.ai/core/go-inference, while the rest of the guide has moved to dappco.re/go/inference. Leaving the old path here will mislead local workspace setup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/development.md` around lines 304 - 312, Update the dependency graph to
use the new namespace: replace the old module path
forge.lthn.ai/core/go-inference with dappco.re/go/inference in the "Dependency
Graph" block (the go-mlx entry referencing forge.lthn.ai/core/go-inference),
ensuring the rest of the entries (e.g., mlx-c v0.4.1 and Apple MLX) remain
unchanged so local workspace setup points to dappco.re/go/inference.
docs/models.md (1)

8-25: ⚠️ Potential issue | 🟡 Minor

Clarify architecture detection for direct .gguf loads.

Line 8 still says architecture is detected from config.json, but Line 25 now allows passing a single .gguf file directly. That makes the opening description inaccurate for one of the newly documented input forms.

Proposed doc fix
-go-mlx loads transformer models from either HuggingFace safetensors shards or GGUF checkpoints. Architecture is auto-detected from the `model_type` field in `config.json`.
+go-mlx loads transformer models from either HuggingFace safetensors shards or GGUF checkpoints. For safetensors directories, architecture is detected from the `model_type` field in `config.json`; for direct GGUF loads, it is detected from the checkpoint metadata.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/models.md` around lines 8 - 25, The opening sentence incorrectly states
architecture is always auto-detected from the `model_type` field in
`config.json`; update it to note two cases: when loading a model directory the
architecture is taken from the `model_type` in `config.json`, and when passing a
single `.gguf` file (used by inference.LoadModel) the architecture is detected
from the GGUF checkpoint metadata instead. Reference `model_type`, `.gguf`, and
`inference.LoadModel` in the revised sentence so readers know which source is
used for each input form.
♻️ Duplicate comments (6)
internal/metal/mlx_mlx_distributed_jaccl_no_jaccl.cpp (1)

1-1: ⚠️ Potential issue | 🔴 Critical

Hard build dependency: upstream no_jaccl.cpp path likely missing (submodule/init or forwarding path).

Line [1] directly includes an upstream .cpp via a relative path; the provided static analysis indicates the file is not found, which would fail compilation in environments where lib/mlx isn’t present or the path differs. This is consistent with the earlier (outdated) review finding, so please treat it as an actionable blocker/guardrail.

♻️ Proposed fix: guard the include with a clear error
-#include "../../lib/mlx/mlx/distributed/jaccl/no_jaccl.cpp"
+#if __has_include("../../lib/mlx/mlx/distributed/jaccl/no_jaccl.cpp")
+#include "../../lib/mlx/mlx/distributed/jaccl/no_jaccl.cpp"
+#else
+#error "Missing upstream source: lib/mlx/mlx/distributed/jaccl/no_jaccl.cpp. Check that the lib/mlx subtree/submodule is initialised (e.g., git submodule update --init --recursive) or fix the forwarding include path."
+#endif
#!/bin/bash
set -euo pipefail

echo "== Checking expected upstream path =="
TARGET="lib/mlx/mlx/distributed/jaccl/no_jaccl.cpp"
if [ -f "$TARGET" ]; then
  echo "✓ Found: $TARGET"
else
  echo "✗ Not found: $TARGET"
fi

echo
echo "== Locating any no_jaccl.cpp in the repo (read-only) =="
# Prefer fd if available; fall back to find
if command -v fd >/dev/null 2>&1; then
  fd -a "no_jaccl\.cpp$" -t f . || true
else
  find . -type f -name "no_jaccl.cpp" 2>/dev/null || true
fi

echo
echo "== Checking submodule status (read-only) =="
if command -v git >/dev/null 2>&1 && git rev-parse --is-inside-work-tree >/dev/null 2>&1; then
  git submodule status --recursive || true
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_distributed_jaccl_no_jaccl.cpp` at line 1, The direct
relative include of "../../lib/mlx/mlx/distributed/jaccl/no_jaccl.cpp" is a hard
build dependency that can break when the upstream path is absent; update
internal/metal/mlx_mlx_distributed_jaccl_no_jaccl.cpp to guard the include with
a preprocessor existence check (e.g. use __has_include or equivalent) and
provide a clear compile-time fallback: if the file exists include the upstream
no_jaccl.cpp, otherwise emit a descriptive `#error` (or include a local stub) that
names the missing "lib/mlx/mlx/distributed/jaccl/no_jaccl.cpp" and explains how
to resolve it; reference the include target
"../../lib/mlx/mlx/distributed/jaccl/no_jaccl.cpp" and ensure the message
appears at compile time so CI/devs get actionable guidance.
internal/metal/mlx_mlx_backend_cpu_arg_reduce.cpp (1)

1-1: ⚠️ Potential issue | 🔴 Critical

Fix build-breaking missing forwarded include target.

This TU forwards via #include "../../lib/mlx/mlx/backend/cpu/arg_reduce.cpp", but clang reports that file is not found. As-is, this is very likely to break the build (and it matches the earlier stub problem flagged on prior commits).

Suggested resolution (choose one)
  • If the CPU implementation should exist in-repo: add/restore lib/mlx/mlx/backend/cpu/arg_reduce.cpp so this include resolves correctly (and ensure the relative path is correct).
  • If the implementation is provided by an external dependency (e.g., FetchContent): remove or disable this forwarding TU and update the corresponding CMake target(s) so the build no longer compiles this missing-include file.

Also verify the link strategy to avoid duplicate symbol issues if arg_reduce.cpp is compiled elsewhere and only included here as well.

#!/bin/bash
set -euo pipefail

echo "== Check referenced implementation file exists =="
if [ -f "lib/mlx/mlx/backend/cpu/arg_reduce.cpp" ]; then
  echo "✓ lib/mlx/mlx/backend/cpu/arg_reduce.cpp exists"
else
  echo "✗ lib/mlx/mlx/backend/cpu/arg_reduce.cpp does NOT exist"
fi

echo "== Locate any arg_reduce.cpp in the repo =="
fd -t f 'arg_reduce\.cpp$' -a

echo "== Find references to this forwarding TU =="
rg -n "internal/metal/mlx_mlx_backend_cpu_arg_reduce\.cpp" -S .

echo "== Find build references to arg_reduce.cpp / arg_reduce =="
rg -n --glob 'CMakeLists.txt' --glob '*.cmake' 'arg_reduce(\.cpp)?' -S .
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_backend_cpu_arg_reduce.cpp` at line 1, The forwarding
translation unit internal/metal/mlx_mlx_backend_cpu_arg_reduce.cpp (`#include`
"../../lib/mlx/mlx/backend/cpu/arg_reduce.cpp") fails because the target
implementation is missing; either restore/add the implementation file at
lib/mlx/mlx/backend/cpu/arg_reduce.cpp (ensuring the relative path matches the
include) or remove/disable this forwarding TU and update the CMake target that
references it so the build no longer compiles the missing file; also check the
CMake targets that reference "arg_reduce.cpp" (or an arg_reduce target) to avoid
duplicate-symbols if the implementation will be built elsewhere.
internal/metal/mlx_mlx_backend_cpu_copy.cpp (1)

1-1: ⚠️ Potential issue | 🔴 Critical

Blocker: included copy.cpp path appears missing, so this TU won’t compile.

internal/metal/mlx_mlx_backend_cpu_copy.cpp includes ../../lib/mlx/mlx/backend/cpu/copy.cpp, but clang reports the referenced file as “file not found”. This will break builds unless the lib/ subtree is always present in CI/dev checkouts.

Please either:

  1. fix the include path to the real location of copy.cpp, and/or
  2. ensure the build process initialises/fetches the required lib/mlx/... sources before compiling these shim files (and confirm CI does the same), and/or
  3. avoid compiling upstream .cpp via relative #include (prefer build-system inclusion or include a header and link the proper object sources).
Proposed “fail fast” guard (optional while you fix the path)
 // internal/metal/mlx_mlx_backend_cpu_copy.cpp
+#if !__has_include("../../lib/mlx/mlx/backend/cpu/copy.cpp")
+#error "Missing upstream source: ../../lib/mlx/mlx/backend/cpu/copy.cpp (ensure submodules/sources are present and path is correct)"
+#endif
 `#include` "../../lib/mlx/mlx/backend/cpu/copy.cpp"
#!/bin/bash
# Description: Verify whether the referenced upstream source exists and whether this shim pattern is used consistently.

set -euo pipefail

echo "=== Resolve/locate copy.cpp ==="
# Check the exact path this file references
if test -f "internal/metal/../../lib/mlx/mlx/backend/cpu/copy.cpp"; then
  echo "FOUND: internal/metal/../../lib/mlx/mlx/backend/cpu/copy.cpp"
else
  echo "MISSING: internal/metal/../../lib/mlx/mlx/backend/cpu/copy.cpp"
  echo "--- Nearby matches (search for copy.cpp under lib/) ---"
  fd -t f "copy.cpp" lib || true
fi

echo
echo "=== Inspect other internal/metal shim include patterns ==="
# Find similar shim files and print their include targets
rg -n --glob 'internal/metal/mlx_mlx_backend_*.cpp' '^\s*#include\s+"?\.\./\.\./' internal/metal || true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_backend_cpu_copy.cpp` at line 1, The shim file
internal/metal/mlx_mlx_backend_cpu_copy.cpp currently `#includes`
"../../lib/mlx/mlx/backend/cpu/copy.cpp" which is missing; fix by either
updating that include to the correct path to the upstream copy.cpp, or stop
including the upstream .cpp here and instead include the appropriate header
(e.g., the copy.h API) and add the upstream copy.cpp to the build system as a
compiled source to be linked; alternatively ensure the build step
fetches/initializes the lib/mlx subtree before compiling these shims so the
relative path exists—modify the include in
internal/metal/mlx_mlx_backend_cpu_copy.cpp and/or the build rules accordingly.
internal/metal/mlx_mlx_backend_metal_scaled_dot_product_attention.cpp (1)

1-1: ⚠️ Potential issue | 🔴 Critical

Build-blocker: included MLX Metal implementation file is missing.

Line [1] includes ../../lib/mlx/mlx/backend/metal/scaled_dot_product_attention.cpp, and the provided static analysis reports it as “file not found”. That will break compilation unless CI/dev reliably checks out/initialises lib/mlx (submodule or vendored content) before any compiler/linter step evaluates this TU.

🛠️ Proposed fix (process/build-system)
  • Ensure the build pipeline runs submodule initialisation before compile/lint, e.g. git submodule update --init --recursive (or the repo’s equivalent).
  • If this repo supports a “vendored lib” fallback, make sure the fallback is present in CI images/artifacts.
  • Confirm the relative path is correct in the build working directory context (it should resolve relative to the including file, but the underlying file must exist).
#!/bin/bash
set -euo pipefail

target="lib/mlx/mlx/backend/metal/scaled_dot_product_attention.cpp"
forwarder="internal/metal/mlx_mlx_backend_metal_scaled_dot_product_attention.cpp"

echo "== Check whether lib/mlx submodule exists and is initialised =="
git submodule status --recursive -- lib/mlx || true

echo "== Check forwarding file type (symlink vs regular file) =="
stat -c '%F %n' "$forwarder" || true
readlink -f "$forwarder" || true
test -e "$forwarder" && echo "✓ Forwarder exists"

echo "== Check include target existence =="
if test -f "$target"; then
  echo "✓ Target exists: $target"
else
  echo "✗ Target missing: $target"
  echo "Suggestion: run: git submodule update --init --recursive"
  exit 1
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_backend_metal_scaled_dot_product_attention.cpp` at
line 1, The TU
internal/metal/mlx_mlx_backend_metal_scaled_dot_product_attention.cpp includes
../../lib/mlx/mlx/backend/metal/scaled_dot_product_attention.cpp which is
missing; fix by ensuring the lib/mlx source is available to the build (run git
submodule update --init --recursive or restore the vendored lib) or adjust the
include to point to a bundled fallback, and verify the relative path resolves
correctly in CI/build working directory so the include in
internal/metal/mlx_mlx_backend_metal_scaled_dot_product_attention.cpp finds
lib/mlx/mlx/backend/metal/scaled_dot_product_attention.cpp at compile time.
internal/metal/mlx_mlx_distributed_ring_no_ring.cpp (1)

1-1: ⚠️ Potential issue | 🔴 Critical

Critical: compile will fail — included no_ring.cpp target is missing

internal/metal/mlx_mlx_distributed_ring_no_ring.cpp (Line 1) includes ../../lib/mlx/mlx/distributed/ring/no_ring.cpp, but the static analysis indicates that file does not exist. This will break compilation for any configuration that compiles this TU.

Fix options:

  1. Ensure the upstream lib/mlx/mlx/distributed/ring/no_ring.cpp exists in this repo (or is correctly provided via the expected submodule/wiring), and that the relative path resolves during build.
  2. If no_ring is not meant to ship, remove/exclude this forwarding TU from the internal/metal build instead of including a missing implementation.
  3. If the intent is to ship a stub, create the missing stub no_ring.cpp (matching the public symbols/namespaces from dist/include/mlx/distributed/ring/ring.h) rather than forwarding to a non-existent file.
#!/bin/bash
set -euo pipefail

echo "== Check forwarded target existence =="
target="lib/mlx/mlx/distributed/ring/no_ring.cpp"
if [ -f "$target" ]; then
  echo "OK: Found $target"
else
  echo "MISSING: $target"
fi

echo "== Check whether this internal file is a symlink (and its real target) =="
ls -la internal/metal/mlx_mlx_distributed_ring_no_ring.cpp

echo "== Search for other references to the same forwarded path =="
rg -n --hidden --no-ignore-vcs "distributed/ring/no_ring\\.cpp" internal/metal || true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_distributed_ring_no_ring.cpp` at line 1, The include
in internal/metal/mlx_mlx_distributed_ring_no_ring.cpp points to a missing file
../../lib/mlx/mlx/distributed/ring/no_ring.cpp which breaks compilation; fix by
either (A) adding the upstream implementation at
lib/mlx/mlx/distributed/ring/no_ring.cpp so the relative include resolves, (B)
if no_ring is not intended to be shipped, remove this forwarding TU from the
internal/metal build (or delete
internal/metal/mlx_mlx_distributed_ring_no_ring.cpp), or (C) supply a small stub
implementation at lib/mlx/mlx/distributed/ring/no_ring.cpp that defines the
public symbols expected by dist/include/mlx/distributed/ring/ring.h (match
namespaces and exported function/class signatures) so dependent TUs link
correctly. Ensure whichever option you choose is reflected in the build files so
the TU is not compiled against a missing source.
internal/metal/mlx_mlx_backend_metal_conv.cpp (1)

1-1: ⚠️ Potential issue | 🔴 Critical

🚨 Build blocker: include target may be missing (duplicate of prior broken-symlink finding).

#include "../../lib/mlx/mlx/backend/metal/conv.cpp" is reported as not found by clang, and there’s already a prior review pointing to the same class of problem for this file (broken symlink / missing lib/mlx/mlx/backend/ directory). This would prevent compilation.

Please ensure the upstream backend sources (or correct symlink targets) are present in the checkout, and confirm CMake isn’t compiling both the upstream .cpp and this shim (duplicate symbols risk).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_backend_metal_conv.cpp` at line 1, The include in
internal/metal/mlx_mlx_backend_metal_conv.cpp (`#include`
"../../lib/mlx/mlx/backend/metal/conv.cpp") is failing because the upstream path
or symlink is missing and may also cause duplicate-symbols if both the upstream
conv.cpp and this shim are compiled; fix by restoring or correcting the
symlink/checkout so ../../lib/mlx/mlx/backend/metal/conv.cpp exists or replace
the include with an appropriate header-only include, and update CMakeLists (or
the build target) so only one translation unit provides conv implementation
(ensure either this shim is removed from sources if upstream conv.cpp is built,
or this shim is the sole source and upstream conv.cpp is excluded).
🟠 Major comments (18)
internal/metal/mlx_mlx_backend_metal_custom_kernel.cpp-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Fix broken include path: referenced file does not exist.

The shim includes ../../lib/mlx/mlx/backend/metal/custom_kernel.cpp, but this file cannot be resolved from internal/metal/. The relative path resolves to an empty path, and no implementation file exists at the expected location in the repository. This will cause a compilation failure.

Confirm the correct relative path from internal/metal/ to the actual implementation file, or ensure the lib/mlx/ directory structure is present in the build context.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_backend_metal_custom_kernel.cpp` at line 1, The
include in internal/metal/mlx_mlx_backend_metal_custom_kernel.cpp is broken:
replace or correct the incorrect relative include
"../../lib/mlx/mlx/backend/metal/custom_kernel.cpp" by locating the actual
implementation file and updating the include to the correct relative path (or
add the missing lib/mlx directory to the build). Search the repo for the
implementation (look for custom_kernel.cpp or custom_kernel.hpp in
mlx/backend/metal) and update the `#include` in
mlx_mlx_backend_metal_custom_kernel.cpp to point to that file, or change it to
include the proper header instead of a .cpp if appropriate so the file resolves
during compilation.
internal/metal/mlx_mlx_backend_cpu_threefry.cpp-1-1 (1)

1-1: 🛠️ Refactor suggestion | 🟠 Major

Confirm no double-compilation / multiple-definition risk from including a .cpp.

This file uses an include-forwarder pattern by directly including an upstream implementation .cpp. That’s fine if—and only if—the upstream lib/.../threefry.cpp is compiled only via this forwarder and not also as a separate compilation unit in your Metal/MLX build wiring.

Please confirm the build system does not compile lib/mlx/mlx/backend/cpu/threefry.cpp directly elsewhere (or via another forwarding TU), otherwise you may hit duplicate symbol/linker errors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_backend_cpu_threefry.cpp` at line 1, Confirm the build
system does not compile threefry.cpp as a separate translation unit anywhere
else; if it does, remove the duplicate rule or stop including threefry.cpp
directly from mlx_mlx_backend_cpu_threefry.cpp to avoid
multiple-definition/linker errors. Either (A) make this forwarder the sole
compilation unit for the threefry implementation (ensure no other build target
lists threefry.cpp), or (B) stop including the .cpp here and instead compile the
upstream threefry.cpp once and include a proper header for its declarations.
Search build rules and targets for any references to threefry.cpp and update
them so the implementation is compiled exactly once.
internal/metal/mlx_mlx_distributed_distributed.cpp-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Potential multiple-definition/linker issues: ensure distributed.cpp is compiled only once.

Because this is a .cpp-including-a-.cpp shim, there’s a real risk of duplicate symbol definitions if lib/mlx/mlx/distributed/distributed.cpp is also compiled in another target (or another shim). If you see “multiple definition”/ODR-style linker errors later, this is the first place to look.

Confirm the build graph compiles distributed.cpp exactly once—either via this shim or via a direct compilation unit, but not both.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_distributed_distributed.cpp` at line 1, This shim
directly includes lib/mlx/mlx/distributed/distributed.cpp causing possible
duplicate symbol/ODR linker errors; locate the inclusion in
internal/metal/mlx_mlx_distributed_distributed.cpp and remove the .cpp include —
either compile lib/mlx/mlx/distributed/distributed.cpp as a single translation
unit in the build graph and link the object, or replace the include with the
corresponding header (e.g., distributed.h) and link the compiled object; ensure
the build targets do not also compile distributed.cpp elsewhere so that
distributed.cpp is present in exactly one compilation unit.
internal/metal/mlx_mlx_distributed_mpi_no_mpi.cpp-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Include target file does not exist—this will cause compilation to fail.

The file internal/metal/mlx_mlx_distributed_mpi_no_mpi.cpp attempts to include ../../lib/mlx/mlx/distributed/mpi/no_mpi.cpp, but this file does not exist in the repository. There is no no_mpi.cpp anywhere in the codebase, and lib/mlx/ is an empty directory. The build will fail when attempting to compile this translation unit.

Verify that the referenced source file exists or that the include path is correct.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_distributed_mpi_no_mpi.cpp` at line 1, The translation
unit internal/metal/mlx_mlx_distributed_mpi_no_mpi.cpp includes a non-existent
file "../../lib/mlx/mlx/distributed/mpi/no_mpi.cpp", causing build failure;
either point the include to the correct existing source/header, add the missing
implementation file, or remove/replace this TU. Locate the include directive in
internal/metal/mlx_mlx_distributed_mpi_no_mpi.cpp and (a) if there is an
intended header, change the include to that header path, (b) if the
implementation belongs in lib/mlx, add the missing no_mpi.cpp into
lib/mlx/mlx/distributed/mpi/, or (c) if this TU is unnecessary, delete it or
conditionally compile it behind a proper guard so the build no longer references
the missing file.
internal/metal/mlx_mlx_linalg.cpp-1-1 (2)

1-1: ⚠️ Potential issue | 🟠 Major

ODR / multiple-definition risk: including a .cpp implementation directly.

This file includes a .cpp implementation to “forward” it into internal/metal. That pattern is fragile: if the same lib/mlx/mlx/linalg.cpp is also compiled as its own translation unit (or included by multiple shims), you can get duplicate symbol / ODR violations at link time. If this is an intentional build wiring, please ensure (and preferably document) that the implementation is included/compiled exactly once across the whole target.

Optionally, add a short comment explaining the intent and the “compile exactly once” constraint, to prevent future regressions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_linalg.cpp` at line 1, Remove the direct inclusion of
the implementation file "lib/mlx/mlx/linalg.cpp" from
internal/metal/mlx_mlx_linalg.cpp and instead either (A) include the
corresponding public header (e.g., linalg.h / linalg.hpp) and link against the
separately compiled lib/mlx translation unit, or (B) if the intent is to force a
single TU shim, add a clear comment above the include stating that this
translation unit intentionally provides the sole compilation of linalg.cpp and
must be built only once; ensure build rules are adjusted so linalg.cpp is not
compiled twice (search for linalg.cpp and the shim filename to update targets).
Ensure references to symbols defined in linalg (functions/classes in the linalg
translation unit) come from the header declarations, not from multiple compiled
definitions.

1-1: ⚠️ Potential issue | 🟠 Major

Compilation will fail: included file does not exist.

The path ../../lib/mlx/mlx/linalg.cpp resolves to a non-existent file. This is a systemic issue affecting ~150+ shim files in internal/metal/ which all reference missing external source trees (lib/mlx/, lib/mlx-c/, lib/generated/). These external dependencies must be present (via git submodules, code generation, or other means) before this file can compile.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_linalg.cpp` at line 1, The compile error is caused by
the include in internal/metal/mlx_mlx_linalg.cpp referencing a non-existent file
("../../lib/mlx/mlx/linalg.cpp"); fix it by ensuring the external source trees
are present and referenced correctly—either add the required repositories as git
submodules (e.g., lib/mlx, lib/mlx-c, lib/generated) and run git submodule
update --init --recursive, or generate the missing sources before build and
update the build system to point to the correct directories; alternatively, if
these shims should include a header instead, replace the direct .cpp include
with the proper header include and adjust the build to compile the external
sources. Ensure the change touches the include in mlx_mlx_linalg.cpp and/or the
project setup so the path "../../lib/mlx/mlx/linalg.cpp" (and analogous includes
across internal/metal/* shims) resolves at build time.
internal/metal/mlx_mlx_backend_cpu_reduce.cpp-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Fix missing include target: lib/mlx/mlx/backend/cpu/reduce.cpp does not exist.

The include directive on line 1 references ../../lib/mlx/mlx/backend/cpu/reduce.cpp, which does not exist in the repository. This will cause a compilation failure. Either the path is incorrect, or the upstream file needs to be added to the repository.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_backend_cpu_reduce.cpp` at line 1, The include
directive is referencing a non-existent file
"../../lib/mlx/mlx/backend/cpu/reduce.cpp" which breaks compilation; update the
include in internal/metal/mlx_mlx_backend_cpu_reduce.cpp to point to the correct
existing target (e.g., the proper relative path or header for the CPU reduce
implementation) or add the missing source file to the repository; specifically
check and fix the include that names reduce.cpp (or replace with the appropriate
reduce.hpp or backend header) so references to the reduce implementation
resolve.
compute.go-175-199 (1)

175-199: ⚠️ Potential issue | 🟠 Major

Guard descriptor arithmetic before the backend allocates from it.

Width*bytesPerPixel and Height*Stride are both computed in int, and the Darwin backend later converts the dimensions to int32. Large inputs can overflow validation, wrap SizeBytes(), or truncate during allocation, which means the buffer can be created with a different layout from the validated descriptor. Please reject values that exceed the safe int/int32 range here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute.go` around lines 175 - 199, PixelBufferDesc.Validate and SizeBytes
must guard against integer overflow/truncation: ensure Width, Height, Stride and
the computed products (Width * bytesPerPixel and Height * Stride) are checked
against platform int and int32 limits before accepting the descriptor or
returning SizeBytes. Modify Validate (and SizeBytes if used standalone) to
perform the arithmetic in a wider type (e.g., int64), reject descriptors where
any dimension or computed byte-size exceeds math.MaxInt or the safe int32 range
(or where intermediate multiplication would overflow), and return
ComputeErrorInvalidDescriptor/ComputeErrorUnsupportedPixelFormat as appropriate;
reference PixelBufferDesc.Width, Height, Stride, Format.BytesPerPixel(),
Validate, and SizeBytes when locating the changes.
gguf_info.go-149-157 (1)

149-157: ⚠️ Potential issue | 🟠 Major

Add GGUF-metadata fallbacks for the core shape fields.

ReadGGUFInfo() parses the metadata map, but VocabSize, HiddenSize, ContextLength, and QuantGroup are still sourced only from config.json. For a standalone .gguf, those fields come back as zero even though the checkpoint metadata is already available. Please populate these fields from GGUF metadata before falling back to the sidecar config.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@gguf_info.go` around lines 149 - 157, ReadGGUFInfo should populate VocabSize,
HiddenSize, ContextLength and QuantGroup from the GGUF metadata map first, and
only fall back to config.* methods if those metadata entries are missing or
invalid. In the GGUFInfo construction (GGUFInfo{...}) update the VocabSize,
HiddenSize, ContextLength and QuantGroup assignments to: try to parse the
corresponding keys from the parsed metadata map (e.g., look up "vocab_size",
"dim"/"hidden_size", "n_ctx"/"context_length", and any quant group key present)
and use those integer values when present/valid; otherwise call
config.vocabSize(), config.hiddenSize(), config.contextLength(),
config.quantGroup() as the fallback. Ensure parsing handles numeric types and
conversion errors gracefully so zeros are not used when metadata exists.
compute_darwin.go-688-707 (1)

688-707: ⚠️ Potential issue | 🟠 Major

Reject buffers that were created by a different session.

These helpers verify type and open-state, but they never check buffer.session == session. That lets one session dispatch with another session’s buffer, so the owning session can Close() and free the underlying metal.Array while this session is still using it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute_darwin.go` around lines 688 - 707, The helpers pixelBufferLocked and
byteBufferLocked currently validate type and open state but must also reject
buffers owned by a different session; after the type assertion and nil check in
pixelBufferLocked (for *pixelBuffer) and byteBufferLocked (for *byteBuffer), add
a check if buffer.session != session and return
computeErr(ComputeErrorInvalidBuffer,
"require_pixel_buffer"/"require_byte_buffer", kernel, role, role+" must belong
to this session") before calling buffer.requireOpenLocked(); keep the existing
requireOpenLocked() call and error handling otherwise.
internal/metal/lora_merge.go-19-45 (1)

19-45: ⚠️ Potential issue | 🟠 Major

Do not free the pre-merge weight if it may be shared with another module.

mergeLinearLoRA unconditionally frees oldWeight after replacing linear.Weight with the merged result. Tied weights (embedding-to-lm_head sharing) exist in Gemma3, Gemma4, and Qwen3 models. Although standard LoRA targets focus on attention and MLP layers, the mechanism supports arbitrary custom targets via LoRAConfig.TargetKeys. If LoRA is applied to a layer whose weight is tied to another module, the peer reference becomes dangling immediately after merge.

Recommend adding ownership-aware cleanup—either skip freeing if the weight is shared, document that LoRA target selection must exclude tied layers, or use reference counting to handle shared ownership safely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/lora_merge.go` around lines 19 - 45, The current merge code
unconditionally calls Free(oldWeight, ...) after replacing linear.Weight with
merged, which can leave dangling references when weights are shared (e.g.,
embedding-to-lm_head); modify mergeLinearLoRA (the block using mergedLoRAWeight,
Dequantize, Add, Materialize, Detach, Free and assigning linear.Weight) to check
ownership before freeing: add or call a weight-ownership predicate (e.g.,
WeightIsShared(oldWeight) or linear.IsOwner flag) and only Free
oldWeight/oldScales/oldBiases when the caller owns the buffer; if no ownership
metadata exists, either incrementally track reference counts for weights or skip
freeing shared buffers and document that LoRA targets must not include tied
layers; ensure you still Free(delta) and zero-out linear.LoRA fields when
leaving the merged state.
internal/metal/lora_test.go-867-891 (1)

867-891: ⚠️ Potential issue | 🟠 Major

Do not share one weight handle across three Linears in this test.

closeGemma4(model) frees Router.Proj, PerLayerInputGate, and PerLayerProjection, so reusing the same weight array here turns teardown into repeated frees of the same handle. The extra defer Free(weight) makes that worse.

Use three distinct weight arrays here, or remove the model-level cleanup and free the shared handle exactly once.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/lora_test.go` around lines 867 - 891, The test currently
creates one weight handle via weight := FromValues(...) and defers Free(weight)
but then constructs three Linear instances (routerProj, perLayerInputGate,
perLayerProjection) that all share that handle and are later freed by
closeGemma4(model), causing double free; fix it by creating three distinct
weight handles (e.g., weightRouter, weightInputGate, weightProjection using
FromValues) and pass each to its respective NewLinear call (routerProj,
perLayerInputGate, perLayerProjection), keep closeGemma4(model) to perform
model-level cleanup, and remove the extra defer Free(weight) so each handle is
freed exactly once by the model teardown.
internal/metal/cache.go-211-217 (1)

211-217: ⚠️ Potential issue | 🟠 Major

Do not return the parent arrays after storing slices into the cache.

This branch keeps c.keys/c.values as slices of fullK/fullV, then returns fullK/fullV directly. Callers are allowed to Free the returned K/V pair, so freeing the prompt view here can invalidate the cached window that the next decode step depends on.

Proposed fix
 	if trim := cap - c.maxSize; trim > 0 {
 		// Preserve the full multi-token prompt for the current attention pass,
 		// while storing only the bounded sliding window for future decode steps.
 		c.keys = Slice(fullK, []int32{0, 0, int32(trim), 0}, []int32{B, H, int32(cap), Dk})
 		c.values = Slice(fullV, []int32{0, 0, int32(trim), 0}, []int32{B, H, int32(cap), Dv})
 		c.idx = int(c.keys.Shape()[2])
-		return fullK, fullV
+		return Slice(fullK, []int32{0, 0, 0, 0}, []int32{B, H, int32(cap), Dk}),
+			Slice(fullV, []int32{0, 0, 0, 0}, []int32{B, H, int32(cap), Dv})
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/cache.go` around lines 211 - 217, The code stores slices into
the cache (c.keys = Slice(fullK, ...), c.values = Slice(fullV, ...)) but then
returns the parent tensors fullK/fullV which callers may Free and thereby
invalidate the cache; change the return to avoid returning the parent arrays —
either return the cached slices (c.keys, c.values) or return independent copies
of those slices so freeing the return value cannot free the cached window.
Update the branch that sets c.keys/c.values and c.idx to return the safe tensor
references (c.keys/c.values or their clones) instead of fullK/fullV.
internal/metal/io_custom.go-184-199 (1)

184-199: ⚠️ Potential issue | 🟠 Major

read_at_offset must not mutate the stream cursor position.

The callback seeks to off and leaves the cursor there. This breaks stream semantics: any subsequent call to tell observes the wrong position, and mixed offset/sequential access patterns can corrupt MLX's internal IO state. Use io.ReaderAt when available, or save and restore the position around the positional read.

Suggested fix
 func goIOReadAtOffset(desc unsafe.Pointer, data *C.char, n C.size_t, off C.size_t) {
 	s := lookupIOStream(desc)
 	if s == nil {
 		return
 	}
+	buf := unsafe.Slice((*byte)(unsafe.Pointer(data)), int(n))
+	if ra, ok := s.rws.(io.ReaderAt); ok {
+		_, err := ra.ReadAt(buf, int64(off))
+		if err != nil {
+			s.good = false
+		}
+		return
+	}
+	pos, err := s.rws.Seek(0, io.SeekCurrent)
+	if err != nil {
+		s.good = false
+		return
+	}
+	defer func() {
+		if _, err := s.rws.Seek(pos, io.SeekStart); err != nil {
+			s.good = false
+		}
+	}()
 	_, err := s.rws.Seek(int64(off), io.SeekStart)
 	if err != nil {
 		s.good = false
 		return
 	}
-	buf := unsafe.Slice((*byte)(unsafe.Pointer(data)), int(n))
 	_, err = io.ReadFull(s.rws, buf)
 	if err != nil {
 		s.good = false
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/io_custom.go` around lines 184 - 199, The goIOReadAtOffset
callback currently seeks s.rws to off and leaves the cursor changed; update it
so positional reads do not mutate the stream position by: if the underlying
s.rws implements io.ReaderAt (type assertion), use its ReadAt to fill buf;
otherwise save the current position via s.rws.Seek(0, io.SeekCurrent), perform
the Seek(off, io.SeekStart) + io.ReadFull, and always restore the saved position
with a final Seek regardless of read success so tell remains correct; keep using
lookupIOStream, set s.good = false on read/seek errors as before, and ensure
restoration errors are handled (and set s.good = false) so the stream state
remains consistent.
internal/metal/gemma4_vision.go-598-620 (1)

598-620: ⚠️ Potential issue | 🟠 Major

Reject incomplete multimodal projector weights during construction.

This only treats the projector as missing when Projection == nil && Linear1 == nil. If Linear1 loads without Linear2 (or vice versa), Forward later falls back to returning normalised vision features, and multimodal injection quietly stops working when the hidden sizes differ. Fail fast here unless you have either a single Projection or the full Linear1 + Linear2 pair.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/gemma4_vision.go` around lines 598 - 620, The constructor
buildGemma4MultiModalProjector must reject incomplete multimodal projector
weights: change the readiness check so that you only accept the projector when
either Projection is present OR both Linear1 and Linear2 are present;
specifically, replace the current condition that checks Projection==nil &&
Linear1==nil with a check that fails fast (return nil) when visionCfg.HiddenSize
!= textCfg.HiddenSize AND NOT (projector.Projection != nil || (projector.Linear1
!= nil && projector.Linear2 != nil)); keep the rest of the initialization
unchanged so Forward won’t silently disable multimodal injection.
internal/metal/export.go-271-291 (1)

271-291: ⚠️ Potential issue | 🟠 Major

Return an error when the closure/imported handle is nil.

Line 284, Line 322, Line 389, and Line 424 all dereference cls or f unconditionally. A nil handle from caller code currently becomes a panic in a public API instead of a normal error return.

Also applies to: 301-329, 377-397, 404-431

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/export.go` around lines 271 - 291, The code dereferences the
closure/import handle (cls and in other funcs f) without nil-checks, causing
panics for callers who pass nil; add an early nil guard in ExportFunction (and
the other funcs that use cls/f) that checks if cls == nil || cls.ctx == nil (or
f == nil || f.ctx == nil) and return a proper error via core.E (e.g.,
core.E("mlx.ExportFunction", core.Sprintf("nil closure handle"), nil)) instead
of dereferencing, preserving existing error-return style; apply the same pattern
to every function in this file that currently accesses cls.ctx or f.ctx
unguarded.
internal/metal/lora.go-181-207 (1)

181-207: ⚠️ Potential issue | 🟠 Major

Normalise or reject non-positive LoRA ranks here.

normalizeLoRAConfig only fixes Rank == 0. A negative rank still reaches later shape construction and alpha / rank, which turns a bad caller config into invalid tensor dimensions instead of a predictable failure path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/lora.go` around lines 181 - 207, normalizeLoRAConfig currently
only fixes Rank == 0 but allows negative ranks; update the function so it treats
non-positive ranks uniformly: if cfg.Rank <= 0 { cfg.Rank = 8 } (or another
agreed default) before any uses, then proceed with the existing Alpha/Scale
logic (so Alpha / float32(cfg.Rank) won't divide by a negative/zero). Modify
normalizeLoRAConfig to perform this check at the top (referencing cfg.Rank,
cfg.Alpha, cfg.Scale) so downstream shape/alpha computations are safe and
deterministic.
internal/metal/export.go-314-320 (1)

314-320: ⚠️ Potential issue | 🟠 Major

Guard kwarg arrays before reading arr.ctx.

Unlike the positional-argument paths, the kwarg loops do not check for nil or invalid arrays. A caller passing map[string]*Array{"x": nil} will panic at Line 318 or Line 417.

Also applies to: 415-419

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/export.go` around lines 314 - 320, The kwarg loop that builds
kwargsMap reads arr.ctx without guarding for nil, so a map entry like
map[string]*Array{"x": nil} will panic; update the loop over kwargs in export.go
to check each arr for nil (and possibly arr.ctx == nil/invalid) before calling
C.mlx_map_string_to_array_insert, either skipping nil entries or returning an
error; ensure you still free the C string (cName) in all code paths and apply
the same nil-guard fix to the similar loop at the other occurrence (around the
415-419 block) to avoid dereferencing nil arr.ctx.
🟡 Minor comments (6)
docs/development.md-69-72 (1)

69-72: ⚠️ Potential issue | 🟡 Minor

Update the CMake example to match the new GGUF requirement.

This section now says MLX_BUILD_GGUF=ON, but the later CMake example still shows OFF and “GGUF not supported”. That contradiction will send contributors towards the wrong build configuration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/development.md` around lines 69 - 72, Update the CMake example so it
matches the new GGUF requirement: change the MLX_BUILD_GGUF flag in the example
from OFF to ON and update the explanatory text that currently says “GGUF not
supported” to reflect that GGUF is now supported; locate the CMake snippet and
the accompanying sentence referencing MLX_BUILD_GGUF and ensure both the flag
and description consistently indicate MLX_BUILD_GGUF=ON (GGUF supported).
README.md-3-11 (1)

3-11: ⚠️ Potential issue | 🟡 Minor

Keep the stated Go version consistent.

The badge says Go 1.26, while the metadata line still says Go 1.25. Readers will not know which toolchain the project actually expects.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 3 - 11, The README has inconsistent Go versions: the
shield badge shows "Go 1.26" while the metadata line states "Language: Go 1.25";
pick the correct toolchain version and make both places match by updating either
the badge or the metadata string (the "Go-1.26" badge markup and the "Language:
Go 1.25" text) so the README consistently reflects the intended Go version.
README.md-15-29 (1)

15-29: ⚠️ Potential issue | 🟡 Minor

Check err before deferring Close() in the quick-start snippet.

As written, the example defers model.Close() before handling a failed LoadModel call, so the first copy/paste error path can panic instead of showing the load failure cleanly.

Proposed doc fix
 model, err := inference.LoadModel("/Volumes/Data/lem/safetensors/gemma-3-1b/")
+if err != nil {
+    panic(err)
+}
 defer model.Close()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 15 - 29, The example defers model.Close() before
checking the error from inference.LoadModel, which can cause a panic if
LoadModel fails; change the snippet to check err immediately after calling
inference.LoadModel and only defer model.Close() after confirming err == nil
(i.e., move the defer model.Close() to after the error check for the result of
inference.LoadModel) so that the LoadModel error is handled cleanly.
internal/metal/io_custom_test.go-15-18 (1)

15-18: ⚠️ Potential issue | 🟡 Minor

Track the written length separately from the cursor.

Bytes() currently slices to b.pos, so any backwards Seek before the final read will truncate valid data that was already written past the current cursor. This helper needs a high-water mark, not just the live position.

Proposed fix
 type bytesRWS struct {
 	data []byte
 	pos  int
+	end  int
 }
 
 func newBytesRWS(initial []byte) *bytesRWS {
 	cp := make([]byte, len(initial))
 	copy(cp, initial)
-	return &bytesRWS{data: cp, pos: 0}
+	return &bytesRWS{data: cp, pos: 0, end: len(cp)}
 }
 
 func newBytesRWSSize(size int) *bytesRWS {
-	return &bytesRWS{data: make([]byte, size), pos: 0}
+	return &bytesRWS{data: make([]byte, size), pos: 0, end: 0}
 }
 
 func (b *bytesRWS) Read(p []byte) (int, error) {
-	if b.pos >= len(b.data) {
+	if b.pos >= b.end {
 		return 0, io.EOF
 	}
-	n := copy(p, b.data[b.pos:])
+	n := copy(p, b.data[b.pos:b.end])
 	b.pos += n
 	return n, nil
 }
 
 func (b *bytesRWS) Write(p []byte) (int, error) {
 	// Grow if needed
 	needed := b.pos + len(p)
 	if needed > len(b.data) {
 		grown := make([]byte, needed)
 		copy(grown, b.data)
 		b.data = grown
 	}
 	n := copy(b.data[b.pos:], p)
 	b.pos += n
+	if b.pos > b.end {
+		b.end = b.pos
+	}
 	return n, nil
 }
 
 func (b *bytesRWS) Bytes() []byte {
-	return b.data[:b.pos]
+	return b.data[:b.end]
 }

Also applies to: 39-49, 71-73

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/io_custom_test.go` around lines 15 - 18, The bytesRWS helper
currently uses pos as the source of truth for Bytes(), which causes backwards
Seek to truncate previously written data; add a separate high-water mark field
(e.g., hi or high) to bytesRWS and update all write paths (Write, WriteAt, or
any method that appends/writes into data) to set high = max(high, pos after
write, pos+len(buf) as appropriate), leave Seek to only adjust pos, and change
Bytes() to return data[:high] so reads/Bytes() reflect the furthest written
extent rather than the current cursor.
internal/metal/device.go-28-38 (1)

28-38: ⚠️ Potential issue | 🟡 Minor

Remove unnecessary mlx_device_new() call in currentDefaultDevice().

Line 30 allocates a device that is immediately overwritten by mlx_get_default_device(&dev) on line 33. Use a zero-initialised variable instead:

var dev C.mlx_device
defer C.mlx_device_free(dev)

if rc := C.mlx_get_default_device(&dev); rc != 0 {

This avoids the wasteful allocation and matches the correct ownership semantics where mlx_get_default_device writes the default device into the provided pointer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/device.go` around lines 28 - 38, In currentDefaultDevice(),
remove the unnecessary allocation call to C.mlx_device_new() and instead declare
a zero-initialised C.mlx_device variable (dev) that you pass to
C.mlx_get_default_device(&dev), keeping the existing defer
C.mlx_device_free(dev); this avoids allocating a device that is immediately
overwritten and preserves correct ownership semantics when calling
C.mlx_get_default_device.
adapter.go-84-101 (1)

84-101: ⚠️ Potential issue | 🟡 Minor

Normalise nil contexts in the buffered APIs as well.

Generate and Chat pass ctx straight through, while GenerateStream and ChatStream already default nil to context.Background(). Keeping the buffered path unguarded makes nil-context behaviour inconsistent and pushes a needless panic/error edge into the backend.

Also applies to: 136-153

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapter.go` around lines 84 - 101, The Generate method on InferenceAdapter
currently forwards a potentially nil ctx to adapter.model.Generate causing
inconsistent behavior versus GenerateStream/ChatStream; update Generate (and
similarly Chat, and the buffered implementations referenced around functions
Generate/Chat at the 136-153 region) to normalize a nil context by setting ctx =
context.Background() when ctx == nil before calling adapter.model.Generate or
adapter.model.Chat, ensuring genOptsToInference(opts) is still applied and
preserving existing error/metrics handling.

Comment thread compute_darwin.go
Comment thread compute_darwin.go
Comment thread internal/metal/mlx_mlx_backend_common_compiled.cpp
Comment thread internal/metal/mlx_mlx_backend_common_reduce.cpp
Comment thread internal/metal/mlx_mlx_backend_cpu_distributed.cpp
Comment thread internal/metal/mlx_mlx_distributed_utils.cpp
Comment thread internal/metal/mlx_mlx_einsum.cpp
Comment thread internal/metal/mlx_mlx_export.cpp
Comment thread internal/metal/mlx_mlx_fft.cpp
Comment thread internal/metal/mlx_mlx_io_no_gguf.cpp
Round 2 follow-up to 061da30. Closes residual CodeRabbit findings.

Code:
- compute.go: session close synchronizes before freeing resources
- compute.go: filter kernels reject source/destination stride mismatches
- compute.go: pixel/byte buffer helpers reject buffers from another
  session
- compute.go: pixel descriptor validation catches int overflow before
  allocation
- gguf_info.go: GGUF metadata fallback for vocab/hidden/context sizes
  when explicit fields absent
- internal/metal/cache.go: rotating KV cache trim no longer returns
  parent arrays (was leaking shared state)
- internal/metal/io_custom.go: read_at_offset preserves stream cursor;
  test helper tracks high-water length
- internal/metal/export.go: nil handle + nil kwarg array errors
- internal/metal/gemma4_vision.go: multimodal projector rejects
  incomplete Linear1/Linear2 pairs
- internal/metal/lora.go: rank normalization handles negative ranks
- internal/metal/lora_merge.go: no longer explicitly frees possibly
  tied base weights
- internal/metal/lora_test.go: Gemma4 LoRA test uses distinct weights
- internal/metal/device.go: currentDefaultDevice doesn't allocate-
  then-overwrite

C++ forwarding hygiene:
- internal/metal/*.cpp: added __has_include guards + #error
  diagnostics to forwarding TUs (single local compile entrypoints)
- docs/development.md: clarified wrapper purpose

API:
- adapter.go: buffered Generate/Chat normalize nil contexts

Doc:
- README + docs: Go 1.26 consistency; quick-start error check; direct
  .gguf architecture wording; dependency namespace alignment

Verification: gofmt clean, GOWORK=off go vet + go test -count=1 ./...
pass with explicit cache paths.

Note: pre-existing 'lib/mlx' submodule dirty marker untouched.

Closes residual findings on #3

Co-authored-by: Codex <noreply@openai.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
internal/metal/mlx_mlx_backend_metal_event.cpp (1)

1-4: ⚠️ Potential issue | 🔴 Critical

This forwarding TU still hard-fails when lib/mlx is not present at that fixed relative path.

The same path issue previously reported remains, and the new #error now makes it a deterministic compile failure when the file is absent (matching the static-analysis failure). Please move this to build-system wiring (target source/link/include config) or a build-provided macro path instead of a fixed relative .cpp include.

#!/bin/bash
set -euo pipefail

wrapper="internal/metal/mlx_mlx_backend_metal_event.cpp"
rel="../../lib/mlx/mlx/backend/metal/event.cpp"

echo "== Wrapper =="
test -f "$wrapper" && echo "OK: $wrapper exists"

echo "== Resolved forwarded path =="
wrapper_dir="$(dirname "$wrapper")"
resolved="$(cd "$wrapper_dir" && realpath -m "$rel")"
echo "$resolved"
if [ -f "$resolved" ]; then
  echo "OK: forwarded source exists"
else
  echo "FAIL: forwarded source missing"
fi

echo "== Build references =="
rg -n --type=cmake -C2 'mlx_mlx_backend_metal_event\.cpp|internal/metal|FetchContent|add_subdirectory\(.*mlx|MLX_BUILD' .

echo "== event.cpp candidates in repo =="
fd -t f -a 'event.cpp' .

Expected result: either the resolved forwarded file exists, or CMake excludes this TU unless that dependency layout is present.

🧹 Nitpick comments (7)
docs/models.md (1)

27-33: Clarify the mutually exclusive weight-file requirement in the directory checklist.

This list reads a bit ambiguously with the trailing “or”. Consider nesting the alternatives under a single “weights” bullet so readers do not interpret it as needing both formats.

Suggested doc tweak
-When loading a directory, it must contain:
+When loading a directory, it must contain:
 
 - `config.json` -- model configuration (architecture, dimensions, quantisation)
 - `tokenizer.json` -- HuggingFace BPE tokeniser
-- One or more `*.safetensors` files -- model weights (multi-shard supported), or
-- Exactly one `*.gguf` file -- model weights in GGUF format
+- Weights, in one of these formats:
+  - One or more `*.safetensors` files (multi-shard supported), or
+  - Exactly one `*.gguf` file
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/models.md` around lines 27 - 33, The checklist is ambiguous about model
weight files; update the documentation to group the weight-file alternatives
under a single "weights" bullet that states the options are mutually exclusive
(either one or more `*.safetensors` shards OR exactly one `*.gguf` file), and
keep `config.json` and `tokenizer.json` as separate required bullets; explicitly
mention `config.json`, `tokenizer.json`, and the mutually exclusive `weights`
options (`*.safetensors` shards or a single `*.gguf`) so readers don’t think
both formats are required.
internal/metal/mlx_mlx_io_no_gguf.cpp (1)

1-1: Consider guarding __has_include for defensive consistency (optional refactoring).

Although the project enforces C++17, which guarantees __has_include support, this pattern is used across 100+ forwarding translation units. Adding defined(__has_include) would be a defensive measure, but applying it only to this file creates inconsistency. Apply systematically across all similar files if adopted.

♻️ Proposed guard if adopted project-wide
-#if __has_include("../../lib/mlx/mlx/io/no_gguf.cpp")
+#if defined(__has_include) && __has_include("../../lib/mlx/mlx/io/no_gguf.cpp")
 `#include` "../../lib/mlx/mlx/io/no_gguf.cpp"
 `#else`
 `#error` "Missing forwarded source: ../../lib/mlx/mlx/io/no_gguf.cpp. Initialise submodules with git submodule update --init --recursive or fix the forwarding include path."
 `#endif`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_io_no_gguf.cpp` at line 1, The file uses __has_include
directly in the preprocessor check (see the token __has_include in
mlx_mlx_io_no_gguf.cpp); to be defensive, change the conditional to first test
defined(__has_include) before calling __has_include, and if you adopt this
change do the same across the other forwarding translation units that use
__has_include to keep behavior consistent; update the opening preprocessor guard
around no_gguf.cpp to use defined(__has_include) && __has_include(...) and apply
the same pattern to the other similar files.
internal/metal/mlx_mlx_backend_cpu_binary.cpp (1)

1-5: Guard __has_include for widened toolchain compatibility (optional).

The codebase enforces C++17, where __has_include is a standard preprocessor feature supported by all compliant compilers (GCC 5.0+, Clang 3.6+, MSVC 2015+). The current pattern across 100+ forwarding includes is technically sound and requires no change.

If you wish to improve robustness for non-standard or experimental toolchains, the suggested guard can be applied uniformly across the affected files in internal/metal/. However, this refactoring is optional given the project's C++17 requirement.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_backend_cpu_binary.cpp` around lines 1 - 5, The
guarded include uses the preprocessor feature __has_include directly; to improve
robustness for non-standard toolchains, first ensure __has_include is defined
(e.g., add a pre-check that defines a fallback macro when __has_include is not
defined) and then keep the existing conditional include and `#error` message
as-is; target the macro symbol __has_include and the existing `#error` string
("Missing forwarded source: ../../lib/mlx/mlx/backend/cpu/binary.cpp. Initialise
submodules with git submodule update --init --recursive or fix the forwarding
include path.") so the change uniformly wraps the current logic and can be
applied to other forwarding includes in internal/metal/.
README.md (1)

1-3: Use UK English spelling "Licence" in the badge.

The badge text uses American spelling "License" but the coding guidelines require UK English spelling. The badge URL path uses EUPL--1.2 which is fine, but the display text should match UK conventions.

-[![License: EUPL-1.2](https://img.shields.io/badge/License-EUPL--1.2-blue.svg)](LICENSE.md)
+[![Licence: EUPL-1.2](https://img.shields.io/badge/Licence-EUPL--1.2-blue.svg)](LICENSE.md)

As per coding guidelines: "Use UK English spelling (colour, organisation, centre, initialise, behaviour) in all code and documentation".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 1 - 3, Update the README badge display text to UK
English by changing the badge alt/link text "[![License: EUPL-1.2]" to
"[![Licence: EUPL-1.2]"; leave the URL/target (the image URL and the link to
LICENSE.md) unchanged. This targets the badge markdown snippet currently showing
"License" so it reads "Licence" to conform with the project's UK English
spelling guideline.
compute_test.go (1)

29-50: Consider aligning test names with suffix convention.

The coding guidelines specify _Bad as a suffix for expected error conditions. Currently TestPixelBufferDesc_Validate_BadStride has _Bad in the middle. For consistency, consider renaming to TestPixelBufferDesc_Validate_Stride_Bad.

Similarly for TestPixelBufferDesc_Validate_BadByteLengthOverflow at line 64.

Suggested naming
-func TestPixelBufferDesc_Validate_BadStride(t *testing.T) {
+func TestPixelBufferDesc_Validate_Stride_Bad(t *testing.T) {
-func TestPixelBufferDesc_Validate_BadByteLengthOverflow(t *testing.T) {
+func TestPixelBufferDesc_Validate_ByteLengthOverflow_Bad(t *testing.T) {

As per coding guidelines: "Use test suffix convention: _Good for happy path, _Bad for expected error conditions, _Ugly for panic/edge cases".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute_test.go` around lines 29 - 50, Rename the test functions to follow
the `_Bad` suffix convention: change TestPixelBufferDesc_Validate_BadStride to
TestPixelBufferDesc_Validate_Stride_Bad and change
TestPixelBufferDesc_Validate_BadByteLengthOverflow to
TestPixelBufferDesc_Validate_ByteLengthOverflow_Bad (update any references/calls
and the function names themselves so TestPixelBufferDesc_Validate_Stride_Bad and
TestPixelBufferDesc_Validate_ByteLengthOverflow_Bad replace the old
identifiers).
compute_darwin.go (1)

216-237: Consider upper bound validation for byte buffer size.

NewByteBuffer validates size > 0 but does not guard against values exceeding math.MaxInt32. On 64-bit systems, the int32(size) conversion at line 231 could silently overflow for extremely large allocations.

In practice this is unlikely (buffers > 2GB are rare), but adding an upper bound check would make the validation complete and consistent with PixelBufferDesc.Validate().

Optional defensive check
 func (session *computeSession) NewByteBuffer(size int) (ByteBuffer, error) {
 	if size <= 0 {
 		return nil, computeErr(ComputeErrorInvalidAllocation, "new_byte_buffer", "", "size", "byte buffer size must be positive")
 	}
+	if size > math.MaxInt32 {
+		return nil, computeErr(ComputeErrorInvalidAllocation, "new_byte_buffer", "", "size", "byte buffer size exceeds int32 limit")
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute_darwin.go` around lines 216 - 237, NewByteBuffer currently only
checks size > 0 but then casts size to int32 for metal.Zeros causing a potential
overflow; add an upper-bound validation that rejects sizes larger than
math.MaxInt32 (or the same limit used by PixelBufferDesc.Validate()) and return
computeErr(ComputeErrorInvalidAllocation, "new_byte_buffer", "", "size", "...")
when exceeded; update NewByteBuffer (around the int32(size) conversion and
before constructing byteBuffer/bufferBase) to perform this check so the
int32(size) cast is safe and session.buffers registration remains unchanged
(symbols: NewByteBuffer, byteBuffer, bufferBase, metal.Zeros, session.buffers).
internal/metal/io_custom_test.go (1)

96-103: Add explicit Array cleanup via t.Cleanup in tests.

These tests allocate *Array values but do not explicitly release them. Although finalizers will eventually free GPU memory, the coding guidelines prefer explicit Free() calls for deterministic release. Add t.Cleanup with Free() to ensure immediate reclamation, particularly important for larger test suites where GC timing may not be reliable.

Affected tests:

  • TestIOCustom_RoundTrip_Good: clean up a, b (allocated at lines 96–97) and loaded map
  • TestIOCustom_WithMetadata_Good: clean up a (line 155) and loaded map
  • TestIOCustom_IteratorBreak_Ugly: clean up a, b, c (lines 216–218)
Suggested pattern
 func TestIOCustom_RoundTrip_Good(t *testing.T) {
 	a := FromValues([]float32{1, 2, 3, 4}, 2, 2)
 	b := FromValues([]float32{10, 20, 30}, 3)
+	t.Cleanup(func() {
+		if a != nil {
+			a.Free()
+		}
+		if b != nil {
+			b.Free()
+		}
+	})
 	Materialize(a, b)
@@
 	loaded, err := LoadAllSafetensorsFromReader(reader, int64(len(written)), "test-memory")
 	if err != nil {
 		t.Fatalf("LoadAllSafetensorsFromReader: %v", err)
 	}
+	t.Cleanup(func() {
+		for _, arr := range loaded {
+			if arr != nil {
+				arr.Free()
+			}
+		}
+	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/io_custom_test.go` around lines 96 - 103, The tests allocate
GPU-backed *Array values without deterministic cleanup; for each affected test
(TestIOCustom_RoundTrip_Good, TestIOCustom_WithMetadata_Good,
TestIOCustom_IteratorBreak_Ugly) add t.Cleanup calls that invoke Free() on every
allocated Array (e.g., a.Free(), b.Free(), c.Free()) and also free any arrays in
the loaded map to ensure deterministic release; locate the allocations
(variables named a, b, c and the loaded map) and register t.Cleanup closures
that iterate and call Free() for each Array to guarantee deterministic GPU
memory reclamation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/development.md`:
- Around line 33-35: The fenced code block containing the line "replace
dappco.re/go/inference => ../go-inference" lacks a language tag; add "go" after
the opening backticks so the block reads "```go" to satisfy MD040 and restore Go
syntax highlighting for that snippet.
- Around line 72-75: The docs state MLX_BUILD_GGUF=ON earlier but the later
CMake snippet still sets MLX_BUILD_GGUF=OFF; update the later snippet and any
accompanying text to consistently use MLX_BUILD_GGUF=ON (and keep
MLX_BUILD_SAFETENSORS=ON / BUILD_SHARED_LIBS=ON as documented) so local build
instructions are not misleading; locate the CMake example that contains
MLX_BUILD_GGUF and change it to ON and adjust any explanatory lines to match.

In `@gguf_info_test.go`:
- Around line 227-236: The test function TestReadGGUFInfo_Bad_InvalidMagic
should be renamed so the `_Bad` suffix is at the end (e.g.
TestReadGGUFInfo_InvalidMagic_Bad); update the function name in this file (and
any references) to the new identifier, keep the test body unchanged, and run `go
test` to ensure the renamed test still runs and passes the naming convention
check.

In `@internal/metal/io_custom_test.go`:
- Around line 56-70: The Seek implementation in bytesRWS uses
io.ErrUnexpectedEOF for invalid whence and negative newPos which is incorrect;
update bytesRWS.Seek to return descriptive errors instead (for example
errors.New("bytesRWS.Seek: invalid whence") when the whence switch hits default,
and errors.New("bytesRWS.Seek: negative position") when newPos < 0) so callers
can distinguish malformed seek parameters from EOF/read errors; replace the two
io.ErrUnexpectedEOF returns accordingly while keeping all other logic in
bytesRWS.Seek unchanged.

In `@internal/metal/mlx_mlx_backend_metal_eval.cpp`:
- Line 4: CI is failing because C++ wrapper files like
internal/metal/mlx_mlx_backend_metal_eval.cpp contain a `#error` that triggers
when lib/mlx and other submodules are not present; update the test workflow to
ensure submodules are checked out/initialised by adding submodules: recursive to
the actions/checkout@v4 step in the test.yml workflow and/or run git submodule
update --init --recursive before the build so CGO_ENABLED=1 C++ compilation
succeeds.

---

Nitpick comments:
In `@compute_darwin.go`:
- Around line 216-237: NewByteBuffer currently only checks size > 0 but then
casts size to int32 for metal.Zeros causing a potential overflow; add an
upper-bound validation that rejects sizes larger than math.MaxInt32 (or the same
limit used by PixelBufferDesc.Validate()) and return
computeErr(ComputeErrorInvalidAllocation, "new_byte_buffer", "", "size", "...")
when exceeded; update NewByteBuffer (around the int32(size) conversion and
before constructing byteBuffer/bufferBase) to perform this check so the
int32(size) cast is safe and session.buffers registration remains unchanged
(symbols: NewByteBuffer, byteBuffer, bufferBase, metal.Zeros, session.buffers).

In `@compute_test.go`:
- Around line 29-50: Rename the test functions to follow the `_Bad` suffix
convention: change TestPixelBufferDesc_Validate_BadStride to
TestPixelBufferDesc_Validate_Stride_Bad and change
TestPixelBufferDesc_Validate_BadByteLengthOverflow to
TestPixelBufferDesc_Validate_ByteLengthOverflow_Bad (update any references/calls
and the function names themselves so TestPixelBufferDesc_Validate_Stride_Bad and
TestPixelBufferDesc_Validate_ByteLengthOverflow_Bad replace the old
identifiers).

In `@docs/models.md`:
- Around line 27-33: The checklist is ambiguous about model weight files; update
the documentation to group the weight-file alternatives under a single "weights"
bullet that states the options are mutually exclusive (either one or more
`*.safetensors` shards OR exactly one `*.gguf` file), and keep `config.json` and
`tokenizer.json` as separate required bullets; explicitly mention `config.json`,
`tokenizer.json`, and the mutually exclusive `weights` options (`*.safetensors`
shards or a single `*.gguf`) so readers don’t think both formats are required.

In `@internal/metal/io_custom_test.go`:
- Around line 96-103: The tests allocate GPU-backed *Array values without
deterministic cleanup; for each affected test (TestIOCustom_RoundTrip_Good,
TestIOCustom_WithMetadata_Good, TestIOCustom_IteratorBreak_Ugly) add t.Cleanup
calls that invoke Free() on every allocated Array (e.g., a.Free(), b.Free(),
c.Free()) and also free any arrays in the loaded map to ensure deterministic
release; locate the allocations (variables named a, b, c and the loaded map) and
register t.Cleanup closures that iterate and call Free() for each Array to
guarantee deterministic GPU memory reclamation.

In `@internal/metal/mlx_mlx_backend_cpu_binary.cpp`:
- Around line 1-5: The guarded include uses the preprocessor feature
__has_include directly; to improve robustness for non-standard toolchains, first
ensure __has_include is defined (e.g., add a pre-check that defines a fallback
macro when __has_include is not defined) and then keep the existing conditional
include and `#error` message as-is; target the macro symbol __has_include and the
existing `#error` string ("Missing forwarded source:
../../lib/mlx/mlx/backend/cpu/binary.cpp. Initialise submodules with git
submodule update --init --recursive or fix the forwarding include path.") so the
change uniformly wraps the current logic and can be applied to other forwarding
includes in internal/metal/.

In `@internal/metal/mlx_mlx_io_no_gguf.cpp`:
- Line 1: The file uses __has_include directly in the preprocessor check (see
the token __has_include in mlx_mlx_io_no_gguf.cpp); to be defensive, change the
conditional to first test defined(__has_include) before calling __has_include,
and if you adopt this change do the same across the other forwarding translation
units that use __has_include to keep behavior consistent; update the opening
preprocessor guard around no_gguf.cpp to use defined(__has_include) &&
__has_include(...) and apply the same pattern to the other similar files.

In `@README.md`:
- Around line 1-3: Update the README badge display text to UK English by
changing the badge alt/link text "[![License: EUPL-1.2]" to "[![Licence:
EUPL-1.2]"; leave the URL/target (the image URL and the link to LICENSE.md)
unchanged. This targets the badge markdown snippet currently showing "License"
so it reads "Licence" to conform with the project's UK English spelling
guideline.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bbd59bee-7b0e-40cf-9f10-7fb435099ddf

📥 Commits

Reviewing files that changed from the base of the PR and between 061da30 and 0024239.

📒 Files selected for processing (164)
  • README.md
  • adapter.go
  • compute.go
  • compute_darwin.go
  • compute_darwin_test.go
  • compute_test.go
  • docs/development.md
  • docs/models.md
  • gguf_info.go
  • gguf_info_test.go
  • internal/metal/cache.go
  • internal/metal/device.go
  • internal/metal/export.go
  • internal/metal/export_test.go
  • internal/metal/gemma4_vision.go
  • internal/metal/io_custom.go
  • internal/metal/io_custom_test.go
  • internal/metal/lora.go
  • internal/metal/lora_merge.go
  • internal/metal/lora_test.go
  • internal/metal/mlx_gen_cpu_compiled_preamble.cpp
  • internal/metal/mlx_gen_metal_jit_binary_ops.cpp
  • internal/metal/mlx_gen_metal_jit_gather.cpp
  • internal/metal/mlx_gen_metal_jit_gather_axis.cpp
  • internal/metal/mlx_gen_metal_jit_gather_front.cpp
  • internal/metal/mlx_gen_metal_jit_hadamard.cpp
  • internal/metal/mlx_gen_metal_jit_masked_scatter.cpp
  • internal/metal/mlx_gen_metal_jit_reduce_utils.cpp
  • internal/metal/mlx_gen_metal_jit_scatter.cpp
  • internal/metal/mlx_gen_metal_jit_scatter_axis.cpp
  • internal/metal/mlx_gen_metal_jit_ternary_ops.cpp
  • internal/metal/mlx_gen_metal_jit_unary_ops.cpp
  • internal/metal/mlx_gen_metal_jit_utils.cpp
  • internal/metal/mlx_mlx_array.cpp
  • internal/metal/mlx_mlx_backend_common_broadcasting.cpp
  • internal/metal/mlx_mlx_backend_common_common.cpp
  • internal/metal/mlx_mlx_backend_common_compiled.cpp
  • internal/metal/mlx_mlx_backend_common_load.cpp
  • internal/metal/mlx_mlx_backend_common_reduce.cpp
  • internal/metal/mlx_mlx_backend_common_slicing.cpp
  • internal/metal/mlx_mlx_backend_common_utils.cpp
  • internal/metal/mlx_mlx_backend_cpu_arg_reduce.cpp
  • internal/metal/mlx_mlx_backend_cpu_available.cpp
  • internal/metal/mlx_mlx_backend_cpu_binary.cpp
  • internal/metal/mlx_mlx_backend_cpu_cholesky.cpp
  • internal/metal/mlx_mlx_backend_cpu_compiled.cpp
  • internal/metal/mlx_mlx_backend_cpu_conv.cpp
  • internal/metal/mlx_mlx_backend_cpu_copy.cpp
  • internal/metal/mlx_mlx_backend_cpu_distributed.cpp
  • internal/metal/mlx_mlx_backend_cpu_eig.cpp
  • internal/metal/mlx_mlx_backend_cpu_eigh.cpp
  • internal/metal/mlx_mlx_backend_cpu_encoder.cpp
  • internal/metal/mlx_mlx_backend_cpu_eval.cpp
  • internal/metal/mlx_mlx_backend_cpu_fft.cpp
  • internal/metal/mlx_mlx_backend_cpu_gemms_bnns.cpp
  • internal/metal/mlx_mlx_backend_cpu_gemms_cblas.cpp
  • internal/metal/mlx_mlx_backend_cpu_hadamard.cpp
  • internal/metal/mlx_mlx_backend_cpu_indexing.cpp
  • internal/metal/mlx_mlx_backend_cpu_inverse.cpp
  • internal/metal/mlx_mlx_backend_cpu_jit_compiler.cpp
  • internal/metal/mlx_mlx_backend_cpu_logsumexp.cpp
  • internal/metal/mlx_mlx_backend_cpu_luf.cpp
  • internal/metal/mlx_mlx_backend_cpu_masked_mm.cpp
  • internal/metal/mlx_mlx_backend_cpu_matmul.cpp
  • internal/metal/mlx_mlx_backend_cpu_primitives.cpp
  • internal/metal/mlx_mlx_backend_cpu_qrf.cpp
  • internal/metal/mlx_mlx_backend_cpu_quantized.cpp
  • internal/metal/mlx_mlx_backend_cpu_reduce.cpp
  • internal/metal/mlx_mlx_backend_cpu_scan.cpp
  • internal/metal/mlx_mlx_backend_cpu_select.cpp
  • internal/metal/mlx_mlx_backend_cpu_softmax.cpp
  • internal/metal/mlx_mlx_backend_cpu_sort.cpp
  • internal/metal/mlx_mlx_backend_cpu_svd.cpp
  • internal/metal/mlx_mlx_backend_cpu_threefry.cpp
  • internal/metal/mlx_mlx_backend_cpu_unary.cpp
  • internal/metal/mlx_mlx_backend_gpu_copy.cpp
  • internal/metal/mlx_mlx_backend_gpu_primitives.cpp
  • internal/metal/mlx_mlx_backend_gpu_slicing.cpp
  • internal/metal/mlx_mlx_backend_metal_allocator.cpp
  • internal/metal/mlx_mlx_backend_metal_binary.cpp
  • internal/metal/mlx_mlx_backend_metal_compiled.cpp
  • internal/metal/mlx_mlx_backend_metal_conv.cpp
  • internal/metal/mlx_mlx_backend_metal_copy.cpp
  • internal/metal/mlx_mlx_backend_metal_custom_kernel.cpp
  • internal/metal/mlx_mlx_backend_metal_device.cpp
  • internal/metal/mlx_mlx_backend_metal_distributed.cpp
  • internal/metal/mlx_mlx_backend_metal_eval.cpp
  • internal/metal/mlx_mlx_backend_metal_event.cpp
  • internal/metal/mlx_mlx_backend_metal_fence.cpp
  • internal/metal/mlx_mlx_backend_metal_fft.cpp
  • internal/metal/mlx_mlx_backend_metal_hadamard.cpp
  • internal/metal/mlx_mlx_backend_metal_indexing.cpp
  • internal/metal/mlx_mlx_backend_metal_logsumexp.cpp
  • internal/metal/mlx_mlx_backend_metal_matmul.cpp
  • internal/metal/mlx_mlx_backend_metal_metal.cpp
  • internal/metal/mlx_mlx_backend_metal_nojit_kernels.cpp
  • internal/metal/mlx_mlx_backend_metal_normalization.cpp
  • internal/metal/mlx_mlx_backend_metal_primitives.cpp
  • internal/metal/mlx_mlx_backend_metal_quantized.cpp
  • internal/metal/mlx_mlx_backend_metal_reduce.cpp
  • internal/metal/mlx_mlx_backend_metal_resident.cpp
  • internal/metal/mlx_mlx_backend_metal_rope.cpp
  • internal/metal/mlx_mlx_backend_metal_scaled_dot_product_attention.cpp
  • internal/metal/mlx_mlx_backend_metal_scan.cpp
  • internal/metal/mlx_mlx_backend_metal_slicing.cpp
  • internal/metal/mlx_mlx_backend_metal_softmax.cpp
  • internal/metal/mlx_mlx_backend_metal_sort.cpp
  • internal/metal/mlx_mlx_backend_metal_ternary.cpp
  • internal/metal/mlx_mlx_backend_metal_unary.cpp
  • internal/metal/mlx_mlx_backend_metal_utils.cpp
  • internal/metal/mlx_mlx_compile.cpp
  • internal/metal/mlx_mlx_device.cpp
  • internal/metal/mlx_mlx_distributed_distributed.cpp
  • internal/metal/mlx_mlx_distributed_jaccl_no_jaccl.cpp
  • internal/metal/mlx_mlx_distributed_mpi_no_mpi.cpp
  • internal/metal/mlx_mlx_distributed_nccl_no_nccl.cpp
  • internal/metal/mlx_mlx_distributed_ops.cpp
  • internal/metal/mlx_mlx_distributed_primitives.cpp
  • internal/metal/mlx_mlx_distributed_ring_no_ring.cpp
  • internal/metal/mlx_mlx_distributed_utils.cpp
  • internal/metal/mlx_mlx_dtype.cpp
  • internal/metal/mlx_mlx_dtype_utils.cpp
  • internal/metal/mlx_mlx_einsum.cpp
  • internal/metal/mlx_mlx_export.cpp
  • internal/metal/mlx_mlx_fast.cpp
  • internal/metal/mlx_mlx_fft.cpp
  • internal/metal/mlx_mlx_graph_utils.cpp
  • internal/metal/mlx_mlx_io_gguf.cpp
  • internal/metal/mlx_mlx_io_gguf_quants.cpp
  • internal/metal/mlx_mlx_io_load.cpp
  • internal/metal/mlx_mlx_io_no_gguf.cpp
  • internal/metal/mlx_mlx_io_safetensors.cpp
  • internal/metal/mlx_mlx_linalg.cpp
  • internal/metal/mlx_mlx_ops.cpp
  • internal/metal/mlx_mlx_primitives.cpp
  • internal/metal/mlx_mlx_random.cpp
  • internal/metal/mlx_mlx_scheduler.cpp
  • internal/metal/mlx_mlx_transforms.cpp
  • internal/metal/mlx_mlx_utils.cpp
  • internal/metal/mlx_mlx_version.cpp
  • internal/metal/mlxc_array.cpp
  • internal/metal/mlxc_closure.cpp
  • internal/metal/mlxc_compile.cpp
  • internal/metal/mlxc_device.cpp
  • internal/metal/mlxc_distributed.cpp
  • internal/metal/mlxc_distributed_group.cpp
  • internal/metal/mlxc_error.cpp
  • internal/metal/mlxc_export.cpp
  • internal/metal/mlxc_fast.cpp
  • internal/metal/mlxc_fft.cpp
  • internal/metal/mlxc_io.cpp
  • internal/metal/mlxc_io_types.cpp
  • internal/metal/mlxc_linalg.cpp
  • internal/metal/mlxc_map.cpp
  • internal/metal/mlxc_memory.cpp
  • internal/metal/mlxc_metal.cpp
  • internal/metal/mlxc_ops.cpp
  • internal/metal/mlxc_random.cpp
  • internal/metal/mlxc_stream.cpp
  • internal/metal/mlxc_string.cpp
  • internal/metal/mlxc_transforms.cpp
  • internal/metal/mlxc_transforms_impl.cpp
  • internal/metal/mlxc_vector.cpp
  • internal/metal/mlxc_version.cpp
✅ Files skipped from review due to trivial changes (11)
  • internal/metal/mlx_gen_metal_jit_masked_scatter.cpp
  • internal/metal/mlx_gen_metal_jit_scatter.cpp
  • internal/metal/mlx_gen_metal_jit_unary_ops.cpp
  • internal/metal/mlx_gen_metal_jit_ternary_ops.cpp
  • internal/metal/mlx_gen_metal_jit_gather_axis.cpp
  • internal/metal/mlx_gen_metal_jit_utils.cpp
  • internal/metal/mlx_gen_metal_jit_gather.cpp
  • internal/metal/mlx_gen_metal_jit_hadamard.cpp
  • internal/metal/device.go
  • adapter.go
  • internal/metal/io_custom.go
🚧 Files skipped from review as they are similar to previous changes (9)
  • internal/metal/mlx_gen_metal_jit_scatter_axis.cpp
  • internal/metal/mlx_gen_metal_jit_binary_ops.cpp
  • internal/metal/mlx_gen_metal_jit_gather_front.cpp
  • internal/metal/mlx_gen_metal_jit_reduce_utils.cpp
  • internal/metal/mlx_gen_cpu_compiled_preamble.cpp
  • internal/metal/cache.go
  • compute_darwin_test.go
  • internal/metal/gemma4_vision.go
  • gguf_info.go

Comment thread docs/development.md Outdated
Comment thread docs/development.md
Comment thread gguf_info_test.go Outdated
Comment thread internal/metal/io_custom_test.go
Comment thread internal/metal/mlx_mlx_backend_metal_eval.cpp
…st cleanup on PR #3

Round 3 follow-up to 0024239.

Code:
- compute_darwin.go: byte buffer rejects size > math.MaxInt32 (overflow)
- internal/metal/io_custom_test.go: deterministic t.Cleanup releases;
  seek errors use descriptive types (was io.ErrUnexpectedEOF generic)
- internal/metal/*.cpp (144 forwarding TUs): __has_include guards
  applied with consistent defined(__has_include) && ... shape
- .forgejo/workflows/test.yml: checkout submodules recursively
  (addresses missing-required-sources class)

Tests:
- gguf_info_test.go: _Bad suffix rename per AX-10
- compute_test.go: _Bad suffix rename on both tests

Doc:
- docs/development.md: fence language 'go'; MLX_BUILD_GGUF ON
- docs/models.md: grouped mutually exclusive safetensors/GGUF options
- README.md: 'Licence' UK English on badge

Verification: gofmt clean, GOWORK=off go vet + go test -count=1 ./...
pass with explicit cache paths.

Note: lib/mlx pre-existing dirty submodule untouched.

Closes residual r3 findings on #3

Co-authored-by: Codex <noreply@openai.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/models.md (1)

12-23: ⚠️ Potential issue | 🟠 Major

Fix non-compiling example: log.Fatal used without importing log.

The “Loading a Model” example calls log.Fatal(err) but the shown import block doesn’t include log, so the snippet won’t compile if users paste it into a new file.

🛠️ Proposed fix (add `log` import)
 import (
+    "log"
     "dappco.re/go/inference"
     _ "dappco.re/go/mlx"
 )

Alternatively, replace log.Fatal(err) with panic(err) and remove the need for log.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/models.md` around lines 12 - 23, The example fails to compile because it
calls log.Fatal(err) but does not import the log package; update the import
block to include "log" (so the snippet imports "log" along with
"dappco.re/go/inference" and the blank import "_ \"dappco.re/go/mlx\"") or
alternatively replace the call to log.Fatal(err) with panic(err) and remove the
log import; ensure the rest of the example uses inference.LoadModel and defers
m.Close() as shown.
♻️ Duplicate comments (1)
internal/metal/mlx_mlx_distributed_ops.cpp (1)

1-5: ⚠️ Potential issue | 🔴 Critical

Fix the forwarded #include target path (currently likely wrong).

The forwarding TU points at ../../lib/mlx/mlx/distributed/ops.cpp. If the upstream source is actually located at ../../lib/mlx/mlx/ops.cpp, this will fail the build and trigger the #error. This appears to be the same include-path problem already flagged in an earlier review cycle.

🛠️ Proposed fix
-#if defined(__has_include) && __has_include("../../lib/mlx/mlx/distributed/ops.cpp")
-#include "../../lib/mlx/mlx/distributed/ops.cpp"
+#if defined(__has_include) && __has_include("../../lib/mlx/mlx/ops.cpp")
+#include "../../lib/mlx/mlx/ops.cpp"
 `#else`
-#error "Missing forwarded source: ../../lib/mlx/mlx/distributed/ops.cpp. Initialise submodules with git submodule update --init --recursive or fix the forwarding include path."
+#error "Missing forwarded source: ../../lib/mlx/mlx/ops.cpp. Initialise submodules with git submodule update --init --recursive or fix the forwarding include path."
 `#endif`

To confirm which file exists in this repo’s current submodule state, run:

#!/bin/bash
set -euo pipefail

echo "== Submodule status (recursive) =="
git submodule status --recursive || true

echo "== Checking forwarded source candidates =="
if [ -f lib/mlx/mlx/ops.cpp ]; then
  echo "FOUND: lib/mlx/mlx/ops.cpp"
else
  echo "MISSING: lib/mlx/mlx/ops.cpp"
fi

if [ -f lib/mlx/mlx/distributed/ops.cpp ]; then
  echo "FOUND: lib/mlx/mlx/distributed/ops.cpp"
else
  echo "MISSING: lib/mlx/mlx/distributed/ops.cpp"
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_distributed_ops.cpp` around lines 1 - 5, The forwarded
include in the TU uses the wrong target path
("../../lib/mlx/mlx/distributed/ops.cpp") which triggers the `#error`; update the
include to the correct upstream file path (e.g., "../../lib/mlx/mlx/ops.cpp")
inside the existing `#if` defined(__has_include) && __has_include(...) guard and
keep the `#error` as a fallback, then verify by running the provided
submodule/file-existence checks to confirm the actual file present before
committing the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@compute_darwin.go`:
- Around line 282-289: ensureFrameLocked() opens a frame before validation, so
if session.runLocked(kernel, args) returns an error you must close/rollback that
implicit frame before returning; modify the error path after err != nil to call
the session frame-close API (e.g. session.clearFrameLocked() or
session.rollbackFrameLocked() — whichever actual method exists in the codebase
to undo the frame opened by ensureFrameLocked()) so the frame is not left active
and subsequent BeginFrame()/metrics remain correct, then return the error.
- Around line 95-100: replaceLocked currently calls metal.Free on the old
*metal.Array immediately, which can free GPU memory while pending work is still
on the default stream; instead push the old array onto a per-session retired
slice (e.g. computeSession.retired) in bufferBase.replaceLocked and remove the
direct metal.Free call, then after metal.Synchronize(metal.DefaultStream()) in
computeSession.syncLocked drain retired by calling metal.Free on each entry and
clear the slice; also ensure the same drain logic runs after the final sync in
Close so retired arrays are freed only after GPU work completes.

In `@docs/development.md`:
- Around line 57-59: Edit the sentence that groups submodule setup with
generated sources so it clearly separates submodule dependencies from generated
code: state that `internal/metal/` includes source files from the git submodules
`lib/mlx` and `lib/mlx-c` (which must be initialized), and then in a separate
sentence note that `lib/generated` contains generated sources (not a submodule)
and must be present for C++ includes to succeed; update the wording around the
existing mention of `internal/metal/`, `lib/mlx`, `lib/mlx-c`, and
`lib/generated` to make the distinction explicit.

In `@internal/metal/io_custom_test.go`:
- Around line 14-15: The comment for the bytesRWS type is inaccurate: it claims
to wrap bytes.Buffer but the implementation uses and manages a []byte directly;
update the bytesRWS type comment to state that it implements io.ReadWriteSeeker
over an internal []byte buffer (managing position and length), describe that it
provides Read, Write and Seek semantics on that byte slice, and remove the
reference to bytes.Buffer so the comment matches the actual implementation of
bytesRWS.

In `@internal/metal/mlx_mlx_backend_cpu_qrf.cpp`:
- Around line 1-2: The forwarding TU internal/metal/mlx_mlx_backend_cpu_qrf.cpp
fails because the forwarded implementation file
../../lib/mlx/mlx/backend/cpu/qrf.cpp is missing and the `#error` gets triggered;
fix by ensuring the mlx submodule is initialized and the file is present (run
git submodule update --init --recursive) or update the include path to point to
the correct existing implementation, and remove or adjust the __has_include
guard accordingly so that the forwarded include in
internal/metal/mlx_mlx_backend_cpu_qrf.cpp (the __has_include check and the
include of qrf.cpp) resolves to an actual file at build time.

In `@internal/metal/mlx_mlx_backend_cpu_scan.cpp`:
- Around line 1-5: The current shim hard-errors when __has_include is not
defined; change the conditional so that when __has_include is defined it uses
the guarded include, but when it's undefined it simply attempts the direct
include of "../../lib/mlx/mlx/backend/cpu/scan.cpp" instead of emitting `#error`.
Update the conditional around __has_include (the preprocessor branch in this
file that currently uses `#if` defined(__has_include) && __has_include(...),
`#else`, `#endif`) to include the file directly in the `#else` branch so compilers
without __has_include will rely on the normal include failure behavior.

In `@internal/metal/mlx_mlx_backend_metal_indexing.cpp`:
- Around line 1-5: The current preprocessor fallback conflates two different
errors; update the conditional in
internal/metal/mlx_mlx_backend_metal_indexing.cpp to distinguish a missing
forwarded file from a toolchain lacking __has_include support: keep the existing
`#if` defined(__has_include) &&
__has_include("../../lib/mlx/mlx/backend/metal/indexing.cpp") path that includes
the forwarded source, add an `#elif` defined(__has_include) branch that triggers a
clear `#error` stating the forwarded source was not found (submodule/path issue),
and add a separate `#else` branch that emits a distinct `#error` saying the
compiler/toolchain does not support __has_include so presence of the forwarded
file cannot be detected; reference the preprocessor symbol __has_include and the
existing `#error` to locate and modify the include logic.

In `@internal/metal/mlx_mlx_backend_metal_softmax.cpp`:
- Around line 1-2: The forwarded source include guard is pulling
"../../lib/mlx/mlx/backend/metal/softmax.cpp" via __has_include and then
including it, but that file is missing causing the preprocessor to hit the
`#error` and abort compilation; fix by ensuring the lib/mlx submodule is
initialized (run git submodule update --init --recursive) or correct the
relative include path used in internal/metal/mlx_mlx_backend_metal_softmax.cpp
(the __has_include(...) and the subsequent `#include` should point to an existing
file), or add a proper fallback implementation or build-time guard so the
include is only used when the referenced softmax.cpp actually exists.

In `@internal/metal/mlx_mlx_distributed_distributed.cpp`:
- Around line 1-5: This translation unit unconditionally forwards
"../../lib/mlx/mlx/distributed/distributed.cpp" and can hard-fail if the
submodule is missing; wrap the include in a feature guard (e.g. change the
top-level check to `#if` defined(ENABLE_MLX_DISTRIBUTED) && defined(__has_include)
&& __has_include(...)) or similar so the file only compiles when
ENABLE_MLX_DISTRIBUTED is defined, and then add corresponding build-system logic
to define ENABLE_MLX_DISTRIBUTED from CMake or cgo (update CMakeLists or
metal.go cgo directives to set -DENABLE_MLX_DISTRIBUTED only when the
distributed submodule is present or the user requests distributed support) so
the `#error` path cannot trigger for builds that do not require distributed
functionality.

In `@internal/metal/mlx_mlx_einsum.cpp`:
- Around line 1-5: The file internal/metal/mlx_mlx_einsum.cpp forwards
../../lib/mlx/mlx/einsum.cpp and currently errors out if that submodule file is
missing; update the build/CI scripts to explicitly initialise submodules before
compilation by running git submodule update --init --recursive (or equivalent in
your CI runner) as a pre-build step, and add a brief note to the project build
docs (or CI job description) indicating this requirement so the include in
internal/metal/mlx_mlx_einsum.cpp succeeds reliably.

In `@internal/metal/mlx_mlx_graph_utils.cpp`:
- Around line 1-5: The current include guard uses defined(__has_include) and
falls through to an `#error` that blames missing forwarded sources even when the
real problem is lack of compiler support for __has_include; update the
preprocessor logic in mlx_mlx_graph_utils.cpp so the failure message
distinguishes the two cases — if defined(__has_include) is false emit an `#error`
explaining the compiler lacks __has_include support (and suggest a modern
compiler or a manual include), otherwise keep the existing include/#error that
references "../../lib/mlx/mlx/graph_utils.cpp" and submodule initialization;
reference the symbols __has_include and the forwarded path
"../../lib/mlx/mlx/graph_utils.cpp" when making the conditional and messages.

In `@internal/metal/mlx_mlx_scheduler.cpp`:
- Around line 1-5: The CONTRIBUTING.md must be updated to document the Git
submodule requirement to match README.md and CI behavior: add a brief step
instructing developers to run git submodule update --init --recursive (or
equivalent) during local setup, reference that the codebase uses forwarded
sources guarded by the __has_include check and the failing `#error` directive when
submodules are missing, and optionally note that CI uses submodules: recursive
so local setup should mirror that to avoid the forwarded-source error.

In `@README.md`:
- Line 2: The README references mismatched licence filenames (badge links to
LICENSE.md, section links to LICENCE) and the repository lacks the UK-spelled
LICENCE file; create a new LICENCE file in the repo root containing the EUPL-1.2
licence text, update the README badge link (in README.md) to point to LICENCE
instead of LICENSE.md, and ensure any other README links or references that use
"LICENSE.md" are changed to the UK spelling "LICENCE" so both badge and section
are consistent with project UK English guidelines.

---

Outside diff comments:
In `@docs/models.md`:
- Around line 12-23: The example fails to compile because it calls
log.Fatal(err) but does not import the log package; update the import block to
include "log" (so the snippet imports "log" along with "dappco.re/go/inference"
and the blank import "_ \"dappco.re/go/mlx\"") or alternatively replace the call
to log.Fatal(err) with panic(err) and remove the log import; ensure the rest of
the example uses inference.LoadModel and defers m.Close() as shown.

---

Duplicate comments:
In `@internal/metal/mlx_mlx_distributed_ops.cpp`:
- Around line 1-5: The forwarded include in the TU uses the wrong target path
("../../lib/mlx/mlx/distributed/ops.cpp") which triggers the `#error`; update the
include to the correct upstream file path (e.g., "../../lib/mlx/mlx/ops.cpp")
inside the existing `#if` defined(__has_include) && __has_include(...) guard and
keep the `#error` as a fallback, then verify by running the provided
submodule/file-existence checks to confirm the actual file present before
committing the change.

---

Nitpick comments:
In `@gguf_info_test.go`:
- Around line 288-299: In writeGGUFValue, avoid panic-causing type assertions by
validating the dynamic type before use: replace value.(string) and
value.(uint32) with type-checked assertions (e.g., using a type switch or
ok-style asserts) and call t.Fatalf with a clear message if the value is not the
expected type; keep the existing writeGGUFString call for strings and the
binary.Write for uint32 (ensuring you still check and fail on its error), and
reference writeGGUFValue and writeGGUFString when making the change.

In `@internal/metal/io_custom_test.go`:
- Around line 146-148: The test assumes the tensor is rank-2 by indexing
wShape[0] and wShape[1]; guard the shape length first by checking len(wShape) >=
2 and produce a clear test failure if not (e.g., t.Fatalf or t.Errorf with the
actual shape), then proceed to compare wShape[0] and wShape[1] to the expected
values; refer to wShape and the call w.Shape() in
internal/metal/io_custom_test.go to locate where to add the length check.

In `@internal/metal/mlx_mlx_backend_common_broadcasting.cpp`:
- Line 4: Update the existing `#error` in
internal/metal/mlx_mlx_backend_common_broadcasting.cpp (the preprocessor
directive currently instructing to "Initialise submodules with git submodule
update --init --recursive or fix the forwarding include path") to keep the
guidance but make it build-system specific: replace the generic command with the
repository's canonical setup command or entrypoint (for example the project's
documented bootstrap/make/script invocation) and include a short hint about the
missing forwarded source ../../lib/mlx/mlx/backend/common/broadcasting.cpp so
developers know what to fix; leave the `#error` in place to fail fast but change
the message text to reference the repo's canonical build/init command and the
missing file path.
- Around line 1-5: The forwarded include uses a fragile relative path with
__has_include; change to probe and include via a build-injected root macro
(e.g., FORWARD_INCLUDE_ROOT or PROJECT_SRC_ROOT) so the probe and include become
independent of working directory: replace the literal "../../lib/mlx/..." usage
in the `#if/__has_include` and `#include` with a path built from that macro (and add
a clear `#error` fallback if the macro is not defined), update build scripts
(CMake target_compile_definitions or include directories) to define the macro to
the repository root, and add a short CI check to verify supported
compilers/flags on the matrix actually evaluate __has_include for .cpp files so
the fallback path/error is exercised if needed.

In `@internal/metal/mlx_mlx_backend_common_reduce.cpp`:
- Around line 1-5: Change the preprocessor guard to test for support of the
__has_include operator before referencing it: first use `#if`
defined(__has_include) and inside that branch check `#if` __has_include("...") to
include "../../lib/mlx/mlx/backend/common/reduce.cpp", otherwise emit the same
`#error`; keep the original error text and include the existing include path
string so the check only evaluates __has_include when it is defined (affecting
the current forwarding include block in this file).

In `@internal/metal/mlx_mlx_backend_cpu_available.cpp`:
- Line 2: This TU forwards the implementation by including
lib/mlx/mlx/backend/cpu/available.cpp into
internal/metal/mlx_mlx_backend_cpu_available.cpp which can cause
ODR/duplicate-link errors if the original cpp is also built; verify and fix your
build so only the forwarding TU is compiled (search your CMakeLists/target
sources for "lib/mlx/mlx/backend/cpu/available.cpp" and remove it from other
targets), or alternatively stop including the .cpp and instead expose the needed
declarations via a header or make the symbols in available.cpp static/inline;
locate the include in internal/metal/mlx_mlx_backend_cpu_available.cpp and the
implementation in lib/mlx/mlx/backend/cpu/available.cpp to apply the change.

In `@internal/metal/mlx_mlx_backend_cpu_cholesky.cpp`:
- Line 4: The conditional `#error` in
internal/metal/mlx_mlx_backend_cpu_cholesky.cpp is triggering static-analysis
noise; fix by either (A) ensuring the CI/pre-build step always initialises
submodules (run git submodule update --init --recursive before compiling so the
forwarded source exists and the `#error` is never reachable), or (B) add a CI/tool
suppression for preprocessor diagnostics that flag the `#error` when __has_include
would succeed at compile time; do not change runtime/build behaviour of the shim
or remove the `#error` in the source.

In `@internal/metal/mlx_mlx_backend_cpu_compiled.cpp`:
- Line 2: Add a short explanatory comment at the top of the translation unit
mlx_mlx_backend_cpu_compiled.cpp (which simply includes
../../lib/mlx/.../compiled.cpp) stating that this forwarding TU is the single,
canonical compilation unit for compiled.cpp to avoid multiple-definition linking
errors; alternatively add a build-system assertion/check that ensures
compiled.cpp is only compiled once (i.e., only via this shim) and not compiled
directly or via any other forwarding shims.
- Around line 1-5: The current guard uses defined(__has_include) which can be
incorrect because __has_include is a preprocessor operator, not a macro; change
the conditional to use the operator form directly, e.g. use `#if`
__has_include("../../lib/mlx/mlx/backend/cpu/compiled.cpp") before the `#include`
"../../lib/mlx/mlx/backend/cpu/compiled.cpp" and keep the `#else` `#error` fallback
unchanged, then verify the new guard works with GCC/Clang/AppleClang in CI where
the forwarded compilation of compiled.cpp is exercised.

In `@internal/metal/mlx_mlx_backend_cpu_copy.cpp`:
- Around line 1-5: The compile-time guard using __has_include(...) with a .cpp
forwarded source may fail on some toolchains; update the shim in
mlx_mlx_backend_cpu_copy.cpp to use a portable check and fallback: instead of
probing for "../../lib/mlx/mlx/backend/cpu/copy.cpp" with __has_include, probe
for a corresponding header (e.g., a public "copy.h") or remove the .cpp probe
and make the build/system (CMake/Bazel) ensure the .cpp is present; modify the
include logic around the current __has_include and the included path so it
checks a header or defers to the build-system check and emits a clear error if
the header/source is missing, referencing the existing include of
"../../lib/mlx/mlx/backend/cpu/copy.cpp" in this file.

In `@internal/metal/mlx_mlx_backend_cpu_eig.cpp`:
- Around line 1-5: The current preprocessor falls back to a single `#error` for
both lack of __has_include support and missing forwarded source; change the
logic to first test for the macro's availability (e.g., `#ifndef` __has_include)
and emit a clear error like "__has_include not supported; update compiler or
provide alternative include check", and only if __has_include exists use it to
test for "../../lib/mlx/mlx/backend/cpu/eig.cpp" and emit a different error like
"forwarded source ../../lib/mlx/mlx/backend/cpu/eig.cpp not found; initialise
submodules or fix include path" so the messages distinguish missing feature vs
missing file.

In `@internal/metal/mlx_mlx_backend_cpu_eigh.cpp`:
- Line 4: Update the preprocessor error in
internal/metal/mlx_mlx_backend_cpu_eigh.cpp by enhancing the `#error` message to
point developers to the repository's submodule/init documentation (e.g., add a
path like docs/submodules.md or CONTRIBUTING.md) and keep the existing
instructions about running git submodule update --init --recursive; modify the
string inside the `#error` directive so it includes that doc path and preserves
the original troubleshooting command.
- Around line 1-5: The forwarding TU uses __has_include and a fail-fast `#error`
for "../../lib/mlx/mlx/backend/cpu/eigh.cpp"; ensure you don't break unrelated
builds by making the build system only compile
internal/metal/mlx_mlx_backend_cpu_eigh.cpp for targets/configs that include
lib/mlx (or define a build macro like BUILD_WITH_MLX), or alternatively add a
stable guard macro check (e.g., `#if` defined(BUILD_WITH_MLX) &&
__has_include(...)) before the include; verify the relative path
"../../lib/mlx/mlx/backend/cpu/eigh.cpp" remains correct for your repo layout if
directories are reorganised.

In `@internal/metal/mlx_mlx_backend_cpu_fft.cpp`:
- Around line 4-5: The file internal/metal/mlx_mlx_backend_cpu_fft.cpp currently
hard-fails with a `#error` when the forwarded source is missing; replace this with
a safe conditional: use a preprocessor check (e.g., `#if`
__has_include("../../lib/mlx/mlx/backend/cpu/fft.cpp")) to include the forwarded
file when present and otherwise provide a minimal stub implementation or a clear
`#pragma` message warning, and update the message to mention both git submodule
update --init --recursive and git clone --recurse-submodules; alternatively,
gate adding this translation unit in the build system (CMake) so the .cpp is
only compiled when the forwarded file exists.

In `@internal/metal/mlx_mlx_backend_cpu_gemms_bnns.cpp`:
- Around line 1-5: Split the preprocessor check into two steps: first test
defined(__has_include) and if that is false emit a clear `#error` stating the
compiler/toolchain does not support __has_include (so the forwarded-include
presence cannot be determined), and otherwise (when defined(__has_include) is
true) evaluate __has_include("../../lib/mlx/mlx/backend/cpu/gemms/bnns.cpp") and
only then include the file or emit the existing "Missing forwarded source"
error; reference the existing use of __has_include and the include line that
pulls in "../../lib/mlx/mlx/backend/cpu/gemms/bnns.cpp" to locate where to add
the separate `#error` branch.
- Line 4: Replace the unconditional preprocessor error in the forwarding shim
(the lone `#error` that references "../../lib/mlx/mlx/backend/cpu/gemms/bnns.cpp")
with a guarded check that only triggers the hard failure when the BNNS backend
is required (e.g. ifndef/if defined HAVE_BNNS or a build macro like
USE_BNNS/MLX_ENABLE_BNNS); otherwise provide a non-fatal “backend unavailable”
path (compile an empty stub or no-op implementations) so builds that do not
enable BNNS can proceed. Locate the shim containing the `#error` and add the macro
guard around the `#error` and include-forwarding logic, or invert it to `#if`
defined(HAVE_BNNS) / `#include` "…/bnns.cpp" / `#else` /* stub */ / `#endif`, ensuring
the symbol name you choose matches the project’s BNNS build option.

In `@internal/metal/mlx_mlx_backend_cpu_gemms_cblas.cpp`:
- Around line 1-5: The forwarded include uses a brittle relative path
("../../lib/mlx/mlx/backend/cpu/gemms/cblas.cpp"); update the guard and include
to use an include-rooted path (e.g., check
__has_include("mlx/backend/cpu/gemms/cblas.cpp") and `#include`
"mlx/backend/cpu/gemms/cblas.cpp") and ensure the build adds the submodule
source directory to the compiler include paths (target_include_directories or
-I) so the forwarding unit (internal/metal/mlx_mlx_backend_cpu_gemms_cblas.cpp)
no longer hardcodes ../../ but relies on the include tree.

In `@internal/metal/mlx_mlx_backend_cpu_jit_compiler.cpp`:
- Around line 1-5: The current header guard uses "#if defined(__has_include) &&
__has_include(...)" which triggers the `#error` on toolchains that lack
__has_include; change the conditional in this include block so that when
__has_include is undefined it falls back to attempting a direct `#include` of
"../../lib/mlx/mlx/backend/cpu/jit_compiler.cpp" (so the compiler will produce a
normal file-not-found diagnostics if the file is missing) rather than
immediately emitting the custom `#error`; keep the __has_include &&
__has_include(...) path to prefer the guarded include and only use the explicit
include fallback when __has_include is not available.

In `@internal/metal/mlx_mlx_backend_cpu_luf.cpp`:
- Line 1: The preprocessor guard currently uses `#if defined(__has_include) &&
__has_include("...")` which is unnecessary on standard-compliant preprocessors
and can be simplified: replace the directive with `#if __has_include("...")`
(i.e., change the check around `__has_include` in the forwarding shim containing
the current `#if defined(__has_include) &&
__has_include("../../lib/mlx/mlx/backend/cpu/luf.cpp")` line); if you prefer to
maintain the existing compatibility pattern, apply this same simplification
uniformly across the other forwarding shims instead of changing only this file
so behavior remains consistent.

In `@internal/metal/mlx_mlx_backend_cpu_masked_mm.cpp`:
- Around line 1-5: The forwarding shim uses a guarded include that may
mis-handle some compilers: replace the preprocessor check "#if
defined(__has_include) &&
__has_include(\"../../lib/mlx/mlx/backend/cpu/masked_mm.cpp\")" with the simpler
"#if __has_include(\"../../lib/mlx/mlx/backend/cpu/masked_mm.cpp\")" in the TU
(internal/metal/mlx_mlx_backend_cpu_masked_mm.cpp) so the __has_include operator
is evaluated directly; keep the same `#include` and `#error` branches and then run
CI or build locally across supported compilers (esp. GCC) to verify the pattern
matches other forwarding shims.
- Line 4: Update the existing `#error` in
internal/metal/mlx_mlx_backend_cpu_masked_mm.cpp to include the exact expected
include path spelling (“../../lib/mlx/mlx/backend/cpu/masked_mm.cpp”) and,
optionally, the relative source of this shim (“from internal/metal/”) so users
know where the forwarding include is missing; modify the error string in the
`#error` directive to combine the existing advice (git submodule update --init
--recursive) with the precise expected path and the optional shim location to
make the message unambiguous.

In `@internal/metal/mlx_mlx_backend_cpu_primitives.cpp`:
- Around line 1-5: The forwarded include path is brittle because it hardcodes
"../../lib/mlx/mlx/backend/cpu/primitives.cpp"; change the file to use a central
macro for the forwarded-source root (e.g. MLX_FORWARDED_SRC_ROOT) and
build-provided include base so the include becomes `#if` defined(__has_include) &&
__has_include(MLX_FORWARDED_SRC_ROOT "/backend/cpu/primitives.cpp") then
`#include` MLX_FORWARDED_SRC_ROOT "/backend/cpu/primitives.cpp" else keep the
`#error` but mention MLX_FORWARDED_SRC_ROOT in the message; ensure the macro can
be set from the build system (or default to "../../lib/mlx/mlx") so the
forwarding unit uses a single stable identifier instead of the duplicated
literal path.

In `@internal/metal/mlx_mlx_backend_cpu_qrf.cpp`:
- Around line 3-5: Wrap the forwarding include / hard `#error` in a build-guard so
missing QRF source only fails when the QRF backend is enabled: change the TU so
the forwarded include and the existing `#error` line are inside a preprocessor
check like `#if` defined(MLX_BACKEND_CPU_QRF) (or your project’s equivalent macro
such as ENABLE_QRF/MLX_ENABLE_QRF); if the macro is not defined, make the file a
no-op (or emit a non-fatal `#pragma` message) so the missing
"../../lib/mlx/.../qrf.cpp" forwarding include does not break builds that
intentionally exclude the CPU QRF backend. Ensure you update the presence check
around the forwarded source/#error in this TU to reference that macro name.

In `@internal/metal/mlx_mlx_backend_cpu_reduce.cpp`:
- Around line 1-5: The forwarded-include fragility in
internal/metal/mlx_mlx_backend_cpu_reduce.cpp comes from hardcoding the upstream
path "../../lib/mlx/mlx/backend/cpu/reduce.cpp"; replace that pattern by
centralising the forwarding logic: create a single header or macro (e.g.,
MLX_FORWARDED_INCLUDE or a header like "mlx_forwarded_include.h") that defines
the forwarded include path and the __has_include check, then update this TU to
use that macro/header to include reduce.cpp (referencing the existing include
check and the target file reduce.cpp) so the path is maintained in one place and
not duplicated across many TUs.

In `@internal/metal/mlx_mlx_backend_cpu_softmax.cpp`:
- Around line 1-5: The TU mlx_mlx_backend_cpu_softmax.cpp forwards/embeds the
upstream implementation (softmax.cpp) which risks ODR/multiple-definition if
upstream softmax.cpp is also compiled as a separate translation unit; update the
build or code so exactly one definition exists — either (a) modify the build
(CMakeLists) to exclude the original softmax.cpp from normal compilation when
the forwarding TU is enabled, or (b) stop forwarding the .cpp and instead
provide a header-only/API wrapper or mark functions inline/templated in the
upstream softmax implementation; locate the forwarding include in
mlx_mlx_backend_cpu_softmax.cpp and the upstream softmax implementation
(softmax.cpp) in the MLX sources and apply the conditional exclusion or refactor
so symbols from softmax are defined only once.

In `@internal/metal/mlx_mlx_backend_cpu_svd.cpp`:
- Around line 1-5: The forwarded include of
../../lib/mlx/mlx/backend/cpu/svd.cpp from
internal/metal/mlx_mlx_backend_cpu_svd.cpp can cause multiple-definition linker
errors if the original svd.cpp is also built as a separate translation unit; fix
by ensuring the upstream TU is not compiled (update the build system to remove
../../lib/mlx/mlx/backend/cpu/svd.cpp from the CMake target that builds the
project) or alternatively stop forwarding the .cpp and include a proper
header/inline implementation instead (or make the implementation internal-only
by converting functions in svd.cpp to static/inline or put them in an anonymous
namespace), and update references in the code to use the header-based
declarations; target the forwarded file name svd.cpp and the forwarding file
mlx_mlx_backend_cpu_svd.cpp when making the change.

In `@internal/metal/mlx_mlx_backend_cpu_threefry.cpp`:
- Around line 1-5: The forwarded include using a hard-coded relative path in the
__has_include check (the line including
"../../lib/mlx/mlx/backend/cpu/threefry.cpp" and the accompanying `#error`) is
brittle; change it to prefer a repo-rooted or installed-style include (e.g.,
__has_include(<mlx/backend/cpu/threefry.cpp>) with `#include`
<mlx/backend/cpu/threefry.cpp>) and keep the existing relative include as a
fallback (or introduce a small FORWARDED_INCLUDE macro shared by other stubs) so
compilation works whether the repo layout changes; update the __has_include
checks to test the new angle-bracket path first and only trigger the `#error` if
neither option is found.

In `@internal/metal/mlx_mlx_backend_cpu_unary.cpp`:
- Line 4: The preprocessor `#error` directive in
internal/metal/mlx_mlx_backend_cpu_unary.cpp (the line containing "#error
\"Missing forwarded source: ../../lib/mlx/.../unary.cpp\"") is triggering
static-analysis noise; update the file to only emit that `#error` for actual build
configurations or to silence analyzers: either wrap the `#error` in a build-only
guard macro (e.g., conditionally require a BUILD_FORWARDING_PRESENT or
FORWARDED_SOURCE_EXISTS macro before emitting the `#error`) so static analysis
runs that define nothing won't see it, or surround the `#error` with analyzer
suppression pragmas/comments appropriate for your CI (clang-tidy/cppcheck) to
ignore this forwarding-pattern; ensure the forwarded-path string remains intact
so real missing-source builds still fail.

In `@internal/metal/mlx_mlx_backend_gpu_slicing.cpp`:
- Around line 1-5: The wrapper mlx_mlx_backend_gpu_slicing.cpp forwards the
upstream gpu/slicing.cpp and risks duplicate compilation if that upstream file
is also listed as a separate compile unit; ensure the build does not compile
gpu/slicing.cpp twice by either removing the upstream
../../lib/mlx/mlx/backend/gpu/slicing.cpp from the CMake target sources or by
guarding the upstream file with a unique include guard/macro (e.g., define
MLX_BACKEND_GPU_SLICING_FORWARD_INCLUDED before including) so only the wrapper
version is compiled; update the CMakeLists.txt to reference only the wrapper or
add the macro definition to the target compile definitions to prevent duplicate
linkage.

In `@internal/metal/mlx_mlx_backend_metal_binary.cpp`:
- Around line 1-5: The forwarded-include uses a fragile relative path via
__has_include and `#include`("../../lib/mlx/mlx/backend/metal/binary.cpp"); change
to rely on a stable include root provided by the build system (e.g., add lib/ to
the compiler include paths) and then test for and include the file with a
root-relative path such as __has_include("mlx/backend/metal/binary.cpp") /
`#include` "mlx/backend/metal/binary.cpp" while preserving the current `#error`
fallback; update the build configuration to add the lib/ directory to the
include search paths so __has_include and the include can resolve reliably.

In `@internal/metal/mlx_mlx_backend_metal_compiled.cpp`:
- Line 1: The preprocessor guard using defined(__has_include) can be
toolchain-sensitive; update the include check around the __has_include usage in
the forwarding TU guard (the expression wrapping __has_include) by either
switching to the simpler form `#if __has_include("...")` or adding an explicit
fallback that emits an `#error` when __has_include is unavailable, ensuring you
change the guard that currently uses defined(__has_include) &&
__has_include(...) (the macro check in this TU) and run CI across clang/gcc/msvc
to verify it compiles on all supported toolchains.

In `@internal/metal/mlx_mlx_backend_metal_copy.cpp`:
- Around line 1-2: The forwarded include uses a brittle relative path; update
the include logic in this shim so it first checks for and includes a configured
include-path style header (e.g., use __has_include(<mlx/backend/metal/copy.cpp>)
and include <mlx/backend/metal/copy.cpp>), and fall back to the existing
relative include (the current
__has_include("../../lib/mlx/mlx/backend/metal/copy.cpp") and include) if the
configured-path include is not found; modify the include branch logic around the
existing __has_include check in this file so both forms are tested and the
non-relative form is preferred.

In `@internal/metal/mlx_mlx_backend_metal_custom_kernel.cpp`:
- Around line 1-5: The forwarded-include is brittle due to a hard-coded relative
path; change the include in mlx_mlx_backend_metal_custom_kernel.cpp to use a
build-system-provided project include (e.g. replace the quoted
"../../lib/mlx/mlx/backend/metal/custom_kernel.cpp" with a namespaced angled
include like <mlx/backend/metal/custom_kernel.cpp>), but keep the existing
__has_include check and the `#error` fallback so the file still fails fast when
the submodule/include path is missing; update only the string inside the
existing __has_include and `#include` lines (they reference __has_include and the
forwarded source path) so the preprocessor logic and error message remain
intact.

In `@internal/metal/mlx_mlx_backend_metal_distributed.cpp`:
- Line 4: Update the preprocessor error message emitted by the '#error' in
internal/metal/mlx_mlx_backend_metal_distributed.cpp to also suggest running
'git submodule sync --recursive' in addition to 'git submodule update --init
--recursive' so users are guided to resync submodule URLs/config after changes;
keep the original actionable text and filename reference
('../../lib/mlx/mlx/backend/metal/distributed.cpp') but append the sync
recommendation to improve resilience in fresh clones or after URL changes.
- Around line 1-5: The current preprocessor guard uses defined(__has_include)
with the operator in one expression which can be unreliable; change the include
guard to a nested conditional that first checks `#if` defined(__has_include) and
only then evaluates the
__has_include("../../lib/mlx/mlx/backend/metal/distributed.cpp") operator,
falling back to the `#error` path otherwise, so the forwarded include string
("../../lib/mlx/mlx/backend/metal/distributed.cpp") is only tested when
supported; after updating the guard around the include of the forwarded source,
verify the pattern compiles on your CI toolchains (clang, gcc, MSVC) or match
the same pattern used in other forwarding TUs like this file to ensure
consistency.

In `@internal/metal/mlx_mlx_backend_metal_eval.cpp`:
- Around line 1-5: The translation unit forwards
../../lib/mlx/mlx/backend/metal/eval.cpp and emits a hard `#error` if it is
missing (see the __has_include check and the error string), so update every C++
compile workflow and developer setup that builds this file to initialise
submodules recursively; specifically, add or ensure a step that runs git
submodule update --init --recursive (or configure your CI's checkout action to
fetch submodules) before any compile step that touches this TU, and verify the
change covers all CI jobs that compile Metal backend code.

In `@internal/metal/mlx_mlx_backend_metal_fence.cpp`:
- Around line 1-5: The forwarded-include currently uses a brittle relative path
in the __has_include/#include for "../../lib/mlx/mlx/backend/metal/fence.cpp";
change the forwarding to use a stable include name (e.g.
"mlx/backend/metal/fence.cpp" or similar project-root-relative header) and
update the build system (CMake) to add the MLX lib root to the compiler include
directories so the check and include resolve reliably; also search the tree for
other forwarding translation units that include the same fence.cpp and
consolidate/remove duplicates to avoid duplicate-symbol/link errors (refer to
the current forwarding TU that contains the __has_include/#error block and the
upstream source fence.cpp).

In `@internal/metal/mlx_mlx_backend_metal_hadamard.cpp`:
- Around line 1-5: Replace the brittle hard-coded relative include in the
forwarding TU with a robust include mechanism: change the include to use a
build-provided include root or macro (e.g. use `#include`
<mlx/backend/metal/hadamard.cpp> with the build system adding the appropriate
include directory, or use a build-defined macro like MLX_BACKEND_ROOT and
include via MLX_BACKEND_ROOT "/backend/metal/hadamard.cpp"); update the `#error`
message to mention the expected include root or build flag (e.g. instruct to add
the mlx backend include directory or define MLX_BACKEND_ROOT via the build
system) and keep the existing has_include check but point it at the new include
form so the file fails fast if the build isn’t configured.

In `@internal/metal/mlx_mlx_backend_metal_logsumexp.cpp`:
- Around line 1-5: The current TU
internal/metal/mlx_mlx_backend_metal_logsumexp.cpp uses a brittle relative
forwarded include and unconditionally `#error` if
"../../lib/mlx/mlx/backend/metal/logsumexp.cpp" is missing; change this to be
build-system resilient by either (A) switching the include to a stable project
include path (e.g., "mlx/backend/metal/logsumexp.cpp") and removing the hard
`#error` in the __has_include check—emit a `#pragma` message/#warning and provide a
safe no-op stub or define a feature macro (e.g., NO_METAL_LOGSUMEXP) so
compilation can continue if the file is absent, or (B) move the existence check
into CMake and only compile this TU when the Metal backend/source is present
(use target_include_directories to expose the lib/ include path and gate the
source file). Keep the __has_include guard but do not hard-fail; reference the
existing identifiers __has_include, the current include path
"../../lib/mlx/mlx/backend/metal/logsumexp.cpp", and the TU filename to locate
where to apply the change.

In `@internal/metal/mlx_mlx_backend_metal_nojit_kernels.cpp`:
- Line 2: The forwarding TU includes the implementation file nojit_kernels.cpp
directly which will cause duplicate symbol/ODR issues unless the build system
guarantees the original nojit_kernels.cpp is not compiled separately; update the
build rules so only one copy is built: either remove nojit_kernels.cpp from the
list of sources in the CMake target(s) that also compile
internal/metal/mlx_mlx_backend_metal_nojit_kernels.cpp, or conditionally add the
include-based TU (internal/metal/mlx_mlx_backend_metal_nojit_kernels.cpp)
instead of the standalone nojit_kernels.cpp via CMake logic; verify the target
source lists and any platform-specific generator expressions so that
nojit_kernels.cpp is present in exactly one translation unit.

In `@internal/metal/mlx_mlx_backend_metal_quantized.cpp`:
- Around line 1-5: The current preprocessor check conflates two failure modes;
update the conditional in internal/metal/mlx_mlx_backend_metal_quantized.cpp to
detect when __has_include is not supported versus when the forwarded file is
absent: first use `#if` defined(__has_include) to guard support and emit a clear
`#error` like "compiler does not support __has_include" if not defined, then
inside that branch use `#if`
__has_include("../../lib/mlx/mlx/backend/metal/quantized.cpp") to include the
forwarded file and in the corresponding `#else` emit the existing
submodule-missing error (mentioning
"../../lib/mlx/mlx/backend/metal/quantized.cpp"); keep the include path and
error text exact so callers can locate the missing file or feature.
- Line 4: Replace the single blunt preprocessor `#error` that references the
forwarded source "../../lib/mlx/mlx/backend/metal/quantized.cpp" with a clearer
diagnostic split using __has_include: first test for compiler/support (e.g. a
specific feature macro) and emit a descriptive `#error` for truly unsupported
compilers/standards, and otherwise use __has_include to check for the forwarded
file and emit a targeted `#error` that instructs to run "git submodule update
--init --recursive" (or fix the include path); this keeps the original guarded
include logic but separates "unsupported" vs "missing forwarded source" messages
so static analyzers and humans get clearer intent.

In `@internal/metal/mlx_mlx_backend_metal_slicing.cpp`:
- Around line 1-2: The forwarded include uses a brittle hard-coded relative path
in the __has_include and include directives; update it to reference a
build-system-provided macro (e.g. MLX_VENDOR_DIR) instead so the shim survives
tree reorgs: change the __has_include check to test for the macro-derived path
(using token/string concatenation) and use the same macro-derived path in the
subsequent `#include`, ensuring the macro is documented as being set by the build
(fallback to the current relative path only if the macro is undefined).

In `@internal/metal/mlx_mlx_backend_metal_softmax.cpp`:
- Around line 3-5: Keep the existing `#error` in mlx_mlx_backend_metal_softmax.cpp
but update its wording to match the repository's standard submodule
initialization command and any CI specifics; locate the preprocessor block that
emits `#error` "Missing forwarded source:
../../lib/mlx/mlx/backend/metal/softmax.cpp..." and replace the message text to
include the project-preferred git submodule command (e.g., whether to include
--recursive or a particular submodule name/branch) and any CI setup notes so
developers get exact, actionable instructions.

In `@internal/metal/mlx_mlx_backend_metal_sort.cpp`:
- Around line 1-5: Update the failing `#error` message in the __has_include
fallback so it reflects all supported dependency acquisition methods (e.g., git
submodules and CMake FetchContent) and prints the resolved relative include path
being checked (the "../../lib/mlx/mlx/backend/metal/sort.cpp" forwarded include
from internal/metal/), e.g., replace the current single-option message with one
that lists both recovery steps and includes the exact forwarded path string and
a brief hint to verify CMake/FetchContent or submodule setup; edit the
preprocessor block that uses __has_include and the `#error` line to make this
clearer.

In `@internal/metal/mlx_mlx_distributed_primitives.cpp`:
- Around line 1-5: Current preprocessor logic lumps compiler lacking
__has_include and the file actually missing into one `#else` branch; split the
cases so diagnostics are specific. Change the guards to first test
defined(__has_include) and if not defined emit an error stating the compiler
does not support __has_include, then inside the defined branch use
__has_include("../../lib/mlx/mlx/distributed/primitives.cpp") and include the
forwarded source on success or emit a distinct error stating the forwarded
source "../../lib/mlx/mlx/distributed/primitives.cpp" is missing and to
initialise submodules or fix the path.

In `@internal/metal/mlx_mlx_fft.cpp`:
- Around line 1-5: The current preprocessor block always emits a hard `#error`
when __has_include is not defined; change it to fall back to directly including
the forwarded source path ("../../lib/mlx/mlx/fft.cpp") when __has_include is
unavailable so the compiler will report a normal include error if the file is
truly missing. Concretely, keep the check that uses __has_include to include the
file when available, but when __has_include is not defined, replace the `#error`
branch with a plain `#include` "../../lib/mlx/mlx/fft.cpp" fallback so compilation
behaves correctly on compilers without __has_include support.

In `@internal/metal/mlx_mlx_io_gguf_quants.cpp`:
- Around line 1-2: The forwarded include uses a brittle relative path
("../../lib/mlx/mlx/io/gguf_quants.cpp"); change it to a stable project include
and adjust build include directories instead (e.g., replace the relative include
in internal/metal/mlx_mlx_io_gguf_quants.cpp with a canonical path such as
<mlx/io/gguf_quants.cpp> or another agreed-on project include prefix) and add
the corresponding include directory in your CMakeLists (or build system) so the
file is found without fragile relative traversal.
- Around line 1-5: The forwarding source
internal/metal/mlx_mlx_io_gguf_quants.cpp appears to be dead — it only contains
a conditional include for ../../lib/mlx/mlx/io/gguf_quants.cpp and is not
referenced by CMake or cgo; either remove this file as vestigial (and update
docs/cleanup any mention in the internal/metal forwarding pattern) or make it a
real compilation entrypoint by adding it to the build (add the file to the
appropriate CMakeLists.txt target or the Go cgo directives) and ensure the
include guard path and submodule requirement remain correct; search for
references to mlx_mlx_io_gguf_quants.cpp, verify MLX_BUILD_GGUF usage, then
choose deletion if unreferenced or add the file to the build if intended for
future GGUF support.

In `@internal/metal/mlx_mlx_io_load.cpp`:
- Around line 1-5: The forwarded-include guard using __has_include with the
relative path "../../lib/mlx/mlx/io/load.cpp" can fail on some
toolchains/working-directory setups; update the build and source to make
resolution robust: either (a) add the upstream lib/mlx directory to the
compiler's include path and change the include to a canonical project include
(e.g. "mlx/io/load.cpp"), or (b) add fallback checks or build-time validation in
the buildscript (CMake/Makefile) to test for the presence of
../../lib/mlx/mlx/io/load.cpp and fail early with a clearer message;
specifically adjust the usage around __has_include and the literal include
string "../../lib/mlx/mlx/io/load.cpp" and run the CI/build configurations that
previously failed to verify the TU compiles everywhere.

In `@internal/metal/mlx_mlx_linalg.cpp`:
- Line 4: The hard `#error` in the forwarded-include sentinel causes noisy
failures for static analysis/IDE indexing; change it to a conditional that keeps
the `#error` for normal builds but soft-fails for analysis-only passes (e.g., use
a build macro like ANALYSIS_MODE or STATIC_ANALYSIS), e.g., wrap the existing
`#error` inside `#if` !defined(ANALYSIS_MODE) ... `#error` ... `#endif` and add an `#else`
branch that emits a compile-time warning/message (pragma message or similar) so
tooling can proceed; update the sentinel in mlx_mlx_linalg.cpp around the
existing `#error` to use this conditional macro name
(ANALYSIS_MODE/STATIC_ANALYSIS) so CI/dev builds still hard-fail while
analysis/IDE runs get a friendly notice.

In `@internal/metal/mlx_mlx_primitives.cpp`:
- Line 2: The TU currently does a direct `#include` of
../../lib/mlx/mlx/primitives.cpp (in internal/metal/mlx_mlx_primitives.cpp)
which can produce duplicate-definition link errors if primitives.cpp is also
compiled as its own TU; verify your build wiring so primitives.cpp is only
compiled once: either (A) stop including the .cpp here and add
../../lib/mlx/mlx/primitives.cpp as a source file in your CMake/Make target
(e.g. add to target_sources or src list) so it's compiled once, or (B) keep this
forwarding include but remove any other target/source entries that compile
primitives.cpp separately (search CMakeLists.txt for references to
primitives.cpp or targets that would build lib/mlx); update the build files
accordingly and rerun a full parallel build to confirm no multiple-definition
errors.
- Around line 1-5: Replace the non-idiomatic preprocessor guard that uses
defined(__has_include) with the conventional __has_include check: locate the
conditional around the include of "../../lib/mlx/mlx/primitives.cpp" (the `#if`
defined(__has_include) && __has_include(...) / `#include`
"../../lib/mlx/mlx/primitives.cpp" block) and change it to use `#if`
__has_include(...) so the preprocessor directly tests availability of the
forwarded source while keeping the same `#include` and `#error` fallback intact.

In `@internal/metal/mlx_mlx_transforms.cpp`:
- Around line 1-5: The wrapper translation unit mlx_mlx_transforms.cpp directly
forwards ../../lib/mlx/mlx/transforms.cpp which can cause multiple-definition
linker errors if the upstream transforms.cpp is also compiled into the build;
ensure only one compilation unit is built by either removing the upstream
../../lib/mlx/mlx/transforms.cpp entry from the build source list (so only this
wrapper TU is compiled) or stop including the upstream .cpp here and instead
include the corresponding header (or expose functions via a single upstream TU)
so symbols from transforms.cpp are compiled exactly once; locate the forwarding
include in mlx_mlx_transforms.cpp and reconcile it with your build system/source
lists to prevent double-linking.

In `@internal/metal/mlx_mlx_utils.cpp`:
- Line 4: The preprocessor `#error` in internal/metal/mlx_mlx_utils.cpp is too
vague about where the forwarded source should come from; change it to use a
CMake-injected macro (e.g., MLX_EXPECTED_INCLUDE_ROOT or MLX_BUILD_DOCS_URL) so
the message prints the configured expected include root and a link to the
submodule-init docs; update the `#error` text to include the macro value when
defined and a clear fallback explaining to run "git submodule update --init
--recursive" or consult the referenced build docs, so users building with
CMake/Ninja see an actionable, environment-aware message.

In `@internal/metal/mlxc_distributed.cpp`:
- Around line 1-5: The current forwarding shim uses __has_include to include
"../../lib/mlx-c/mlx/c/distributed.cpp" and emits a clear `#error` if missing; to
improve portability for toolchains without __has_include, add a fallback that
attempts the include unguarded when __has_include is not defined, or explicitly
document/guard the code path; update the conditional around the include (the
macro check using __has_include and the include of
"../../lib/mlx-c/mlx/c/distributed.cpp") so that when __has_include is not
available the file is still included (or an alternative diagnostic is provided),
preserving the existing `#error` message behavior when the forwarded source truly
cannot be found.

In `@README.md`:
- Around line 72-90: The example currently ignores errors from
session.NewPixelBuffer calls (src, rgba, scaled), making it unreliable; update
the Frame Compute snippet to capture and handle each returned error from
NewPixelBuffer (e.g., src, err := session.NewPixelBuffer(...); if err != nil {
panic(err) } ), doing this for the calls that produce src, rgba, and scaled so
they follow the same error-checking pattern used elsewhere in the example;
reference the NewPixelBuffer calls and the variables src, rgba, and scaled when
applying the changes.
- Around line 30-33: The Quick Start streaming loop using model.Generate
currently only prints tokens; update it to check the generator's terminal error
state by calling model.Err() (or m.Err() if the iterator variable is named m)
after the for tok := range model.Generate(...) loop and handle/report any
non-nil error (e.g., log or fmt.Println the error) so users see generation
failures; locate the loop that iterates over model.Generate and add a post-loop
check like err := model.Err(); if err != nil { /* report error */ }.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f27b9526-a425-4f61-9a9b-d9d0217f2124

📥 Commits

Reviewing files that changed from the base of the PR and between 0024239 and f746e36.

📒 Files selected for processing (152)
  • .forgejo/workflows/test.yml
  • README.md
  • compute_darwin.go
  • compute_test.go
  • docs/development.md
  • docs/models.md
  • gguf_info_test.go
  • internal/metal/io_custom_test.go
  • internal/metal/mlx_gen_cpu_compiled_preamble.cpp
  • internal/metal/mlx_gen_metal_jit_binary_ops.cpp
  • internal/metal/mlx_gen_metal_jit_gather.cpp
  • internal/metal/mlx_gen_metal_jit_gather_axis.cpp
  • internal/metal/mlx_gen_metal_jit_gather_front.cpp
  • internal/metal/mlx_gen_metal_jit_hadamard.cpp
  • internal/metal/mlx_gen_metal_jit_masked_scatter.cpp
  • internal/metal/mlx_gen_metal_jit_reduce_utils.cpp
  • internal/metal/mlx_gen_metal_jit_scatter.cpp
  • internal/metal/mlx_gen_metal_jit_scatter_axis.cpp
  • internal/metal/mlx_gen_metal_jit_ternary_ops.cpp
  • internal/metal/mlx_gen_metal_jit_unary_ops.cpp
  • internal/metal/mlx_gen_metal_jit_utils.cpp
  • internal/metal/mlx_mlx_array.cpp
  • internal/metal/mlx_mlx_backend_common_broadcasting.cpp
  • internal/metal/mlx_mlx_backend_common_common.cpp
  • internal/metal/mlx_mlx_backend_common_compiled.cpp
  • internal/metal/mlx_mlx_backend_common_load.cpp
  • internal/metal/mlx_mlx_backend_common_reduce.cpp
  • internal/metal/mlx_mlx_backend_common_slicing.cpp
  • internal/metal/mlx_mlx_backend_common_utils.cpp
  • internal/metal/mlx_mlx_backend_cpu_arg_reduce.cpp
  • internal/metal/mlx_mlx_backend_cpu_available.cpp
  • internal/metal/mlx_mlx_backend_cpu_binary.cpp
  • internal/metal/mlx_mlx_backend_cpu_cholesky.cpp
  • internal/metal/mlx_mlx_backend_cpu_compiled.cpp
  • internal/metal/mlx_mlx_backend_cpu_conv.cpp
  • internal/metal/mlx_mlx_backend_cpu_copy.cpp
  • internal/metal/mlx_mlx_backend_cpu_distributed.cpp
  • internal/metal/mlx_mlx_backend_cpu_eig.cpp
  • internal/metal/mlx_mlx_backend_cpu_eigh.cpp
  • internal/metal/mlx_mlx_backend_cpu_encoder.cpp
  • internal/metal/mlx_mlx_backend_cpu_eval.cpp
  • internal/metal/mlx_mlx_backend_cpu_fft.cpp
  • internal/metal/mlx_mlx_backend_cpu_gemms_bnns.cpp
  • internal/metal/mlx_mlx_backend_cpu_gemms_cblas.cpp
  • internal/metal/mlx_mlx_backend_cpu_hadamard.cpp
  • internal/metal/mlx_mlx_backend_cpu_indexing.cpp
  • internal/metal/mlx_mlx_backend_cpu_inverse.cpp
  • internal/metal/mlx_mlx_backend_cpu_jit_compiler.cpp
  • internal/metal/mlx_mlx_backend_cpu_logsumexp.cpp
  • internal/metal/mlx_mlx_backend_cpu_luf.cpp
  • internal/metal/mlx_mlx_backend_cpu_masked_mm.cpp
  • internal/metal/mlx_mlx_backend_cpu_matmul.cpp
  • internal/metal/mlx_mlx_backend_cpu_primitives.cpp
  • internal/metal/mlx_mlx_backend_cpu_qrf.cpp
  • internal/metal/mlx_mlx_backend_cpu_quantized.cpp
  • internal/metal/mlx_mlx_backend_cpu_reduce.cpp
  • internal/metal/mlx_mlx_backend_cpu_scan.cpp
  • internal/metal/mlx_mlx_backend_cpu_select.cpp
  • internal/metal/mlx_mlx_backend_cpu_softmax.cpp
  • internal/metal/mlx_mlx_backend_cpu_sort.cpp
  • internal/metal/mlx_mlx_backend_cpu_svd.cpp
  • internal/metal/mlx_mlx_backend_cpu_threefry.cpp
  • internal/metal/mlx_mlx_backend_cpu_unary.cpp
  • internal/metal/mlx_mlx_backend_gpu_copy.cpp
  • internal/metal/mlx_mlx_backend_gpu_primitives.cpp
  • internal/metal/mlx_mlx_backend_gpu_slicing.cpp
  • internal/metal/mlx_mlx_backend_metal_allocator.cpp
  • internal/metal/mlx_mlx_backend_metal_binary.cpp
  • internal/metal/mlx_mlx_backend_metal_compiled.cpp
  • internal/metal/mlx_mlx_backend_metal_conv.cpp
  • internal/metal/mlx_mlx_backend_metal_copy.cpp
  • internal/metal/mlx_mlx_backend_metal_custom_kernel.cpp
  • internal/metal/mlx_mlx_backend_metal_device.cpp
  • internal/metal/mlx_mlx_backend_metal_distributed.cpp
  • internal/metal/mlx_mlx_backend_metal_eval.cpp
  • internal/metal/mlx_mlx_backend_metal_event.cpp
  • internal/metal/mlx_mlx_backend_metal_fence.cpp
  • internal/metal/mlx_mlx_backend_metal_fft.cpp
  • internal/metal/mlx_mlx_backend_metal_hadamard.cpp
  • internal/metal/mlx_mlx_backend_metal_indexing.cpp
  • internal/metal/mlx_mlx_backend_metal_logsumexp.cpp
  • internal/metal/mlx_mlx_backend_metal_matmul.cpp
  • internal/metal/mlx_mlx_backend_metal_metal.cpp
  • internal/metal/mlx_mlx_backend_metal_nojit_kernels.cpp
  • internal/metal/mlx_mlx_backend_metal_normalization.cpp
  • internal/metal/mlx_mlx_backend_metal_primitives.cpp
  • internal/metal/mlx_mlx_backend_metal_quantized.cpp
  • internal/metal/mlx_mlx_backend_metal_reduce.cpp
  • internal/metal/mlx_mlx_backend_metal_resident.cpp
  • internal/metal/mlx_mlx_backend_metal_rope.cpp
  • internal/metal/mlx_mlx_backend_metal_scaled_dot_product_attention.cpp
  • internal/metal/mlx_mlx_backend_metal_scan.cpp
  • internal/metal/mlx_mlx_backend_metal_slicing.cpp
  • internal/metal/mlx_mlx_backend_metal_softmax.cpp
  • internal/metal/mlx_mlx_backend_metal_sort.cpp
  • internal/metal/mlx_mlx_backend_metal_ternary.cpp
  • internal/metal/mlx_mlx_backend_metal_unary.cpp
  • internal/metal/mlx_mlx_backend_metal_utils.cpp
  • internal/metal/mlx_mlx_compile.cpp
  • internal/metal/mlx_mlx_device.cpp
  • internal/metal/mlx_mlx_distributed_distributed.cpp
  • internal/metal/mlx_mlx_distributed_jaccl_no_jaccl.cpp
  • internal/metal/mlx_mlx_distributed_mpi_no_mpi.cpp
  • internal/metal/mlx_mlx_distributed_nccl_no_nccl.cpp
  • internal/metal/mlx_mlx_distributed_ops.cpp
  • internal/metal/mlx_mlx_distributed_primitives.cpp
  • internal/metal/mlx_mlx_distributed_ring_no_ring.cpp
  • internal/metal/mlx_mlx_distributed_utils.cpp
  • internal/metal/mlx_mlx_dtype.cpp
  • internal/metal/mlx_mlx_dtype_utils.cpp
  • internal/metal/mlx_mlx_einsum.cpp
  • internal/metal/mlx_mlx_export.cpp
  • internal/metal/mlx_mlx_fast.cpp
  • internal/metal/mlx_mlx_fft.cpp
  • internal/metal/mlx_mlx_graph_utils.cpp
  • internal/metal/mlx_mlx_io_gguf.cpp
  • internal/metal/mlx_mlx_io_gguf_quants.cpp
  • internal/metal/mlx_mlx_io_load.cpp
  • internal/metal/mlx_mlx_io_no_gguf.cpp
  • internal/metal/mlx_mlx_io_safetensors.cpp
  • internal/metal/mlx_mlx_linalg.cpp
  • internal/metal/mlx_mlx_ops.cpp
  • internal/metal/mlx_mlx_primitives.cpp
  • internal/metal/mlx_mlx_random.cpp
  • internal/metal/mlx_mlx_scheduler.cpp
  • internal/metal/mlx_mlx_transforms.cpp
  • internal/metal/mlx_mlx_utils.cpp
  • internal/metal/mlx_mlx_version.cpp
  • internal/metal/mlxc_array.cpp
  • internal/metal/mlxc_closure.cpp
  • internal/metal/mlxc_compile.cpp
  • internal/metal/mlxc_device.cpp
  • internal/metal/mlxc_distributed.cpp
  • internal/metal/mlxc_distributed_group.cpp
  • internal/metal/mlxc_error.cpp
  • internal/metal/mlxc_export.cpp
  • internal/metal/mlxc_fast.cpp
  • internal/metal/mlxc_fft.cpp
  • internal/metal/mlxc_io.cpp
  • internal/metal/mlxc_io_types.cpp
  • internal/metal/mlxc_linalg.cpp
  • internal/metal/mlxc_map.cpp
  • internal/metal/mlxc_memory.cpp
  • internal/metal/mlxc_metal.cpp
  • internal/metal/mlxc_ops.cpp
  • internal/metal/mlxc_random.cpp
  • internal/metal/mlxc_stream.cpp
  • internal/metal/mlxc_string.cpp
  • internal/metal/mlxc_transforms.cpp
  • internal/metal/mlxc_transforms_impl.cpp
  • internal/metal/mlxc_vector.cpp
  • internal/metal/mlxc_version.cpp
✅ Files skipped from review due to trivial changes (10)
  • internal/metal/mlx_gen_metal_jit_gather_axis.cpp
  • internal/metal/mlx_gen_metal_jit_scatter_axis.cpp
  • internal/metal/mlx_gen_metal_jit_binary_ops.cpp
  • internal/metal/mlx_gen_metal_jit_ternary_ops.cpp
  • internal/metal/mlx_gen_metal_jit_utils.cpp
  • internal/metal/mlx_gen_metal_jit_scatter.cpp
  • internal/metal/mlx_gen_metal_jit_reduce_utils.cpp
  • internal/metal/mlx_gen_metal_jit_hadamard.cpp
  • internal/metal/mlx_gen_metal_jit_gather.cpp
  • compute_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • internal/metal/mlx_gen_metal_jit_masked_scatter.cpp
  • internal/metal/mlx_gen_metal_jit_gather_front.cpp
  • internal/metal/mlx_gen_metal_jit_unary_ops.cpp
  • internal/metal/mlx_gen_cpu_compiled_preamble.cpp

Comment thread compute_darwin.go
Comment on lines +95 to +100
func (base *bufferBase) replaceLocked(next *metal.Array) {
if base.array != nil {
metal.Free(base.array)
}
base.array = next
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== internal/metal synchronisation and lifetime APIs ==\n'
rg -n -C3 --type=go 'func .*Synchronize|DefaultStream|func .*Apply|func .*Free|func .*FromValues|func .*Zeros' internal/metal

printf '\n== array replacement/free call sites in compute_darwin.go ==\n'
rg -n -C3 'replaceLocked|metal\.Free|kernel\.Apply|syncLocked|Upload\(' compute_darwin.go

Repository: dAppCore/go-mlx

Length of output: 50371


🏁 Script executed:

head -15 compute_darwin.go && printf '\n---\n' && sed -n '90,105p' compute_darwin.go

Repository: dAppCore/go-mlx

Length of output: 667


🏁 Script executed:

rg -n -i 'license|initialized|color|behavior|centre' compute_darwin.go || echo "No US spelling variants found"

Repository: dAppCore/go-mlx

Length of output: 89


🏁 Script executed:

sed -n '1,350p' compute_darwin.go | grep -n -A5 'func.*Run\|func.*Upload\|func.*syncLocked\|Synchronize' | head -100

Repository: dAppCore/go-mlx

Length of output: 1214


🏁 Script executed:

# Look for explicit Synchronize calls in compute_darwin.go
rg -n 'Synchronize|syncLocked|metal\.Free' compute_darwin.go

Repository: dAppCore/go-mlx

Length of output: 407


🏁 Script executed:

sed -n '119,165p' compute_darwin.go

Repository: dAppCore/go-mlx

Length of output: 1362


🏁 Script executed:

sed -n '275,320p' compute_darwin.go

Repository: dAppCore/go-mlx

Length of output: 1341


🏁 Script executed:

sed -n '250,275p' compute_darwin.go && printf '\n---\n' && sed -n '340,380p' compute_darwin.go

Repository: dAppCore/go-mlx

Length of output: 2101


🏁 Script executed:

# Find ensureFrameLocked and runLocked definitions
rg -n -A15 'func.*ensureFrameLocked|func.*runLocked' compute_darwin.go | head -80

Repository: dAppCore/go-mlx

Length of output: 1313


🏁 Script executed:

# Check the frame state handling in case of errors
sed -n '275,310p' compute_darwin.go

Repository: dAppCore/go-mlx

Length of output: 1123


🏁 Script executed:

# Look for where kernel inputs/outputs are used and freed
rg -n -B3 -A8 'kernel\.Apply|dst\.replaceLocked|src\.array' compute_darwin.go | head -100

Repository: dAppCore/go-mlx

Length of output: 1289


🏁 Script executed:

# Check if metal.FromValues creates a reference that persists on GPU
rg -n -A5 'func.*FromValues' internal/metal/array.go | head -30

Repository: dAppCore/go-mlx

Length of output: 231


🏁 Script executed:

# Verify the exact sequence: does kernel.Apply block until completion?
sed -n '124,160p' internal/metal/metal_kernel.go

Repository: dAppCore/go-mlx

Length of output: 1293


🏁 Script executed:

# Check if MLX has internal refcounting for arrays
rg -n 'refcount|reference|retain' internal/metal/array.go | head -20

Repository: dAppCore/go-mlx

Length of output: 436


🏁 Script executed:

# Verify frame state cleanup on errors and whether failed frames leak
sed -n '160,180p' compute_darwin.go

Repository: dAppCore/go-mlx

Length of output: 456


🏁 Script executed:

# Check if src.array in Upload is the same as inputs to a kernel
# Look at how buffers are created and used
sed -n '50,120p' compute_darwin.go

Repository: dAppCore/go-mlx

Length of output: 1834


🏁 Script executed:

# Examine the full picture: does Upload() risk freeing an in-flight input?
sed -n '119,135p' compute_darwin.go && printf '\n---\n' && sed -n '775,800p' compute_darwin.go

Repository: dAppCore/go-mlx

Length of output: 1480


Defer freeing old metal.Array references until the stream synchronises.

Lines 95–100: replaceLocked() calls metal.Free() immediately, but kernel.Apply() only queues work on the default stream without blocking. When a buffer is reused (e.g., by Upload() or as a destination in kernel.Apply()), the GPU may still be reading from or writing to the old array at the moment metal.Free() is called, causing a use-after-free on the device.

Move freed arrays into a deferred list and drain it after metal.Synchronize() completes:

Suggested fix
type computeSession struct {
	// ... existing fields ...
	retired []*metal.Array
}

func (base *bufferBase) replaceLocked(next *metal.Array) {
	if base.array != nil {
		base.session.retired = append(base.session.retired, base.array)
	}
	base.array = next
}

func (session *computeSession) syncLocked() error {
	// ... existing sync code ...
	metal.Synchronize(metal.DefaultStream())
	
	// Drain retired arrays after GPU work completes
	for _, arr := range session.retired {
		metal.Free(arr)
	}
	session.retired = session.retired[:0]
	
	// ... rest of sync ...
}

Also call the drain logic in Close() after its final sync.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute_darwin.go` around lines 95 - 100, replaceLocked currently calls
metal.Free on the old *metal.Array immediately, which can free GPU memory while
pending work is still on the default stream; instead push the old array onto a
per-session retired slice (e.g. computeSession.retired) in
bufferBase.replaceLocked and remove the direct metal.Free call, then after
metal.Synchronize(metal.DefaultStream()) in computeSession.syncLocked drain
retired by calling metal.Free on each entry and clear the slice; also ensure the
same drain logic runs after the final sync in Close so retired arrays are freed
only after GPU work completes.

Comment thread compute_darwin.go
Comment on lines +282 to +289
session.ensureFrameLocked()

start := time.Now()
err := session.runLocked(kernel, args)
dispatchDuration := time.Since(start)
if err != nil {
return err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Rollback the implicit frame when Run() fails.

ensureFrameLocked() executes before validation and kernel lookup. If runLocked() returns an error, the session is left with an active frame even though no pass was dispatched, so the next BeginFrame() can fail and later frame metrics are polluted by the failed call. Please clear the implicitly-opened frame again on error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@compute_darwin.go` around lines 282 - 289, ensureFrameLocked() opens a frame
before validation, so if session.runLocked(kernel, args) returns an error you
must close/rollback that implicit frame before returning; modify the error path
after err != nil to call the session frame-close API (e.g.
session.clearFrameLocked() or session.rollbackFrameLocked() — whichever actual
method exists in the codebase to undo the frame opened by ensureFrameLocked())
so the frame is not left active and subsequent BeginFrame()/metrics remain
correct, then return the error.

Comment thread docs/development.md
Comment on lines +57 to +59
The forwarding translation units in `internal/metal/` include source files from
`lib/mlx`, `lib/mlx-c`, and `lib/generated`; leaving those submodules empty will
make the C++ includes fail before the Go package can build.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify that lib/generated is not a submodule.

Line 58 currently reads as if lib/generated is part of the submodule initialisation flow, which can mislead setup steps. Please separate submodule dependencies from generated sources in this sentence.

✏️ Suggested patch
-The forwarding translation units in `internal/metal/` include source files from
-`lib/mlx`, `lib/mlx-c`, and `lib/generated`; leaving those submodules empty will
-make the C++ includes fail before the Go package can build.
+The forwarding translation units in `internal/metal/` include source files from
+`lib/mlx` and `lib/mlx-c`, and also rely on generated sources under
+`lib/generated`; leaving submodules uninitialised or generated sources absent
+will make the C++ includes fail before the Go package can build.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/development.md` around lines 57 - 59, Edit the sentence that groups
submodule setup with generated sources so it clearly separates submodule
dependencies from generated code: state that `internal/metal/` includes source
files from the git submodules `lib/mlx` and `lib/mlx-c` (which must be
initialized), and then in a separate sentence note that `lib/generated` contains
generated sources (not a submodule) and must be present for C++ includes to
succeed; update the wording around the existing mention of `internal/metal/`,
`lib/mlx`, `lib/mlx-c`, and `lib/generated` to make the distinction explicit.

Comment on lines +14 to +15
// bytesRWS wraps a bytes.Buffer to satisfy io.ReadWriteSeeker.
// bytes.Buffer only provides Read and Write; this adds Seek support.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update the bytesRWS type comment to match the implementation.

The comment says it wraps bytes.Buffer, but the type stores and manages a []byte directly.

✏️ Proposed comment fix
-// bytesRWS wraps a bytes.Buffer to satisfy io.ReadWriteSeeker.
-// bytes.Buffer only provides Read and Write; this adds Seek support.
+// bytesRWS provides an in-memory io.ReadWriteSeeker over a byte slice.
+// It tracks a current position and a high-water mark for valid bytes.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// bytesRWS wraps a bytes.Buffer to satisfy io.ReadWriteSeeker.
// bytes.Buffer only provides Read and Write; this adds Seek support.
// bytesRWS provides an in-memory io.ReadWriteSeeker over a byte slice.
// It tracks a current position and a high-water mark for valid bytes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/io_custom_test.go` around lines 14 - 15, The comment for the
bytesRWS type is inaccurate: it claims to wrap bytes.Buffer but the
implementation uses and manages a []byte directly; update the bytesRWS type
comment to state that it implements io.ReadWriteSeeker over an internal []byte
buffer (managing position and length), describe that it provides Read, Write and
Seek semantics on that byte slice, and remove the reference to bytes.Buffer so
the comment matches the actual implementation of bytesRWS.

Comment on lines +1 to +2
#if defined(__has_include) && __has_include("../../lib/mlx/mlx/backend/cpu/qrf.cpp")
#include "../../lib/mlx/mlx/backend/cpu/qrf.cpp"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Check that the forwarded implementation actually exists
test -f "lib/mlx/mlx/backend/cpu/qrf.cpp" && echo "OK: lib/mlx/mlx/backend/cpu/qrf.cpp exists" || { echo "FAIL: missing lib/mlx/mlx/backend/cpu/qrf.cpp"; exit 1; }

# 2) Spot-check: are other forwarding shims using the same pattern?
rg -n --glob 'internal/metal/*mlx_mlx_backend_cpu_*.cpp' '__has_include\("\.\./\.\./lib/mlx/mlx/backend/cpu/' internal/metal | head -n 50

# 3) Optional: confirm the shim includes exactly the same path it checks
rg -n 'mlx_mlx_backend_cpu_qrf\.cpp|backend/cpu/qrf\.cpp' internal/metal/mlx_mlx_backend_cpu_qrf.cpp

Repository: dAppCore/go-mlx

Length of output: 105


The forwarded implementation file is missing; compilation will fail.

The file lib/mlx/mlx/backend/cpu/qrf.cpp does not exist at the expected path. Since __has_include("../../lib/mlx/mlx/backend/cpu/qrf.cpp") will evaluate to false, the include on line 2 is skipped and the #error on line 4 is triggered. This will cause compilation to fail for any code that includes this forwarding translation unit.

Ensure that the submodule is initialised with git submodule update --init --recursive and that lib/mlx/mlx/backend/cpu/qrf.cpp is present before submitting this change.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_backend_cpu_qrf.cpp` around lines 1 - 2, The
forwarding TU internal/metal/mlx_mlx_backend_cpu_qrf.cpp fails because the
forwarded implementation file ../../lib/mlx/mlx/backend/cpu/qrf.cpp is missing
and the `#error` gets triggered; fix by ensuring the mlx submodule is initialized
and the file is present (run git submodule update --init --recursive) or update
the include path to point to the correct existing implementation, and remove or
adjust the __has_include guard accordingly so that the forwarded include in
internal/metal/mlx_mlx_backend_cpu_qrf.cpp (the __has_include check and the
include of qrf.cpp) resolves to an actual file at build time.

Comment on lines +1 to +5
#if defined(__has_include) && __has_include("../../lib/mlx/mlx/distributed/distributed.cpp")
#include "../../lib/mlx/mlx/distributed/distributed.cpp"
#else
#error "Missing forwarded source: ../../lib/mlx/mlx/distributed/distributed.cpp. Initialise submodules with git submodule update --init --recursive or fix the forwarding include path."
#endif
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Find where this forwarding TU is referenced/added to the build.
rg -n --hidden --no-mmap "mlx_mlx_distributed_distributed\.cpp" CMakeLists.txt cmake **/*.cmake .github .forgejo || true

# 2) Look for any distributed/feature flag controlling inclusion of this file.
rg -n --hidden --no-mmap -S "distributed" CMakeLists.txt cmake **/*.cmake .github .forgejo || true

Repository: dAppCore/go-mlx

Length of output: 349


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# First, explore the repository structure
echo "=== Repository structure ==="
ls -la

# Find CMakeLists.txt files
echo -e "\n=== CMakeLists.txt files ==="
find . -name "CMakeLists.txt" -type f 2>/dev/null | head -20

# Find the file mentioned in the review
echo -e "\n=== Locate mlx_mlx_distributed_distributed.cpp ==="
find . -name "*mlx_mlx_distributed_distributed*" -type f 2>/dev/null

# Find all .cmake files
echo -e "\n=== .cmake files ==="
find . -name "*.cmake" -type f 2>/dev/null | head -20

Repository: dAppCore/go-mlx

Length of output: 4356


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Root CMakeLists.txt ==="
cat -n CMakeLists.txt

echo -e "\n=== cpp/CMakeLists.txt ==="
cat -n cpp/CMakeLists.txt

echo -e "\n=== Search for mlx_mlx_distributed in build configs ==="
rg -n "mlx_mlx_distributed" CMakeLists.txt cpp/CMakeLists.txt 2>/dev/null || echo "No matches found"

echo -e "\n=== Search for distributed feature flags ==="
rg -n -i "distributed" CMakeLists.txt cpp/CMakeLists.txt 2>/dev/null || echo "No matches found"

Repository: dAppCore/go-mlx

Length of output: 1765


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== internal/metal directory contents ==="
find ./internal/metal -type f | head -20

echo -e "\n=== Check for CMakeLists.txt in internal ==="
find ./internal -name "CMakeLists.txt" -type f

echo -e "\n=== Look for build config references in internal/metal ==="
ls -la ./internal/metal/

echo -e "\n=== Check for any build-related files in cpp directory ==="
find ./cpp -type f | grep -E "\.(txt|cmake|mk)$" || true

echo -e "\n=== Search for how C++ files are compiled in the codebase ==="
rg -n "add_library\|add_executable\|target_sources" CMakeLists.txt cpp/CMakeLists.txt 2>/dev/null || echo "No matches"

Repository: dAppCore/go-mlx

Length of output: 16871


🏁 Script executed:

#!/bin/bash
set -euo pipefefio

echo "=== Look for cgo directives in Go files ==="
rg -n "^//\s*#cgo" internal/metal/*.go | head -30

echo -e "\n=== Look for build tags in Go files ==="
rg -n "^//\s*\+build" internal/metal/*.go | head -20

echo -e "\n=== Look for 'distributed' references in Go files ==="
rg -n -i "distributed" internal/metal/*.go | head -20

echo -e "\n=== Check metal.go for compilation details ==="
cat -n internal/metal/metal.go | head -50

Repository: dAppCore/go-mlx

Length of output: 114


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Look for cgo directives in Go files ==="
rg -n "^//\s*#cgo" internal/metal/*.go | head -30

echo -e "\n=== Look for build tags in Go files ==="
rg -n "^//\s*\+build" internal/metal/*.go | head -20

echo -e "\n=== Look for 'distributed' references in Go files ==="
rg -n -i "distributed" internal/metal/*.go | head -20

echo -e "\n=== Check metal.go for compilation details ==="
head -50 internal/metal/metal.go

Repository: dAppCore/go-mlx

Length of output: 103


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Search for `#cgo` directives broadly ==="
rg -n "#cgo" . --type go | head -30

echo -e "\n=== Look at the beginning of compute_darwin.go (likely has cgo) ==="
head -100 compute_darwin.go

echo -e "\n=== Look at beginning of metal.go ==="
head -50 internal/metal/metal.go

Repository: dAppCore/go-mlx

Length of output: 5598


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Search for .cpp file references in Go code ==="
rg -n "\.cpp" internal/metal/*.go | head -30

echo -e "\n=== Look for export.go which might list sources ==="
head -100 internal/metal/export.go

echo -e "\n=== Search for any build constraints or feature flags ==="
rg -n "MLX_BUILD|BUILD_.*DISTRIBUTED|DISTRIBUTED" internal/metal/ | head -20

Repository: dAppCore/go-mlx

Length of output: 110


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Content of mlx_mlx_distributed_distributed.cpp ==="
cat internal/metal/mlx_mlx_distributed_distributed.cpp

echo -e "\n=== Content of a similar forwarding file ==="
cat internal/metal/mlx_mlx_backend_metal_distributed.cpp

echo -e "\n=== Content of mlx_mlx_array.cpp (another forwarding file) ==="
cat internal/metal/mlx_mlx_array.cpp

echo -e "\n=== Search for generate or build scripts ==="
find . -name "*.py" -o -name "*.sh" -o -name "*.pl" | xargs grep -l "mlx_mlx_" 2>/dev/null | head -10

Repository: dAppCore/go-mlx

Length of output: 1268


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Search for cgo SOURCES or file listings ==="
rg -n "cgo.*SOURCES\|LDFLAGS.*\.cpp\|\.cpp.*LDFLAGS" . --type go

echo -e "\n=== Check if cpp files are linked via LDFLAGS or similar ==="
rg -n "LDFLAGS.*internal/metal\|mlx_mlx" . --type go | head -20

echo -e "\n=== Look at the full metal.go file to see all cgo directives ==="
wc -l internal/metal/metal.go

echo -e "\n=== Check git to see how/when these files were added ==="
git log --oneline internal/metal/mlx_mlx_distributed_distributed.cpp 2>/dev/null | head -5 || echo "Git log not available"

echo -e "\n=== Check if there's a Makefile ==="
find . -name "Makefile" -o -name "makefile" | head -10

Repository: dAppCore/go-mlx

Length of output: 107


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Check build tags on all Go files in metal package ==="
head -3 internal/metal/*.go | grep -E "^==>|go:build" | head -30

echo -e "\n=== Look for any conditional imports or distributed feature flags in Go ==="
rg -n -i "feature\|distributed.*flag\|ifdef.*distributed" internal/metal/ | head -20

echo -e "\n=== Check if distributed files have build constraints ==="
rg -n "^//" internal/metal/mlx_mlx_distributed*.cpp

echo -e "\n=== Look at how Go build system handles cgo files ==="
rg -n "//go:build\|//\+build" internal/metal/*.go

Repository: dAppCore/go-mlx

Length of output: 1132


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Check the Go documentation: cgo compiles all .c/.cpp/.h files in package ==="
echo "By default, cgo compiles all .c, .cxx, .cpp, .cgo.c files in the package."
echo ""

echo "=== List all .cpp and .c files in internal/metal ==="
find internal/metal -maxdepth 1 \( -name "*.cpp" -o -name "*.c" \) | wc -l
find internal/metal -maxdepth 1 \( -name "*.cpp" -o -name "*.c" \) | sort | tail -20

echo -e "\n=== Check if there's any CXXFLAGS or compile flag controlling distributed ==="
rg -n "MLX_DISTRIBUTED|DISTRIBUTED.*FLAG" internal/metal/mlx_build_config.h

echo -e "\n=== Show full mlx_build_config.h ==="
cat internal/metal/mlx_build_config.h

Repository: dAppCore/go-mlx

Length of output: 986


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Final verification: Show mlx_build_config.h content ==="
cat internal/metal/mlx_build_config.h

echo -e "\n=== Confirm: no feature flag controls distributed compilation ==="
echo "Evidence: mlx_build_config.h is empty/minimal"
wc -c internal/metal/mlx_build_config.h

echo -e "\n=== Summary of findings ==="
echo "1. File internal/metal/mlx_mlx_distributed_distributed.cpp exists"
echo "2. All files in internal/metal/ are compiled by cgo for darwin && arm64"
echo "3. No CMake or cgo configuration provides feature flags to conditionally exclude this file"
echo "4. The file's conditional logic is only at preprocessor level (`#if` __has_include)"
echo "5. Build WILL fail if submodule not initialized, regardless of distributed feature need"

Repository: dAppCore/go-mlx

Length of output: 1178


This translation unit is compiled unconditionally and lacks a feature flag to exclude it from builds that don't require distributed functionality.

Verification confirms that internal/metal/mlx_mlx_distributed_distributed.cpp is compiled for all darwin && arm64 builds without any CMake or cgo-level conditional logic. The build configuration contains no distributed feature flag (checked mlx_build_config.h and cgo directives in metal.go). All 148 .cpp files in internal/metal/ are compiled by cgo by default. If the submodule is not initialised, the build will hard-fail with the #error message regardless of whether distributed support is actually needed.

This should be guarded at the build system level (CMake or cgo) with a distributed feature flag to prevent unconditional compilation failures for users who don't require this functionality.

🧰 Tools
🪛 Clang (14.0.6)

[error] 4-4: "Missing forwarded source: ../../lib/mlx/mlx/distributed/distributed.cpp. Initialise submodules with git submodule update --init --recursive or fix the forwarding include path."

(clang-diagnostic-error)

🪛 Cppcheck (2.20.0)

[error] 4-4: #error "Missing forwarded source

(preprocessorErrorDirective)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_distributed_distributed.cpp` around lines 1 - 5, This
translation unit unconditionally forwards
"../../lib/mlx/mlx/distributed/distributed.cpp" and can hard-fail if the
submodule is missing; wrap the include in a feature guard (e.g. change the
top-level check to `#if` defined(ENABLE_MLX_DISTRIBUTED) && defined(__has_include)
&& __has_include(...)) or similar so the file only compiles when
ENABLE_MLX_DISTRIBUTED is defined, and then add corresponding build-system logic
to define ENABLE_MLX_DISTRIBUTED from CMake or cgo (update CMakeLists or
metal.go cgo directives to set -DENABLE_MLX_DISTRIBUTED only when the
distributed submodule is present or the user requests distributed support) so
the `#error` path cannot trigger for builds that do not require distributed
functionality.

Comment on lines +1 to +5
#if defined(__has_include) && __has_include("../../lib/mlx/mlx/einsum.cpp")
#include "../../lib/mlx/mlx/einsum.cpp"
#else
#error "Missing forwarded source: ../../lib/mlx/mlx/einsum.cpp. Initialise submodules with git submodule update --init --recursive or fix the forwarding include path."
#endif
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

repo_root="$(pwd)"
target_rel="lib/mlx/mlx/einsum.cpp"
target_abs="$repo_root/$target_rel"

echo "== Check physical file existence =="
if [[ -f "$target_abs" ]]; then
  echo "OK: found $target_rel"
  ls -l "$target_abs"
else
  echo "ERROR: missing $target_rel"
  echo "If submodules aren’t initialised in this checkout, run locally:"
  echo "  git submodule update --init --recursive"
fi

echo
echo "== Check whether file is tracked by git (if this is a vendored file) =="
git ls-files --error-unmatch "$target_rel" >/dev/null 2>&1 \
  && echo "OK: tracked" \
  || echo "WARN: not tracked in git index (expected if it lives only inside submodules)"

echo
echo "== Sanity: do any einsum.cpp exist elsewhere? =="
find . -type f -name 'einsum.cpp' -print

Repository: dAppCore/go-mlx

Length of output: 439


Confirm submodule initialisation is enforced in the build pipeline

The include target lib/mlx/mlx/einsum.cpp is currently absent from the repository. This file is expected to exist only after running git submodule update --init --recursive, but the error guard in this code will cause a hard compilation failure if the submodule is not properly initialised. Ensure that the CI/build process explicitly initialises submodules before attempting to compile this translation unit, or consider documenting this requirement more prominently in the build setup.

🧰 Tools
🪛 Clang (14.0.6)

[error] 4-4: "Missing forwarded source: ../../lib/mlx/mlx/einsum.cpp. Initialise submodules with git submodule update --init --recursive or fix the forwarding include path."

(clang-diagnostic-error)

🪛 Cppcheck (2.20.0)

[error] 4-4: #error "Missing forwarded source

(preprocessorErrorDirective)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_einsum.cpp` around lines 1 - 5, The file
internal/metal/mlx_mlx_einsum.cpp forwards ../../lib/mlx/mlx/einsum.cpp and
currently errors out if that submodule file is missing; update the build/CI
scripts to explicitly initialise submodules before compilation by running git
submodule update --init --recursive (or equivalent in your CI runner) as a
pre-build step, and add a brief note to the project build docs (or CI job
description) indicating this requirement so the include in
internal/metal/mlx_mlx_einsum.cpp succeeds reliably.

Comment on lines +1 to +5
#if defined(__has_include) && __has_include("../../lib/mlx/mlx/graph_utils.cpp")
#include "../../lib/mlx/mlx/graph_utils.cpp"
#else
#error "Missing forwarded source: ../../lib/mlx/mlx/graph_utils.cpp. Initialise submodules with git submodule update --init --recursive or fix the forwarding include path."
#endif
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify the preprocessor failure mode for toolchains without __has_include.

Right now, if __has_include isn’t supported (or defined(__has_include) evaluates false), the code falls through to the #error message that blames missing forwarded sources/submodules. That can mislead builds when the real cause is compiler feature support.

🛠️ Proposed tweak
-#if defined(__has_include) && __has_include("../../lib/mlx/mlx/graph_utils.cpp")
-#include "../../lib/mlx/mlx/graph_utils.cpp"
-#else
-#error "Missing forwarded source: ../../lib/mlx/mlx/graph_utils.cpp. Initialise submodules with git submodule update --init --recursive or fix the forwarding include path."
-#endif
+#if defined(__has_include)
+#  if __has_include("../../lib/mlx/mlx/graph_utils.cpp")
+#    include "../../lib/mlx/mlx/graph_utils.cpp"
+#  else
+#    error "Missing forwarded source: ../../lib/mlx/mlx/graph_utils.cpp. Initialise submodules with git submodule update --init --recursive or fix the forwarding include path."
+#  endif
+#else
+#  error "`__has_include` is not supported by this compiler/toolchain; adjust the forwarding shim or upgrade your compiler."
+#endif
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#if defined(__has_include) && __has_include("../../lib/mlx/mlx/graph_utils.cpp")
#include "../../lib/mlx/mlx/graph_utils.cpp"
#else
#error "Missing forwarded source: ../../lib/mlx/mlx/graph_utils.cpp. Initialise submodules with git submodule update --init --recursive or fix the forwarding include path."
#endif
`#if` defined(__has_include)
# if __has_include("../../lib/mlx/mlx/graph_utils.cpp")
# include "../../lib/mlx/mlx/graph_utils.cpp"
# else
# error "Missing forwarded source: ../../lib/mlx/mlx/graph_utils.cpp. Initialise submodules with git submodule update --init --recursive or fix the forwarding include path."
# endif
`#else`
# error "`__has_include` is not supported by this compiler/toolchain; adjust the forwarding shim or upgrade your compiler."
`#endif`
🧰 Tools
🪛 Clang (14.0.6)

[error] 4-4: "Missing forwarded source: ../../lib/mlx/mlx/graph_utils.cpp. Initialise submodules with git submodule update --init --recursive or fix the forwarding include path."

(clang-diagnostic-error)

🪛 Cppcheck (2.20.0)

[error] 4-4: #error "Missing forwarded source

(preprocessorErrorDirective)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_graph_utils.cpp` around lines 1 - 5, The current
include guard uses defined(__has_include) and falls through to an `#error` that
blames missing forwarded sources even when the real problem is lack of compiler
support for __has_include; update the preprocessor logic in
mlx_mlx_graph_utils.cpp so the failure message distinguishes the two cases — if
defined(__has_include) is false emit an `#error` explaining the compiler lacks
__has_include support (and suggest a modern compiler or a manual include),
otherwise keep the existing include/#error that references
"../../lib/mlx/mlx/graph_utils.cpp" and submodule initialization; reference the
symbols __has_include and the forwarded path "../../lib/mlx/mlx/graph_utils.cpp"
when making the conditional and messages.

Comment on lines +1 to +5
#if defined(__has_include) && __has_include("../../lib/mlx/mlx/scheduler.cpp")
#include "../../lib/mlx/mlx/scheduler.cpp"
#else
#error "Missing forwarded source: ../../lib/mlx/mlx/scheduler.cpp. Initialise submodules with git submodule update --init --recursive or fix the forwarding include path."
#endif
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check for submodule recursive init in CI/build workflows/docs.
set -euo pipefail

rg -n --hidden --no-ignore-vcs 'submodule update|--init|--recursive' . \
  --glob '.github/**' --glob '.gitlab/**' --glob '.forgejo/**' --glob '.circleci/**' --glob '**/*.yml' --glob '**/*.yaml'

Repository: dAppCore/go-mlx

Length of output: 41


🏁 Script executed:

# Check if the file exists
ls -la internal/metal/mlx_mlx_scheduler.cpp 2>/dev/null || echo "File not found"

Repository: dAppCore/go-mlx

Length of output: 144


🏁 Script executed:

# Look for Makefiles, build scripts, and configuration files that might handle submodule init
find . -type f \( -name 'Makefile' -o -name '*.mk' -o -name 'build.sh' -o -name 'setup.sh' -o -name 'configure' \) 2>/dev/null | head -20

Repository: dAppCore/go-mlx

Length of output: 41


🏁 Script executed:

# Check README and documentation for submodule setup instructions
find . -type f \( -name 'README*' -o -name 'CONTRIBUTING*' -o -name 'BUILD*' -o -name 'SETUP*' \) 2>/dev/null

Repository: dAppCore/go-mlx

Length of output: 124


🏁 Script executed:

# List all CI/workflow configuration files to ensure we covered all bases
find . -type f \( -path '*/.github/*' -o -path '*/.gitlab-ci.yml' -o -path '*/.circleci/*' -o -path '*/.travis.yml' -o -path '*/tox.ini' \) 2>/dev/null | head -20

Repository: dAppCore/go-mlx

Length of output: 41


🏁 Script executed:

# Read README to check for submodule setup instructions
cat README.md

Repository: dAppCore/go-mlx

Length of output: 4836


🏁 Script executed:

# Read CONTRIBUTING to check for submodule setup instructions
cat CONTRIBUTING.md

Repository: dAppCore/go-mlx

Length of output: 1144


🏁 Script executed:

# Check for any CI config files in repo root or at root level
ls -la | grep -E '\.(yml|yaml)$|^\.github|^\.gitlab|^\.circleci'

Repository: dAppCore/go-mlx

Length of output: 126


🏁 Script executed:

# Check git submodules configuration
cat .gitmodules 2>/dev/null || echo "No .gitmodules file"

Repository: dAppCore/go-mlx

Length of output: 394


🏁 Script executed:

# List all files at repo root to see what exists
ls -la | head -30

Repository: dAppCore/go-mlx

Length of output: 2017


🏁 Script executed:

# Check .forgejo directory structure and CI configuration
find .forgejo -type f 2>/dev/null | head -20

Repository: dAppCore/go-mlx

Length of output: 124


🏁 Script executed:

# Read Forgejo workflow files if they exist
cat .forgejo/workflows/* 2>/dev/null || echo "No workflow files found in .forgejo/workflows"

Repository: dAppCore/go-mlx

Length of output: 690


CI/build robustness is already in place; enhance CONTRIBUTING.md to document the submodule requirement.

The .forgejo/workflows/test.yml correctly initialises submodules recursively via submodules: recursive before compilation. The #error directive serves its purpose as a fail-fast mechanism. However, CONTRIBUTING.md should be updated to match README.md by documenting the submodule initialisation step for developers setting up the project locally, ensuring consistency across all development entry points.

🧰 Tools
🪛 Clang (14.0.6)

[error] 4-4: "Missing forwarded source: ../../lib/mlx/mlx/scheduler.cpp. Initialise submodules with git submodule update --init --recursive or fix the forwarding include path."

(clang-diagnostic-error)

🪛 Cppcheck (2.20.0)

[error] 4-4: #error "Missing forwarded source

(preprocessorErrorDirective)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/metal/mlx_mlx_scheduler.cpp` around lines 1 - 5, The CONTRIBUTING.md
must be updated to document the Git submodule requirement to match README.md and
CI behavior: add a brief step instructing developers to run git submodule update
--init --recursive (or equivalent) during local setup, reference that the
codebase uses forwarded sources guarded by the __has_include check and the
failing `#error` directive when submodules are missing, and optionally note that
CI uses submodules: recursive so local setup should mirror that to avoid the
forwarded-source error.

Comment thread README.md
[![Go Reference](https://pkg.go.dev/badge/forge.lthn.ai/core/go-mlx.svg)](https://pkg.go.dev/forge.lthn.ai/core/go-mlx)
[![License: EUPL-1.2](https://img.shields.io/badge/License-EUPL--1.2-blue.svg)](LICENSE.md)
[![Go Reference](https://pkg.go.dev/badge/dappco.re/go/mlx.svg)](https://pkg.go.dev/dappco.re/go/mlx)
[![Licence: EUPL-1.2](https://img.shields.io/badge/Licence-EUPL--1.2-blue.svg)](LICENSE.md)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for licence files in repository root
echo "=== Checking for licence files ==="
ls -la | grep -i license || echo "No LICENSE files found in root"
ls -la | grep -i licence || echo "No LICENCE files found in root"

# Also check git-tracked files
echo -e "\n=== Git-tracked files matching 'licence' or 'license' ==="
git ls-files | grep -iE 'licen[cs]e' || echo "No licence files in git"

Repository: dAppCore/go-mlx

Length of output: 270


🏁 Script executed:

#!/bin/bash
# Read the README.md file to check line 2 and surrounding context
echo "=== Line 2 context in README.md ==="
sed -n '1,5p' README.md

# Check line 151 and surrounding context
echo -e "\n=== Line 151 context in README.md ==="
sed -n '148,154p' README.md

Repository: dAppCore/go-mlx

Length of output: 510


Create missing licence file and align both links to consistent UK spelling.

The Licence badge and section both reference non-existent files. The badge links to LICENSE.md (US spelling, does not exist) while the section links to LICENCE (UK spelling, does not exist). Per the project's coding guidelines requiring UK English spelling, create LICENCE and update the badge link to match.

🛠️ Proposed fix
-[![Licence: EUPL-1.2](https://img.shields.io/badge/Licence-EUPL--1.2-blue.svg)](LICENSE.md)
+[![Licence: EUPL-1.2](https://img.shields.io/badge/Licence-EUPL--1.2-blue.svg)](LICENCE)

Then create a LICENCE file in the repository root containing the EUPL-1.2 licence text.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[![Licence: EUPL-1.2](https://img.shields.io/badge/Licence-EUPL--1.2-blue.svg)](LICENSE.md)
[![Licence: EUPL-1.2](https://img.shields.io/badge/Licence-EUPL--1.2-blue.svg)](LICENCE)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 2, The README references mismatched licence filenames
(badge links to LICENSE.md, section links to LICENCE) and the repository lacks
the UK-spelled LICENCE file; create a new LICENCE file in the repo root
containing the EUPL-1.2 licence text, update the README badge link (in
README.md) to point to LICENCE instead of LICENSE.md, and ensure any other
README links or references that use "LICENSE.md" are changed to the UK spelling
"LICENCE" so both badge and section are consistent with project UK English
guidelines.

@Snider Snider merged commit d71e18b into main Apr 27, 2026
1 check passed
Snider added a commit that referenced this pull request Apr 27, 2026
…n PR #3

Round 4 follow-up to f746e36.

Code:
- compute_darwin.go: old metal.Array frees deferred until after
  stream sync (was racing free with in-flight kernels)
- compute_darwin.go: implicit frames rolled back on failed Run
- internal/metal/*.cpp distributed forwarding TUs: opt-out
  MLX_ENABLE_DISTRIBUTED guard added across all distributed units
- internal/metal/mlx_build_config.h: build config hooks for opt-out

Doc:
- LICENCE: missing top-level LICENCE file added
- README.md + CONTRIBUTING.md: setup guidance + corrected examples
- docs/development.md + docs/models.md: clarified submodule +
  generated-source workflow

Tests:
- compute_darwin_test.go + gguf_info_test.go + io_custom_test.go:
  related coverage updates

Disposition replies (stale in current checkout, no action):
- qrf/softmax/einsum missing-source: submodules initialised, files
  exist
- duplicate distributed_ops path comment: lib/mlx/mlx/distributed/
  ops.cpp matches TU name, no rename

Verification: gofmt clean, GOWORK=off go vet + go test -count=1 ./...
pass with explicit cache paths.

Closes residual r4 findings on #3

Co-authored-by: Codex <noreply@openai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants