Context
mlx-server depends on ekryski/mlx-swift-lm, pinned exact: "3.32.1-alpha". The maintainer of that library is pivoting to a new project, FFAI (thewafflehaus/FFAI, docs at https://ffai.dev). mlx-swift-lm should be treated as effectively frozen.
Risk
A frozen upstream means:
Not an emergency: the dependency is pinned and Package.resolved is committed, so the build is reproducible and current releases are unaffected.
Destination: FFAI (gated)
FFAI is the likely future foundation: active, dependency-light (pre-compiled Metal kernels, no MLX/Python/JIT), with a clean API (Model.load, model.generate, streaming generateStream()) and chat templates via swift-transformers. Its API shape maps cleanly onto mlx-server's InferenceEngine, so a migration is a contained change to one file.
But FFAI v0.1.0 is not migratable yet. Two hard blockers:
- Tool calling is stubbed in FFAI. Migrating now would regress mlx-server, which has tool calling today.
- Qwen 3.5/3.6 MoE ("hybrid") architectures are FFAI Phase 5+. mlx-server's current target model (Qwen3.6-35B-A3B) would not load.
FFAI also does not have batching yet, so the pivot does not change the concurrency answer.
Plan
- Near-term: stay on the pinned
mlx-swift-lm 3.32.1-alpha. It works and supports tool calling and the MoE model.
- Migrate to FFAI when it ships: (a) tool calling, and (b) the model architectures mlx-server needs.
- Fallback: if FFAI stalls, re-base on Apple's
ml-explore/mlx-swift-examples instead.
- Update the mlx-server README, which currently sells
mlx-swift-lm's roadmap as a reason to use it; that pitch is now stale.
Related
Concurrency / Phase 2 batching is coupled to this decision: whichever foundation mlx-server lands on must be the one that can provide batched generation.
Context
mlx-server depends on
ekryski/mlx-swift-lm, pinnedexact: "3.32.1-alpha". The maintainer of that library is pivoting to a new project, FFAI (thewafflehaus/FFAI, docs at https://ffai.dev).mlx-swift-lmshould be treated as effectively frozen.Risk
A frozen upstream means:
InferenceEngineactor comment) is blocked: it cannot be built on a dead upstream.Not an emergency: the dependency is pinned and
Package.resolvedis committed, so the build is reproducible and current releases are unaffected.Destination: FFAI (gated)
FFAI is the likely future foundation: active, dependency-light (pre-compiled Metal kernels, no MLX/Python/JIT), with a clean API (
Model.load,model.generate, streaminggenerateStream()) and chat templates via swift-transformers. Its API shape maps cleanly onto mlx-server'sInferenceEngine, so a migration is a contained change to one file.But FFAI v0.1.0 is not migratable yet. Two hard blockers:
FFAI also does not have batching yet, so the pivot does not change the concurrency answer.
Plan
mlx-swift-lm 3.32.1-alpha. It works and supports tool calling and the MoE model.ml-explore/mlx-swift-examplesinstead.mlx-swift-lm's roadmap as a reason to use it; that pitch is now stale.Related
Concurrency / Phase 2 batching is coupled to this decision: whichever foundation mlx-server lands on must be the one that can provide batched generation.