Plan migration off mlx-swift-lm (upstream pivoting to FFAI)

## Context

mlx-server depends on `ekryski/mlx-swift-lm`, pinned `exact: "3.32.1-alpha"`. The maintainer of that library is pivoting to a new project, [FFAI](https://github.com/thewafflehaus/FFAI) (`thewafflehaus/FFAI`, docs at https://ffai.dev). `mlx-swift-lm` should be treated as effectively frozen.

## Risk

A frozen upstream means:
- No new model architectures.
- No bug fixes.
- No batched generation, so the planned Phase 2 multi-slot concurrency (see #9-area work and the `InferenceEngine` actor comment) is blocked: it cannot be built on a dead upstream.
- Eventual breakage against new Xcode / Swift / MLX versions.

Not an emergency: the dependency is pinned and `Package.resolved` is committed, so the build is reproducible and current releases are unaffected.

## Destination: FFAI (gated)

FFAI is the likely future foundation: active, dependency-light (pre-compiled Metal kernels, no MLX/Python/JIT), with a clean API (`Model.load`, `model.generate`, streaming `generateStream()`) and chat templates via swift-transformers. Its API shape maps cleanly onto mlx-server's `InferenceEngine`, so a migration is a contained change to one file.

But FFAI v0.1.0 is **not migratable yet**. Two hard blockers:
1. **Tool calling** is stubbed in FFAI. Migrating now would regress mlx-server, which has tool calling today.
2. **Qwen 3.5/3.6 MoE ("hybrid") architectures** are FFAI Phase 5+. mlx-server's current target model (Qwen3.6-35B-A3B) would not load.

FFAI also does not have batching yet, so the pivot does not change the concurrency answer.

## Plan

- **Near-term:** stay on the pinned `mlx-swift-lm 3.32.1-alpha`. It works and supports tool calling and the MoE model.
- **Migrate to FFAI when it ships:** (a) tool calling, and (b) the model architectures mlx-server needs.
- **Fallback:** if FFAI stalls, re-base on Apple's `ml-explore/mlx-swift-examples` instead.
- Update the mlx-server README, which currently sells `mlx-swift-lm`'s roadmap as a reason to use it; that pitch is now stale.

## Related

Concurrency / Phase 2 batching is coupled to this decision: whichever foundation mlx-server lands on must be the one that can provide batched generation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plan migration off mlx-swift-lm (upstream pivoting to FFAI) #11

Context

Risk

Destination: FFAI (gated)

Plan

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Plan migration off mlx-swift-lm (upstream pivoting to FFAI) #11

Description

Context

Risk

Destination: FFAI (gated)

Plan

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions