DIM — Distributed Inference Model

Run a transformer that doesn't fit on your Mac across your Mac and your iPhone.

A two-week solo investigation: split Qwen 2.5 7B (FP16, ~15 GB of weights) between an M1 Pro MacBook (16 GB) and an iPhone 13 Pro (6 GB), route hidden activations over Wi-Fi, and measure what happens. Working prototype, with the receipts attached.

Headline

1.68× faster than solo FP16 on the same Mac under memory pressure (1.59×–1.85× across days, depending on Wi-Fi state). The full paper, with methods, regime analysis, and limitations, is in PAPER.md.

The honest caveat

Distribution helps when the model doesn't fit on one device. If your model fits (e.g. you can quantize 7B to Q4_K_M and run it in Ollama at ~29 tok/s), distribution loses by ~850×. DIM addresses one specific regime — FP16 required, weights spilling to swap — and that's the regime the paper measures. See Regime Analysis.

Architecture

Mac (host, PyTorch FP16)             iPhone (CoreML FP16)            Mac (host, cont'd)
─────────────────────────            ─────────────────────           ──────────────────
Embedding (INT8)
Head blocks 0..10 (FP16)  ── TCP ──► Middle blocks 11..16  ── TCP ─► Tail blocks 17..27
                                     FP16 forward                    LM head (INT8)
                                                                     → next-token logits

The Mac runs the embedding, head transformer blocks, tail transformer blocks, and LM head. The iPhone runs six contiguous middle transformer blocks via CoreML. Hidden activations cross over Wi-Fi as FP16 tensors (~896 KB per token). Full design: docs/ARCHITECTURE.md.

Repo layout

DIM/
├── PAPER.md            ← research-paper-style writeup, the main read
├── README.md           ← you are here
├── system/             ← the working distributed system (Python + Swift)
│   ├── pipeline/         ← head/middle/tail Qwen pipeline + evaluation harness
│   ├── ios_worker/       ← iOS app (Xcode project, Swift, CoreML inference)
│   ├── swift_worker/     ← macOS Swift CLI worker (early prototype)
│   └── benchmarks/       ← single-device baselines
├── simulation/         ← Python cluster simulator (devices.json, scenarios)
├── analysis/           ← figure + table generation pipeline
│   ├── figures/         ← all 8 PNGs in PAPER.md
│   └── tables/          ← run summary table
├── docs/               ← ARCHITECTURE, SETUP, REPRODUCING, LIMITATIONS
│   └── notes/           ← internal lab notes (kept for completeness)
└── reports/            ← 20 dated lab-notebook entries from the 2-week build

Quickstart

Three steps to reproduce the headline figure on your own machines (not the full setup — that's in docs/SETUP.md):

git clone https://github.com/dannydyl/DIM.git
cd DIM
python -m venv .venv && source .venv/bin/activate
pip install matplotlib numpy

# Regenerate every figure and table from the committed JSONL data
python analysis/generate_all.py

To run the system end-to-end (split a model, build the iOS app, take measurements), follow docs/SETUP.md and docs/REPRODUCING.md.

Requirements

Component	Version
macOS	14+
Xcode	15+
Python	3.11+
iPhone	12 Pro or later, iOS 17+
Apple Developer account	Free personal team is enough
Disk	~60 GB for model weights
Hugging Face token	Required for downloading Qwen 2.5 7B

What's not in this README

The full paper, with figures and methodology: PAPER.md
Why FP16 over the wire and not INT8: docs/ARCHITECTURE.md
Step-by-step setup, including iOS signing: docs/SETUP.md
Exact reproduction commands: docs/REPRODUCING.md
Honest list of what this work does not do: docs/LIMITATIONS.md
Two weeks of dated lab notes (dead ends included): reports/

Status

Working prototype, end of the initial two-week sprint. Not maintained as a product; not packaged for non-technical users. Open-sourced as a faithful record of one specific configuration that worked, intended for anyone curious about distributed local inference on consumer Apple hardware.

License

MIT — see LICENSE.

Citation

If this work is useful in your own writing, citing the repo URL and the commit hash is fine. There is no formal publication.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DIM — Distributed Inference Model

Headline

The honest caveat

Architecture

Repo layout

Quickstart

Requirements

What's not in this README

Status

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
analysis		analysis
docs		docs
reports		reports
simulation		simulation
system		system
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
PAPER.md		PAPER.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

DIM — Distributed Inference Model

Headline

The honest caveat

Architecture

Repo layout

Quickstart

Requirements

What's not in this README

Status

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages