Skip to content

dannydyl/DIM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DIM — Distributed Inference Model

Run a transformer that doesn't fit on your Mac across your Mac and your iPhone.

A two-week solo investigation: split Qwen 2.5 7B (FP16, ~15 GB of weights) between an M1 Pro MacBook (16 GB) and an iPhone 13 Pro (6 GB), route hidden activations over Wi-Fi, and measure what happens. Working prototype, with the receipts attached.

Headline

Distributed beats memory-pressed solo

1.68× faster than solo FP16 on the same Mac under memory pressure (1.59×–1.85× across days, depending on Wi-Fi state). The full paper, with methods, regime analysis, and limitations, is in PAPER.md.

The honest caveat

Distribution helps when the model doesn't fit on one device. If your model fits (e.g. you can quantize 7B to Q4_K_M and run it in Ollama at ~29 tok/s), distribution loses by ~850×. DIM addresses one specific regime — FP16 required, weights spilling to swap — and that's the regime the paper measures. See Regime Analysis.

Architecture

Mac (host, PyTorch FP16)             iPhone (CoreML FP16)            Mac (host, cont'd)
─────────────────────────            ─────────────────────           ──────────────────
Embedding (INT8)
Head blocks 0..10 (FP16)  ── TCP ──► Middle blocks 11..16  ── TCP ─► Tail blocks 17..27
                                     FP16 forward                    LM head (INT8)
                                                                     → next-token logits

The Mac runs the embedding, head transformer blocks, tail transformer blocks, and LM head. The iPhone runs six contiguous middle transformer blocks via CoreML. Hidden activations cross over Wi-Fi as FP16 tensors (~896 KB per token). Full design: docs/ARCHITECTURE.md.

Repo layout

DIM/
├── PAPER.md            ← research-paper-style writeup, the main read
├── README.md           ← you are here
├── system/             ← the working distributed system (Python + Swift)
│   ├── pipeline/         ← head/middle/tail Qwen pipeline + evaluation harness
│   ├── ios_worker/       ← iOS app (Xcode project, Swift, CoreML inference)
│   ├── swift_worker/     ← macOS Swift CLI worker (early prototype)
│   └── benchmarks/       ← single-device baselines
├── simulation/         ← Python cluster simulator (devices.json, scenarios)
├── analysis/           ← figure + table generation pipeline
│   ├── figures/         ← all 8 PNGs in PAPER.md
│   └── tables/          ← run summary table
├── docs/               ← ARCHITECTURE, SETUP, REPRODUCING, LIMITATIONS
│   └── notes/           ← internal lab notes (kept for completeness)
└── reports/            ← 20 dated lab-notebook entries from the 2-week build

Quickstart

Three steps to reproduce the headline figure on your own machines (not the full setup — that's in docs/SETUP.md):

git clone https://github.com/dannydyl/DIM.git
cd DIM
python -m venv .venv && source .venv/bin/activate
pip install matplotlib numpy

# Regenerate every figure and table from the committed JSONL data
python analysis/generate_all.py

To run the system end-to-end (split a model, build the iOS app, take measurements), follow docs/SETUP.md and docs/REPRODUCING.md.

Requirements

Component Version
macOS 14+
Xcode 15+
Python 3.11+
iPhone 12 Pro or later, iOS 17+
Apple Developer account Free personal team is enough
Disk ~60 GB for model weights
Hugging Face token Required for downloading Qwen 2.5 7B

What's not in this README

Status

Working prototype, end of the initial two-week sprint. Not maintained as a product; not packaged for non-technical users. Open-sourced as a faithful record of one specific configuration that worked, intended for anyone curious about distributed local inference on consumer Apple hardware.

License

MIT — see LICENSE.

Citation

If this work is useful in your own writing, citing the repo URL and the commit hash is fine. There is no formal publication.

About

Distributed LLM inference across an Apple M1 Pro Mac and an iPhone 13 Pro — pipeline-parallel Qwen 2.5 7B (FP16) over Wi-Fi, 1.68× memory-pressed solo.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages