A single-binary caching reverse proxy for HTTP APIs. Run it locally, point your
SDKs at http://localhost:8080/<service>/... instead of the real origin, and
external API responses get recorded the first time and replayed forever after
— deterministically, including for POST bodies.
it ensures:
-
Saving Tokens and API Costs: By caching repetitive requests to expensive upstreams (like LLM providers), Middly intercepts duplicate prompts during the iterative testing phase.
-
Creating Deterministic Testing Environments: Middly allows developers to replay cached responses, ensuring consistent and reproducible outputs. This removes network volatility, eliminates the burden of maintaining a live connection, and makes testing entirely predictable.
-
Zero-Bloat, Drop-in Deployment: Compiled as a single, statically linked binary with no CGO dependencies, Middly drops cleanly into any local workspace or CI pipeline across OS architectures without complex toolchain requirements or heavy installations.
- Download and extract the latest release for your system.
- Move the executable to your system's PATH to run it globally:
Linux & macOS:
chmod +x middly
sudo mv middly /usr/local/bin/Windows:
Move middly.exe to a dedicated folder (e.g., C:\Tools) and add that folder to your System Environment Variables.
Note: macOS users may need to clear the quarantine flag:
xattr -d com.apple.quarantine middly_darwin_arm64Pure-Go SQLite (modernc.org/sqlite) — no CGO, single static binary.
middly --routes='<path>=<upstream_url>' --mode=<mode_type> --db=<database_file>
middly # the default with (port:8080, db: cwd/cache.db, mode:record, routes: /openai, /anthropic,...)
# Point your client at the proxy:
export OPENAI_BASE_URL=http://localhost:8080/openai/v1
export OPENAI_API_KEY=sk-* #if the API key is needed!
python my_script.py # first run hits the real API and caches every call
python my_script.py # second run is fully offline, sub-ms per call
# Open the dashboard at <http://localhost:8080/dashboard>.middly --mode=replay # never forwards a request to the upstream, play everything locally.
middly --mode=passthrough # forwards every request, never reads/writes cache.| flag | default | meaning |
|---|---|---|
--port |
8080 |
listen port |
--db |
cache.db |
SQLite file path |
--mode |
record |
record | replay | passthrough |
--clear-cache |
false |
wipe the cache on startup |
--verbose |
false |
log every proxied request |
--ttl |
0 (off) |
expire entries older than this Go duration |
--include-auth |
false |
hash the Authorization header (off = team-share) |
--routes |
(built-ins) | /prefix=https://host,... overrides |
MIDDLY_MODE env var sets the mode when --mode is left at its default.
| prefix | target |
|---|---|
/openai |
https://api.openai.com |
/anthropic |
https://api.anthropic.com |
/stripe |
https://api.stripe.com |
/weather |
https://api.weather.com |
Add or override with --routes='/foo=https://example.com,/bar=https://other'.
+----------------------+--------+----------------------+----------------------+----------------------+----------------------+----------------------+----------------------+----------------------+
| request | status | time (without mid) | time (with mid) | cost (without mid) | cost (with mid) | in (without / with) | out (without / with) | tokens (without/with)|
+----------------------+--------+----------------------+----------------------+----------------------+----------------------+----------------------+----------------------+----------------------+
| long-essay | 200 | 35.96s | 55ms | $0.227 | $0 | 61 / 0 | 2048 / 0 | 2109 / 0 |
| code-generation | 200 | 33.80s | 2ms | $0.227 | $0 | 72 / 0 | 2048 / 0 | 2120 / 0 |
| system-design | 200 | 34.60s | 2ms | $0.227 | $0 | 69 / 0 | 2048 / 0 | 2117 / 0 |
| step-by-step-math | 200 | 24.57s | 2ms | $0.171 | $0 | 73 / 0 | 1540 / 0 | 1613 / 0 |
| code-review | 200 | 32.97s | 1ms | $0.202 | $0 | 78 / 0 | 1818 / 0 | 1896 / 0 |
| translation-summary | 200 | 33.79s | 2ms | $0.227 | $0 | 55 / 0 | 2048 / 0 | 2103 / 0 |
| sql-deep-dive | 200 | 22.32s | 2ms | $0.151 | $0 | 91 / 0 | 1353 / 0 | 1444 / 0 |
| creative-writing | 200 | 35.64s | 2ms | $0.227 | $0 | 56 / 0 | 2048 / 0 | 2104 / 0 |
+----------------------+--------+----------------------+----------------------+----------------------+----------------------+----------------------+----------------------+----------------------+
| TOTAL | — | 253.66s | 68ms | $0.0166 | $0 | 555 / 0 | 14951 / 0 | 15506 / 0 |
+----------------------+--------+----------------------+----------------------+----------------------+----------------------+----------------------+----------------------+----------------------+
cmd/ CLI entry point — flag parsing, wiring, signal handling
canonical/ Deterministic request → SHA-256 hash
cache/ SQLite-backed response store (WAL, connection-pooled)
proxy/ httputil.ReverseProxy + cache lookup + capture-on-miss
dashboard/ HTMX UI at /dashboard
canonical.Canonicalize produces a stable string of the form
METHOD\nNAMESPACE/path?sorted_query\nheader1: v\nheader2: v\nNORMALIZED_BODY
then hashes it with SHA-256. The hash is deterministic across:
- header order.
- query parameter order.
- JSON object key order — recursively.
- noisy/hop-by-hop headers.
- common cache-buster query params.
The Authorization header is excluded by default so a cache produced by
one developer can be replayed by another. Pass --include-auth if your
upstream's response actually depends on the credential (e.g. multi-tenant
billing data).
The Namespace (route prefix) is folded into the hash so /openai/v1/x and
/stripe/v1/x cannot collide.
cache/cache.go exposes Get / Put / Clear / Count / Recent / Export / Import
over a single SQLite file. Pragmas applied:
journal_mode=WAL— concurrent readers + a single writer, no global locksynchronous=NORMAL— durable enough for a dev cache, much faster than FULLbusy_timeout=5000— graceful contention handling
Connection pool: MaxOpenConns=8, MaxIdleConns=8.
The proxy uses httputil.ReverseProxy with three hooks:
- Director — rewrites the URL to the upstream and strips
X-Forwarded-For. - ModifyResponse — wraps
resp.Bodywith a tee-stylecaptureBodythat fills abytes.Bufferas the client reads, then on EOF/Close fires a one-shot callback that persists the captured bytes to SQLite in a goroutine (so write latency to the cache never blocks the client). - FlushInterval = 100ms — keeps SSE / chunked streams responsive.
This preserves status codes, headers, chunked encoding and trailers; the
cache stores only the decoded body and lets net/http re-encode on replay.
| mode | cache read | cache write | upstream call |
|---|---|---|---|
record |
yes | yes (on miss) | on miss |
replay |
yes | no | never (miss → 502) |
passthrough |
no | no | always |
/dashboard renders a small HTMX page that polls /dashboard/stats and
/dashboard/recent every second. No SPA framework, no build step — just a
single HTML template and HTMX from a CDN. Includes a clear cache button
(POST /dashboard/clear).
middly --port=8080 --mode=replay
go test ./...Covers:
canonical/— header order, JSON key order, query order, blacklist, namespace isolation, auth handlingproxy/— record-then-replay, replay miss → 502, passthrough never caches, namespace isolation across upstreams
cmd/main.go CLI / wiring
canonical/ hashing
cache/ sqlite store
proxy/ reverse proxy + stats
dashboard/ HTMX UI + templates
