The serving layer for customer-facing analytics.
In plain terms: your product's dashboards and stats pages run queries against an expensive data warehouse (Snowflake, Databricks, BigQuery) on every page load. Offloader serves those same reads from cheap, pre-computed snapshots on your own servers instead — cutting warehouse cost and speeding up responses. New to the idea? Start with What Offloader is, in plain language.
Offloader is a self-hostable container that moves repeated product-facing analytical reads off Databricks, Snowflake, BigQuery, and similar warehouses by serving approved object-storage snapshots through governed REST endpoints — on infrastructure you already operate. There is no Offloader cloud, and your private data never leaves your environment. (An optional managed CDN edge can serve already-public data — public only, opt-in; see serving public data.)
Status: the V1 gateway is feature-complete and validated against real production data. The commercial offer is a paid diagnostic plus offload pilot, not a broad data platform.
New here → What Offloader is, in plain language (the words and the mental model, no jargon).
Then, by what you want to do:
- Try it — Quickstart: run it against a bundled example in ~15 minutes, no cloud needed.
- Define your endpoints — Config guide: what the
offloader.yml+datasets//endpoints//keys/files look like. - Run it in production — Operator guide · Deployment.
- Security — Security model: what's protected, and what you own.
- Replace an existing serving API — Cutover runbook: the safe, gradual switch-over.
- Cost case — ROI diagnostic · Benchmarks.
Deeper / internal: Architecture · Product plan · V1 release plan · Operations plan · V1 task map.
Offloader is:
- A self-hostable serving container for bounded production analytics endpoints.
- A manifest-backed snapshot materializer and query runtime.
- A contract registry for REST APIs over approved serving datasets.
- A freshness, observability, and finance-grade ROI reporting layer.
- A two-port service: one API port for product traffic, one admin/metrics port for customer-owned observability and access controls.
Offloader is not:
- A warehouse replacement.
- A BI tool.
- A general SQL workspace.
- A streaming database.
- An ELT/data modeling tool.
- A hosted cloud service.
- A control plane, RBAC system, or SSO provider.
The engineer's-eye view (the plain-language version is in concepts):
- Serve named, versioned REST endpoints over approved Parquet/CSV snapshots, materialized into DuckDB and swapped in atomically.
- Read snapshots from the local filesystem OR remote object storage —
s3://,gs://,https://via DuckDB httpfs, with S3/GCS-HMAC or GCS-OAuth-bearer credentials from env (never a request). - Boot fully stateless: point
OFFLOADER_CONFIGat ags://…prefix and the whole project (datasets, endpoints, keys) is fetched from the bucket at startup — config and data in the same place, nothing mounted. WithOFFLOADER_CONFIG_SYNC_INTERVALit also hot-reloads bucket changes with zero downtime, blue-green even across a schema change. - Follow a producer that publishes on its own schedule: a Databricks
commit-protocol resolver discovers the latest
_committed_<tid>in GCS and refreshes per-dataset, isolated so one slow/broken source never blocks the rest; warm-start serves the on-disk snapshot instantly on restart. - Enforce API keys, endpoint allowlists, compiler-inserted tenant filters, and column
allowlists — or run fully public (
auth: none, accepted only when no endpoint is tenant-scoped). - Serve nested
STRUCT/MAP/LISTcolumns as native JSON, and the same query ergonomics:combinations, per-param valuealiases, applieddefaults, and an allowlist-bounded?columns=subset. - Scale: a DuckDB read-connection pool + per-request serving in the caller process
(~5–6k req/s cached, p99 < 60ms on 50KB nested payloads; validated at 66
datasets / 67 endpoints against a real GCS bucket — see
docs/benchmarks.md). - Expose generated docs/OpenAPI + a client
/schema, Prometheus metrics (pool, refresh, per-endpoint latency), and redacted diagnostics on a separate admin port. - Preserve the previous good snapshot on refresh failure (
rollback), and ship a signed container image via CI on every version tag.
Offloader was built and proven to replace a real warehouse-backed serving API's production serving:
offloader import-schema converts a serving_schema.json into a whole project, and
offloader shadow-diff gates the cutover on proven response parity against the live
system. See the cutover runbook for the shadow → canary → cutover playbook.
The primary product surface is the container plus environment variables. A standard deployment needs only:
docker run \
-e OFFLOADER_CONFIG=/etc/offloader/offloader.yml \
-e OFFLOADER_CACHE_DIR=/var/lib/offloader/cache \
-e OFFLOADER_SECRET_KEY_BASE="$(openssl rand -base64 48)" \
-v ./offloader.yml:/etc/offloader/offloader.yml:ro \
-v offloader-cache:/var/lib/offloader/cache \
-p 4000:4000 \
-p 127.0.0.1:4001:4001 \
ghcr.io/andrewdryga/offloader:<version>The full env-var reference is in the config guide.
gateway/ Elixir/Phoenix self-hostable container: REST APIs, auth,
tenant enforcement, env-driven config, manifest refresh,
DuckDB materialization, admin/metrics port
tools/ Optional helper CLI: config/manifest validation, serving-schema
import (import-schema), cutover response-diff (shadow-diff),
diagnostics, endpoint tests, support bundles
deploy/ Container deployment notes and examples; no managed cloud scaffold
docs/ Product, architecture, security, operations, and release docs
examples/ Local demo manifests, endpoint configs, and sample datasets
dev/ Local verification, benchmark, and deployment-check scripts
legal/ Contracting templates (diagnostic + pilot SOW) — fill per deal
The release is gated against a citeable V1 gate checklist (technical + commercial proof gates). The development gate is:
make check # format, warnings-as-errors, tests
make e2e # manifest -> HTTP smoke
make deploy-check # build the prod image, boot it, verify both portsOffloader is source-available under the Functional Source License 1.1 (FSL-1.1-ALv2). In plain terms: you may read, run, modify, and self-host it for any purpose except offering it to others as a competing product or hosted service — and each release automatically becomes Apache-2.0 two years after it ships. So you self-host freely with full source and no lock-in today, and it becomes fully open-source on a clock.