Offloader

The serving layer for customer-facing analytics.

In plain terms: your product's dashboards and stats pages run queries against an expensive data warehouse (Snowflake, Databricks, BigQuery) on every page load. Offloader serves those same reads from cheap, pre-computed snapshots on your own servers instead — cutting warehouse cost and speeding up responses. New to the idea? Start with What Offloader is, in plain language.

Offloader is a self-hostable container that moves repeated product-facing analytical reads off Databricks, Snowflake, BigQuery, and similar warehouses by serving approved object-storage snapshots through governed REST endpoints — on infrastructure you already operate. There is no Offloader cloud, and your private data never leaves your environment. (An optional managed CDN edge can serve already-public data — public only, opt-in; see serving public data.)

Status: the V1 gateway is feature-complete and validated against real production data. The commercial offer is a paid diagnostic plus offload pilot, not a broad data platform.

Documentation

New here → What Offloader is, in plain language (the words and the mental model, no jargon).

Then, by what you want to do:

Try it — Quickstart: run it against a bundled example in ~15 minutes, no cloud needed.
Define your endpoints — Config guide: what the offloader.yml + datasets//endpoints//keys/ files look like.
Run it in production — Operator guide · Deployment.
Security — Security model: what's protected, and what you own.
Replace an existing serving API — Cutover runbook: the safe, gradual switch-over.
Cost case — ROI diagnostic · Benchmarks.

Deeper / internal: Architecture · Product plan · V1 release plan · Operations plan · V1 task map.

Product boundary

Offloader is:

A self-hostable serving container for bounded production analytics endpoints.
A manifest-backed snapshot materializer and query runtime.
A contract registry for REST APIs over approved serving datasets.
A freshness, observability, and finance-grade ROI reporting layer.
A two-port service: one API port for product traffic, one admin/metrics port for customer-owned observability and access controls.

Offloader is not:

A warehouse replacement.
A BI tool.
A general SQL workspace.
A streaming database.
An ELT/data modeling tool.
A hosted cloud service.
A control plane, RBAC system, or SSO provider.

What it does

The engineer's-eye view (the plain-language version is in concepts):

Serve named, versioned REST endpoints over approved Parquet/CSV snapshots, materialized into DuckDB and swapped in atomically.
Read snapshots from the local filesystem OR remote object storage — s3://, gs://, https:// via DuckDB httpfs, with S3/GCS-HMAC or GCS-OAuth-bearer credentials from env (never a request).
Boot fully stateless: point OFFLOADER_CONFIG at a gs://… prefix and the whole project (datasets, endpoints, keys) is fetched from the bucket at startup — config and data in the same place, nothing mounted. With OFFLOADER_CONFIG_SYNC_INTERVAL it also hot-reloads bucket changes with zero downtime, blue-green even across a schema change.
Follow a producer that publishes on its own schedule: a Databricks commit-protocol resolver discovers the latest _committed_<tid> in GCS and refreshes per-dataset, isolated so one slow/broken source never blocks the rest; warm-start serves the on-disk snapshot instantly on restart.
Enforce API keys, endpoint allowlists, compiler-inserted tenant filters, and column allowlists — or run fully public (auth: none, accepted only when no endpoint is tenant-scoped).
Serve nested STRUCT/MAP/LIST columns as native JSON, and the same query ergonomics: combinations, per-param value aliases, applied defaults, and an allowlist-bounded ?columns= subset.
Scale: a DuckDB read-connection pool + per-request serving in the caller process (~5–6k req/s cached, p99 < 60ms on 50KB nested payloads; validated at 66 datasets / 67 endpoints against a real GCS bucket — see docs/benchmarks.md).
Expose generated docs/OpenAPI + a client /schema, Prometheus metrics (pool, refresh, per-endpoint latency), and redacted diagnostics on a separate admin port.
Preserve the previous good snapshot on refresh failure (rollback), and ship a signed container image via CI on every version tag.

Replacing an existing serving API

Offloader was built and proven to replace a real warehouse-backed serving API's production serving: offloader import-schema converts a serving_schema.json into a whole project, and offloader shadow-diff gates the cutover on proven response parity against the live system. See the cutover runbook for the shadow → canary → cutover playbook.

Runtime configuration

The primary product surface is the container plus environment variables. A standard deployment needs only:

docker run \
  -e OFFLOADER_CONFIG=/etc/offloader/offloader.yml \
  -e OFFLOADER_CACHE_DIR=/var/lib/offloader/cache \
  -e OFFLOADER_SECRET_KEY_BASE="$(openssl rand -base64 48)" \
  -v ./offloader.yml:/etc/offloader/offloader.yml:ro \
  -v offloader-cache:/var/lib/offloader/cache \
  -p 4000:4000 \
  -p 127.0.0.1:4001:4001 \
  ghcr.io/andrewdryga/offloader:<version>

The full env-var reference is in the config guide.

Repository layout

gateway/          Elixir/Phoenix self-hostable container: REST APIs, auth,
                  tenant enforcement, env-driven config, manifest refresh,
                  DuckDB materialization, admin/metrics port
tools/            Optional helper CLI: config/manifest validation, serving-schema
                  import (import-schema), cutover response-diff (shadow-diff),
                  diagnostics, endpoint tests, support bundles
deploy/           Container deployment notes and examples; no managed cloud scaffold
docs/             Product, architecture, security, operations, and release docs
examples/         Local demo manifests, endpoint configs, and sample datasets
dev/              Local verification, benchmark, and deployment-check scripts
legal/            Contracting templates (diagnostic + pilot SOW) — fill per deal

Project status (for contributors)

The release is gated against a citeable V1 gate checklist (technical + commercial proof gates). The development gate is:

make check        # format, warnings-as-errors, tests
make e2e          # manifest -> HTTP smoke
make deploy-check # build the prod image, boot it, verify both ports

License

Offloader is source-available under the Functional Source License 1.1 (FSL-1.1-ALv2). In plain terms: you may read, run, modify, and self-host it for any purpose except offering it to others as a competing product or hosted service — and each release automatically becomes Apache-2.0 two years after it ships. So you self-host freely with full source and no lock-in today, and it becomes fully open-source on a clock.

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.agent/skills		.agent/skills
.claude		.claude
.codex		.codex
.gemini		.gemini
.githooks		.githooks
.github/workflows		.github/workflows
deploy		deploy
dev		dev
docs		docs
examples		examples
gateway		gateway
legal		legal
site		site
tools		tools
.gitignore		.gitignore
.tool-versions		.tool-versions
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Offloader

Documentation

Product boundary

What it does

Replacing an existing serving API

Runtime configuration

Repository layout

Project status (for contributors)

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Offloader

Documentation

Product boundary

What it does

Replacing an existing serving API

Runtime configuration

Repository layout

Project status (for contributors)

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages