sheet-data-toolkit

Enrich and generate datasets directly inside a Google Sheet, backed by any data warehouse.

Why

Analysts and ops teams live in spreadsheets, not query editors. The gap between "I have a list of ids and want the rest of the row" (or "I want a filtered list of records") and "someone runs a query for me and pastes the result back" is pure friction — and it repeats every day. This toolkit closes that gap: a menu and a couple of sidebars turn a Google Sheet into a front-end for your warehouse, without giving every analyst raw query access or building a bespoke internal tool from scratch.

It's shipped as a documented skeleton: real, runnable reference code against an example schema, plus the docs to port it to your own backend. The value is in the two engineering-heavy pieces most teams get wrong the first time — safely streaming large results back into a browser-side script, and deciding when a result is too big for a spreadsheet at all — solved once and documented in depth in docs/engineering-notes.md.

Architecture

 ┌────────────┐   menu / sidebar    ┌──────────────┐
 │ Google      │ ───────────────────▶│  Apps Script │
 │ Sheet       │                     │  (Code.gs)   │
 │ (UI)        │◀─────────────────── │              │
 └────────────┘   values + colors    └──────┬───────┘
                                             │ submit(mode, params)
                                             │ poll(run_id)
                                             │ fetch_result(run_id, output_ref)
                                             ▼
                                     ┌──────────────┐
                                     │ DataBackend  │   <-- the seam
                                     │ (interface)  │
                                     └──────┬───────┘
                                             │ implemented by
                                             ▼
                                     ┌──────────────┐
                                     │ Adapter      │
                                     │ (Databricks, │
                                     │  Snowflake…) │
                                     └──────┬───────┘
                                             │ runs a notebook / query
                                             ▼
                                     ┌──────────────┐
                                     │ Query engine │
                                     │ (Spark, SQL) │
                                     └──────┬───────┘
                                             │ reads/writes
                                             ▼
                                     ┌──────────────┐
                                     │ Storage      │
                                     │ (table +     │
                                     │  result file)│
                                     └──────────────┘

Full walkthrough, hop by hop: docs/architecture.md.

The three modes

Mode	What you do	What you get
Complete	Paste a column of ids into a sheet	Each id's row filled in with a fixed field set (`name, status, region, tier, created_at`); unmatched ids flagged and highlighted
Generate	Pick filters in a sidebar (status, region, tier, activity, created-at range)	A generated list of matching records, written to a sheet tab, an `.xlsx`, or a CSV — chosen automatically if the result is too large for the requested format
Expand	Pick metrics + a number of months, on top of an already-Completed sheet	Extra columns appended per metric per month (e.g. `activity_metric_2026-06-01`)

Screenshot placeholder: a short GIF of pasting ids and running Complete, and of the Generate sidebar with filters checked, belongs here once this is deployed against a real sheet.

Every mode operates against one example table, entities — see schema/example-schema.md.

Quick start

Copy the config template.
```
cp apps-script/config.example.gs apps-script/config.gs
```
Fill in backendHost, backendToken, clusterId. config.gs is git-ignored — it holds your API token.
Deploy the Apps Script project. Open your target Google Sheet → Extensions → Apps Script, and add Code.gs, config.gs, Sidebar.html, SidebarExpand.html, and appsscript.json. Reload the sheet; a "Sheet Data Toolkit" menu should appear.
Create the entities table (or point at your own equivalent) in your warehouse — see schema/example-schema.md.
Port the three notebooks under backend/notebooks/ into your workspace, and point an adapter (start from backend/adapters/databricks.py) at your platform's job/query API.
Create the sheet tabs referenced in config.gs's SHEET_NAMES (Complete #1/#2, Generate #1/#2 by default).

Full step-by-step, including a Snowflake/BigQuery adapter sketch: docs/implementing-your-backend.md.

Repo layout

sheet-data-toolkit/
├── apps-script/
│   ├── Code.gs                    # menu, orchestration, streaming reader, batch writer
│   ├── config.example.gs          # copy to config.gs and fill in your backend
│   ├── appsscript.json            # Apps Script manifest
│   ├── Sidebar.html               # Generate mode UI
│   └── SidebarExpand.html         # Expand mode UI
├── backend/
│   ├── interface.py                # DataBackend contract + destination cap logic
│   ├── test_interface.py
│   ├── adapters/
│   │   ├── databricks.py           # reference adapter (Jobs API + DBFS)
│   │   └── README.md               # how to write your own adapter
│   └── notebooks/
│       ├── complete_records.py     # Complete mode
│       ├── generate_list.py        # Generate mode
│       └── expand_metrics.py       # Expand mode
├── schema/
│   └── example-schema.md           # the `entities` table every mode references
├── examples/
│   └── field-catalog.example.json  # filter <-> field <-> widget data contract
└── docs/
    ├── architecture.md
    ├── request-response-contract.md
    ├── implementing-your-backend.md
    └── engineering-notes.md

Docs

docs/architecture.md — the full flow, hop by hop, and where the DataBackend seam sits.
docs/request-response-contract.md — the exact request/response JSON for each mode.
docs/implementing-your-backend.md — step-by-step guide to porting this to your own stack.
docs/engineering-notes.md — the five patterns worth understanding before you edit Code.gs or the notebooks: the streaming UTF-8-safe reader, the batch write with found/not-found reorder, the dynamic filter sidebar, the per-destination cap with auto-fallback to CSV, and the distributed export without collect().

Implementing for your stack

The only backend-specific file is the adapter. If you're not on Databricks, start at docs/implementing-your-backend.md and backend/adapters/README.md — both include sketches for Snowflake and BigQuery adapters that satisfy the same submit/poll/fetch_result contract.

Provenance

This toolkit is generalized from an internal production tool built at a company for its data analysts, with every company-specific detail — hostnames, table names, credentials, business domain — removed or replaced with the generic entities example used throughout this repo. The architecture, the three modes, and the five engineering patterns in docs/engineering-notes.md are the same ones that ran in production; only the identifying details changed.

License

Apache-2.0 — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sheet-data-toolkit

Why

Architecture

The three modes

Quick start

Repo layout

Docs

Implementing for your stack

Provenance

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
apps-script		apps-script
backend		backend
docs		docs
examples		examples
schema		schema
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

sheet-data-toolkit

Why

Architecture

The three modes

Quick start

Repo layout

Docs

Implementing for your stack

Provenance

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages