Skip to content

404mqs/sheet-data-toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sheet-data-toolkit

Enrich and generate datasets directly inside a Google Sheet, backed by any data warehouse.

Why

Analysts and ops teams live in spreadsheets, not query editors. The gap between "I have a list of ids and want the rest of the row" (or "I want a filtered list of records") and "someone runs a query for me and pastes the result back" is pure friction — and it repeats every day. This toolkit closes that gap: a menu and a couple of sidebars turn a Google Sheet into a front-end for your warehouse, without giving every analyst raw query access or building a bespoke internal tool from scratch.

It's shipped as a documented skeleton: real, runnable reference code against an example schema, plus the docs to port it to your own backend. The value is in the two engineering-heavy pieces most teams get wrong the first time — safely streaming large results back into a browser-side script, and deciding when a result is too big for a spreadsheet at all — solved once and documented in depth in docs/engineering-notes.md.

Architecture

 ┌────────────┐   menu / sidebar    ┌──────────────┐
 │ Google      │ ───────────────────▶│  Apps Script │
 │ Sheet       │                     │  (Code.gs)   │
 │ (UI)        │◀─────────────────── │              │
 └────────────┘   values + colors    └──────┬───────┘
                                             │ submit(mode, params)
                                             │ poll(run_id)
                                             │ fetch_result(run_id, output_ref)
                                             ▼
                                     ┌──────────────┐
                                     │ DataBackend  │   <-- the seam
                                     │ (interface)  │
                                     └──────┬───────┘
                                             │ implemented by
                                             ▼
                                     ┌──────────────┐
                                     │ Adapter      │
                                     │ (Databricks, │
                                     │  Snowflake…) │
                                     └──────┬───────┘
                                             │ runs a notebook / query
                                             ▼
                                     ┌──────────────┐
                                     │ Query engine │
                                     │ (Spark, SQL) │
                                     └──────┬───────┘
                                             │ reads/writes
                                             ▼
                                     ┌──────────────┐
                                     │ Storage      │
                                     │ (table +     │
                                     │  result file)│
                                     └──────────────┘

Full walkthrough, hop by hop: docs/architecture.md.

The three modes

Mode What you do What you get
Complete Paste a column of ids into a sheet Each id's row filled in with a fixed field set (name, status, region, tier, created_at); unmatched ids flagged and highlighted
Generate Pick filters in a sidebar (status, region, tier, activity, created-at range) A generated list of matching records, written to a sheet tab, an .xlsx, or a CSV — chosen automatically if the result is too large for the requested format
Expand Pick metrics + a number of months, on top of an already-Completed sheet Extra columns appended per metric per month (e.g. activity_metric_2026-06-01)

Screenshot placeholder: a short GIF of pasting ids and running Complete, and of the Generate sidebar with filters checked, belongs here once this is deployed against a real sheet.

Every mode operates against one example table, entities — see schema/example-schema.md.

Quick start

  1. Copy the config template.

    cp apps-script/config.example.gs apps-script/config.gs
    

    Fill in backendHost, backendToken, clusterId. config.gs is git-ignored — it holds your API token.

  2. Deploy the Apps Script project. Open your target Google Sheet → Extensions → Apps Script, and add Code.gs, config.gs, Sidebar.html, SidebarExpand.html, and appsscript.json. Reload the sheet; a "Sheet Data Toolkit" menu should appear.

  3. Create the entities table (or point at your own equivalent) in your warehouse — see schema/example-schema.md.

  4. Port the three notebooks under backend/notebooks/ into your workspace, and point an adapter (start from backend/adapters/databricks.py) at your platform's job/query API.

  5. Create the sheet tabs referenced in config.gs's SHEET_NAMES (Complete #1/#2, Generate #1/#2 by default).

Full step-by-step, including a Snowflake/BigQuery adapter sketch: docs/implementing-your-backend.md.

Repo layout

sheet-data-toolkit/
├── apps-script/
│   ├── Code.gs                    # menu, orchestration, streaming reader, batch writer
│   ├── config.example.gs          # copy to config.gs and fill in your backend
│   ├── appsscript.json            # Apps Script manifest
│   ├── Sidebar.html               # Generate mode UI
│   └── SidebarExpand.html         # Expand mode UI
├── backend/
│   ├── interface.py                # DataBackend contract + destination cap logic
│   ├── test_interface.py
│   ├── adapters/
│   │   ├── databricks.py           # reference adapter (Jobs API + DBFS)
│   │   └── README.md               # how to write your own adapter
│   └── notebooks/
│       ├── complete_records.py     # Complete mode
│       ├── generate_list.py        # Generate mode
│       └── expand_metrics.py       # Expand mode
├── schema/
│   └── example-schema.md           # the `entities` table every mode references
├── examples/
│   └── field-catalog.example.json  # filter <-> field <-> widget data contract
└── docs/
    ├── architecture.md
    ├── request-response-contract.md
    ├── implementing-your-backend.md
    └── engineering-notes.md

Docs

  • docs/architecture.md — the full flow, hop by hop, and where the DataBackend seam sits.
  • docs/request-response-contract.md — the exact request/response JSON for each mode.
  • docs/implementing-your-backend.md — step-by-step guide to porting this to your own stack.
  • docs/engineering-notes.md — the five patterns worth understanding before you edit Code.gs or the notebooks: the streaming UTF-8-safe reader, the batch write with found/not-found reorder, the dynamic filter sidebar, the per-destination cap with auto-fallback to CSV, and the distributed export without collect().

Implementing for your stack

The only backend-specific file is the adapter. If you're not on Databricks, start at docs/implementing-your-backend.md and backend/adapters/README.md — both include sketches for Snowflake and BigQuery adapters that satisfy the same submit/poll/fetch_result contract.

Provenance

This toolkit is generalized from an internal production tool built at a company for its data analysts, with every company-specific detail — hostnames, table names, credentials, business domain — removed or replaced with the generic entities example used throughout this repo. The architecture, the three modes, and the five engineering patterns in docs/engineering-notes.md are the same ones that ran in production; only the identifying details changed.

License

Apache-2.0 — see LICENSE.

About

Enrich and generate datasets directly inside a Google Sheet, backed by any data warehouse.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors