Skip to content

v0.1.3

Choose a tag to compare

@db-tycoon-stephen db-tycoon-stephen released this 28 Apr 22:56
· 90 commits to main since this release
5bbfe33

Tycoon v0.1.3

Released: 2026-04-28

Headline

Five of the seven planned v0.1.3 themes shipped; the two XL items
(tycoon data sync cloud↔local snapshots, and one-command
MotherDuck + Nao + LM Studio setup) are deferred to v0.1.4.

  • Templates are now parameterized. tycoon init --template X --param name=value substitutes runtime values into
    tycoon.yml and source configs at scaffold time — so
    github-analytics and weather-station are actually runnable
    pipelines, not init-only demos.
  • Observability v2 landed. Every ingest captures dlt's
    trace.pickle (byte sizes + per-step durations + per-job
    detail); every dbt invocation snapshots manifest.json and
    diffs against the previous run (SQL-hash changes, added /
    removed / retyped columns, model adds/removes). All surfaces in
    tycoon data history show <id> and the existing Rill dashboards.
  • csv-import is buildable end-to-end on fresh install: the
    template now ships dbt_project/ + a sample CSV, and CI runs
    the full init → ingest → transform pipeline on every PR via a
    new offline_e2e marker.
  • Warehouse alignment now covers Snowflake, BigQuery, and
    Redshift
    — not just DuckDB and MotherDuck.

What changed

Added

See the detailed "Planned scope" sections below. Quick index:

  • Template parameterization (--param name=value) — §1
  • csv-import template ships a buildable dbt project + offline e2e
    through dbt run — §3
  • dlt trace enrichment (byte sizes, per-step durations, per-job
    detail) — §4
  • dbt manifest schema-diff (SQL hash + column changes across
    invocations) — §5
  • Snowflake / BigQuery / Redshift dbt-profile warehouse alignment
    in tycoon register dbt — §2
  • Nao context surface for coding agents: auto-generated
    AGENTS.md pointer + tycoon ask context cat-command —
    see "Nao context surface" section below.

Changed

  • tycoon data history show <load_id> now renders pipeline
    duration, total bytes written, and a per-step breakdown when a
    dlt trace is captured. The per-table view gains a Bytes column.
  • tycoon data history show <invocation_id> (dbt) now appends a
    "Schema changes vs. previous run" table when a manifest diff
    recorded changes.
  • _extract_dbt_duckdb_path is retained as a thin shim over the
    new structured _extract_dbt_warehouse_target. Existing
    callers unchanged.
  • Dependency bumps. dlt[duckdb] 1.25.0 → 1.26.0,
    dagster 1.13.0 → 1.13.2 (and the three sibling packages
    dagster-webserver, dagster-dbt 0.29.0 → 0.29.2,
    dagster-dlt 0.29.0 → 0.29.2), nao-core 0.1.7 → 0.1.8,
    typer 0.24.1 → 0.25.0, uvicorn 0.44.0 → 0.46.0,
    pydantic 2.13.2 → 2.13.3, fastapi 0.136.0 → 0.136.1.
    All 328 tests still pass after the bumps; nothing in tycoon's
    call surface required code changes.
  • Rill 0.86 docstring refresh. Rill 0.86 was released the
    same day as this version. The rill_generator.py module
    docstring now references 0.86, but the generator's output
    (Parquet bridge via local_file connector) is unchanged.
    Rill 0.86's "DuckLake live connector" was probed for v0.1.3
    scope but deferred — SQLite-backed DuckLake catalogs hold an
    exclusive OS-level lock while attached, breaking the
    "Rill running while pipelines write" workflow that the
    Parquet bridge supports today. Migration is on the v0.1.4
    candidate list pending either Rill shared-lock support or a
    Postgres-catalog story.

Fixed

  • generate_rill_config no longer accepts an unused
    warehouse_db_path param
    (carried-forward known issue from
    v0.1.2). The function only ever introspected raw_db_path;
    the warehouse path was a leftover from an earlier draft. All
    10 call sites in tests/test_explore.py plus the single
    production caller in src/tycoon/commands/explore.py are
    updated. Tightens the public surface.
  • Type diagnostics in src/tycoon/ingestion/runner.py cleared
    (carried-forward known issue from v0.1.2). The
    rest_api_source call now casts its config dict to dlt's
    exported RESTAPIConfig typed-dict type; the env-var warning
    loop pulls the matched ${VAR} string from
    _check_unexpanded_env_vars directly instead of re-running the
    regex with a suppressed # type: ignore[union-attr].
    _check_unexpanded_env_vars now returns list[tuple[str, str]]
    (key + matched-var pairs) — internal-only helper; no public-API
    impact.

Upgrade notes

pip install -U database-tycoon

No breaking changes. Four things are worth knowing:

  1. New observability tables. .tycoon/metadata.duckdb gains
    five new tables (dlt_trace_runs, dlt_trace_steps,
    dlt_trace_jobs, dbt_manifest_snapshots,
    dbt_schema_changes). Existing projects pick these up
    automatically on the next ingest / dbt run; nothing to
    migrate. Delete .tycoon/metadata.duckdb if you want to start
    fresh — it's still fully disposable.
  2. Template {owner} / {repo} placeholders are now
    {{ owner }} / {{ repo }}.
    Anyone with an in-tree fork of
    github-analytics or weather-station should update their
    placeholder syntax; dbt Jinja ({{ ref(...) }}) still passes
    through untouched.
  3. Cloud-adapter alignment is new behavior. Running tycoon register dbt against a Snowflake, BigQuery, or Redshift dbt
    project will now prompt to update stack.warehouse. Decline
    the prompt to keep the old behavior.
  4. tycoon ask init / ask sync now write an AGENTS.md
    at the project root.
    First-time existing-project users who
    re-run either command will see a new AGENTS.md appear,
    pointing coding agents at .tycoon/nao/databases/**,
    .tycoon/nao/repos/dbt/, and .tycoon/nao/RULES.md. Commit it
    — it's a static pointer file that stays useful in fresh
    clones. If you already have a hand-authored AGENTS.md,
    tycoon detects the missing <!-- @generated by tycoon ask -->
    sentinel and leaves your file alone with a warning.

Known limitations (carried forward from v0.1.2)

  • tycoon data query is DuckDB/MotherDuck-only. Snowflake and
    BigQuery warehouses can be registered and aligned, but the
    query command doesn't dispatch to them yet. Coming with the
    deferred tycoon data sync work (§7).
  • External dlt pipelines can't be run via tycoon data sources run. Only tycoon-managed dlt sources are. Out of v0.1.3 scope.

Known issues (carried from v0.1.2 — still open)

  • All three carried-forward known issues were resolved during the
    v0.1.3 cycle. See "Fixed" above.

What's next (v0.1.4 candidates)

Both deferred v0.1.3 themes carry forward.

  • tycoon data sync — cloud ↔ local DuckDB snapshots (issue
    #12).
    Snapshot a MotherDuck warehouse to a local DuckDB file (for
    offline analysis, AI training, air-gapped demos) and upload the
    other direction. Uses DuckDB's native ATTACH 'md:...' + COPY FROM; tycoon handles auth + path resolution + naming.
  • One-command MotherDuck + Nao + LM Studio setup (issue
    #7).
    A single command that walks through MotherDuck auth (OAuth or
    token), Nao init + schema sync, and LM Studio detection +
    wiring. Today each of these is a separate 2–5 step process and
    ordering matters.

Scope detail

The themes targeted for v0.1.3, with the five that landed marked
✅ and the two deferred items kept here for context.

1. Template parameterization ✅ landed

Templates can now declare parameters that get substituted at tycoon init time, turning placeholder-laden templates into concrete working
projects on the first run.

Format. Each template gains an optional template.yml metadata
file next to its tycoon.yml:

parameters:
  - name: owner
    description: GitHub username or organization
    example: octocat
    required: true
  - name: repo
    description: Repository name
    example: hello-world
    required: true

CLI. A new repeatable --param name=value flag on tycoon init:

tycoon init --template github-analytics \
  --param owner=acme --param repo=widgets

Missing required parameters are prompted for interactively. The
template.yml metadata itself never lands in the scaffolded project.

Substitution. {{ name }} placeholders (with or without
whitespace) in .yml, .yaml, .sql, .md, and .txt files under
the template directory get replaced with the resolved values. Unknown
placeholders are left intact, so dbt Jinja ({{ ref('x') }}) passes
through untouched.

Templates updated.

  • github-analytics: declares owner + repo. All four dlt
    resources (issues, pulls, stargazers, contributors) now use
    the substituted values.
  • weather-station: declares station_id + office + gridX +
    gridY, matching the NOAA API's path segments.

e2e upgrades. Both templates' @pytest.mark.e2e tests now run
actual ingestion (--max-records 5 caps) instead of stopping at
init, with xfail-on-upstream-flake semantics so rate-limiting or
API downtime doesn't hard-fail CI.

2. Snowflake / BigQuery warehouse alignment ✅ landed

v0.1.2 extended the v0.1.1 warehouse-alignment check from DuckDB-only
to MotherDuck. v0.1.3 extends it to Snowflake, BigQuery, Redshift,
and "anything else." The tradeoff called out in the original scoping
turned out to be the right framing: cloud profiles aren't
structurally comparable by a single "path" field, so the alignment
check reduces to adapter-type equality (plus account-match for
Snowflake).

New structured extractor. _extract_dbt_warehouse_target(dbt_dir)
returns a frozen DbtWarehouseTarget with four fields:

  • adapter_type — raw dbt adapter name (duckdb, snowflake,
    bigquery, redshift, or anything dbt reports).
  • identifier — the single best locator: filesystem path for local
    DuckDB, md:<name> for MotherDuck, account for Snowflake,
    project for BigQuery, host for Redshift.
  • display — human-friendly string for prompts (e.g.,
    snowflake://acme-us-east-1/ANALYTICS).
  • details — per-adapter extras (database / schema / dataset /
    warehouse / role / method / location / keyfile-path), kept so
    callers can render richer warnings without re-parsing profiles.

A .tycoon_warehouse_type helper property maps the raw adapter to
tycoon's WarehouseType enum, handling the MotherDuck-via-DuckDB
overlap (md:* paths resolve to motherduck, everything else with
adapter_type == "duckdb" resolves to duckdb). Unknown adapters
(e.g. databricks) return None.

Register-time alignment. register_dbt now branches on adapter
type:

  • DuckDB / MotherDuck — unchanged from v0.1.2 (path-normalized
    comparison, updates both database.warehouse and stack.warehouse).
  • Snowflake / BigQuery / Redshift / unknown — a new
    _align_cloud_warehouse helper warns when stack.warehouse
    disagrees with the dbt adapter type and offers to update it.
    database.warehouse is explicitly left alone (it only makes sense
    for DuckDB/MotherDuck). For Snowflake, any pre-existing
    warehouse_connection.account in tycoon.yml is compared to the
    dbt profile's account and a non-fatal mismatch warning is emitted.
  • Unknown adapters produce an informational warning but no
    stack.warehouse change (we don't want to push users into a
    WarehouseType.other we can't yet reason about).

_extract_dbt_duckdb_path is kept as a thin backwards-compatible
shim over the new structured extractor, so existing init-wizard
callers that only care about the DuckDB/MotherDuck subset stay
untouched.

Tests. 11 new register tests (6 for the structured extractor
covering duckdb/md/snowflake/bigquery/unknown/missing-profile; 4 for
the CLI-level alignment covering Snowflake/BigQuery-update,
already-aligned no-op, and account-mismatch warning). Full suite
328 passed.

3. dbt build step in e2e tests ✅ landed

The csv-import e2e test now runs the full init → ingest → transform
pipeline on every PR (via the offline_e2e marker). New in v0.1.3:

  • csv-import template ships a real dbt project. A new
    dbt_project/ subdirectory under the template bundles
    dbt_project.yml, profiles.yml, and a
    models/staging/stg_widgets.sql + schema.yml pair. Scaffolded
    verbatim by tycoon init --template csv-import.
  • Sample data included. data/input/widgets.csv ships with 10
    rows so tycoon data sources run files && tycoon data transform run
    works with zero manual setup.
  • Staging model stg_widgets casts / trims raw CSV rows into
    typed columns (widget_id INTEGER, widget_name VARCHAR,
    quantity INTEGER) with unique + not_null tests on the PK.
  • Gotcha worth documenting: dlt's filesystem source + read_csv()
    transformer produces a single unioned _read_csv table per schema
    (not one table per CSV file). The staging model references that
    dlt-internal table name; users adding more CSVs can filter or split
    downstream.
  • e2e assertions extended: row-count and column-type checks on
    main.stg_widgets, plus a cross-check that the observability layer
    captured the dbt run invocation and the per-node success.

4. dlt trace enrichment (observability v2a) ✅ landed

v0.1.2's dlt observability captured _dlt_loads + per-table row
counts. Trace-level detail (byte sizes, per-step durations, per-job
error messages) lives in ~/.dlt/pipelines/<name>/trace.pickle — dlt
ships these as pickled PipelineTrace objects, not JSON. v0.1.3 adds
a capture_dlt_trace helper that unpickles the trace, calls
.asdict(), and inserts it into three new tables in the metadata
DB:

  • dlt_trace_runs — one row per pipeline run
    (transaction_id, pipeline_name, started_at, finished_at, duration_s,
    engine_version, success, exception).
  • dlt_trace_steps — one row per (transaction_id, step)
    covering extract / normalize / load / run with per-step duration
    and step_exception.
  • dlt_trace_jobs — one row per (transaction_id, job_id) for
    load-step packages: table_name, file_format, state, file_size_bytes,
    elapsed_s, failed_message.

The capture hook runs after every successful ingest and reuses the
existing pipeline.pipeline_name to locate trace.pickle. All
operations are best-effort — a missing or malformed trace never
propagates to the caller.

Surfaces:

  • tycoon data history show <load_id> now prints pipeline name,
    total duration, total bytes written, and a Steps table (extract /
    normalize / load durations + status). The per-table view gains a
    Bytes column when trace data exists.
  • Three new Parquet files under data/parquet/_tycoon/ keep Rill's
    local_file connector in sync.

Tests. tests/test_observability_trace.py covers the dict-form
capture (insert + idempotency + missing-id + step-exception),
disk-form capture (missing file, pickled round-trip), and the
Parquet-export inclusion. tests/test_history.py verifies the
enriched drilldown renders Duration + Bytes when a trace is
captured.

5. dbt manifest.json schema-diff (observability v2b) ✅ landed

The v0.1.2 dbt_runs / dbt_nodes capture answers "what happened
during this invocation"
but doesn't catch "this model's SQL
changed between runs"
or "a column got renamed / dropped."
v0.1.3 fills the gap by snapshotting target/manifest.json after
every dbt invocation and diffing against the previous snapshot.

Two new tables:

  • dbt_manifest_snapshots — one row per captured manifest
    (invocation_id, generated_at, dbt_schema_version,
    fingerprint_json). The fingerprint is a compact JSON blob of
    {unique_id: {resource_type, checksum, columns: {name: type}}}
    filtered to model / seed / snapshot resources — small enough to
    keep the snapshot row cheap while preserving everything the diff
    needs.
  • dbt_schema_changes — one row per detected change, with
    columns invocation_id / prev_invocation_id / change_type /
    unique_id / column_name / old_value / new_value. Five
    change types: model_added, model_removed, sql_changed,
    column_added, column_removed, column_type_changed. First
    snapshot records zero changes (nothing to diff against).

Capture hook runs best-effort after every tycoon data transform run/test/build — wired into _capture_dbt_and_refresh_safe next to
the existing capture_dbt_safe. Missing or malformed manifests
never propagate.

Surfaces:

  • tycoon data history show <invocation_id> appends a "Schema
    changes vs. previous run" table (Change · Node · Column · Old →
    New) below the existing Nodes table when changes were captured.
  • Both tables are part of the Parquet export (data/parquet/_tycoon/ dbt_manifest_snapshots.parquet + dbt_schema_changes.parquet)
    ready for Rill dashboards.

Tests. tests/test_observability_manifest.py covers fingerprint
extraction (model/seed/snapshot filtering, missing checksum +
columns), the pure diff function (add / remove model, sql change,
column added / removed / type changed, first-capture no-op), and a
manifest.json round-trip with duplicate-invocation idempotency,
missing-file no-op, and safe-wrapper exception swallowing.
tests/test_history.py::test_show_dbt_surfaces_schema_changes
verifies the drilldown rendering.

Nao context surface for coding agents ✅ landed

A late-add to v0.1.3 (after the rest of the release notes were
"finalized") that shipped because it was small, additive, and
made the existing tycoon ask work meaningfully more useful for
agent-driven workflows.

What it does. tycoon ask init and tycoon ask sync now also
write an AGENTS.md at the project root pointing at the
Nao-synced context tree. Coding agents that auto-read AGENTS.md
(Claude Code, Cursor, Windsurf, etc.) get oriented to the
project's data context for free — they know to look at:

  • .tycoon/nao/databases/type=<engine>/database=<name>/schema=<schema>/table=<table>/{columns,preview}.md
  • .tycoon/nao/repos/dbt/models/ (synced dbt SQL + YAML)
  • .tycoon/nao/RULES.md (project-specific agent rules)

Sentinel-based ownership. The generated AGENTS.md carries
an <!-- @generated by tycoon ask --> sentinel near the top.
On subsequent ask init / ask sync runs, tycoon refreshes the
file only if the sentinel is present in the first 500 chars.
If a user wrote their own AGENTS.md, tycoon prints a warning
and leaves it alone — no clobbering.

tycoon ask context cat-command for piping context into any
agent harness without launching the chat UI:

tycoon ask context                       # list every synced table
tycoon ask context --table dim_users     # cat columns.md + preview.md
tycoon ask context --schema mart         # all tables in schema mart
tycoon ask context --rules-only          # cat RULES.md
tycoon ask context --include-dbt         # appends synced dbt model SQL

Output is plain markdown on stdout so it composes cleanly:

tycoon ask context --table dim_users | claude -p "explain this table"

Tests. tests/test_nao.py::TestAgentsMd covers the pure
generator + write logic (sentinel-detected overwrite, user-file
preservation, missing-file write). tests/test_ask_context.py
covers the CLI surface — listing mode, table/schema filters,
rules-only, missing-context errors, and filter-no-match
diagnostics.

6. One-command MotherDuck + Nao + LM Studio setup

Follow-through on issue
#7. A
single command (tycoon register stack or similar) walks through:
(a) MotherDuck auth (OAuth or token), (b) Nao init + schema sync,
(c) LM Studio (or any local OpenAI-compatible endpoint) detection +
wiring. Today each of these is a separate 2–5 step process and
ordering matters.

7. tycoon data sync — cloud ↔ local DuckDB snapshots

Follow-through on issue
#12. A
new subcommand that snapshots a MotherDuck warehouse to a local
DuckDB file (for offline analysis / AI training / air-gapped
demos), and conversely uploads a local DuckDB to MotherDuck. Uses
DuckDB's native ATTACH 'md:...' + COPY FROM under the hood;
tycoon's job is just the auth + path resolution + naming conventions.

Candidates carried forward from v0.1.2 "What's next"

These are the same themes enumerated above, referenced here for
continuity with the v0.1.2 release notes. If any slip past v0.1.3 they
should be moved to a v0.1.4 "What's next" section rather than left
undocumented.

Known issues to address

  • Pre-existing ty type diagnostics in src/tycoon/ingestion/runner.py
    lines 84 (dlt rest_api_source expecting RESTAPIConfig not dict)
    and 260 (regex match.group on None) — carried from v0.1.2.
  • src/tycoon/scaffolding/rill_generator.py line 210: warehouse_db_path
    public API param is accepted but unused inside generate_rill_config
    — carried from v0.1.2. Either wire it in or deprecate the parameter.

Test + CI goals for v0.1.3

  • Coverage floor ratchet: raise from 60% (v0.1.2 floor) to
    62–63% if feasible. Each new capture helper + parameterization
    path should ship with tests that push the overall number up.
  • Full e2e gate on every PR: once dbt build runs in csv-import's
    e2e, promote any future template that acquires a full offline
    pipeline into the offline_e2e marker too.
  • Dependabot: a follow-up to v0.1.2's dependency bumps — configure
    .github/dependabot.yml so Python and Actions deps get automated PRs
    we can review at a glance instead of bundling them into release
    cycles by hand.