Skip to content

v0.1.2

Choose a tag to compare

@db-tycoon-stephen db-tycoon-stephen released this 19 Apr 22:34
· 102 commits to main since this release
e35eae9

Tycoon v0.1.2

Released: 2026-04-19

Six themes in this release: MotherDuck alignment,
tycoon register warehouse, network-gated e2e tests plus a new
PR-gated CI workflow
that runs the default suite on every push, a
Nao/ask cleanup pass, a new Tycoon observability layer that
captures dlt + dbt run history into a dedicated metadata DuckDB and
surfaces it as two auto-generated Rill dashboards, and a terminal-side
view of that history
via tycoon data history plus a new Runs column
on tycoon data status. Together they finish the register family,
extend the warehouse-alignment story from v0.1.1, fix a cluster of
papercuts around tycoon ask that surfaced during the MotherDuck + Nao

  • LM Studio dogfood walkthrough, and close the observability gap between
    "I ran a pipeline" and "I can see exactly what it did, now and
    historically".

Headline

Warehouse alignment now covers MotherDuck

The alignment check added in v0.1.1 prevented the "dbt writes to
data/warehouse.duckdb, tycoon data query reads from somewhere else"
divergence — but only for local DuckDB. v0.1.2 extends it to MotherDuck:
if your dbt project's profile targets md:theirs and tycoon.yml has a
local DuckDB warehouse (or the other way around), the wizard and
tycoon register dbt both prompt to adopt the dbt-side value.

Snowflake/BigQuery intentionally not covered yet — dbt profiles for
those warehouses are structurally different (no single "connection
string" to compare), and we'd rather ship MotherDuck cleanly than
half-cover the others. tycoon doctor still reports credential
readiness for Snowflake/BigQuery.

tycoon register warehouse

The register family grows one more:

tycoon register warehouse

An interactive prompt — "cloud or local?". Pick local and you're asked
for a DuckDB path. Pick cloud and you get a MotherDuck name prompt plus
instructions for grabbing a token from
app.motherduck.com/token if
MOTHERDUCK_TOKEN isn't already set. Updates
database.warehouse and stack.warehouse in tycoon.yml; prompts
before overwriting an existing warehouse setting.

We considered a parallel tycoon register ingestion and shelved it —
for external ingestion (Airbyte, Fivetran) tycoon only records the
choice; the command would have been a one-line config flip, and the
wizard already covers that case. If it turns out to be useful, it's
two dozen lines in a follow-up.

Network-gated e2e template tests

A new @pytest.mark.e2e marker, deselected from the default pytest
run, powers per-template end-to-end tests. The default suite stays fast
and fully offline; pytest -m e2e opts into the network path.

One test per built-in template exists today; coverage depth varies by
template:

  • csv-import — full ingest + row-count assertion (no network).
  • nyc-transit — fetches nyc-dot with --max-records 50; flaky
    upstream responses xfail rather than fail the build.
  • github-analytics — skipped without GITHUB_TOKEN; init-only
    today (template URLs still hard-code {owner}/{repo} placeholders).
  • weather-station — init-only today (URL templates need
    {station_id} / {office} fill-in; follow-up will add defaults).

A new .github/workflows/e2e.yml runs the suite on workflow_dispatch
only — no cron, no default-branch runs, no minutes burned on flaky
upstream timeouts. Someone clicks "Run workflow" when they want an
upstream compatibility check.

Tycoon observability: dlt + dbt run history, now a first-class citizen

tycoon data status has always shown the last dlt load time and
current row counts — useful, but silent about history. And dbt has
never had any tycoon-side observability at all: target/run_results.json
gets overwritten on every invocation, so unless you archived it
yourself you had no way to see "did my last build succeed? how long
did it take? which model is getting slower over time?"

v0.1.2 introduces a small observability layer that closes both gaps at
once.

A dedicated metadata DuckDB at .tycoon/metadata.duckdb. Four
tables, disposable by design — delete the file to reset history:

  • dlt_runs — one row per _dlt_loads entry across every source schema
  • dlt_rows_by_table — per-load row counts per table (via
    count(*) GROUP BY _dlt_load_id)
  • dbt_runs — one row per dbt invocation (command, elapsed, success,
    models_ok, models_error, tests_passed, tests_failed, dbt_version,
    target, invocation_id)
  • dbt_nodes — one row per model/test/seed/snapshot per invocation
    (status, execution_time, rows_affected, compile_time, message)

All writes are INSERT … ON CONFLICT DO NOTHING, so re-capturing the
same load or invocation is a no-op. You can also query the metadata DB
directly:

tycoon data query --db .tycoon/metadata.duckdb \
  "SELECT invocation_id, started_at, elapsed_s, success
   FROM dbt_runs ORDER BY started_at DESC LIMIT 10"

Capture hooks. Two thin best-effort calls, each wrapped in
try/except so observability can never break the operation it's
attached to:

  • The dlt ingestion runner mirrors new loads into the metadata DB at
    the end of every successful tycoon data sources run.
  • tycoon data transform run/test/build parses
    target/run_results.json immediately after each dbt invocation and
    inserts the invocation + every node.

Two auto-generated Rill dashboards. Each appears only when its
table has data, so new projects don't start with empty explores:

  • _tycoon_dlt_usage.yaml — dlt load timeline, success rate, rows
    loaded per schema/table/load.
  • _tycoon_dbt_usage.yaml — dbt run timeline, success rate, avg
    duration, models built, model errors, tests passed/failed.

Parquet snapshots under data/parquet/_tycoon/ are re-exported from
the metadata DB after every capture, so Rill stays current without you
touching a scaffold command.

One caveat worth flagging on the dlt side. Row counts derive from
count(*) GROUP BY _dlt_load_id — exact for write_disposition=append,
best-effort for replace and merge (older loads' row counts drop as
rows get overwritten). Byte sizes + per-job durations (parseable from
~/.dlt/pipelines/<name>/trace.json) are deferred, as is dbt
schema-diff via manifest.json snapshotting.

What's deliberately not hooked. tycoon run dbt … — the generic
CLI passthrough — skips observability capture. That command is pure
forwarding by design, and making it aware of a specific tool would
break the contract. Users who want history graduate to
tycoon data transform.

tycoon data history and the status Runs column

The metadata DB is great, but SQL'ing it by hand isn't the nicest UX.
v0.1.2 adds a first-class terminal view:

tycoon data history                          # last 20 runs, dlt + dbt mixed
tycoon data history --tool dbt --limit 50    # dbt-only, last 50
tycoon data history --source pokeapi         # all dlt loads for a given source
tycoon data history show deadbeef            # per-node / per-table drilldown

Short id prefixes resolve against both dlt load_ids and dbt
invocation_ids. Ambiguous prefixes error out with the candidates
listed, so you never operate on the wrong run by accident. The
--source filter accepts either a source name from tycoon.yml
(pokeapi) or a schema literal (raw_pokeapi) — both resolve to the
same filter. When --source is active, dbt runs are hidden from the
list since they aren't source-scoped.

tycoon data status gains a new Runs column pulled from the same
metadata DB, showing the total number of captured dlt loads per source.
When any source has run history, a Drill in with tycoon data history
hint is printed beneath the table. Falls back to when the metadata
DB doesn't exist yet.

Three small quality-of-life polish items landed alongside:

  1. Scaffolded .gitignore excludes .tycoon/metadata.duckdb* — new
    projects don't accidentally commit their run history on git add ..
  2. tycoon data clean learns --metadata — by default (including
    with --all), the observability metadata DB is preserved so
    routine clean cycles don't nuke run history. --metadata is the
    explicit opt-in to wipe it.
  3. tycoon doctor gains an observability check — prints one of
    "metadata DB not yet created", "no runs captured yet", or
    "N dlt load(s), M dbt run(s) captured" so "why are my dashboards
    empty?" is diagnosable in one glance.

CI: tests now gate every PR

Until v0.1.2, the test suite only ran when someone remembered to invoke
uv run pytest locally. There was no PR workflow — the only CI job
was a tag-triggered PyPI publish and the manual e2e.yml. That's a
real hole.

New .github/workflows/ci.yml runs on every pull_request and push
to main:

  • Full default pytest -q suite (unit + offline-e2e — the csv-import
    template's full init → sources add → sources run → row-count pipeline now runs on every PR, not just when someone clicks "Run
    workflow")
  • uvx ruff check lint gate
  • Matrix on Python 3.12 + 3.13 so regressions on either minor
    version get caught before release
  • Concurrency-gated so pushes cancel superseded runs

The network-gated e2e tests (nyc-transit, github-analytics,
weather-station) stay behind the original e2e marker and the
manual e2e.yml workflow — they need credentials and hit flaky
upstream APIs, not suitable for per-PR gating.

Three contributor-facing additions round out the testing story:

  • Coverage floor: CI now fails if overall coverage drops below 60%.
    Baseline is ~65%, so there's ~5% drift headroom. The floor lives in
    [tool.coverage.report].fail_under and is meant to ratchet upward
    1–2 points per release as real tests get added — aspirational
    tracking is how coverage gates end up being ignored; a regression
    gate that works is the point. The jump from 61% to 65% came from
    two targeted test-suite additions below.
  • CONTRIBUTING.md: first-time-contributor guide covering dev
    setup, what CI gates on, the three test-marker tiers (default,
    offline_e2e, e2e), code conventions, and the release process.
  • Optional pre-commit hooks: .pre-commit-config.yaml runs ruff
    (--fix mode) plus a handful of standard hygiene hooks before each
    commit. Opt-in via uvx pre-commit install — CI is still the source
    of truth; this just catches failures before the PR opens.

What changed

Added

  • Tycoon observability layer
    (#13): new
    src/tycoon/observability.py module owns .tycoon/metadata.duckdb
    with four tables (dlt_runs, dlt_rows_by_table, dbt_runs,
    dbt_nodes). Capture helpers capture_dlt and capture_dbt mirror
    from each raw DB / parse target/run_results.json with idempotent
    ON CONFLICT DO NOTHING writes. The ingestion runner calls
    capture_dlt_safe after every successful run_source; the dbt
    transform command calls capture_dbt_safe after every run / test
    / build. Both are best-effort — observability failures never
    propagate. rill_generator.refresh_usage_dashboards re-exports the
    four Parquets under data/parquet/_tycoon/ and idempotently writes
    _tycoon_dlt_usage.yaml / _tycoon_dbt_usage.yaml (plus their
    metrics_view + sources). Dashboards appear only when their backing
    table is non-empty.
  • tycoon data history — terminal view of recent dlt + dbt runs
    with --tool {all,dlt,dbt}, --limit N, and --source <name>
    filters. tycoon data history show <id> drills into a specific run
    (short id prefix resolution; ambiguous prefixes error out with
    candidates listed). --source accepts either a config name from
    tycoon.yml or a raw schema literal.
  • tycoon data status gains a Runs column pulled from
    dlt_runs in the metadata DB, plus a drill-in hint when any source
    has history. Falls back to when the metadata DB doesn't exist yet.
  • tycoon doctor now checks observability — reports one of
    "metadata DB not yet created", "no runs captured yet", or
    "N dlt load(s), M dbt run(s) captured".
  • Scaffolded .gitignore excludes .tycoon/metadata.duckdb* so
    run history never accidentally gets committed.
  • tycoon data clean --metadata — new flag for explicitly wiping
    the observability metadata DB. By default (including --all), the
    metadata DB is preserved so routine clean cycles don't nuke history.
  • .github/workflows/ci.yml — PR + main-push gate running the full
    default pytest suite (unit + offline-e2e) plus uvx ruff check on
    every change. Matrix on Python 3.12 + 3.13. Concurrency-gated.
  • offline_e2e pytest marker — promotes the csv-import template's
    full init → ingest → row-count pipeline into the default pytest
    run so CI gates on real integration, not just unit tests.
  • Ruff configuration in pyproject.toml (line-length 120,
    target py312) with two per-file ignores for legitimate patterns.
  • Coverage gate via pytest-cov: floor at 60% in
    [tool.coverage.report].fail_under; baseline ~65%. CI uploads
    coverage.xml as an artifact on the 3.12 matrix leg.
  • FastAPI server tests (11 new) — /, /health, /check-updates
    (mocked httpx), /api/status, /api/run/pipeline/{source_name},
    /api/run/dbt (including 404 / 409 paths), and the
    /ws/logs/{run_id} WebSocket.
  • Dagster orchestration smoke tests (10 new) — defs imports
    cleanly, build_ingestion_assets handles missing and present
    configs, dashed source names are sanitized, resource factories
    return valid Dagster resources. Catches the
    DagsterInvalidDefinitionError class of bug (legacy #4 / #13) at
    import time.
  • CONTRIBUTING.md — onboarding doc: dev setup, CI gate details,
    test marker semantics, code conventions, release process.
  • .pre-commit-config.yaml — optional pre-commit hooks (ruff +
    standard hygiene) mirroring CI. Opt in via uvx pre-commit install.
  • MotherDuck warehouse alignment: _extract_dbt_duckdb_path and
    the register dbt / init-wizard alignment flow now handle
    path: md:<name> dbt profiles, preserving the prefix and comparing
    against tycoon's md:* warehouse value.
  • tycoon register warehouse: cloud (MotherDuck) or local (DuckDB)
    interactive prompt; surfaces MOTHERDUCK_TOKEN guidance when absent;
    prompts before overwriting an existing warehouse.
  • @pytest.mark.e2e marker (registered in
    [tool.pytest.ini_options]) with default-deselect behavior and
    tests/test_templates_e2e.py covering all four built-in templates.
  • .github/workflows/e2e.yml: manual-trigger-only CI job for the
    e2e suite, with GITHUB_TOKEN secret available for the
    github-analytics slot.

Changed

  • Init wizard's warehouse-alignment branch now triggers when the
    chosen warehouse is either DuckDB or MotherDuck (previously DuckDB
    only). If alignment swaps a local path for an md:* target (or vice
    versa), stack.warehouse is updated accordingly.

Fixed

  • tycoon init no longer emits database.raw == database.warehouse
    (#11). The
    local-DuckDB wizard branch produced a tycoon.yml where both fields
    pointed at the same file; tycoon data transform run then failed with
    dbt-duckdb's Unique file handle conflict. Scaffolding now keeps raw
    sibling-distinct.
  • tycoon ask sync / ask chat now work out of the box
    (#6). Replaced
    python -m nao_core (which nao-core doesn't support — no __main__.py)
    with venv-colocated nao binary resolution, mirroring how
    tycoon data transform finds dbt.
  • MotherDuck URLs pass through Nao config verbatim
    (#5). Previously
    md:my_catalog was path-joined to ../../md:my_catalog, breaking every
    MotherDuck + Nao stack silently.
  • ask.include_schemas is glob-expanded before writing nao config
    (#10). Bare
    names like mart silently matched nothing under Nao's
    fnmatch(schema.table, pattern) filter; they're now auto-expanded to
    mart.*. Already-qualified patterns are left alone.
  • Nao's chat SQLite lives at .tycoon/nao/db.sqlite
    (#8) instead of
    inside the venv. Chat history and local Nao user accounts survive
    uv sync and tycoon upgrades.
  • tycoon doctor recognizes cached MotherDuck OAuth
    (#3). Used to
    unconditionally error with MOTHERDUCK_TOKEN is not set even when the
    user had a live browser-OAuth session; now reports token (env) /
    OAuth (cached session) / not configured.
  • nao-core bumped 0.0.59 → 0.1.7
    (#9) to silence
    the nagging "run nao upgrade" banner on every invocation. Emitted
    config now uses templates instead of the deprecated accessors key.

Upgrade notes

pip install -U database-tycoon

No breaking changes. Existing tycoon.yml files continue to work
unchanged; the new register subcommand and e2e marker are purely
additive.

Dependency bumps: rich, dlt, duckdb, pydantic, fastapi,
dagster*, nao-core, pytest all advanced minor-or-patch versions.
No API changes expected from any of them.

Known limitations (carried forward)

  • Snowflake and BigQuery: dbt can transform against them, but
    tycoon data query is DuckDB/MotherDuck-only for now, and warehouse
    alignment doesn't cover their dbt profiles yet.
  • External dlt pipelines still can't be run via
    tycoon data sources run — only tycoon-managed dlt sources are.
  • github-analytics and weather-station e2e tests only verify init
    today; full ingest requires template-side support for injecting
    runtime values ({owner}/{repo}, {station_id}).

What's next (v0.1.3 candidates)

  • Template parameterization: support for injecting values into
    template URL patterns at sources run time, so github-analytics
    and weather-station e2e tests can actually hit live endpoints
    end-to-end.
  • Snowflake/BigQuery warehouse alignment: structural comparison of
    dbt profiles for non-DuckDB warehouses, now that the MotherDuck path
    is proven.
  • dbt build in e2e: extend the e2e tests to run
    tycoon data transform run after ingest. Waiting on templates
    shipping at least one model to build — today they're ingest-only.
  • Observability v2: byte sizes + per-job durations for dlt (via
    ~/.dlt/pipelines/<name>/trace.json); dbt schema-diff via
    manifest.json snapshotting (detect renamed models, added columns,
    changed SQL hashes between invocations); capture hooks from
    tycoon run dbt … if we decide to break the pure-passthrough
    contract.