Skip to content

Duckle v0.3.0

Choose a tag to compare

@github-actions github-actions released this 11 Jun 13:49
· 101 commits to main since this release

Duckle v0.3.0 - local-first, embedded ETL on DuckDB.

Patched since the initial v0.3.0 release

The binaries on this release have been refreshed. Re-download the binary for
your OS below to pick these up (the version number is unchanged).

2026-06-16 re-roll - workspace-relative paths, clearer auto-layout, materialization control, JSON zip

  • Auto-layout now arranges the canvas by dependency depth (issue #36): nodes
    flow left to right by their position in the graph, siblings stacked and
    centered, with generous spacing so edges and connectors stay readable. The
    previous layout placed every node on a single row and ignored the wiring.
  • Workspace-relative paths (issue #37): a built-in ${workspace} placeholder
    (alias ${projectroot}) resolves to the active workspace root, so source and
    sink paths can be written relative to it and a whole workspace folder stays
    portable when it is copied or moved. No context needs to be defined; it
    resolves on the canvas, in schema autodetect, and in headless / scheduled runs.
  • Schema autodetect now resolves context variables before inspecting a source,
    so a path (or any field) bound to a context variable can be detected, not just
    a hand-typed literal. Previously autodetect sent the raw ${...} placeholder
    to the engine and could not find the file.
  • Per-stage materialization control on every node's Basic tab. Choose how a step
    is stored: Auto (a view for a single consumer, a table when several steps read
    it), View (always lazy), Memory (read once, held as a table buffered in RAM),
    or Disk (read once, streamed through a temporary Parquet file to keep memory
    low for very large intermediates). In addition, a source that feeds a filter
    or quality validator which splits rows into pass and reject is now read once
    automatically instead of being scanned twice.
  • New Zip Arrays to Table transform (Transform > Array). Turn a record that
    carries a list of column names and a list of row-arrays into a normal table
    with one column per name and one row per array - the common "headings + rows"
    JSON shape, with no hand-written SQL.
  • The local webhook source no longer drops requests on macOS under load.

2026-06-15 re-roll - local accounts, faster context switching, corruption-safe workspaces

  • Local multi-account profiles: create one or more named profiles (with an
    optional picture), shown in the top-right corner, and switch between them for
    quick context switching. Each profile remembers its own workspace folder, so
    switching profiles swaps the whole project context in one click. Profiles are
    stored only on this device, are never transmitted, and have no password -
    they separate working contexts, they are not a security boundary.
  • A single corrupt workspace file no longer hides everything (issue #35):
    if a context, connection or pipeline JSON file is hand-edited into invalid
    JSON, Duckle now skips just that file and shows a banner naming it, instead of
    silently failing to load the entire workspace. If a structural file
    (duckle.json / repository.json) is invalid, the workspace stays
    un-editable and is never overwritten, so the good files on disk are protected
    until you fix or restore the broken one.
  • Account-switching fixes (all part of the new profiles feature):
    • Switching to, or deleting back to, a profile that points at the workspace
      already open no longer blanks the canvas down to the default sample.
    • On a cold start Duckle opens the active profile's workspace, not the last
      globally used folder.
    • The account menu and editor are no longer clipped by the top bar, so the
      dropdown opens and you can switch, add, edit and remove profiles.
  • Project: a documentation site is now live at
    https://duckle.org.

2026-06-14 re-roll (issue fixes)

  • src.xml now captures text inside CDATA sections instead of skipping it, so a
    value written by snk.xml (which uses CDATA for complex cells) round-trips back
    correctly (issue #33).
  • A pipeline run via a schedule now resolves workspace context the same way the
    canvas does, so a context-based value (for example an Oracle password stored
    as a context variable) is substituted before the run. Previously the raw
    placeholder reached the driver and a job that worked from the canvas failed
    under a schedule with errors like ORA-01017 (issue #32).

2026-06-13 re-roll (UI follow-up)

  • The run-failure banner in the Output panel now has a dismiss (X) button on
    the right; it reappears on the next failed run.
  • The component properties panel always opens on the Basic tab. Switching to
    Schema / Preview / Advanced / Validation is your choice and stays put while a
    component is selected; picking a different component resets back to Basic.

2026-06-13 re-roll - full-codebase correctness audit (67 fixes)

A multi-pass review of the whole codebase (engine, connectors, transforms,
desktop, frontend) with every finding independently verified before fixing.
Binaries refreshed; re-download below to pick these up.

  • Reliability and data safety:
    • Cancelling a run now cancels only that run; a nested sub-pipeline (Iterate /
      ForEach / Run Job / Parallelize) no longer resets or steals another run's cancel.
    • The DuckDB engine and the local AI model now download atomically (written to
      a temp file and renamed into place), so an interrupted download can never
      leave a half-written file that looks installed.
    • A partial "Run from here" no longer advances incremental / change-feed
      watermarks, so a later full run cannot skip rows that a preview loaded but
      never wrote to a sink.
    • In fast batched runs, a failing transform is now blamed on the correct node
      instead of the downstream sink.
    • Autosave keeps a tab marked unsaved if the write actually failed, instead of
      silently losing edits.
  • Formats and connectors:
    • Cloud (S3 / GCS / Azure) sources and sinks now reject Avro / ORC clearly
      instead of silently reading them as CSV or writing Parquet to a .avro/.orc path.
    • CSV "Windows-1252" encoding now works (it was previously rejected).
    • Kafka "Initial offset" (earliest / latest) is now honoured; the default
      reads the available backlog.
    • Snowflake and Databricks requests use the merged OS + bundled trust store
      again (fixes a corporate-proxy / Zscaler TLS regression). The Snowflake sink
      waits for an async statement to finish (no false success), REST page
      pagination starts from the right page, and the Snowflake source handles
      gzipped partitions and typed columns.
    • Identifier escaping hardened for ClickHouse, Cassandra and Oracle; the XML
      sink emits only valid element names; the Mongo source no longer drops a row
      when one value fails to convert.
  • Transforms and SQL:
    • Window aggregate (aggwin) with an Order by keeps the per-partition total
      instead of silently becoming a running total.
    • INTERSECT / EXCEPT match by column name; NTILE uses the requested bucket
      count; rank direction is parsed correctly.
    • Denormalize, array-collect and JSON array-agg now produce a deterministic
      element order.
    • An aggregation on a named column with no function is rejected instead of
      silently becoming a row count.
  • Write-mode safety:
    • A SQLite / DuckDB sink with an unrecognised write mode (a typo like "appnd")
      now errors instead of dropping and recreating the table.
    • Upsert with blank conflict columns is rejected; a database port above 65535
      is range-checked instead of wrapping to the wrong port.
  • Hardening:
    • Closed several secret-leak paths in SQL export, the MCP server and git push
      output; the per-workspace secret key file is created owner-only.
    • UTF-8 panic guards in CSV type sniffing and the Map node, a size cap on the
      webhook body, temp-file cleanup, atomic config writes, and safer git
      clone/checkout argument handling.

2026-06-12 re-roll - secrets at rest, connector correctness, desktop reliability

  • Secrets at rest: saved connection secrets (passwords, tokens, keys) and the
    cached git token are now encrypted with a per-workspace AES-256-GCM key under
    .duckle/keys/. Only secret fields are encrypted; host, database and user
    names stay readable. The key is gitignored, so a committed connections/
    folder can be shared without exposing credentials. Existing plaintext values
    are encrypted on the next save, and ${ENV:...} placeholders are never
    encrypted.
  • Connector correctness:
    • Databricks sink no longer counts a still-running or failed write as
      success: it inspects the statement state, polls it to completion and fails
      loudly on error.
    • RabbitMQ source acknowledges messages only after the batch is durably
      written, so a write failure leaves them queued for redelivery instead of
      dropping them.
    • Webhook source answers 200 only after the batch is persisted (503 on
      failure), so a sender never treats a never-stored event as delivered.
    • The Avro sink infers a nullable schema, so a null in any column no longer
      aborts the whole file.
    • The XML sink splits a literal ]]> across two CDATA sections when writing
      nested values, keeping the output well-formed.
    • The MongoDB sink delete propagation matches boolean and numeric flag
      columns, not only strings.
    • Snowflake and Databricks delete propagation escapes backslashes in the
      delete value so it matches the source value.
    • The Snowflake source errors clearly on an unnamed result column instead of
      risking misaligned columns.
  • Desktop reliability:
    • A cached git token that cannot be decrypted (missing workspace key) now
      reports a clear error instead of a confusing push-authentication failure.
    • The local model download integrity check no longer deletes a valid
      download on a transient read error.
    • Auto Layout now persists the new node positions.

2026-06-12 re-roll

  • Engine correctness pass:
    • Cloud (S3 / GCS / Azure / HTTP) JSON sources now honour the Records path
      option and the 100 MB object-size cap, same as local JSON sources.
    • Inline dbt models with a non-identifier name (for example my-model) no
      longer fail with table-not-found on read-back; the written and read names
      now always agree.
    • A Distinct node with an Order by but no key columns now errors instead of
      silently dropping the ordering.
    • The GCP Pub/Sub source persists rows before acknowledging them, so a
      failure no longer loses the batch.
    • A REST response of {} now yields zero rows (like []), not one empty row.
    • BigQuery ATTACH escapes its project / dataset values.
  • Multi-source guards and dbt:
    • Wiring a second input into a join / diff / SCD lookup port now errors
      (it was silently dropped). Map nodes still accept multiple lookups.
    • A dbt node exposes all its upstream tables to the project as the list
      var duckle_inputs (each is also a real table dbt reads via sources).
  • App and community:
    • The native webview right-click menu (Back / Reload / Print) is suppressed
      on the app header, footer and chrome. The canvas and Projects tree keep
      their own menus, and text fields keep copy / paste.
    • A "Report a bug" button in the status bar and a Discord banner in the
      README link to the community server.

2026-06-11 re-roll

  • Multiple DuckDB sources reading from the same database file in one pipeline
    no longer fail with database with name "duckle_src" already exists. Each
    attach-backed stage releases its alias so any number can target one file.
  • History tab (#29): it rendered transparent so the canvas showed through;
    it now has an opaque background like the Run and Plan tabs.
  • JSON source Records path (#27): the plain JSON source exposes the Records
    path option too, not just the JSONL source.
  • Saved connections in node parameters (#30): a kind-filtered "Saved
    connection" picker at the top of each credential block auto-fills the
    fields below.
  • Copy and paste components (#28): Ctrl/Cmd+C and Ctrl/Cmd+V copy and paste
    components within a pipeline or across pipelines.

Highlights

dbt, rebuilt around speed

  • dbt Fusion is now the default engine: a Rust dbt runtime that parses and
    builds projects far faster than the Python toolchain. dbt Core (Apache,
    via dbt-duckdb) stays as an automatic fallback when Fusion is unavailable.
  • Free first-launch provisioning: Duckle fetches and sets up its dbt engine
    for you on first use. No Python setup, no separate install, no cost.
  • Multi-source dbt models: an xf.dbt node accepts several upstream inputs at
    once. Each wired source materializes a real table named by its node id, and
    your models read them through dbt sources - cross-system modeling inside
    one dbt build.
  • A flagship Customer 360 example ships in the gallery: six sources across
    four system types feed a dbt build (staging, intermediate and mart models
    with tests), then enrich and fan out to four sinks.

JSON ingestion: nested record paths

  • The JSON and JSONL sources have a Records path option. Point it at the key
    that holds the record array in a REST envelope (for example data, or a
    dotted path like response.records) and Duckle unnests it into proper
    columns, recursively flattening nested structs.

Native brand icons across the palette

  • Sources, sinks and SaaS connectors show full-colour brand logos on the
    canvas, in the palette and in quick-add search. Transforms keep their
    themed glyphs.

Local multi-account profiles

  • Create named profiles with an optional picture, shown top-right, each bound
    to its own workspace folder, and switch between them for quick context
    switching. Stored only on this device, never transmitted, no password.

Canvas and panels

  • The right-hand properties panel is collapsible to the edge, with the state
    remembered between sessions.
  • Each pipeline remembers its own canvas viewport (pan and zoom).

Operations (Phase 2)

  • Structured error taxonomy in the engine for clearer, categorised failures.
  • Prometheus metrics: a .prom textfile under the workspace logs/ dir for
    node_exporter's textfile collector or Grafana Alloy (no HTTP endpoint, no
    agent - works headless and air-gapped).
  • Backfill controls: inspect and reset incremental watermarks from the UI and
    the runner.
  • A Runs history tab to review past executions and trends.

Reliability

  • A single corrupt or hand-edited workspace JSON file no longer blocks the
    whole workspace from loading; the bad file is skipped and named, and good
    files on disk are protected from being overwritten.
  • All helper subprocesses (dbt, git, shell, the DuckDB CLI) run without
    flashing a console window on Windows.
  • Engine staging is resilient when a binary is locked by a running instance:
    Duckle stages the update aside and swaps it in rather than failing with an
    access error.

Install

Download the raw executable for your OS below and run it - no installer.

Upgrading

This is a drop-in upgrade from v0.2.0. Existing pipelines and connections work
unchanged. The dbt engine provisions itself the first time you run a dbt node.