Dagu DuckDB Action

Official Dagu action for running DuckDB SQL through the DuckDB CLI.

This action keeps DuckDB out of the Dagu core binary, so Dagu can remain portable and cgo-free. The action pins the DuckDB CLI with Dagu tools, which uses aqua internally.

Usage

type: graph

steps:
  - id: query
    action: duckdb@v1
    with:
      query: |
        SELECT 42 AS answer, 'duckdb' AS engine;

  - id: print_result
    depends: [query]
    run: printf '%s\n' '${query.outputs.result}'

The default output format is DuckDB JSON mode, so result is a JSON string:

[{"answer":42,"engine":"duckdb"}]

Existing DuckDB Files

Use database to run SQL against an existing DuckDB file:

type: graph

steps:
  - id: query_existing_db
    action: duckdb@v1
    with:
      database: /data/analytics.duckdb
      query: |
        SELECT count(*) AS users FROM users;

Use workdir when the database path or files referenced by SQL should be resolved relative to a directory:

type: graph

steps:
  - id: query_project_db
    action: duckdb@v1
    with:
      workdir: /data/project
      database: analytics.duckdb
      query: |
        SELECT * FROM read_csv_auto('events.csv') LIMIT 10;

The database file must exist on the worker that runs the action. In distributed shared-nothing mode, use a shared mount or an absolute path available on that worker. :memory: is scoped to one action invocation, so it cannot share state between multiple action steps.

Multiple Operations

For tightly coupled operations, run multiple SQL statements in one action. This keeps them in one DuckDB process and lets you use a transaction boundary:

type: graph

steps:
  - id: update_metrics
    action: duckdb@v1
    with:
      database: /data/analytics.duckdb
      query: |
        BEGIN TRANSACTION;

        CREATE TABLE IF NOT EXISTS metrics (
          name VARCHAR,
          value INTEGER
        );

        INSERT INTO metrics VALUES ('runs', 1);

        UPDATE metrics
        SET value = value + 1
        WHERE name = 'runs';

        COMMIT;

        SELECT * FROM metrics;

For separate DAG visibility, use multiple action steps against the same database file and connect them with depends:

type: graph

steps:
  - id: insert_rows
    action: duckdb@v1
    with:
      database: /data/analytics.duckdb
      query: |
        INSERT INTO metrics VALUES ('jobs', 10);

  - id: update_rows
    depends: [insert_rows]
    action: duckdb@v1
    with:
      database: /data/analytics.duckdb
      query: |
        UPDATE metrics SET value = value + 5 WHERE name = 'jobs';

  - id: select_rows
    depends: [update_rows]
    action: duckdb@v1
    with:
      database: /data/analytics.duckdb
      readonly: true
      query: |
        SELECT * FROM metrics WHERE name = 'jobs';

  - id: print_result
    depends: [select_rows]
    run: printf '%s\n' '${select_rows.outputs.result}'

Keep write operations ordered with depends. Parallel writes to the same DuckDB file can conflict because DuckDB uses file-level locking semantics.

Artifacts and Large Results

The result output is for small values that need to flow through Dagu variables: counts, IDs, status rows, or compact JSON. Do not use ${step.outputs.result} as a transport for large rowsets.

When the query result itself should be kept with the DAG run, stream DuckDB stdout directly to a run artifact:

type: graph

tools:
  - duckdb/duckdb@v1.5.2

steps:
  - id: export_rows
    run: |
      duckdb -batch -bail -no-stdin -csv /data/source.duckdb \
        -c "SELECT id, name, score FROM source_table WHERE score >= 80"
    stdout:
      artifact: exports/selected_rows.csv

This keeps the CSV out of Dagu output variables while making it available in the run's Artifacts tab.

You can then load that artifact file in a later DuckDB step when the artifact directory is readable by that step:

type: graph

tools:
  - duckdb/duckdb@v1.5.2

steps:
  - id: export_rows
    run: |
      duckdb -batch -bail -no-stdin -csv /data/source.duckdb \
        -c "SELECT id, name, score FROM source_table WHERE score >= 80"
    stdout:
      artifact: exports/selected_rows.csv

  - id: insert_rows
    depends: [export_rows]
    action: duckdb@v1
    with:
      database: /data/target.duckdb
      query: |
        INSERT INTO target_table
        SELECT *
        FROM read_csv_auto('${DAG_RUN_ARTIFACTS_DIR}/exports/selected_rows.csv');

For large or typed datasets, prefer writing Parquet from SQL:

steps:
  - id: export_parquet
    action: duckdb@v1
    with:
      database: /data/source.duckdb
      query: |
        COPY (
          SELECT id, name, score
          FROM source_table
          WHERE score >= 80
        )
        TO '${DAG_RUN_ARTIFACTS_DIR}/exports/selected_rows.parquet'
        (FORMAT parquet);

  - id: insert_parquet
    depends: [export_parquet]
    action: duckdb@v1
    with:
      database: /data/target.duckdb
      query: |
        INSERT INTO target_table
        SELECT *
        FROM read_parquet('${DAG_RUN_ARTIFACTS_DIR}/exports/selected_rows.parquet');

In distributed shared-nothing mode, an artifact path may be worker-local while the run is still executing. For cross-worker data handoff, use a shared mounted path, object storage, or keep the operation inside one DuckDB statement with ATTACH / INSERT INTO ... SELECT.

Inputs

Name	Type	Required	Default	Description
`query`	string	Yes		SQL passed to `duckdb -c`.
`database`	string	No	transient in-memory database	Database file path. Use an absolute path when the file lives outside the action workspace.
`workdir`	string	No	action workspace	Directory to `cd` into before running DuckDB. Use this when SQL references local files with relative paths.
`format`	string	No	`json`	Output format: `json`, `csv`, `table`, `markdown`, `line`, `list`, or `column`.
`readonly`	boolean	No	`false`	Open the database in read-only mode.

Outputs

Name	Type	Description
`result`	string	Raw DuckDB stdout in the selected format.

Local Development

Use source: to call a local checkout:

steps:
  - id: query
    action: source:file:///path/to/duckdb@local
    with:
      query: SELECT 1 AS ok;

Remote actions run in their own action workspace. If a query needs files from a caller workspace, pass workdir and use paths that exist on the worker running the action.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples		examples
scripts		scripts
.gitignore		.gitignore
README.md		README.md
dagu-action.yaml		dagu-action.yaml
workflow.yaml		workflow.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dagu DuckDB Action

Usage

Existing DuckDB Files

Multiple Operations

Artifacts and Large Results

Inputs

Outputs

Local Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dagu DuckDB Action

Usage

Existing DuckDB Files

Multiple Operations

Artifacts and Large Results

Inputs

Outputs

Local Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages