Skip to content

duckerlabs/ducker

Repository files navigation

ducker: [noun] a person or thing that ducks

🦆 Ducker

Small docker image with DuckDB and extensions pre-installed!

The extensions included and loaded are:

Quickstart

alias ducker='docker run --rm -i $([ ! -t 0 ] || echo "-t") -v $(pwd):/data -w /data duckerlabs/ducker'

then ducker gives you a DuckDB shell with the included extensions already enabled!

Test your setup with

echo "SELECT 42" | ducker

or get the first 5 lines of a csv file named "albums.csv", with the following PRQL query:

ducker -c 'from `albums.csv` | take 5;'

Config

If there is a .env file in the directory that you are calling ducker from, then that will be read in and added to the environment inside the container.

Furthermore, if there is a .duckdbrc file in the current directory, then it will be executed at startup after have any environment variable references substituted using the envsubst utility.

This means that for working with files on S3, having a .duckdbrc file like the following in your current directory allows you to specify your S3 credentials via a .env file.

set s3_endpoint='${S3_ENDPOINT}';
set s3_access_key_id='${S3_ACCESS_KEY_ID}';
set s3_secret_access_key='${S3_SECRET_ACCESS_KEY}';
set s3_use_ssl=${S3_USE_SSL};
set s3_region='${S3_REGION}';
set s3_url_style='${S3_URL_STYLE}';

Examples

We can use the example from the duckdb-prql extension.

We start ducker with:

ducker

As PRQL does not support DDL commands, we use SQL for defining our tables:

CREATE TABLE invoices AS SELECT * FROM
  read_csv_auto('https://raw.githubusercontent.com/PRQL/prql/main/prql-compiler/tests/integration/data/chinook/invoices.csv');
CREATE TABLE customers AS SELECT * FROM
  read_csv_auto('https://raw.githubusercontent.com/PRQL/prql/main/prql-compiler/tests/integration/data/chinook/customers.csv');

Then we can query using PRQL:

from invoices
filter invoice_date >= @1970-01-16
derive [
  transaction_fees = 0.8,
  income = total - transaction_fees
]
filter income > 1
group customer_id (
  aggregate [
    average total,
    sum_income = sum income,
    ct = count,
  ]
)
sort [-sum_income]
take 10
join c=customers [==customer_id]
derive name = f"{c.last_name}, {c.first_name}"
select [
  c.customer_id, name, sum_income
]

which returns:

┌─────────────┬─────────────────────┬────────────┐
│ customer_id │        name         │ sum_income │
│    int64    │       varchar       │   double   │
├─────────────┼─────────────────────┼────────────┤
│           6 │ Holý, Helena        │      43.83 │
│           7 │ Gruber, Astrid      │      36.83 │
│          24 │ Ralston, Frank      │      37.83 │
│          25 │ Stevens, Victor     │      36.83 │
│          26 │ Cunningham, Richard │      41.83 │
│          28 │ Barnett, Julia      │      37.83 │
│          37 │ Zimmermann, Fynn    │      37.83 │
│          45 │ Kovács, Ladislav    │      39.83 │
│          46 │ O'Reilly, Hugh      │      39.83 │
│          57 │ Rojas, Luis         │      40.83 │
├─────────────┴─────────────────────┴────────────┤
│ 10 rows                              3 columns │
└────────────────────────────────────────────────┘

Acknowledgements

This repo is adapted from https://github.com/davidgasquez/docker-duckdb.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published