Small docker image with DuckDB and extensions pre-installed!
The extensions included and loaded are:
- fts
- httpfs
- icu
- json
- parquet
- postgres_scanner
- sqlite_scanner
- substrait
- duckdb-prql
- This means
ducker
can quack SQL or PRQL!
- This means
alias ducker='docker run --rm -i $([ ! -t 0 ] || echo "-t") -v $(pwd):/data -w /data duckerlabs/ducker'
then ducker
gives you a DuckDB shell with the included extensions already enabled!
Test your setup with
echo "SELECT 42" | ducker
or get the first 5 lines of a csv file named "albums.csv", with the following PRQL query:
ducker -c 'from `albums.csv` | take 5;'
If there is a .env
file in the directory that you are calling ducker
from, then that will be read in
and added to the environment inside the container.
Furthermore, if there is a .duckdbrc
file in the current directory, then it will be executed at startup
after have any environment variable references substituted using the envsubst
utility.
This means that for working with files on S3, having a .duckdbrc
file like the following in your current
directory allows you to specify your S3 credentials via a .env
file.
set s3_endpoint='${S3_ENDPOINT}';
set s3_access_key_id='${S3_ACCESS_KEY_ID}';
set s3_secret_access_key='${S3_SECRET_ACCESS_KEY}';
set s3_use_ssl=${S3_USE_SSL};
set s3_region='${S3_REGION}';
set s3_url_style='${S3_URL_STYLE}';
We can use the example from the duckdb-prql extension.
We start ducker
with:
ducker
As PRQL does not support DDL commands, we use SQL for defining our tables:
CREATE TABLE invoices AS SELECT * FROM
read_csv_auto('https://raw.githubusercontent.com/PRQL/prql/main/prql-compiler/tests/integration/data/chinook/invoices.csv');
CREATE TABLE customers AS SELECT * FROM
read_csv_auto('https://raw.githubusercontent.com/PRQL/prql/main/prql-compiler/tests/integration/data/chinook/customers.csv');
Then we can query using PRQL:
from invoices
filter invoice_date >= @1970-01-16
derive [
transaction_fees = 0.8,
income = total - transaction_fees
]
filter income > 1
group customer_id (
aggregate [
average total,
sum_income = sum income,
ct = count,
]
)
sort [-sum_income]
take 10
join c=customers [==customer_id]
derive name = f"{c.last_name}, {c.first_name}"
select [
c.customer_id, name, sum_income
]
which returns:
┌─────────────┬─────────────────────┬────────────┐
│ customer_id │ name │ sum_income │
│ int64 │ varchar │ double │
├─────────────┼─────────────────────┼────────────┤
│ 6 │ Holý, Helena │ 43.83 │
│ 7 │ Gruber, Astrid │ 36.83 │
│ 24 │ Ralston, Frank │ 37.83 │
│ 25 │ Stevens, Victor │ 36.83 │
│ 26 │ Cunningham, Richard │ 41.83 │
│ 28 │ Barnett, Julia │ 37.83 │
│ 37 │ Zimmermann, Fynn │ 37.83 │
│ 45 │ Kovács, Ladislav │ 39.83 │
│ 46 │ O'Reilly, Hugh │ 39.83 │
│ 57 │ Rojas, Luis │ 40.83 │
├─────────────┴─────────────────────┴────────────┤
│ 10 rows 3 columns │
└────────────────────────────────────────────────┘
This repo is adapted from https://github.com/davidgasquez/docker-duckdb.