Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.
This repository was archived by the owner on May 17, 2024. It is now read-only.

Add support for DuckDB #176

@extrobe

Description

@extrobe

DuckDB is an in-process database. You typically create it as a session, then discard it once you're done (though not the only way to use it)

https://duckdb.org

It's awesome for a few reasons that apply to data-diff. Namely, you can direct-query raw csv/txt/parquet files as though they were tables. (eg select posting_date, count(*) as r_count from '/Users/me/data.csv' group by posting_date )
We use this ability to load PROD v UAT files from our system to compare output. Being able to pass this across to data-diff would be incredible.

Whilst just being able to reference csv files in data-diff might be another option, doing this via duckDB would allow you to perform some basic transformations on the way; such as renaming fields, selecting a reduced range etc

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestnew-db-driverRequest to add a new database driver

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions