Summary
Add managed import modes so users can choose whether discovered CSV and Parquet files are mounted as source-backed views or materialized into DuckDB tables.
This should improve performance for CSV-heavy projects and make Parquet-backed projects easier to transport as a single DuckDB-based datasight project.
Problem
Today, users who start from flat files do not have an explicit, inspectable ingestion mode:
- CSV files often perform better when imported into DuckDB tables instead of being queried through repeated scans.
- Parquet files are portable, but source-backed references make project sharing and relocation more fragile.
- The current project setup does not make the tradeoff between
view and table visible enough.
Proposed scope
- Add an ingestion mode concept with
auto, view, and table.
- Expose this in the CLI for project setup and file-add flows.
- Persist import metadata so users can tell how a table/view was created and from which source.
- Materialize managed tables into the project DuckDB database with deterministic naming.
- Make
auto prefer table for CSV and view for Parquet unless a later design decision changes that default.
CLI sketch
datasight init --import-mode auto
datasight add generation.csv --import-mode table --table-name generation_fuel
datasight add data/*.parquet --import-mode view
Acceptance criteria
- Users can choose
auto|view|table during relevant CLI ingestion flows.
- CSV and Parquet ingestion paths both support managed-table materialization.
- Imported objects carry source metadata that can be surfaced later in the UI and diagnostics.
- The chosen import mode is visible to the user after ingest.
- Tests cover CSV and Parquet imports for each supported mode.
Notes
- This is the lowest-risk entry point into the broader “data prep and portability” roadmap.
- Follow-on UI affordances can read from the same import metadata.
Summary
Add managed import modes so users can choose whether discovered CSV and Parquet files are mounted as source-backed views or materialized into DuckDB tables.
This should improve performance for CSV-heavy projects and make Parquet-backed projects easier to transport as a single DuckDB-based datasight project.
Problem
Today, users who start from flat files do not have an explicit, inspectable ingestion mode:
viewandtablevisible enough.Proposed scope
auto,view, andtable.autoprefertablefor CSV andviewfor Parquet unless a later design decision changes that default.CLI sketch
datasight init --import-mode auto datasight add generation.csv --import-mode table --table-name generation_fuel datasight add data/*.parquet --import-mode viewAcceptance criteria
auto|view|tableduring relevant CLI ingestion flows.Notes