Skip to content

Add managed import modes for CSV and Parquet ingestion #46

@daniel-thom

Description

@daniel-thom

Summary

Add managed import modes so users can choose whether discovered CSV and Parquet files are mounted as source-backed views or materialized into DuckDB tables.

This should improve performance for CSV-heavy projects and make Parquet-backed projects easier to transport as a single DuckDB-based datasight project.

Problem

Today, users who start from flat files do not have an explicit, inspectable ingestion mode:

  • CSV files often perform better when imported into DuckDB tables instead of being queried through repeated scans.
  • Parquet files are portable, but source-backed references make project sharing and relocation more fragile.
  • The current project setup does not make the tradeoff between view and table visible enough.

Proposed scope

  • Add an ingestion mode concept with auto, view, and table.
  • Expose this in the CLI for project setup and file-add flows.
  • Persist import metadata so users can tell how a table/view was created and from which source.
  • Materialize managed tables into the project DuckDB database with deterministic naming.
  • Make auto prefer table for CSV and view for Parquet unless a later design decision changes that default.

CLI sketch

datasight init --import-mode auto
datasight add generation.csv --import-mode table --table-name generation_fuel
datasight add data/*.parquet --import-mode view

Acceptance criteria

  • Users can choose auto|view|table during relevant CLI ingestion flows.
  • CSV and Parquet ingestion paths both support managed-table materialization.
  • Imported objects carry source metadata that can be surfaced later in the UI and diagnostics.
  • The chosen import mode is visible to the user after ingest.
  • Tests cover CSV and Parquet imports for each supported mode.

Notes

  • This is the lowest-risk entry point into the broader “data prep and portability” roadmap.
  • Follow-on UI affordances can read from the same import metadata.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions