Skip to content

Redesign IO format hierarchy before adding new DataFrame sources #450

@zaleslaw

Description

@zaleslaw

Title

Redesign IO format hierarchy before adding new DataFrame sources

Problem

The current hierarchy of SupportedCodeGenerationFormat and SupportedDataFrameFormat is hard to extend for new IO sources.

It works mostly for existing file-based formats, but adding non-file or metadata-rich sources such as JDBC requires workarounds and duplicated logic.

This makes future source integration harder and riskier.

Expected

Redesign the format hierarchy so new DataFrame sources can be added consistently.

Design scope

Clarify how the hierarchy should support:

  • file-based formats
  • non-file sources such as JDBC
  • metadata-based sources
  • code generation support
  • shared capabilities between sources

Acceptance criteria

  • Current limitations of SupportedCodeGenerationFormat and SupportedDataFrameFormat are documented
  • New hierarchy/design is proposed and agreed
  • JDBC use case is covered by the design
  • Adding a new IO source has a clear extension path
  • Existing formats continue to work
  • Migration impact for existing APIs is checked before 1.0

Motivation

This should be done before 1.0 because the format hierarchy affects internal architecture and future source integrations.

After 1.0, changing this model may be harder due to API and compatibility constraints. Several new IO sources are expected, so the extension point should be clarified before release.

Metadata

Metadata

Assignees

Labels

APIIf it touches our API

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions