Skip to content

docs: [RFC-98] Design doc of DSv2 read support#18276

Open
geserdugarov wants to merge 3 commits intoapache:masterfrom
geserdugarov:dsv2-rfc
Open

docs: [RFC-98] Design doc of DSv2 read support#18276
geserdugarov wants to merge 3 commits intoapache:masterfrom
geserdugarov:dsv2-rfc

Conversation

@geserdugarov
Copy link
Contributor

@geserdugarov geserdugarov commented Mar 4, 2026

Describe the issue this Pull Request addresses

Corresponding discussion thread: #13955
Proof of concept is available for this design: #18277

Summary and Changelog

Full design doc for claimed "Spark Datasource V2 Read" support request: #13609

Impact

None. It will be done later. Design doc discussion only at this stage.

Risk Level

None. Change design doc only.

Documentation Update

None. It will be done later. Design doc discussion only at this stage.

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Mar 4, 2026
<!-- -->
The approach is hybrid: DSv2 for reads, DSv1 fallback for writes (`V2TableWithV1Fallback`).

Overall proposed architecture for this hybrid approach is shown in the following schema:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any considerations why hybrid mode is here instead of migrating directly into v2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified it in "Implementation" chapter in 41997c4

</td>
<td>
<pre>
df.write.format("hudi_v2").mode(...).save(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not user firiendly to users though, and might bring in more compatibility/migration burdens in the long term.

Copy link
Contributor Author

@geserdugarov geserdugarov Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with you, but I don't see any alternatives. I want to add this to unblock incremental development of these huge changes, and make DSv2 the default and the only way in the end. I've added this in "Future Work" chapter in 41997c4

@geserdugarov
Copy link
Contributor Author

@yihua , @vinothchandar , if you don't mind, please, check the approach proposed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants