Skip to content

[python][daft] Make Daft Paimon write sink serializable#8022

Merged
JingsongLi merged 1 commit into
apache:masterfrom
kerwin-zk:fix-daft-write-sink-serializable
May 29, 2026
Merged

[python][daft] Make Daft Paimon write sink serializable#8022
JingsongLi merged 1 commit into
apache:masterfrom
kerwin-zk:fix-daft-write-sink-serializable

Conversation

@kerwin-zk
Copy link
Copy Markdown
Contributor

@kerwin-zk kerwin-zk commented May 28, 2026

Purpose

PaimonDataSink currently keeps the FileStoreTable and WriteBuilder directly in the sink object. When Daft runs with the Ray runner, the sink needs to be serialized and sent to workers. For OSS/Jindo tables, the table can indirectly hold PyArrow/Jindo filesystem objects that are not picklable, causing Ray serialization failures.

  Checking PaimonDataSink daft.pickle roundtrip ...
  Traceback (most recent call last):
    File ".../test_paimon_daft_filesystem.py", line 72, in <module>
      restored_sink = loads(dumps(PaimonDataSink(table, mode="overwrite")))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ...
    File "<stringsource>", line 2, in _fs.FileSystem.__reduce_cython__
  TypeError: no default __reduce__ due to non-trivial __cinit__

This PR makes the Daft Paimon write sink serializable by only storing reconstructable table state during pickling.

Tests

CI

@YannByron YannByron self-assigned this May 28, 2026
Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I reviewed the pickle/reconstruction path for PaimonDataSink. The state now avoids carrying the table/FileIO objects, reconstructs the table from catalog options + identifier (or table path fallback), and preserves the commit user / overwrite mode when rebuilding the write builder. The changed files compile and the PR checks are green.

@JingsongLi JingsongLi merged commit 59722ab into apache:master May 29, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants