Search before asking
Paimon version
1.4.1
Compute Engine
PyPaimon
Minimal reproduce step
DataBlobWriter._split_data always selected the full normal and blob column lists from the table schema. With TableWrite.with_write_type, batches only contain the narrowed columns, so pa.RecordBatch.select(...) could reference missing names and raise KeyError when a partial write included blob columns.
What doesn't meet your expectations?
Pass write_cols from FileStoreWrite into DataBlobWriter, narrow normal_column_names and blob_file_column_names to that subset (and only open blob writers for blob columns in the subset), so splits only select columns present in the batch.
Anything else?
No response
Are you willing to submit a PR?
Search before asking
Paimon version
1.4.1
Compute Engine
PyPaimon
Minimal reproduce step
DataBlobWriter._split_data always selected the full normal and blob column lists from the table schema. With TableWrite.with_write_type, batches only contain the narrowed columns, so pa.RecordBatch.select(...) could reference missing names and raise KeyError when a partial write included blob columns.
What doesn't meet your expectations?
Pass write_cols from FileStoreWrite into DataBlobWriter, narrow normal_column_names and blob_file_column_names to that subset (and only open blob writers for blob columns in the subset), so splits only select columns present in the batch.
Anything else?
No response
Are you willing to submit a PR?