Skip to content

[Bug] DataBlobWriter._split_data raise KeyError when a partial write included blob columns #7849

@SteNicholas

Description

@SteNicholas

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

1.4.1

Compute Engine

PyPaimon

Minimal reproduce step

DataBlobWriter._split_data always selected the full normal and blob column lists from the table schema. With TableWrite.with_write_type, batches only contain the narrowed columns, so pa.RecordBatch.select(...) could reference missing names and raise KeyError when a partial write included blob columns.

What doesn't meet your expectations?

Pass write_cols from FileStoreWrite into DataBlobWriter, narrow normal_column_names and blob_file_column_names to that subset (and only open blob writers for blob columns in the subset), so splits only select columns present in the batch.

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions