Skip to content

[python] Support null blob#7847

Merged
XiaoHongbo-Hope merged 6 commits into
apache:masterfrom
XiaoHongbo-Hope:support_null_blob
May 14, 2026
Merged

[python] Support null blob#7847
XiaoHongbo-Hope merged 6 commits into
apache:masterfrom
XiaoHongbo-Hope:support_null_blob

Conversation

@XiaoHongbo-Hope
Copy link
Copy Markdown
Contributor

Purpose

Tests

Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@leaves12138
Copy link
Copy Markdown
Contributor

leaves12138 commented May 14, 2026

Thanks for the patch. The null encoding/decoding is aligned with the Java blob format.

One edge case: BlobFormatWriter.write_value accepts col_data is None before validating that the field is actually a BLOB field. This means a non-BLOB single-column Arrow table with all-null values, e.g. pa.int64() or pa.string(), can now be written as a valid blob file, while non-null values of the same type are still rejected. Could we validate the field type before accepting nulls, or add a write_blob-level check that only large_binary is supported?

Also, the file_io.write_blob null cases currently only assert that the file exists. It would be better to add a public API round-trip test, e.g. write [b"data", None, b"more"] through write_blob and read it back with FormatBlobReader to verify the null value is preserved.

@XiaoHongbo-Hope XiaoHongbo-Hope merged commit 26de5f7 into apache:master May 14, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants