-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[python] support blob type and blob write and read #6390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] support blob type and blob write and read #6390
Conversation
ed4c2fb to
a4de958
Compare
a4de958 to
d0eadd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive support for BLOB (Binary Large Object) data type in the Python Paimon library, including data structures, I/O operations, and format handling.
- Implements BLOB data type with BlobData and BlobRef classes for in-memory and reference-based storage
- Adds blob-specific file format writer and reader with compression and indexing
- Integrates BLOB support into existing serialization, type conversion, and file I/O systems
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| blob.py | Core BLOB data structures and interface definitions |
| blob_format_writer.py | Writer for Paimon's blob file format with compression |
| format_blob_reader.py | Reader for blob files with decompression and indexing |
| generic_row.py | Serialization support for BLOB fields |
| data_types.py | Type system integration for BLOB |
| file_io.py | File I/O operations for blob format |
| delta_varint_compressor.py | Compression utility for blob index data |
| core_options.py | Configuration constant for blob format |
| split_read.py | Integration of blob reader into split reading |
| blob_test.py | Comprehensive test suite for all blob functionality |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
25f7cfe to
9a9ce9c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| bin_length = self.position - previous_pos + 12 | ||
| self.lengths.append(bin_length) |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic number 12 represents the combined size of length (8 bytes) and CRC (4 bytes) fields. Consider defining this as a named constant like METADATA_SIZE = 12 to improve code clarity and maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
ac5df56 to
23b5e5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
87e4747 to
297d764
Compare
b9c511e to
5f47706
Compare
5f47706 to
d70eca7
Compare
leaves12138
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Purpose
Support blob type and blob write and read
Tests
API and Format
Documentation