[core] Support external-storage BLOB columns and MERGE INTO updates for descriptor-based BLOB columns#7328
Merged
JingsongLi merged 3 commits intoapache:masterfrom Mar 3, 2026
Conversation
fc2f6cf to
807fd1d
Compare
JingsongLi
reviewed
Mar 3, 2026
Contributor
JingsongLi
left a comment
There was a problem hiding this comment.
Commented to options.
|
|
||
| @Immutable | ||
| public static final ConfigOption<String> BLOB_COPIED_DESCRIPTOR_FIELD = | ||
| key("blob-copied-descriptor-field") |
Contributor
There was a problem hiding this comment.
blob-external-storage-field
|
|
||
| @Immutable | ||
| public static final ConfigOption<String> BLOB_COPIED_DESCRIPTOR_TARGET_DIR = | ||
| key("blob-copied-descriptor-target-dir") |
Contributor
There was a problem hiding this comment.
blob-external-storage-path
Support a third type of BLOB column: external-storage. When configured, raw BLOB data is written to a configured external storage path at write time and the BLOB value in the row is replaced with a BlobDescriptor pointing to the external data. This enables a BLOB hierarchy in a single table: - Raw BLOB: data stored in .blob files within the table - Descriptor BLOB: data stored externally, descriptor inline - External-storage BLOB: data written to external storage, descriptor inline New options: - blob-external-storage-field: comma-separated field names - blob-external-storage-path: external storage path for raw data External-storage fields must be a subset of blob-descriptor-field. Orphan file cleanup is not applied to the external storage path.
…-into. Previously, any table with BLOB columns was rejected by DataEvolution merge-into. Now the validation checks whether the BLOB columns being written are raw-data or descriptor-based: - Raw-data BLOB columns: update is still rejected - Descriptor-based BLOB columns, including external-storage fields: update is allowed Supported in both Flink (DataEvolutionMergeIntoAction) and Spark (DataEvolutionPaimonWriter).
807fd1d to
5b49724
Compare
Contributor
Author
|
@JingsongLi Thanks for the suggestion! I've updated the names for the two configs you mentioned, along with the related code, to ensure consistency. Additionally, I've updated the PR title, description, and commit message. PTAL~ |
JingsongLi
reviewed
Mar 3, 2026
Contributor
JingsongLi
left a comment
There was a problem hiding this comment.
Can you also update documentation? blob.md.
Contributor
Author
Thanks @JingsongLi , Documentation for blob.md has been updated. |
JingsongLi
approved these changes
Mar 3, 2026
Contributor
JingsongLi
left a comment
There was a problem hiding this comment.
Thanks @JunRuiLee Looks good to me!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This PR adds support for descriptor-based BLOB fields backed by external storage. For these fields, Paimon writes raw BLOB data to a configured external storage path at write time and stores only serialized BlobDescriptors inline in data files. The change also adds validation for the new external-storage BLOB options and verifies that raw-data BLOB fields, descriptor-based BLOB fields, and descriptor-based BLOB fields backed by external storage can coexist in the same table.
This PR also refines MERGE INTO validation for BLOB columns in Flink and Spark. Updates are still rejected for raw-data BLOB columns, but are now allowed for descriptor-based BLOB columns, including those backed by external storage.
Tests
UT:
IT:
API and Format
Documentation
Generative AI tooling