[Feature] add binlog meta module (row type) (1/3)#62058
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
Found 2 correctness issues.
be/src/storage/rowset_builder.cpp:GroupRowsetBuilder::commit_txn()only attaches the row-binlog rowset to local cleanup state, butTxnManagerstill persists/publishes a single rowset. In the normal publish path only the data rowset is made visible, so the companion row-binlog rowset is never added to tablet metadata and is also not recoverable across restart.be/src/storage/tablet/tablet.cpp:Tablet::create_initial_rowset()builds the initial row-binlog rowset withtablet_schema()instead ofrow_binlog_tablet_schema(), so the version-1 row-binlog rowset meta carries the base schema rather than the row-binlog schema.
Checkpoint Conclusions
- Goal / completion / proof: The goal is to add row-binlog schema/meta support. The FE schema/meta pieces are largely in place, but the BE implementation is not end-to-end correct because the paired row-binlog rowset does not participate in the normal txn publish lifecycle, and there is no test covering that path. The added UTs cover builder/schema shape only, not publish/bootstrap behavior.
- Small / focused change: The patch is feature-scoped but broad across FE, BE, thrift and proto. The main remaining problems are integration gaps rather than code organization.
- Concurrency: The modified paths do involve concurrent txn/publish flows. I did not find a new lock-order regression in the inspected code, but the rowset lifecycle across txn/publish is incomplete.
- Lifecycle management: Yes, special lifecycle management was added for the companion row-binlog rowset. This checkpoint fails: the extra rowset is tracked for local cleanup, but not for durable txn persistence/publish.
- Configuration items: New binlog format / historical-value options were added and FE/BE serialization appears wired through. I did not verify dynamic-update semantics beyond that.
- Compatibility: Proto/thrift/meta formats were extended. I did not prove a mixed-version break from the inspected code, but I also did not find explicit rolling-upgrade handling for the new feature.
- Parallel code paths: Yes. The initial-rowset/bootstrap path is a functionally parallel path to normal write/publish and currently uses the wrong schema for row-binlog rowsets.
- Special conditional checks: The new checks mostly follow existing style; no separate blocking issue here.
- Test coverage: Insufficient for a storage feature of this risk level. Missing coverage for commit/publish/recovery and initial-rowset creation with row binlog enabled.
- Observability: Existing logs are probably adequate for debugging the inspected paths; no blocking observability issue found.
- Transaction / persistence correctness: This checkpoint fails. The extra row-binlog rowset is not carried through the persisted txn state and publish flow.
- Data write / modification correctness: This checkpoint fails. The feature intends paired data+row-binlog writes, but only the data rowset becomes visible in the normal BE publish flow.
- FE/BE variable passing: The create-tablet path appears updated to pass row-binlog schema/config. I did not find a concrete missing FE->BE field propagation bug in the reviewed code.
- Performance: No primary performance blocker identified; correctness issues dominate.
- Other issues: The bootstrap row-binlog schema bug above is an additional independent issue.
Overall opinion: not ready to merge until the BE txn/publish lifecycle handles the row-binlog companion rowset atomically and the row-binlog initial rowset is built with the correct schema.
There was a problem hiding this comment.
Pull request overview
This PR introduces row-format binlog metadata and plumbing across FE/BE (schema/meta persistence, create-tablet RPC, rowset building, and reporting) to support the new binlog.format = ROW mode described in #61956.
Changes:
- Extends RPC/metadata schemas (Thrift/Proto) to carry row binlog schema, row-binlog rowset metas, and binlog size/file count reporting.
- Adds FE support for row binlog schema generation, row-binlog index meta management, and DDL restrictions / allowed alter ops for row-binlog tables.
- Adds BE support for row binlog rowset lifecycle (separate path, rowset meta flag, group rowset builder/writer) plus delete-bitmap separation (
binlog_delvec).
Reviewed changes
Copilot reviewed 100 out of 100 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| gensrc/thrift/MasterService.thrift | Add binlog size/file count fields to tablet report payload. |
| gensrc/thrift/BackendService.thrift | Add binlog size/file count to tablet stat report. |
| gensrc/thrift/AgentService.thrift | Add binlog format enum, historical-value flag, and row binlog schema in create-tablet request. |
| gensrc/proto/olap_file.proto | Add row-binlog flags/metas/schemas and delete-bitmap binlog marker; add binlog format + historical-value fields. |
| fe/fe-core/src/test/java/org/apache/doris/task/AgentTaskTest.java | Add unit test coverage for create-tablet thrift including row binlog schema. |
| fe/fe-core/src/test/java/org/apache/doris/nereids/trees/plans/commands/TruncateTableCommandTest.java | Add truncate test for row-binlog-enabled table. |
| fe/fe-core/src/test/java/org/apache/doris/nereids/trees/plans/commands/ShowDataCommandTest.java | Update/extend SHOW DATA validation tests for BinlogSize column. |
| fe/fe-core/src/test/java/org/apache/doris/common/proc/ProcServiceTest.java | Add tests ensuring proc titles include binlog columns. |
| fe/fe-core/src/test/java/org/apache/doris/catalog/OlapTableRowBinlogSchemaTest.java | New unit tests for row binlog schema generation (before/after/system cols). |
| fe/fe-core/src/test/java/org/apache/doris/catalog/CreateTableTest.java | Add create-table tests for row binlog, temp tables, and unsupported key models. |
| fe/fe-core/src/test/java/org/apache/doris/binlog/MockBinlogConfigCache.java | Update mock config creation and enable checks for CCR-vs-row modes. |
| fe/fe-core/src/test/java/org/apache/doris/binlog/BinlogTestUtils.java | Split CCR vs ROW test config helpers. |
| fe/fe-core/src/test/java/org/apache/doris/alter/SchemaChangeHandlerTest.java | Add tests for schema change behavior/limits with row binlog enabled. |
| fe/fe-core/src/main/java/org/apache/doris/task/CreateReplicaTask.java | Add optional row binlog schema payload when creating base index replicas. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/ShowDataCommand.java | Add BinlogSize/LocalBinlogSize columns and aggregate reporting. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/ReplaceTableOp.java | Allow replace-table op on row binlog tables. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/ReplacePartitionOp.java | Allow replace-partition op on row binlog tables. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/RenameTableOp.java | Allow rename-table op on row binlog tables. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/RenameRollupOp.java | Allow rename-rollup op on row binlog tables. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/RenamePartitionOp.java | Allow rename-partition op on row binlog tables. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/ModifyTablePropertiesOp.java | Add row-binlog allowlist logic; validate binlog.* properties via PropertyAnalyzer. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/ModifyTableCommentOp.java | Allow comment changes on row binlog tables. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/ModifyPartitionOp.java | Allow partition-property changes on row binlog tables. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/ModifyDistributionOp.java | Allow distribution changes on row binlog tables. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/ModifyColumnCommentOp.java | Allow column comment changes on row binlog tables. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/DropRollupOp.java | Allow drop-rollup on row binlog tables. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/DropPartitionOp.java | Allow drop-partition on row binlog tables. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/DropColumnOp.java | Allow drop-column on row binlog tables (schema-change checks enforced elsewhere). |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/AlterOp.java | Introduce allowOpRowBinlog() default deny hook. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/AddRollupOp.java | Allow add-rollup on row binlog tables. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/AddPartitionOp.java | Allow add-partition on row binlog tables. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/AddPartitionLikeOp.java | Allow add-partition-like on row binlog tables. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/AddColumnsOp.java | Allow multi-add-columns on row binlog tables. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/info/AddColumnOp.java | Allow add-column on row binlog tables. |
| fe/fe-core/src/main/java/org/apache/doris/master/ReportHandler.java | Pass row binlog meta when recovering/creating base-index replicas. |
| fe/fe-core/src/main/java/org/apache/doris/datasource/InternalCatalog.java | Validate ROW binlog constraints; create row binlog meta on table creation; pass row binlog meta into create-replica tasks. |
| fe/fe-core/src/main/java/org/apache/doris/common/util/PropertyAnalyzer.java | Extend binlog property parsing for format + need_historical_value. |
| fe/fe-core/src/main/java/org/apache/doris/common/util/BufferSizeUtil.java | Reserve extra id buffer (incl. binlog index id). |
| fe/fe-core/src/main/java/org/apache/doris/common/proc/TabletsProcDir.java | Add proc columns for BinlogSize/BinlogFileNum. |
| fe/fe-core/src/main/java/org/apache/doris/common/proc/ReplicasProcNode.java | Add proc columns for BinlogSize/BinlogFileNum. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/TabletStatMgr.java | Track table/binlog sizes and update replica binlog stats from BE reports. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/Tablet.java | Add tablet-level binlog size aggregation helper. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/TableProperty.java | Persist/load binlog format + need_historical_value properties. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/RowBinlogTableWrapper.java | New wrapper to represent/scan row-binlog as a dup-key table schema view. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/Replica.java | Add persisted replica binlog size and binlog file count fields. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/Partition.java | Add partition-level binlog size aggregation helper. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/OlapTableWrapper.java | New base wrapper class delegating locks/partition access to origin table. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/OlapTable.java | Add row-binlog meta/index management, schema generation, and binlog size stats. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/MaterializedIndexMeta.java | Add row binlog index id marker and helpers. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/MaterializedIndex.java | Add index-level binlog size aggregation helper. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/LocalTabletInvertedIndex.java | Propagate binlog size/file count from tablet reports into replica. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/Env.java | Replay binlog config updates with “force” merge semantics. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/Database.java | Treat DB-level “enabled” binlog as CCR snapshot mode only. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/Column.java | Add row-binlog system column constants and column builders. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/CloudTabletStatMgr.java | Update stats ctor call site for added binlog fields. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/BinlogConfig.java | Add binlog format + need_historical_value; split CCR vs ROW enabling; property merge validation. |
| fe/fe-core/src/main/java/org/apache/doris/binlog/TableBinlog.java | Treat “enabled” for CCR binlog GC decisions. |
| fe/fe-core/src/main/java/org/apache/doris/binlog/DBBinlog.java | Treat “enabled” for CCR binlog GC decisions. |
| fe/fe-core/src/main/java/org/apache/doris/binlog/BinlogUtils.java | Add helper to wrap row-binlog table names. |
| fe/fe-core/src/main/java/org/apache/doris/binlog/BinlogManager.java | Interpret DB-level enable as CCR snapshot mode. |
| fe/fe-core/src/main/java/org/apache/doris/binlog/BinlogConfigCache.java | Make DB/table enable checks CCR-only. |
| fe/fe-core/src/main/java/org/apache/doris/backup/RestoreJob.java | Disallow restore into row-binlog-enabled local table; pass row binlog meta to create tasks. |
| fe/fe-core/src/main/java/org/apache/doris/backup/BackupHandler.java | Disallow backup for row-binlog-enabled tables. |
| fe/fe-core/src/main/java/org/apache/doris/alter/SchemaChangeJobV2.java | Pass row binlog meta when creating shadow replicas. |
| fe/fe-core/src/main/java/org/apache/doris/alter/SchemaChangeHandler.java | Enforce light-schema-only rules; propagate add/drop changes into row-binlog schema; update binlog config via PropertyAnalyzer + merge rules. |
| fe/fe-core/src/main/java/org/apache/doris/alter/RollupJobV2.java | Update create-replica task signature (rowBinlogMeta param). |
| fe/fe-core/src/main/java/org/apache/doris/alter/AlterOperations.java | Add row-binlog allowlist enforcement for alter ops. |
| fe/fe-core/src/main/java/org/apache/doris/alter/Alter.java | Invoke row-binlog allowlist checks when needed. |
| be/test/olap/rowset/group_rowset_builder_test.cpp | New test validating row-binlog rowset meta/schema hash behavior in GroupRowsetBuilder flow. |
| be/src/storage/txn/txn_manager.cpp | Gate “add to binlog” by CCR-only enable. |
| be/src/storage/tablet/tablet.h | Add row binlog paths/schema accessors and enable checks; add row-binlog rowset map. |
| be/src/storage/tablet/tablet.cpp | Initialize row-binlog rowsets; add dual-rowset add_rowset; report binlog sizes; create initial rowsets using group writer when enabled. |
| be/src/storage/tablet/tablet_meta.h | Add row-binlog schema/hash/rowset metas and binlog delete bitmap; expose binlog size/file count. |
| be/src/storage/tablet/tablet_meta.cpp | Persist/load row-binlog schemas/rowsets; split delete bitmap into normal vs binlog; factor schema init helper. |
| be/src/storage/tablet/tablet_manager.cpp | Include binlog size/file count in tablet stat report. |
| be/src/storage/tablet/base_tablet.h | Add row-binlog version tracker and row-binlog rowset map. |
| be/src/storage/tablet/base_tablet.cpp | Reconstruct version tracker for row-binlog graph when needed. |
| be/src/storage/snapshot/snapshot_manager.cpp | Minor formatting/no-op change in meta conversion flow. |
| be/src/storage/rowset/vertical_beta_rowset_writer.h | Add vertical row-binlog writer stub; remove final on base vertical writer template. |
| be/src/storage/rowset/rowset_writer_context.h | Add binlog writer options in writer context. |
| be/src/storage/rowset/rowset_meta.h | Add is_row_binlog flag helpers. |
| be/src/storage/rowset/rowset_fwd.h | Add forward decls/typedefs for rowset writer/builder. |
| be/src/storage/rowset/rowset_factory.h | Add factory method for empty GroupRowsetWriter. |
| be/src/storage/rowset/rowset_factory.cpp | Create RowBinlogRowsetWriter and GroupRowsetWriter for binlog/group flows. |
| be/src/storage/rowset/pending_rowset_helper.h | Allow pending guard to cover multiple rowset ids. |
| be/src/storage/rowset/pending_rowset_helper.cpp | Implement multi-rowset-id pending guard. |
| be/src/storage/rowset/group_rowset_writer.h | New GroupRowsetWriter that forwards flush/build to underlying writers. |
| be/src/storage/rowset/group_rowset_writer.cpp | Implement group flush/build. |
| be/src/storage/rowset/beta_rowset_writer.h | Add RowBinlogRowsetWriter type. |
| be/src/storage/rowset/beta_rowset_writer.cpp | Mark rowsets as row-binlog based on context; propagate index_id; copy row-binlog flag in meta build. |
| be/src/storage/rowset_builder.h | Introduce GroupRowsetBuilder + RowBinlogRowsetBuilder; add attach-rowset/pending-id hooks. |
| be/src/storage/rowset_builder.cpp | Implement group builder flow and pending-rowset-id attachment; update init/profile behavior. |
| be/src/storage/data_dir.cpp | Include row-binlog rowset ids when loading delete bitmaps; route bitmap to normal vs binlog delvec. |
| be/src/storage/binlog.h | Add row binlog prefixes/suffix constants. |
| be/src/storage/binlog_config.h | Add binlog format + historical-value fields and helpers. |
| be/src/storage/binlog_config.cpp | Parse/serialize new binlog format + historical-value fields. |
| be/src/load/delta_writer/delta_writer.h | Remove now-unnecessary RowsetBuilder downcast helper. |
| be/src/load/delta_writer/delta_writer.cpp | Use BaseRowsetBuilder::commit_txn() polymorphically. |
| be/src/load/delta_writer/delta_writer_context.h | Add WriteRequestType and group-write request structure. |
| be/src/load/channel/tablets_channel.cpp | Minor include/format adjustments (optional include). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Code Review: [Feature] add binlog meta module (row type)+3025 / -443 across 95 files This PR introduces row-level binlog support for OlapTable — a new Critical Issues1.
|
| Issue | Severity | Location |
|---|---|---|
Boolean.parseBoolean() never throws — need_historical_value validation unreachable |
Low | PropertyAnalyzer.java |
GroupRowsetWriter methods use LOG(FATAL) instead of returning Status::NotSupported — crashes BE on accidental invocation |
Medium | group_rowset_writer.h |
PendingRowsetGuard::drop() resets to 1-element vector with default RowsetId instead of empty |
Low | pending_rowset_helper.cpp |
binlog_format() getter returns int32_t for a BinlogFormatPB field (implicit enum-to-int narrowing) |
Low | binlog_config.h |
_need_before field is set but never read (dead code) |
Low | rowset_writer_context.h |
Resource leak window: if _txn_rs_builder->init() fails after _row_binlog_rowset_builder->init() succeeds, binlog rowset files may be orphaned |
Medium | rowset_builder.cpp |
Design Concerns
-
OlapTableWrapperextendsOlapTable: This creates a fragile split-state where the wrapper's ownindexIdToMeta,fullSchema, etc. can drift from the origin table. Any new method added toOlapTablethat accesses internal state won't automatically delegate. An interface-based or composition pattern would be cleaner. -
Schema evolution is partially manual: ADD/DROP COLUMN are handled but MODIFY COLUMN is explicitly blocked. Future schema change operations added to Doris could silently drift the row binlog schema if they're not also blocked/handled.
Test Coverage
- No regression tests (checkbox unchecked in PR)
- No tests for
OlapTableWrapper/RowBinlogTableWrapper - No test for
BinlogConfigserialization round-trip with new fields (especially nullbinlogFormatlegacy compat) - No test for backup/restore rejection when row binlog is enabled
- FE UT increment coverage: 53.25% (303/569 lines)
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
it will be introduced by write module PR(not ready). |
FE UT Coverage ReportIncrement line coverage |
|
run buildall |
1 similar comment
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
f1a7fce to
574d4bf
Compare
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 29085 ms |
TPC-DS: Total hot run time: 178468 ms |
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
run buildall |
|
/review |
There was a problem hiding this comment.
I found blocking correctness issues in the current row-binlog metadata plumbing. Critical checkpoint conclusions: the PR goal is to add row-binlog metadata/schema plumbing, but production BE tablet creation can fail for row-binlog tables because the initial row-binlog rowset writes to a directory that is never created; tests do not prove the production path because they create that directory manually. The change is broad across FE/BE metadata and not yet fully integrated with all lifecycle paths. Concurrency/locking changes reviewed here do not introduce a confirmed new lock-order issue. Lifecycle/persistence is affected: row-binlog hidden index names are created but table rename is allowed without updating them. No new config item issue found beyond already-known review threads. FE/BE protocol enum compatibility for the remaining formats appears aligned. Testing exists, but it misses the production create-table directory path and rename lifecycle case. Observability is not the main concern for these failures. User focus: no additional user-provided review focus was supplied.
e39265c
e39265c to
c286a46
Compare
|
run buildall |
2 similar comments
|
run buildall |
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
ref #57921 |
What problem does this PR solve?
Issue: #61956
Row Binlog Schema
row binlog table is actually a duplicate table model within origin table.
FE Meta
We reuse MaterializedIndexMeta to restore binlog meta (including schema, col_unique_id)
RowBinlogTableWrapper(OlapTableWrapper) is provided for read row binlog
RowBinlog only support LightSchemaChange now
BE Meta
RowBinlog is actually a rowset with TabletMeta, it has own data directory.
introduce binlog_delvec for partial-update parellel
We use GroupRowsetBuilder to construct data rowset and row binlog rowset.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)