[bug](cloud restore) rewrite table properties and partition info in cloud restore#63696
[bug](cloud restore) rewrite table properties and partition info in cloud restore#63696xy720 wants to merge 9 commits into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/reviews |
|
/review |
|
run buildall |
|
/review |
There was a problem hiding this comment.
Review conclusion: request changes. The PR is small and targeted at making restored dynamic partition properties cloud-safe, but the restore cleanup is incomplete: it leaves dynamic_partition.storage_medium in the persisted dynamic partition properties even though cloud-mode property rewriting removes that key during normal DDL. That lets the dynamic partition scheduler keep generating per-partition storage_medium/storage_cooldown_time properties after restore, so a restored table can still diverge from cloud create/alter behavior.
Critical checkpoints:
- Goal/test: the goal is only partially met; no test covers cloud restore dynamic partition property cleanup.
- Scope: the change is small and focused.
- Concurrency/lifecycle/config: no new concurrency, lifecycle, or config risks found.
- Compatibility/persistence: restore mutates persisted table properties; the missing storage_medium cleanup affects post-restore behavior.
- Parallel paths: normal cloud DDL removes dynamic_partition.storage_medium via CloudPropertyAnalyzer, but restore does not.
- Testing: existing non-cloud unit coverage is not extended for the new cloud path.
- Observability/performance: no additional concerns found.
User focus: no additional user-provided review focus was specified.
TPC-H: Total hot run time: 31772 ms |
There was a problem hiding this comment.
Code review completed for PR 63696.
Summary opinion: no blocking correctness issues found. The change is small and focused: during restore in cloud mode, unsupported dynamic partition restore properties are removed and the in-memory DynamicPartitionProperty is rebuilt before the restored table is used by the dynamic partition scheduler.
Critical checkpoint conclusions:
- Goal and proof: The goal is to prevent restored cloud tables from carrying source-cluster dynamic partition replication/storage-policy settings. The code accomplishes this for the restore path; existing restore flow also forces reserveReplica=false in cloud mode before calling this method.
- Scope: The modification is minimal and localized to TableProperty.resetPropertiesForRestore.
- Concurrency/lifecycle: No new concurrency or special lifecycle risk identified; restore mutates table metadata in the existing restore preparation flow before publishing the restored table.
- Configuration/compatibility: No new config item or serialization/storage format change.
- Parallel paths: Reviewed the restore caller and cloud restore constructor; the cloud restore path reaches this same reset method.
- Data correctness/persistence: The properties map and rebuilt dynamicPartitionProperty remain consistent after removing unsupported dynamic properties.
- Tests: I did not run tests in this review. The PR does not add a dedicated test, so residual risk is lack of explicit coverage for cloud restore with dynamic partition properties.
- Observability/performance: No new observability need or performance concern for this metadata-only path.
User focus: no additional user-provided review focus was specified.
There was a problem hiding this comment.
I reviewed the actual GitHub PR diff, which contains the cloud restore cleanup in TableProperty. I did not find a blocking issue.
Critical checkpoint conclusions:
- Goal/test proof: the change removes cloud-unsupported dynamic partition restore properties before rebuilding DynamicPartitionProperty, which matches the stated restore failure mode. I did not see a new test in this PR, so coverage for cloud restore remains the main gap.
- Scope/focus: the final diff is small and focused on restore-time table property normalization.
- Concurrency/lifecycle: no concurrency or special lifecycle concern identified.
- Configuration/compatibility: no new config or persisted format change; it only strips source-cluster dynamic partition keys during restore in cloud mode.
- Parallel paths: non-cloud restore still rebuilds dynamic partition properties only when dynamic partition is present; cloud path now normalizes unsupported keys before rebuild.
- Tests: no tests were added or changed in the final PR diff.
- Observability: existing behavior is sufficient for this small normalization.
- Transactions/persistence/data writes: restore mutates the staged remote table properties before use; I did not find an edit-log/replay issue in this change.
- Performance: no performance concern identified.
User focus response: the focus file only contained ; I found no additional focus-specific issue.
There was a problem hiding this comment.
I reviewed the actual GitHub PR diff, which contains the cloud restore cleanup in TableProperty. I did not find a blocking issue.
Critical checkpoint conclusions:
- Goal/test proof: the change removes cloud-unsupported dynamic partition restore properties before rebuilding DynamicPartitionProperty, which matches the stated restore failure mode. I did not see a new test in this PR, so coverage for cloud restore remains the main gap.
- Scope/focus: the final diff is small and focused on restore-time table property normalization.
- Concurrency/lifecycle: no concurrency or special lifecycle concern identified.
- Configuration/compatibility: no new config or persisted format change; it only strips source-cluster dynamic partition keys during restore in cloud mode.
- Parallel paths: non-cloud restore still rebuilds dynamic partition properties only when dynamic partition is present; cloud path now normalizes unsupported keys before rebuild.
- Tests: no tests were added or changed in the final PR diff.
- Observability: existing behavior is sufficient for this small normalization.
- Transactions/persistence/data writes: restore mutates the staged remote table properties before use; I did not find an edit-log/replay issue in this change.
- Performance: no performance concern identified.
User focus response: the focus file only contained s; I found no additional focus-specific issue.
TPC-H: Total hot run time: 30994 ms |
TPC-DS: Total hot run time: 172438 ms |
TPC-DS: Total hot run time: 171690 ms |
FE Regression Coverage ReportIncrement line coverage |
1 similar comment
FE Regression Coverage ReportIncrement line coverage |
|
/review |
|
run buildall |
There was a problem hiding this comment.
I found blocking issues in the actual GitHub PR diff. The already-open TableProperty dynamic storage-medium cleanup thread was treated as known context and is not duplicated here.
Critical checkpoint conclusions:
- Goal/test coverage: the PR aims to sanitize restored table/partition properties for cloud mode, but the current code does not fully accomplish that goal and one changed file does not compile. No new restore test is included for these cloud-mode paths.
- Scope/focus: the final PR diff is small and focused on restore property cleanup.
- Concurrency/lifecycle: no concurrency or special lifecycle changes were introduced.
- Config/compatibility: no new config was added; this affects restore behavior in cloud mode only.
- Parallel paths: table-level and partition-level restore cleanup are both touched, but the table-level rewrite is ordered before later replica reset and can be undone.
- Tests: missing cloud-mode restore coverage for table properties, partition properties, and dynamic partition properties.
- Data correctness/observability: restore metadata can still retain unsupported replica allocation properties in cloud mode; existing logs are not directly relevant.
User focus: no additional user-provided review focus was supplied.
|
run buildall |
|
/review |
There was a problem hiding this comment.
Summary: I found one additional cloud-restore correctness gap and left an inline comment. I did not repeat the already-known review threads about TableProperty rewrite ordering, the earlier missing Config import, or dynamic partition storage-medium cleanup.
Critical checkpoint conclusions:
- Goal/test: The PR aims to strip unsupported cloud storage/replica properties during restore. It partially addresses full-table restore metadata, but an existing-partition restore path still persists source partition storage metadata. No new regression/unit test coverage is included in the PR.
- Scope/focus: The code changes are small, but the fix is incomplete across parallel restore paths.
- Concurrency/lifecycle: The touched code runs in restore metadata mutation paths under existing restore/table locking; I did not find new lock-order or lifecycle issues.
- Config/compatibility: No new config item or storage format/protocol change is introduced.
- Parallel code paths: Not fully covered; restoring partitions into an existing table still copies remote DataProperty/in-memory metadata.
- Transaction/persistence/data correctness: The incomplete metadata cleanup can persist unsupported source-cluster partition storage settings in cloud mode.
- FE/BE variable passing and observability: Not applicable to this PR.
- Performance: No material performance issue found.
User focus: No additional user-provided review focus was specified.
TPC-H: Total hot run time: 31447 ms |
TPC-DS: Total hot run time: 172472 ms |
TPC-H: Total hot run time: 31669 ms |
|
run buildall |
|
/review |
TPC-DS: Total hot run time: 171977 ms |
There was a problem hiding this comment.
I found one additional blocking issue in the current head: cloud restore rewrites the persisted table-property map but leaves the already-built derived TableProperty fields stale, so unsupported source in-memory/storage settings can still affect runtime behavior after restore. I did not repeat the already-known threads about replica reset ordering, dynamic storage_medium, PartitionInfo import, or partition-only restore storage metadata.
Critical checkpoints:
- Goal/test: the PR aims to sanitize unsupported source-cluster properties during cloud restore. The current code is incomplete because derived fields are not rebuilt after the map rewrite; I did not find added test coverage proving restored cloud table properties and later partition creation use cloud defaults.
- Scope: the patch is small and focused, but the cleanup is spread across TableProperty, PartitionInfo, and RestoreJob, which makes missed parallel state more likely.
- Concurrency/lifecycle: no new concurrency primitive is introduced. The affected lifecycle issue is object state consistency: TableProperty is deserialized and built before resetPropertiesForRestore mutates its properties map.
- Configuration/compatibility: no new config or storage-format compatibility change.
- Parallel paths: full-table and partition restore paths are touched; existing review threads already cover other incomplete paths/keys.
- Conditional checks: cloud-mode branching is clear, but it must keep both persisted properties and derived fields consistent.
- Tests: no new regression/unit test was present in the diff; cloud restore tests should cover table properties, partition properties, dynamic partition properties, and future partition creation after restore.
- Observability: no new observability need beyond deterministic tests for the restored metadata.
- Transactions/persistence/data correctness: the issue affects persisted/restored metadata semantics and can make future partitions inherit unsupported settings despite the persisted property map being sanitized.
- FE/BE variables/performance: no new FE-BE variable passing or notable performance concern found.
User focus: no additional user-provided review focus was supplied.
TPC-H: Total hot run time: 31742 ms |
TPC-DS: Total hot run time: 171808 ms |
|
/review |
There was a problem hiding this comment.
I found one additional blocker beyond the already-open review threads. The PR is trying to sanitize cloud restore metadata, but the dynamic-partition storage-policy path is still not normalized, so restored dynamic partitions can continue to use source-cluster storage policy settings.
Critical checkpoint conclusions:
- Goal/test: The goal is only partially met; existing threads plus the new comment show cloud restore can still preserve unsupported source properties. I did not see a regression test covering these cloud restore property combinations.
- Scope/clarity: The change is focused, but the property cleanup is split across table, partition, and restore paths, which has led to missed derived/dynamic fields.
- Concurrency/lifecycle: No new concurrency or non-trivial lifecycle issue found in the reviewed code paths; restore/replay ordering was considered.
- Configuration/compatibility: No new config or storage-format incompatibility found.
- Parallel paths: Full-table restore and selected-partition restore now both have some normalization, but existing threads still cover gaps in ordering/derived fields and partition-level restore handling.
- Tests: Missing cloud-mode restore coverage for unsupported table, partition, and dynamic partition properties.
- Observability/performance: No additional observability or performance issue found.
User focus: no additional user-provided review focus was specified.
|
/review |
There was a problem hiding this comment.
I reviewed the current PR diff and the existing inline review context. I am not adding duplicate inline comments because the remaining blocker is already covered by #63696 (comment).
Blocking issue still present:
TableProperty.resetPropertiesForRestore()still callsPropertyAnalyzer.getInstance().rewriteForceProperties(properties)beforesetReplicaAlloc(replicaAlloc). In cloud restore withreserveReplica=false,setReplicaAlloc()writesdefault.replication_allocationback intopropertiesafter the cloud analyzer deleted that key, and no second cloud rewrite runs before persisting the restored table property. This can still leave a restored cloud table carrying the restore replica allocation property that the PR is trying to strip. Please move the cloud rewrite after restore-specific mutations or make this restore path avoid persisting the unsupported default replica allocation in cloud mode.
Checkpoint conclusions:
- Goal/test: The goal is to normalize unsupported table and partition restore metadata in cloud mode. The current code only partially accomplishes this because the replica allocation property can still be reintroduced. No new regression or unit test is added in this PR.
- Scope: The change is focused on restore metadata normalization, but the ordering in
TablePropertystill leaves a correctness gap. - Concurrency/lifecycle: No new thread or lifecycle management issue found in the modified code; restore metadata mutation follows existing restore paths.
- Config/compatibility: No new configuration item or storage format incompatibility found.
- Parallel paths: Full-table and selected-partition restore paths are both touched; selected-partition sanitization is now present in both normal and replay paths.
- Data correctness/persistence: The remaining property reintroduction means persisted restored table properties can still contain cloud-unsupported replica metadata.
- Observability/performance: No additional observability or performance issue found.
- User focus: No additional user-provided review focus was specified.
FE UT Coverage ReportIncrement line coverage |
What problem does this PR solve?
In cloud mode, rewrite all unsupported table properties and some partition info from the source cluster.
These table properties (e.g., dynamic_partition.replication_num, dynamic_partition.replication_allocation, dynamic_partition.storage_policy, in_memory, storage_medium,min_load_replica_num...) are not applicable in cloud mode.
And rewrite some non-applicable partition info as well.
If kept, they would cause some critical problems.
For example, dynamic partition scheduler creates new partitions with source cluster replication settings, leading to write failures like: "alive replica num < 1 load required replica num 2".
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)