[feat](snapshot) Support storage vault for create/list snapshot#62523
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run feut |
7c0435e to
aa4b30c
Compare
|
/review |
There was a problem hiding this comment.
Found 2 blocking issues.
CloudSnapshotHandler.submitJobchanges thecloud_snapshot_handler_classextension-point signature from(long, String)to(long, String, String). Because the handler is loaded reflectively from config, existing custom handlers compiled against the old API will no longer override the invoked method, andADMIN CREATE CLUSTER SNAPSHOTwill fall back to the baseNotImplementedExceptionafter upgrade.ClusterOptions.enableStorageVaultis currently a no-op in the docker regression framework. The new code only injectsENABLE_STORAGE_VAULT=1, but nothing reads that variable, and non-SQL-mode cloud startup still creates the instance viacreate_doris_instance()withobj_info, which keepsenable_storage_vault=false.
Critical checkpoints:
- Goal of this PR: partially met. The schema/parser surface changes are present, but the reviewed runtime paths do not safely deliver end-to-end storage-vault support.
- Scope/focus: small and focused, but incomplete because the new flag is ineffective and the handler API change is incompatible.
- Concurrency/locking: no new concurrency risks in the touched code.
- Lifecycle / extension points: broken. The reflective
cloud_snapshot_handler_classhook is not backward compatible. - Configuration changes: no new config items.
- Compatibility / rolling upgrade: not preserved because the handler method signature changed.
- Parallel code paths: not handled consistently. The regression docker startup has SQL-mode and non-SQL-mode flows, but
enableStorageVaultis only passed as an unused env var. - Special checks / validation:
vault_namevalidation itself is fine. - Test coverage: insufficient. Added tests only cover local validation/schema filling; there is no end-to-end test proving storage-vault snapshot create/list behavior or the new docker option.
- Test result changes: none to review.
- Observability: no additional issue found in the touched code.
- Transaction / persistence: not applicable in the reviewed diff.
- Data-write atomicity / crash safety: not applicable in the reviewed diff.
- FE-BE / protocol propagation: new proto fields were added, but the runtime paths above are not proven end-to-end by the current implementation/tests.
- Performance: no material issue found in the touched code.
Please address the compatibility break and wire the regression-framework option through a real startup path before merging.
aa4b30c to
f46a758
Compare
|
/review |
There was a problem hiding this comment.
Findings:
docker/runtime/doris-compose/cluster.py: this patch removesimport uuid, butMS.docker_env()still usesuuid.uuid4()to buildDORIS_CLOUD_PREFIX. Starting a cloud cluster now fails withNameErrorbefore the meta-service container comes up.fe/fe-core/src/main/java/org/apache/doris/cloud/snapshot/CloudSnapshotHandler.java: changingsubmitJob(long, String)tosubmitJob(long, String, String)breaks existing custom handlers loaded viacloud_snapshot_handler_class. Those handlers will still instantiate, butADMIN CREATE CLUSTER SNAPSHOTnow dispatches to the new base overload and hitsNotImplementedExceptionunless every implementation is rebuilt and updated.docker/runtime/doris-compose/resource/common.sh: the new storage-vault branch creates the instance in vault mode, but it still leavesdefault_storage_vault_nameempty on the normal startup path.SuiteClusterdefaultssqlModeNodeMgrtofalse, so nothing later creates/sets a default vault there; ordinary DDL will still fail withNo default storage vaultunless every table specifiesstorage_vault_nameexplicitly.fe/fe-core/src/main/java/org/apache/doris/catalog/SchemaTable.java: exposingVAULT_IDunconditionally is not mixed-version safe.SchemaScanOperatorX::prepare()requires every FE slot to exist in the BE scanner, so a new FE queryinginformation_schema.cluster_snapshotsagainst an old BE will fail withno match column for this column(VAULT_ID)during rolling upgrade.
Critical checkpoint conclusions:
- Goal of the task: Partially accomplished. The parser/schema pieces were updated, but the create/list snapshot storage-vault flow is not safely completed end to end because of the handler compatibility break and the incomplete docker/test path. No end-to-end test demonstrates create/list snapshot working in storage-vault mode.
- Scope/minimality: The change set is focused on snapshot/vault plumbing, but the docker-compose changes introduce unrelated regressions in cluster startup and test-cluster initialization.
- Concurrency: No new concurrency-specific issue was identified in the changed code.
- Lifecycle / extension points:
CloudSnapshotHandleris instantiated via reflection from config, so its method signature is effectively part of a public extension point. The new signature is not backward compatible. - Configuration: No new config item is added.
- Compatibility: There are blocking compatibility issues for both configured snapshot handlers and FE/BE mixed-version schema scans.
- Parallel code paths: The SQL node-manager path creates a default vault, but the normal startup path does not. The new behavior is inconsistent across those two paths.
- Special conditions / checks: The new
ENABLE_STORAGE_VAULTbranch is insufficient by itself because it does not establish a default vault for the common regression-test flow. - Test coverage: Insufficient for the new behavior. The added tests only cover parser/schema scanner updates; nothing exercises docker startup, non-SQL-mode vault clusters, configured custom handlers, or mixed-version upgrade behavior.
- Test result files: Not applicable.
- Observability: No major new observability gap was introduced beyond the functional regressions above.
- Transaction / persistence: No new transaction- or edit-log-specific defect was identified in the touched code.
- Data write / modification paths: The changed test-cluster initialization path now provisions vault-mode instances that still cannot serve ordinary default-vault DDL.
- New variables across components:
VAULT_ID/resource_idwas added to the FE/BE schema-scan contract, but rollout compatibility was not handled. - Performance: No material performance issue was identified in the touched code.
- Other issues: None beyond the findings above.
Overall opinion: request changes.
| import jsonpickle | ||
| import os | ||
| import os.path | ||
| import uuid |
There was a problem hiding this comment.
Removing import uuid here breaks the unchanged uuid.uuid4() call in MS.docker_env() later in this file. Starting a cloud cluster now raises NameError: name 'uuid' is not defined before the meta-service container can come up.
72318d0 to
a7ab688
Compare
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
a7ab688 to
8421ca1
Compare
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
8421ca1 to
77b6ca2
Compare
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
/review |
There was a problem hiding this comment.
I completed a full pass over the PR diff and did not find any new distinct issues beyond the existing inline review threads. I am not adding duplicate inline comments.
The existing blockers still appear relevant at the current head:
CloudSnapshotHandler.submitJob(long, String)was replaced bysubmitJob(long, String, String), which breaks the config-loadedcloud_snapshot_handler_classextension point during rollout.- The storage-vault docker startup path can create a vault-mode instance without establishing a default storage vault in the common non-
sqlModeNodeMgrflow, so ordinary DDL can still fail withNo default storage vault. - Exposing
VAULT_IDininformation_schema.cluster_snapshotsis not FE/BE mixed-version safe unless old BEs tolerate the added schema-scan slot or FE hides it during rollout.
Critical checkpoint conclusions:
- Goal/test proof: Partially met. The schema, proto, SQL property, and docker flag plumbing are present, but end-to-end create/list snapshot behavior in storage-vault mode is not proven by tests and the known blockers prevent safe rollout.
- Scope/focus: The diff is small and focused, but incomplete for the full feature behavior.
- Concurrency/locking: No new concurrency or locking issue was identified in the touched code.
- Lifecycle/static initialization: The main lifecycle concern is the reflective
cloud_snapshot_handler_classextension API compatibility break; no C++ static initialization issue was identified. - Configuration: No new Doris config item is added. The docker env option is not a dynamic runtime config.
- Compatibility/incompatible changes: FE/BE schema-scan mixed-version compatibility and custom snapshot-handler compatibility are not preserved.
- Parallel code paths: SQL-mode and non-SQL-mode docker cluster setup remain inconsistent for default vault creation.
- Special checks:
vault_namelocal validation is acceptable, but rollout/default-vault checks are missing in the broader flow. - Test coverage/results: Existing tests cover validation and BE schema filling only. They do not cover custom handler compatibility, mixed-version schema scan behavior, docker vault-mode startup, or end-to-end snapshot create/list with a vault. No result-file issue was found.
- Observability: No additional observability issue was found in this diff.
- Transaction/persistence/data writes: The PR only adds proto fields and plumbing in this repository; no new edit-log or transaction defect was identified, but the default-vault provisioning path remains functionally unsafe.
- FE-BE/protocol propagation: New proto/schema fields are added, but rollout compatibility is not handled.
- Performance: No material performance issue was identified.
- User focus: No additional user-provided review focus was supplied.
Overall opinion: request changes until the existing rollout and docker storage-vault setup blockers are resolved and covered by tests.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)