Skip to content

[feat](snapshot) Support storage vault for create/list snapshot#62523

Merged
w41ter merged 1 commit into
apache:masterfrom
wyxxxcat:snapshot-storage-vault/cm_vault
May 13, 2026
Merged

[feat](snapshot) Support storage vault for create/list snapshot#62523
w41ter merged 1 commit into
apache:masterfrom
wyxxxcat:snapshot-storage-vault/cm_vault

Conversation

@wyxxxcat
Copy link
Copy Markdown
Collaborator

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@wyxxxcat
Copy link
Copy Markdown
Collaborator Author

run feut

@wyxxxcat wyxxxcat force-pushed the snapshot-storage-vault/cm_vault branch from 7c0435e to aa4b30c Compare April 16, 2026 02:08
@wyxxxcat
Copy link
Copy Markdown
Collaborator Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 2 blocking issues.

  1. CloudSnapshotHandler.submitJob changes the cloud_snapshot_handler_class extension-point signature from (long, String) to (long, String, String). Because the handler is loaded reflectively from config, existing custom handlers compiled against the old API will no longer override the invoked method, and ADMIN CREATE CLUSTER SNAPSHOT will fall back to the base NotImplementedException after upgrade.
  2. ClusterOptions.enableStorageVault is currently a no-op in the docker regression framework. The new code only injects ENABLE_STORAGE_VAULT=1, but nothing reads that variable, and non-SQL-mode cloud startup still creates the instance via create_doris_instance() with obj_info, which keeps enable_storage_vault=false.

Critical checkpoints:

  • Goal of this PR: partially met. The schema/parser surface changes are present, but the reviewed runtime paths do not safely deliver end-to-end storage-vault support.
  • Scope/focus: small and focused, but incomplete because the new flag is ineffective and the handler API change is incompatible.
  • Concurrency/locking: no new concurrency risks in the touched code.
  • Lifecycle / extension points: broken. The reflective cloud_snapshot_handler_class hook is not backward compatible.
  • Configuration changes: no new config items.
  • Compatibility / rolling upgrade: not preserved because the handler method signature changed.
  • Parallel code paths: not handled consistently. The regression docker startup has SQL-mode and non-SQL-mode flows, but enableStorageVault is only passed as an unused env var.
  • Special checks / validation: vault_name validation itself is fine.
  • Test coverage: insufficient. Added tests only cover local validation/schema filling; there is no end-to-end test proving storage-vault snapshot create/list behavior or the new docker option.
  • Test result changes: none to review.
  • Observability: no additional issue found in the touched code.
  • Transaction / persistence: not applicable in the reviewed diff.
  • Data-write atomicity / crash safety: not applicable in the reviewed diff.
  • FE-BE / protocol propagation: new proto fields were added, but the runtime paths above are not proven end-to-end by the current implementation/tests.
  • Performance: no material issue found in the touched code.

Please address the compatibility break and wire the regression-framework option through a real startup path before merging.

@wyxxxcat wyxxxcat force-pushed the snapshot-storage-vault/cm_vault branch from aa4b30c to f46a758 Compare April 16, 2026 02:54
@wyxxxcat
Copy link
Copy Markdown
Collaborator Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings:

  1. docker/runtime/doris-compose/cluster.py: this patch removes import uuid, but MS.docker_env() still uses uuid.uuid4() to build DORIS_CLOUD_PREFIX. Starting a cloud cluster now fails with NameError before the meta-service container comes up.
  2. fe/fe-core/src/main/java/org/apache/doris/cloud/snapshot/CloudSnapshotHandler.java: changing submitJob(long, String) to submitJob(long, String, String) breaks existing custom handlers loaded via cloud_snapshot_handler_class. Those handlers will still instantiate, but ADMIN CREATE CLUSTER SNAPSHOT now dispatches to the new base overload and hits NotImplementedException unless every implementation is rebuilt and updated.
  3. docker/runtime/doris-compose/resource/common.sh: the new storage-vault branch creates the instance in vault mode, but it still leaves default_storage_vault_name empty on the normal startup path. SuiteCluster defaults sqlModeNodeMgr to false, so nothing later creates/sets a default vault there; ordinary DDL will still fail with No default storage vault unless every table specifies storage_vault_name explicitly.
  4. fe/fe-core/src/main/java/org/apache/doris/catalog/SchemaTable.java: exposing VAULT_ID unconditionally is not mixed-version safe. SchemaScanOperatorX::prepare() requires every FE slot to exist in the BE scanner, so a new FE querying information_schema.cluster_snapshots against an old BE will fail with no match column for this column(VAULT_ID) during rolling upgrade.

Critical checkpoint conclusions:

  • Goal of the task: Partially accomplished. The parser/schema pieces were updated, but the create/list snapshot storage-vault flow is not safely completed end to end because of the handler compatibility break and the incomplete docker/test path. No end-to-end test demonstrates create/list snapshot working in storage-vault mode.
  • Scope/minimality: The change set is focused on snapshot/vault plumbing, but the docker-compose changes introduce unrelated regressions in cluster startup and test-cluster initialization.
  • Concurrency: No new concurrency-specific issue was identified in the changed code.
  • Lifecycle / extension points: CloudSnapshotHandler is instantiated via reflection from config, so its method signature is effectively part of a public extension point. The new signature is not backward compatible.
  • Configuration: No new config item is added.
  • Compatibility: There are blocking compatibility issues for both configured snapshot handlers and FE/BE mixed-version schema scans.
  • Parallel code paths: The SQL node-manager path creates a default vault, but the normal startup path does not. The new behavior is inconsistent across those two paths.
  • Special conditions / checks: The new ENABLE_STORAGE_VAULT branch is insufficient by itself because it does not establish a default vault for the common regression-test flow.
  • Test coverage: Insufficient for the new behavior. The added tests only cover parser/schema scanner updates; nothing exercises docker startup, non-SQL-mode vault clusters, configured custom handlers, or mixed-version upgrade behavior.
  • Test result files: Not applicable.
  • Observability: No major new observability gap was introduced beyond the functional regressions above.
  • Transaction / persistence: No new transaction- or edit-log-specific defect was identified in the touched code.
  • Data write / modification paths: The changed test-cluster initialization path now provisions vault-mode instances that still cannot serve ordinary default-vault DDL.
  • New variables across components: VAULT_ID / resource_id was added to the FE/BE schema-scan contract, but rollout compatibility was not handled.
  • Performance: No material performance issue was identified in the touched code.
  • Other issues: None beyond the findings above.

Overall opinion: request changes.

import jsonpickle
import os
import os.path
import uuid
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing import uuid here breaks the unchanged uuid.uuid4() call in MS.docker_env() later in this file. Starting a cloud cluster now raises NameError: name 'uuid' is not defined before the meta-service container can come up.

Comment thread docker/runtime/doris-compose/resource/common.sh
Comment thread fe/fe-core/src/main/java/org/apache/doris/catalog/SchemaTable.java
@wyxxxcat wyxxxcat force-pushed the snapshot-storage-vault/cm_vault branch 2 times, most recently from 72318d0 to a7ab688 Compare April 16, 2026 03:51
@wyxxxcat
Copy link
Copy Markdown
Collaborator Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 57.14% (4/7) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 28.57% (2/7) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (12/12) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.76% (27444/37205)
Line Coverage 57.43% (296696/516660)
Region Coverage 54.66% (247248/452322)
Branch Coverage 56.32% (107162/190267)

@wyxxxcat wyxxxcat marked this pull request as ready for review April 17, 2026 06:26
@wyxxxcat wyxxxcat closed this Apr 27, 2026
@wyxxxcat wyxxxcat reopened this Apr 27, 2026
@wyxxxcat wyxxxcat force-pushed the snapshot-storage-vault/cm_vault branch from a7ab688 to 8421ca1 Compare April 27, 2026 09:18
@wyxxxcat
Copy link
Copy Markdown
Collaborator Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (12/12) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.66% (21613/37484)
Line Coverage 40.78% (212037/519943)
Region Coverage 37.35% (169914/454921)
Branch Coverage 38.16% (73026/191346)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 10.53% (2/19) 🎉
Increment coverage report
Complete coverage report

@wyxxxcat wyxxxcat force-pushed the snapshot-storage-vault/cm_vault branch from 8421ca1 to 77b6ca2 Compare April 29, 2026 08:09
@wyxxxcat
Copy link
Copy Markdown
Collaborator Author

run buildall

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

PR approved by at least one committer and no changes requested.

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 6, 2026
@wyxxxcat
Copy link
Copy Markdown
Collaborator Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completed a full pass over the PR diff and did not find any new distinct issues beyond the existing inline review threads. I am not adding duplicate inline comments.

The existing blockers still appear relevant at the current head:

  1. CloudSnapshotHandler.submitJob(long, String) was replaced by submitJob(long, String, String), which breaks the config-loaded cloud_snapshot_handler_class extension point during rollout.
  2. The storage-vault docker startup path can create a vault-mode instance without establishing a default storage vault in the common non-sqlModeNodeMgr flow, so ordinary DDL can still fail with No default storage vault.
  3. Exposing VAULT_ID in information_schema.cluster_snapshots is not FE/BE mixed-version safe unless old BEs tolerate the added schema-scan slot or FE hides it during rollout.

Critical checkpoint conclusions:

  • Goal/test proof: Partially met. The schema, proto, SQL property, and docker flag plumbing are present, but end-to-end create/list snapshot behavior in storage-vault mode is not proven by tests and the known blockers prevent safe rollout.
  • Scope/focus: The diff is small and focused, but incomplete for the full feature behavior.
  • Concurrency/locking: No new concurrency or locking issue was identified in the touched code.
  • Lifecycle/static initialization: The main lifecycle concern is the reflective cloud_snapshot_handler_class extension API compatibility break; no C++ static initialization issue was identified.
  • Configuration: No new Doris config item is added. The docker env option is not a dynamic runtime config.
  • Compatibility/incompatible changes: FE/BE schema-scan mixed-version compatibility and custom snapshot-handler compatibility are not preserved.
  • Parallel code paths: SQL-mode and non-SQL-mode docker cluster setup remain inconsistent for default vault creation.
  • Special checks: vault_name local validation is acceptable, but rollout/default-vault checks are missing in the broader flow.
  • Test coverage/results: Existing tests cover validation and BE schema filling only. They do not cover custom handler compatibility, mixed-version schema scan behavior, docker vault-mode startup, or end-to-end snapshot create/list with a vault. No result-file issue was found.
  • Observability: No additional observability issue was found in this diff.
  • Transaction/persistence/data writes: The PR only adds proto fields and plumbing in this repository; no new edit-log or transaction defect was identified, but the default-vault provisioning path remains functionally unsafe.
  • FE-BE/protocol propagation: New proto/schema fields are added, but rollout compatibility is not handled.
  • Performance: No material performance issue was identified.
  • User focus: No additional user-provided review focus was supplied.

Overall opinion: request changes until the existing rollout and docker storage-vault setup blockers are resolved and covered by tests.

@w41ter w41ter merged commit 22e7484 into apache:master May 13, 2026
35 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. cloud dev/4.1.x dev/4.1.x-conflict snapshot

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants