[feat](snapshot) Support storage vault for clone instance by wyxxxcat · Pull Request #63217 · apache/doris

wyxxxcat · 2026-05-13T09:04:20Z

No description provided.

wyxxxcat · 2026-05-13T09:04:33Z

run cloudut

hello-stephen · 2026-05-13T09:05:01Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

github-actions · 2026-05-13T10:02:49Z

PR approved by at least one committer and no changes requested.

github-actions · 2026-05-13T10:02:51Z

PR approved by anyone and no changes requested.

wyxxxcat · 2026-05-13T14:51:58Z

run buildall

wyxxxcat · 2026-05-13T14:52:03Z

/review

github-actions

I found blocking correctness issues in the cascade/update paths.

Critical checkpoint conclusions:

Goal/test: the PR attempts to cascade object-store and storage-vault credential changes to snapshot-derived instances, and adds positive/disabled tests. The tests do not cover retry/partial-failure behavior or non-default credential provider types.
Scope/focus: the change is mostly localized, but the helper refactor changes existing ALTER_OBJ_INFO semantics for role_arn credential providers.
Concurrency/transactions: no new shared in-process locking is introduced, but the RPC handlers now perform multiple independent KV transactions after committing the root instance. Several new paths return an error after the root commit, leaving partial state and making MetaServiceProxy retries non-idempotent.
Lifecycle/static init: no special lifecycle or cross-TU static initialization concern found.
Config/compatibility: no new config or storage-format compatibility concern found.
Parallel paths: similar cascade failure handling appears in alter_storage_vault, alter_obj_store_info, and update_ak_sk; the same retry/partial-success issue needs a consistent fix.
Tests: current tests only cover success and snapshot-disabled cases. Missing negative/retry tests for derived-instance transaction failures and provider-type preservation.
Observability: logs include instance ids, but returning failure after root commit is the main correctness issue.
Transaction/persistence: root and derived instance updates are committed separately; failure after the first commit can leave durable partial updates while the RPC reports failure.
Performance: no major performance issue found in the reviewed diff.

No additional user-provided focus points were supplied.

github-actions

I found blocking correctness issues in the cascade/update paths.

Critical checkpoint conclusions:

Goal/test: the PR adds cascade behavior for storage vault/object credential updates and tests the happy path plus snapshot-disabled path, but failure/retry and role-provider cases are not covered.
Scope: the factoring is mostly focused, but the extracted helper changes existing role_arn semantics.
Concurrency/lifecycle: no new locks or static lifecycle concerns found, but the post-commit cascade creates partial-update windows.
RPC/idempotency: not satisfied. Several error returns happen after the root instance has already committed, and at least rename/update_ak_sk retries cannot complete the cascade.
Compatibility/persistence: storage vault and instance metadata are persisted, but partial cascade can leave derived instances stale.
Tests: missing negative/retry tests for cascade failures and missing coverage for non-default cred_provider_type.
Observability: logs include useful instance ids, but they do not compensate for the partial update correctness issue.
User focus: no additional user-provided review focus was supplied.

wyxxxcat · 2026-05-14T00:52:28Z

run buildall

wyxxxcat · 2026-05-15T02:20:17Z

/review

github-actions

Blocking issues remain from the existing inline threads, so I cannot approve this PR yet. I did not add duplicate inline comments because the current review context already raises the same concrete blockers.

Critical checkpoint conclusions:

Goal/test: The PR aims to cascade storage vault / object credential changes to clone-derived instances and adds unit coverage for happy-path S3 vault cascading and snapshot-disabled skip behavior. The tests do not cover the already-raised retry/idempotency failures after root commit or role-based provider-type preservation.
Scope/focus: The implementation is focused on meta-service resource cascade handling, but the cascade error semantics change existing best-effort behavior in a way that can expose partial commits.
Concurrency/lifecycle: No new shared in-memory concurrency or special static lifecycle issue was found in the changed paths; each cascade instance uses its own transaction.
RPC/idempotency: Not satisfied. Existing threads already identify that storage-vault, object-store, and update_ak_sk cascades can return errors after the root transaction commits, and retries cannot reliably resume. This violates the meta-service retry contract for retryable/maybe-committed KV failures.
Compatibility/parallel paths: Storage vault and object-store cascade paths were both updated, but the role_arn ALTER_OBJ_INFO helper regresses cred_provider_type handling, as already raised inline.
Error handling/observability: Status is generally set on local return paths and logs include instance identifiers, but post-root cascade failures currently report RPC failure despite durable partial success.
Data correctness/persistence: The main correctness risk is persistent divergence between root and derived instances when a cascade fails after the root commit, plus the existing provider-type persistence regression for role-based object-store updates.
Performance: No additional clear performance blocker beyond the extra per-derived-instance transactions, which are consistent with the intended transaction-size isolation.
Tests: Existing new tests cover only successful cascade and snapshot-disabled skip; missing negative/retry/idempotency and role-provider cases correspond to the existing blockers.

User focus: No additional user-provided review focus was specified, and I found no separate issue for focus points.

dismiss

wyxxxcat requested review from dataroaring, gavinchou and w41ter as code owners May 13, 2026 09:04

gavinchou previously approved these changes May 13, 2026

View reviewed changes

gavinchou added the dev/4.1.x label May 13, 2026

github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 13, 2026

github-actions Bot added the reviewed label May 13, 2026

wyxxxcat force-pushed the cm_vault_for_snapshot branch from 794740c to 896dbe2 Compare May 13, 2026 14:48

wyxxxcat dismissed gavinchou’s stale review via 86aa010 May 13, 2026 14:51

wyxxxcat force-pushed the cm_vault_for_snapshot branch from 896dbe2 to 86aa010 Compare May 13, 2026 14:51

github-actions Bot removed the approved Indicates a PR has been approved by one committer. label May 13, 2026

github-actions Bot requested changes May 13, 2026

View reviewed changes

Comment thread cloud/src/meta-service/meta_service_resource.cpp

Comment thread cloud/src/meta-service/meta_service_resource.cpp

Comment thread cloud/src/meta-service/meta_service_resource.cpp

Comment thread cloud/src/meta-service/meta_service_resource.cpp

github-actions Bot requested changes May 13, 2026

View reviewed changes

Comment thread cloud/src/meta-service/meta_service_resource.cpp

Comment thread cloud/src/meta-service/meta_service_resource.cpp

Comment thread cloud/src/meta-service/meta_service_resource.cpp

1

869f852

wyxxxcat force-pushed the cm_vault_for_snapshot branch from 86aa010 to 869f852 Compare May 14, 2026 00:52

github-actions Bot previously requested changes May 15, 2026

View reviewed changes

gavinchou approved these changes May 15, 2026

View reviewed changes

gavinchou merged commit 9e23a2f into apache:master May 15, 2026
31 checks passed

github-actions Bot added the dev/4.1.x-conflict label May 15, 2026

Conversation

wyxxxcat commented May 13, 2026 • edited by gavinchou Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wyxxxcat commented May 13, 2026

Uh oh!

hello-stephen commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

wyxxxcat commented May 13, 2026

Uh oh!

wyxxxcat commented May 13, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wyxxxcat commented May 14, 2026

Uh oh!

wyxxxcat commented May 15, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wyxxxcat commented May 13, 2026 •

edited by gavinchou

Loading