[fix](workload) enforce explicit compute group form for workload DDLs#63505
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
… (CIR-20210, CIR-20211)
Cloud mode and non-cloud mode used to share fuzzy fallback behaviour for the
compute-group qualifier on workload DDLs. The fallback misbehaved when the user
omitted the qualifier in cloud mode:
CREATE WORKLOAD POLICY ... 'workload_group'='superset'
-> "Unable to find the compute group: <default_compute_group>" (CIR-20210)
DROP WORKLOAD GROUP superset
-> "Can not find workload group superset in compute group ." (CIR-20211)
The literal Tag.VALUE_DEFAULT_COMPUTE_GROUP_NAME was being passed downstream as
if it were a real cluster name.
Enforce a strict contract instead of papering over with a session-derived
fallback:
- Cloud mode : require '<compute_group>.<workload_group>' for workload
policy 'workload_group' property, and require FOR/FROM
clause on CREATE / ALTER / DROP WORKLOAD GROUP.
- Non-cloud mode: forbid the qualifier on all four; use Tag.VALUE_DEFAULT_TAG
internally.
Both invalid inputs now surface clear UserException messages that tell the user
how to spell the statement correctly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 31013 ms |
TPC-DS: Total hot run time: 168920 ms |
FE Regression Coverage ReportIncrement line coverage |
|
/review |
There was a problem hiding this comment.
Review result: request changes. The PR correctly narrows workload-group DDL semantics overall, but I found two correctness/usability issues in the new validation paths that should be fixed before merge.
Critical checkpoint conclusions:
- Goal and tests: The goal is to require explicit compute-group qualification in cloud mode and forbid it in non-cloud mode. The new tests cover some invalid workload policy property forms, but they miss empty-segment and trailing-dot cases, and the DROP error message currently tells users to use syntax that the parser rejects.
- Scope: The implementation is small and focused.
- Concurrency: No new shared mutable state or lock ordering changes were introduced. Existing policy alteration still validates properties while holding the policy write lock, but this PR does not materially change that locking pattern.
- Lifecycle/static initialization: No special lifecycle or static initialization risk found.
- Configuration: No new configuration items.
- Compatibility/persistence: No storage/EditLog format change. Existing persisted policies are not revalidated by this path, so no persistence compatibility issue was found.
- Parallel paths: CREATE/ALTER/DROP workload group paths were updated, but DROP's cloud-mode error text is inconsistent with the actual grammar.
- Conditional checks: The new
workload_groupsplit checks are directionally correct but incomplete for empty components/trailing dots. - Test coverage/results: Tests are unit-level and only cover rejection before Env lookup; no end-to-end DDL regression test is added, and the malformed empty-segment cases are uncovered. I did not run tests in this review.
- Observability: No new observability needed for these validation-only changes.
- Transactions/data writes: No transaction or data write correctness issue found.
- FE/BE variables: No new FE/BE protocol variables.
- Performance: No meaningful performance concern found.
User focus: No additional user-provided review focus was present.
| String originCgStr = computeGroup; | ||
| if (StringUtils.isEmpty(computeGroup)) { | ||
| computeGroup = Tag.VALUE_DEFAULT_COMPUTE_GROUP_NAME; | ||
| throw new UserException("Must specify compute group via 'FROM <compute_group>' " |
There was a problem hiding this comment.
The parser still accepts DROP WORKLOAD GROUP ... FOR <compute_group> and does not accept FROM here (DorisParser.g4 line 354). If a cloud user omits the clause and follows this error message, the suggested statement will fail to parse. Please either change the grammar to support FROM or make this message say FOR <compute_group> consistently.
| throw new UserException("Must specify compute group via 'FROM <compute_group>' " | |
| throw new UserException("Must specify compute group via 'FOR <compute_group>' " |
|
|
||
| import com.google.common.annotations.VisibleForTesting; | ||
| import com.google.common.base.Preconditions; | ||
| import com.google.common.collect.ImmutableList; |
There was a problem hiding this comment.
This validation still lets malformed values through because String.split("\\.") discards trailing empty segments. For example, in non-cloud mode workload_group='wg.' becomes ['wg'] and is accepted even though the new contract says only <workload_group> with no qualifier is allowed; in cloud mode .wg has length 2 and reaches the compute-group lookup instead of being rejected as an invalid <compute_group>.<workload_group> form. Please preserve empty segments, for example with split("\\.", -1), and reject empty cg/wg components before lookup; add tests for these cases.
… FOR grammar and reject empty dot segments Two correctness/usability issues raised in the /review run on PR apache#63505: 1. DropWorkloadGroupCommand suggested 'FROM <compute_group>' in cloud mode, but the parser (DorisParser.g4 line 354) only accepts FOR for DROP WORKLOAD GROUP. Following the suggested text would lead the user to a syntax error. Switched both the missing-qualifier and not-supported messages to 'FOR' so they match the grammar and the wording in CREATE/ALTER. 2. WorkloadSchedPolicyMgr#checkProperties used split("\\.") which silently discards trailing empty segments. Inputs like 'wg.' in non-cloud mode collapsed to ["wg"] and slipped past the strict-form check, and '.wg' in cloud mode had length 2 and reached the compute-group lookup with an empty cg. Switched to split("\\.", -1) and reject empty cg/wg components before any Env lookup. Added three regression tests covering trailing-dot and leading-dot in both modes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
run buildall |
TPC-H: Total hot run time: 31565 ms |
|
/review |
TPC-DS: Total hot run time: 172847 ms |
There was a problem hiding this comment.
Review result: no additional blocking issues found.\n\nCritical checkpoint conclusions:\n- Goal and proof: The PR enforces explicit compute-group qualification for workload group DDLs/policy binding in cloud mode and rejects qualifiers in non-cloud mode. The added WorkloadSchedPolicyMgr tests cover invalid policy property formats, including the previously raised dot-segment cases.\n- Scope/focus: The changes are focused on workload group command validation and workload policy property validation.\n- Concurrency/lifecycle: No new shared mutable state, threads, locks, or non-trivial lifecycle management introduced.\n- Config/compatibility: No new config items or storage/protocol formats. This is a deliberate user-visible validation change.\n- Parallel paths: CREATE, ALTER, DROP workload group and workload policy property validation are all updated consistently for cloud/non-cloud behavior. Existing review context for DROP grammar wording and split trailing-empty handling is resolved in the current head.\n- Error handling: UserException/AnalysisException usage follows the existing FE command patterns; user-facing messages identify the expected form.\n- Tests: Unit coverage was added for workload policy property validation, including the existing review-thread cases. I did not run tests in this automated review pass.\n- Observability/performance/persistence/data correctness: No new observability needs identified; no hot-path performance, EditLog, transaction, or visible-version concerns found.\n\nUser focus: No additional user-provided review focus was specified.
FE UT Coverage ReportIncrement line coverage |
| } else { | ||
| if (StringUtils.isEmpty(computeGroup)) { | ||
| computeGroup = Tag.DEFAULT_BACKEND_TAG.value; | ||
| if (!StringUtils.isEmpty(computeGroup)) { |
There was a problem hiding this comment.
这个修复不对
我们再非cloud mode 下,也是支持for 来指定的。
只是不指定的时候,默认就是Tag.DEFAULT_BACKEND_TAG.value;
…ce group Per PR apache#63505 review: in non-cloud mode the FOR <compute_group> clause is also supported syntactically — the value refers to a resource group (Tag), with Tag.VALUE_DEFAULT_TAG as the default when omitted. The grammar is shared with cloud mode purely for consistency. Added comments at each site explaining the duality. Also extends the workload_group property in WorkloadSchedPolicyMgr to accept the '<resource_group>.<workload_group>' form in non-cloud mode, while still rejecting empty segments and over-qualified inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
run buildall |
|
/review |
There was a problem hiding this comment.
Reviewed PR #63505 at 966e2d2.
No blocking correctness issues found in the current diff. The previous review concerns are addressed in the current head: DROP workload group messages now match the FOR grammar, workload_group parsing preserves trailing empty dot segments and rejects empty components, and non-cloud FOR is now treated as a resource group/tag instead of being rejected.
Critical checkpoint conclusions:
- Goal/test evidence: The change enforces explicit cloud workload-group qualification and preserves non-cloud resource-group behavior. Added unit tests cover malformed workload_group property forms; positive Env-backed cloud/non-cloud lookups are not covered.
- Scope/focus: The changes are limited to workload group DDL resolution, workload policy property validation, and focused tests.
- Concurrency/lifecycle: No new shared mutable state or lifecycle-sensitive objects were introduced beyond temporary ConnectContext construction for validation.
- Configuration/compatibility: No new config or storage/protocol format changes. Existing policy persistence stores workload group IDs, so no new replay path was identified.
- Parallel paths: CREATE/ALTER/DROP workload group and workload policy property validation were reviewed together for cloud/non-cloud behavior.
- Error handling: Invalid/missing cloud compute group forms now raise UserException before fallback to default names; dot-format errors are rejected before lookup.
- Test coverage: Targeted tests cover invalid format paths. I attempted
./run-fe-ut.sh --run org.apache.doris.resource.workloadschedpolicy.WorkloadSchedPolicyMgrTest, but it could not proceed becausethirdparty/installed/bin/protocis missing in the runner. - Observability/performance: No new hot-path or observability concerns found; this remains validation/control-plane logic.
User focus: No additional user-provided review focus was specified.
TPC-H: Total hot run time: 31932 ms |
TPC-DS: Total hot run time: 172816 ms |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
…#63505) Cloud mode and non-cloud mode used to share fuzzy fallback behaviour for the compute-group qualifier on workload DDLs. The fallback misbehaved when the user omitted the qualifier in cloud mode: CREATE WORKLOAD POLICY ... 'workload_group'='superset' -> "Unable to find the compute group: <default_compute_group>" DROP WORKLOAD GROUP superset -> "Can not find workload group superset in compute group ." The literal Tag.VALUE_DEFAULT_COMPUTE_GROUP_NAME was being passed downstream as if it were a real cluster name. Enforce a strict contract instead of papering over with a session-derived fallback: - Cloud mode : require '<compute_group>.<workload_group>' for workload policy 'workload_group' property, and require FOR/FROM clause on CREATE / ALTER / DROP WORKLOAD GROUP. - Non-cloud mode: forbid the qualifier on all four; use Tag.VALUE_DEFAULT_TAG internally. Both invalid inputs now surface clear UserException messages that tell the user how to spell the statement correctly. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FE Regression Coverage ReportIncrement line coverage |
…oad DDLs (#3763) ## Summary Documents the contract change introduced by apache/doris#63505, cross-checked against the actual Java code and `DorisParser.g4` grammar (not just the PR description): - **Cloud mode (storage-compute decoupled)**: - `CREATE / ALTER / DROP WORKLOAD GROUP` must explicitly carry the `FOR <compute_group>` clause; omitting it now raises `Must specify compute group via 'FOR <compute_group>' in cloud mode.` - The `workload_group` property of `CREATE / ALTER WORKLOAD POLICY` must use the fully qualified `<compute_group>.<workload_group>` form; bare names, extra dots, and empty segments are rejected. - **Non-cloud mode (storage-compute coupled)**: - The `FOR` clause is optional; the value is a resource group (Tag), with the grammar shared with cloud mode for consistency. Omitting it falls back to `Tag.VALUE_DEFAULT_TAG`. - The `workload_group` property accepts either `<workload_group>` or `<resource_group>.<workload_group>`. ## Files updated (8 files × 4 locations = 32) EN current + EN version-4.x + ZH current + ZH version-4.x of: - `sql-manual/.../CREATE-WORKLOAD-GROUP.md` - `sql-manual/.../ALTER-WORKLOAD-GROUP.md` - `sql-manual/.../DROP-WORKLOAD-GROUP.md` - `sql-manual/.../CREATE-WORKLOAD-POLICY.md` (the EN file was a stale copy of CREATE WORKLOAD GROUP content; rewritten to the correct policy structure as part of this change) - `sql-manual/.../ALTER-WORKLOAD-POLICY.md` - `admin-manual/workload-management/workload-group-bind-compute-group.md` - `admin-manual/workload-management/workload-group.md` - `admin-manual/workload-management/sql-blocking.md` ## Test plan - [ ] Render the updated pages locally and verify the new caution / note blocks render correctly. - [ ] Verify internal cross-links (no `.md` / `.mdx` extensions) resolve. - [ ] Verify EN current and EN version-4.x stay byte-identical for these 8 files. - [ ] Verify ZH current and ZH version-4.x stay byte-identical for these 8 files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cloud mode and non-cloud mode used to share fuzzy fallback behaviour for the
compute-group qualifier on workload DDLs. The fallback misbehaved when the user
omitted the qualifier in cloud mode:
CREATE WORKLOAD POLICY ... 'workload_group'='superset'
-> "Unable to find the compute group: <default_compute_group>"
DROP WORKLOAD GROUP superset
-> "Can not find workload group superset in compute group ."
The literal Tag.VALUE_DEFAULT_COMPUTE_GROUP_NAME was being passed downstream as
if it were a real cluster name.
Enforce a strict contract instead of papering over with a session-derived
fallback:
Cloud mode : require '<compute_group>.<workload_group>' for workload
policy 'workload_group' property, and require FOR/FROM
clause on CREATE / ALTER / DROP WORKLOAD GROUP.
Non-cloud mode: forbid the qualifier on all four; use Tag.VALUE_DEFAULT_TAG
internally.
Both invalid inputs now surface clear UserException messages that tell the user
how to spell the statement correctly.