Skip to content

[fix](streaming-job) reject silent-no-op ALTER keys and unsupported load.* properties#62680

Merged
JNSimba merged 3 commits intoapache:masterfrom
JNSimba:fix/streaming-job-reject-unsupported-props
Apr 29, 2026
Merged

[fix](streaming-job) reject silent-no-op ALTER keys and unsupported load.* properties#62680
JNSimba merged 3 commits intoapache:masterfrom
JNSimba:fix/streaming-job-reject-unsupported-props

Conversation

@JNSimba
Copy link
Copy Markdown
Member

@JNSimba JNSimba commented Apr 21, 2026

What problem does this PR solve?

Three categories of CREATE/ALTER JOB properties are accepted by the
validator but silently ignored by the streaming runtime. Users set them,
see no error, yet observe no effect — an easy source of confusion and
hard-to-diagnose drift. Reject them upfront so the failure is visible.

  1. from-to path: schema is a PG source-identity namespace (peer of
    database). The TVF path (checkUnmodifiableProperties / cdc_stream)
    already rejects it; the from-to path (checkUnmodifiableSourceProperties)
    did not. Now aligned.
  2. cdc_stream TVF path: snapshot_split_key, snapshot_split_size
    and snapshot_parallelism are materialized into split metadata
    (remainingSplits, chunkHighWatermarkMap) on first fetch and never
    re-read at runtime. ALTER on these is a silent no-op; reject.
  3. target properties (load.*): StreamingMultiTblTask#generateStreamLoadProps
    only honors load.max_filter_ratio and load.strict_mode. Any other
    load.* (e.g. load.where, load.columns, load.merge_type) is
    accepted by DataSourceConfigValidator.validateTarget but dropped at
    runtime. Added an allow-list and reject unsupported load.* keys.

Related: DORIS-25224, DORIS-25225, DORIS-25227.

Release note

Reject ALTER JOB on `schema` (from-to path) and on
`snapshot_split_key` / `snapshot_split_size` / `snapshot_parallelism`
(cdc_stream TVF path), as these changes are ignored by the runtime.
Also reject unsupported `load.*` target properties; only
`load.max_filter_ratio` and `load.strict_mode` are honored.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test
    • No need to test
  • Behavior changed:

    • Yes.
  • Does this need documentation?

    • Yes.
    • No.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 21, 2026

/review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens streaming job validation to reject job properties that are currently accepted but silently ignored by the streaming runtime, making misconfigurations fail fast and visible to users.

Changes:

  • Disallow ALTER on schema in the FROM-TO (non-TVF) source properties path to align with existing TVF validation.
  • Disallow ALTER on snapshot_split_key / snapshot_split_size / snapshot_parallelism for cdc_stream TVF jobs since they are not re-read at runtime.
  • Introduce an allow-list for load.* target properties (only load.max_filter_ratio and load.strict_mode), plus regression coverage for the new rejections.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
regression-test/suites/job_p0/streaming_job/cdc/tvf/test_streaming_job_cdc_stream_mysql.groovy Adds regression coverage that ALTER on snapshot_* TVF properties is rejected.
regression-test/suites/job_p0/streaming_job/cdc/test_streaming_mysql_job_create_alter.groovy Adds regression coverage for rejecting unsupported load.* keys at CREATE and rejecting ALTER of schema.
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/AlterJobCommand.java Enforces schema as unmodifiable in FROM-TO ALTER and adds snapshot_* to unmodifiable cdc_stream TVF properties.
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/DataSourceConfigValidator.java Adds load.* allow-listing and rejects unsupported load.* keys during target validation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 21, 2026

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking issue.

  1. snapshot_parallelism is still live in the cdc_stream path. Although snapshot_split_key and snapshot_split_size are baked into persisted split metadata, JdbcTvfSourceOffsetProvider.ensureInitialized() refreshes snapshotParallelism from the current TVF properties and JdbcSourceOffsetProvider.getNextOffset() uses that field to batch remainingSplits for the next snapshot task. So a paused mid-snapshot job can still change post-resume behavior after ALTER, which means the new rejection and test overstate it as a silent no-op.

Critical checkpoints:

  • Goal: Partially accomplished. The PR correctly tightens unsupported load.* validation and aligns schema immutability for the FROM...TO path, but the snapshot_parallelism part does not match runtime behavior.
  • Scope/minimality: Mostly focused, but snapshot_parallelism was grouped with two truly materialized snapshot keys even though its lifecycle is different.
  • Concurrency: No new lock-order or thread-safety regression found in the touched code. The problem is the existing scheduler/provider interaction, not a new race.
  • Lifecycle: Relevant. The TVF offset provider is re-initialized from current TVF props during runtime, which is why snapshot_parallelism is still effective.
  • New configs: None.
  • Compatibility/behavior change: Yes; as written this introduces a user-visible rejection for a property that can still affect resumed jobs.
  • Parallel paths: FROM...TO create/alter validation looks consistent; the mismatch is specific to the cdc_stream TVF path.
  • Special conditions/comments: The new snapshot_* are materialized and never re-read comment is not true for snapshot_parallelism.
  • Tests: Regression coverage was added, but the new TVF test only asserts rejection and does not cover the mid-snapshot resume case where snapshot_parallelism can still matter. I did not run the regression suite in this review environment.
  • Test results/output files: No .out changes in this PR.
  • Observability: No additional logging or metrics are needed for this validator change.
  • Transaction/persistence/data writes: No persistence-format issue found in this patch.
  • FE/BE variable passing: No new FE-BE variables are involved.
  • Performance: No meaningful performance regression in the validation logic itself.
  • Other issues: None beyond the snapshot_parallelism mismatch.

Please either keep snapshot_parallelism mutable here, or freeze it in the runtime before rejecting ALTER as a no-op.

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/16) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/16) 🎉
Increment coverage report
Complete coverage report

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 22, 2026

run external

JNSimba and others added 2 commits April 22, 2026 10:50
…oad.* properties

Three categories of CREATE/ALTER JOB keys are accepted but silently ignored
by the streaming runtime, leaving users wondering why their config has no
effect. Reject them upfront:

- from-to path: reject ALTER on schema (PG source-identity namespace).
  The TVF path already rejects it; the from-to path is now aligned.
- cdc_stream TVF path: reject ALTER on snapshot_split_key /
  snapshot_split_size / snapshot_parallelism. These are materialized into
  split metadata at first fetch and never re-read, so ALTER would be a
  silent no-op.
- target properties: whitelist load.max_filter_ratio and load.strict_mode;
  reject any other load.* key (e.g. load.where, load.columns) since
  StreamingMultiTblTask forwards only those two.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HashSet iteration order is non-deterministic, so the error message
rendered from ALLOW_LOAD_KEYS.toString() was unstable across runs.
Switch to ImmutableSortedSet so the rendered set is always sorted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JNSimba JNSimba force-pushed the fix/streaming-job-reject-unsupported-props branch from 81884b3 to aad5a97 Compare April 22, 2026 02:50
@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 22, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/23) 🎉
Increment coverage report
Complete coverage report

Resolve conflicts with master:
- DataSourceConfigValidator: merge ImmutableSortedSet (PR) and
  Jackson ObjectMapper (master, apache#62490) imports.
- AlterJobCommand cdc_stream branch: union the unmodifiable property
  list — keep PR's snapshot_split_key / snapshot_split_size /
  snapshot_parallelism alongside master's slot_name / publication_name
  (apache#62526).
- test_streaming_mysql_job_create_alter: keep both PR's schema rejection
  case and master's snapshot_*/exclude_columns/target_table cases (apache#62553).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 27, 2026

run buildall

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 27, 2026

run p0

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 27, 2026

run cloud_p0

@JNSimba
Copy link
Copy Markdown
Member Author

JNSimba commented Apr 27, 2026

run nonConcurrent

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/60) 🎉
Increment coverage report
Complete coverage report

1 similar comment
@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/60) 🎉
Increment coverage report
Complete coverage report

Copy link
Copy Markdown
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@JNSimba JNSimba merged commit 284caa2 into apache:master Apr 29, 2026
31 of 32 checks passed
github-actions Bot pushed a commit that referenced this pull request May 2, 2026
…oad.* properties (#62680)

### What problem does this PR solve?

Three categories of CREATE/ALTER JOB properties are accepted by the
validator but silently ignored by the streaming runtime. Users set them,
see no error, yet observe no effect — an easy source of confusion and
hard-to-diagnose drift. Reject them upfront so the failure is visible.

1. **from-to path**: `schema` is a PG source-identity namespace (peer of
`database`). The TVF path (`checkUnmodifiableProperties` / `cdc_stream`)
already rejects it; the from-to path
(`checkUnmodifiableSourceProperties`)
   did not. Now aligned.
2. **cdc_stream TVF path**: `snapshot_split_key`, `snapshot_split_size`
   and `snapshot_parallelism` are materialized into split metadata
   (`remainingSplits`, `chunkHighWatermarkMap`) on first fetch and never
   re-read at runtime. ALTER on these is a silent no-op; reject.
3. **target properties (`load.*`)**:
`StreamingMultiTblTask#generateStreamLoadProps`
   only honors `load.max_filter_ratio` and `load.strict_mode`. Any other
   `load.*` (e.g. `load.where`, `load.columns`, `load.merge_type`) is
   accepted by `DataSourceConfigValidator.validateTarget` but dropped at
   runtime. Added an allow-list and reject unsupported `load.*` keys.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.x

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants