branch-4.1: [Improve](Streamingjob) support exclude_columns for Postgres streaming job #61267#61537
Merged
yiguolei merged 1 commit intobranch-4.1from Mar 20, 2026
Merged
Conversation
…g job (#61267) ### What problem does this PR solve? Add column-level filtering support for PostgreSQL CDC streaming jobs via the `table.<tableName>.exclude_columns` property. Users can specify a comma-separated list of columns to exclude from synchronization. **Syntax example:** ```sql CREATE JOB my_job ON STREAMING FROM POSTGRES ( ... "include_tables" = "my_table", "table.my_table.exclude_columns" = "secret,internal_col" ) TO DATABASE my_db (...) ``` #### Changes FE (validation & table creation) - DataSourceConfigKeys: add TABLE and TABLE_EXCLUDE_COLUMNS_SUFFIX constants - DataSourceConfigValidator: recognize table.<name>.exclude_columns as a valid per-table config key (using suffix allowlist) - StreamingJobUtils.generateCreateTableCmds(): parse excluded columns, validate they exist in the upstream PG table and are not PK columns, then exclude them from the Doris CREATE TABLE statement cdc_client (DML filtering & schema change handling) - ConfigUtil: add parseExcludeColumns(config, tableName) utility - DebeziumJsonDeserializer: skip excluded fields when building INSERT/UPDATE/DELETE rows - PostgresDebeziumJsonDeserializer: skip DROP/ADD DDL for excluded columns during schema change detection, so the Doris table is never modified for columns it was never meant to have #### Behavior | Scenario | Behavior | |--------------------------------|------------------------------------------------------------| | Snapshot / incremental DML | Excluded column values are not written to Doris | | PG DROP excluded column | DDL skipped; stored schema updated; sync continues | | PG ADD excluded column back | DDL skipped; sync continues; Doris never gains the column | | Exclude non-existent column | CREATE JOB fails with clear error | | Exclude PK column | CREATE JOB fails with clear error | #### Tests - test_streaming_postgres_job_col_filter.groovy: covers validation errors, snapshot filtering, incremental DML filtering, DROP excluded column, re-ADD excluded column; uses Awaitility polling instead of fixed sleeps
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
|
run buildall |
Contributor
FE UT Coverage ReportIncrement line coverage |
yiguolei
approved these changes
Mar 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cherry-picked from #61267