Skip to content

[Improve](StreamingJob) Support schema change for PostgreSQL streaming job#61182

Merged
JNSimba merged 7 commits intoapache:masterfrom
JNSimba:cdc-schemachange
Mar 12, 2026
Merged

[Improve](StreamingJob) Support schema change for PostgreSQL streaming job#61182
JNSimba merged 7 commits intoapache:masterfrom
JNSimba:cdc-schemachange

Conversation

@JNSimba
Copy link
Member

@JNSimba JNSimba commented Mar 10, 2026

What problem does this PR solve?

Summary

Added Schema Change support to the CDC pipeline of PostgreSQL Streaming Jobs, enabling Doris target tables to automatically follow DDL changes (ADD COLUMN / DROP COLUMN) from upstream PostgreSQL without manual intervention.

Background

Unlike MySQL Binlog, PostgreSQL WAL does not contain explicit DDL events. Schema Changes can only be detected by diffing the afterSchema field in the DML record with the locally cached schema.

Implementation

Detection process (three stages):

  1. First diff (memory, name comparison): Compares the afterSchema field name of the current DML record with the cached tableSchemas. If a difference is found, proceeds to the next step.

  2. JDBC refresh: Fetches the current real-time schema via PostgreSQL JDBC (fresh). 3. Second diff (exact comparison): Based on the afterSchema (not fresh), it only processes column changes already perceived in the current DML record, avoiding premature execution of subsequent DDL changes for which no DML record has yet been generated.

  • ADD only → generates ALTER TABLE … ADD COLUMN

  • DROP only → generates ALTER TABLE … DROP COLUMN

  • ADD + DROP simultaneously → Rename Guard: If it's determined to be a potential column renaming, no DDL is executed; only the cache is updated, and a WARN log is printed prompting the user to manually execute RENAME in Doris.

Idempotency: SchemaChangeManager silently handles "Can not add column which already exists" / "Column does not exist" errors, ensuring retry safety.

Limitations

  • RENAME COLUMN not supported: If ADD + DROP simultaneously triggers Rename Guard, the DDL is skipped, requiring the user to manually execute ALTER TABLE … RENAME COLUMN in Doris. Data flow then automatically resumes. - MODIFY COLUMN type not supported: Type changes are not visible during the name diff stage, no DDL is generated, and the Doris column type remains unchanged.
  • MODIFY COLUMN is not supported: Column type changes are ignored by design. Since type modifications do not change column names, they cannot be detected during the name diff stage, and therefore no DDL will be generated.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@JNSimba
Copy link
Member Author

JNSimba commented Mar 10, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 27724 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ebd93268178d1004ccf1f8f65b3891e4025fec9b, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17682	4556	4328	4328
q2	q3	10645	782	513	513
q4	4674	370	252	252
q5	7532	1191	1032	1032
q6	172	173	143	143
q7	784	853	692	692
q8	9310	1465	1337	1337
q9	4938	4747	4678	4678
q10	6304	1888	1650	1650
q11	450	262	252	252
q12	738	572	468	468
q13	18061	2938	2202	2202
q14	227	225	227	225
q15	927	791	826	791
q16	776	740	686	686
q17	707	866	415	415
q18	5970	5273	5298	5273
q19	1119	998	603	603
q20	495	519	394	394
q21	4696	2117	1519	1519
q22	371	302	271	271
Total cold run time: 96578 ms
Total hot run time: 27724 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4637	4669	4576	4576
q2	q3	3901	4390	3817	3817
q4	930	1227	847	847
q5	4119	4366	4310	4310
q6	183	173	149	149
q7	1791	1655	1562	1562
q8	2527	2683	2531	2531
q9	7475	7446	7510	7446
q10	3736	4025	3634	3634
q11	515	452	422	422
q12	491	629	468	468
q13	2930	3237	2350	2350
q14	280	290	271	271
q15	859	818	804	804
q16	733	772	702	702
q17	1154	1466	1441	1441
q18	7242	6774	6544	6544
q19	905	906	926	906
q20	2138	2169	2025	2025
q21	3981	3495	3421	3421
q22	472	410	365	365
Total cold run time: 50999 ms
Total hot run time: 48591 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 153550 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ebd93268178d1004ccf1f8f65b3891e4025fec9b, data reload: false

query5	4379	637	508	508
query6	326	231	209	209
query7	4220	472	265	265
query8	351	270	234	234
query9	8733	2698	2718	2698
query10	519	381	340	340
query11	7385	5912	5580	5580
query12	189	131	128	128
query13	1265	477	364	364
query14	5675	3821	3639	3639
query14_1	2847	2827	2833	2827
query15	201	201	177	177
query16	987	465	472	465
query17	1116	746	627	627
query18	2445	458	358	358
query19	220	218	189	189
query20	144	132	134	132
query21	233	143	124	124
query22	4815	4893	4718	4718
query23	16637	16192	15834	15834
query23_1	16017	15959	15953	15953
query24	7535	1663	1270	1270
query24_1	1310	1289	1284	1284
query25	659	563	497	497
query26	1799	299	177	177
query27	2736	488	291	291
query28	4476	1869	1842	1842
query29	837	573	475	475
query30	316	248	213	213
query31	1359	1307	1233	1233
query32	78	75	69	69
query33	520	336	276	276
query34	922	924	575	575
query35	625	685	599	599
query36	1092	1136	973	973
query37	133	97	82	82
query38	2904	2948	2854	2854
query39	892	841	843	841
query39_1	827	822	874	822
query40	234	154	136	136
query41	65	61	60	60
query42	303	304	301	301
query43	248	248	234	234
query44	
query45	203	197	179	179
query46	898	988	615	615
query47	2126	2178	2041	2041
query48	323	324	237	237
query49	622	458	383	383
query50	689	284	210	210
query51	4173	4080	4055	4055
query52	289	293	283	283
query53	289	346	284	284
query54	294	270	270	270
query55	93	90	91	90
query56	322	318	323	318
query57	1356	1360	1271	1271
query58	288	282	275	275
query59	1385	1417	1282	1282
query60	343	346	315	315
query61	149	151	147	147
query62	637	595	540	540
query63	303	278	281	278
query64	5071	1296	999	999
query65	
query66	1459	462	362	362
query67	16458	16313	16299	16299
query68	
query69	390	305	286	286
query70	898	972	981	972
query71	339	294	308	294
query72	2913	2641	2484	2484
query73	554	571	324	324
query74	9995	9960	9795	9795
query75	2855	2745	2433	2433
query76	2297	1045	686	686
query77	370	397	313	313
query78	11197	11314	10679	10679
query79	3139	795	594	594
query80	1741	632	539	539
query81	567	285	252	252
query82	1004	155	118	118
query83	328	273	259	259
query84	253	114	107	107
query85	921	489	442	442
query86	407	310	296	296
query87	3166	3082	3020	3020
query88	3593	2686	2648	2648
query89	423	368	346	346
query90	1979	178	177	177
query91	161	158	136	136
query92	76	76	69	69
query93	1191	845	519	519
query94	648	325	288	288
query95	589	407	312	312
query96	644	530	233	233
query97	2455	2484	2439	2439
query98	247	219	227	219
query99	972	1005	924	924
Total cold run time: 237283 ms
Total hot run time: 153550 ms

@JNSimba JNSimba requested a review from Copilot March 10, 2026 10:06
@JNSimba
Copy link
Member Author

JNSimba commented Mar 10, 2026

/review

@JNSimba
Copy link
Member Author

JNSimba commented Mar 10, 2026

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds PostgreSQL CDC streaming-job schema change (ADD/DROP) support by detecting schema diffs from DML records, executing corresponding Doris DDLs, and persisting schema state for restart/retry safety.

Changes:

  • Introduced schema diffing, Doris DDL generation/execution, and persisted tableSchemas across FE ↔ cdc_client.
  • Added PostgreSQL-specific deserializer logic that refreshes PG schema via JDBC upon detecting schema changes.
  • Added regression tests (basic + advanced) and unit tests for PG type mapping.

Reviewed changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
regression-test/suites/job_p0/streaming_job/cdc/test_streaming_postgres_job_sc_advanced.groovy Adds advanced PG schema-change regression scenarios (offset=latest, double-ADD, rename guard, defaults, NOT NULL).
regression-test/suites/job_p0/streaming_job/cdc/test_streaming_postgres_job_sc.groovy Adds baseline PG schema-change regression coverage (ADD/DROP/RENAME/MODIFY).
regression-test/data/job_p0/streaming_job/cdc/test_streaming_postgres_job_sc_advanced.out Golden output for advanced regression assertions.
regression-test/data/job_p0/streaming_job/cdc/test_streaming_postgres_job_sc.out Golden output for baseline regression assertions.
fs_brokers/cdc_client/src/test/java/org/apache/doris/cdcclient/utils/SchemaChangeHelperTest.java Unit tests for PG type-name → Doris-type mapping used during DDL generation.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/utils/SchemaChangeManager.java Executes schema-change DDLs on Doris FE and swallows idempotent errors for retries.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/utils/SchemaChangeHelper.java Builds ALTER TABLE SQL and maps PG column metadata to Doris column types.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/utils/HttpUtil.java Centralizes Basic auth header used by FE HTTP calls.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/reader/postgres/PostgresSourceReader.java Injects PG schema refresher into deserializer and adds JDBC per-table schema refresh helper.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/reader/mysql/MySqlSourceReader.java Switches to new deserializer result type and enables MySQL schema-change emission.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/reader/SourceReader.java Changes deserialize contract to return DeserializeResult and adds schema persistence hooks.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/reader/JdbcIncrementalSourceReader.java Loads/persists table schemas for stream split startup and restart correctness.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/reader/AbstractCdcSourceReader.java Implements schema JSON serialize/deserialize and in-memory schema application logic.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/deserialize/SourceRecordDeserializer.java Adds setTableSchemas hook for schema-aware deserializers.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/deserialize/PostgresDebeziumJsonDeserializer.java Detects PG schema changes via after-schema diff + JDBC refresh, generates Doris DDL, returns SCHEMA_CHANGE results.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/deserialize/MySqlDebeziumJsonDeserializer.java Introduces MySQL-specific deserializer stub for schema-change events.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/deserialize/DeserializeResult.java New unified deserialize result type carrying DML, DDLs, and updated schemas.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/deserialize/DebeziumJsonDeserializer.java Updates base deserializer to return DeserializeResult and hold tableSchemas.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/sink/HttpPutBuilder.java Reuses centralized Basic auth header helper.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/sink/DorisBatchStreamLoad.java Extends commitOffset to include persisted tableSchemas and parses FE JSON response code.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/service/PipelineCoordinator.java Executes DDLs before DML on SCHEMA_CHANGE results and persists updated schemas back to FE.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/common/DorisType.java Adds Doris type constants used by schema-change type mapping.
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/common/Constants.java Adds DORIS_TARGET_DB key for deserializer context.
fe/fe-core/src/main/java/org/apache/doris/job/offset/jdbc/JdbcSourceOffsetProvider.java Persists tableSchemas alongside offsets for restart/retry correctness.
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingMultiTblTask.java Passes persisted tableSchemas into cdc_client read/write requests.
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertJob.java Stores tableSchemas from commitOffset requests into the offset provider.
fe/fe-common/src/main/java/org/apache/doris/job/cdc/request/JobBaseRecordRequest.java Adds tableSchemas field to requests sent to cdc_client.
fe/fe-common/src/main/java/org/apache/doris/job/cdc/request/CommitOffsetRequest.java Adds tableSchemas field to FE commitOffset API payload.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary — PR #61182: PostgreSQL Schema Change Support for CDC Streaming Jobs

Overall this is a well-structured and carefully designed feature. The two-pass diff approach (fast in-memory name comparison → JDBC refresh → exact diff) is clever, the rename guard is a good safety measure, and the idempotent DDL execution handles retries gracefully. Tests are comprehensive.

I found 3 substantive issues (2 quoting/injection bugs, 1 potential NPE) and 2 minor nits. Details below.


Critical Checkpoint Conclusions (per code-review skill §1.3)

Checkpoint Conclusion
Goal ✅ Code accomplishes stated goal. PG ADD/DROP COLUMN detected and applied; RENAME guarded; tests prove all scenarios.
Modification size ✅ Reasonably scoped. New files are well-separated by concern.
Concurrency tableSchemas uses ConcurrentHashMap. Processing is single-threaded per task (writeRecords sequential). No lock concerns.
Lifecycle ✅ No circular references. JDBC connections use try-with-resources.
Config items ✅ None added.
Incompatible changes commitOffset signature changed but is internal-only API.
Parallel code paths ✅ MySQL path correctly mirrors PG structure. includeSchemaChanges=true intentional for future MySQL support.
Persistence / EditLog tableSchemas persisted via existing JdbcSourceOffsetProvider Gson path. Replay logic updated in replayIfNeed.
Test coverage ⚠️ Two comprehensive regression suites + unit tests for type mapping. Missing: unit tests for quoteDefaultValue, identifier, diffSchemaByName, buildAddColumnSql.
Observability ✅ Good INFO-level logging at detection, DDL execution, and idempotent skip.

Substantive Issues

  1. Potential NPE in PostgresDebeziumJsonDeserializerrefreshSingleTableSchema() can return null if the table was dropped between detection and JDBC refresh. The caller dereferences without null-check → NPE. See inline comment.

  2. quoteDefaultValue — unescaped single quotes — If a PG default value contains ' (e.g. it's), the generated SQL will be malformed: DEFAULT 'it's'. Need to escape: defaultValue.replace("'", "\\'"). See inline comment.

  3. identifier — unescaped backticks — A column named a`b produces `a`b` which is malformed Doris SQL. Need: name.replace("\", "``")`. See inline comment.

Minor Nits

  1. CommitOffsetRequest.java — new tableSchemas field is private while all 7 existing fields are public. Functionally fine (Lombok), but visually inconsistent.
  2. JobBaseRecordRequest.java — same pattern: new field private vs existing protected.

Things Done Well

  • Two-pass diff design avoids unnecessary JDBC round-trips for non-schema-change records
  • Rename guard (simultaneous ADD+DROP → skip + warn) is a safe default
  • DorisBatchStreamLoad.commitOffset improvement: checking JSON code field instead of just HTTP 200
  • tryLoadTableSchemasFromRequest() documentation explaining MySQL schema-mismatch is excellent
  • DeserializeResult with clear type enum and factory methods is clean design
  • Regression tests follow conventions (qt_ prefix, DROP before CREATE, ordered results)

@doris-robot
Copy link

TPC-H: Total hot run time: 27731 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ebd93268178d1004ccf1f8f65b3891e4025fec9b, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17683	4477	4325	4325
q2	q3	10645	788	517	517
q4	4761	354	258	258
q5	7564	1212	1002	1002
q6	171	175	147	147
q7	802	842	661	661
q8	9291	1474	1332	1332
q9	4725	4751	4743	4743
q10	6282	1904	1656	1656
q11	456	265	255	255
q12	736	565	483	483
q13	18045	2971	2175	2175
q14	235	231	212	212
q15	947	794	802	794
q16	772	724	676	676
q17	720	863	428	428
q18	6032	5400	5241	5241
q19	1155	991	623	623
q20	509	493	388	388
q21	4747	2113	1536	1536
q22	426	353	279	279
Total cold run time: 96704 ms
Total hot run time: 27731 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4579	4560	4710	4560
q2	q3	3900	4341	3854	3854
q4	882	1186	780	780
q5	4099	4509	4323	4323
q6	185	181	144	144
q7	1751	1613	1525	1525
q8	2485	2702	2561	2561
q9	7741	7451	7440	7440
q10	3767	3984	3600	3600
q11	501	440	418	418
q12	480	592	456	456
q13	2749	3351	2563	2563
q14	279	297	278	278
q15	840	833	838	833
q16	705	774	721	721
q17	1200	1500	1324	1324
q18	7272	6928	6640	6640
q19	969	916	916	916
q20	2077	2171	2021	2021
q21	3974	3498	3365	3365
q22	486	433	381	381
Total cold run time: 50921 ms
Total hot run time: 48703 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 153622 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ebd93268178d1004ccf1f8f65b3891e4025fec9b, data reload: false

query5	4322	656	532	532
query6	315	224	221	221
query7	4238	479	263	263
query8	353	257	246	246
query9	8701	2798	2746	2746
query10	521	380	336	336
query11	7281	5883	5558	5558
query12	184	130	131	130
query13	1270	458	346	346
query14	5658	3841	3550	3550
query14_1	2888	2880	2830	2830
query15	207	190	180	180
query16	978	471	461	461
query17	900	731	622	622
query18	2441	451	357	357
query19	212	217	185	185
query20	134	131	133	131
query21	231	144	122	122
query22	4841	5283	5133	5133
query23	16623	15992	15642	15642
query23_1	15902	15820	15952	15820
query24	7644	1672	1303	1303
query24_1	1275	1278	1286	1278
query25	664	517	499	499
query26	1258	290	207	207
query27	2708	461	288	288
query28	4517	1848	1842	1842
query29	843	581	494	494
query30	312	252	210	210
query31	1346	1299	1217	1217
query32	77	75	73	73
query33	506	333	276	276
query34	924	939	579	579
query35	650	660	596	596
query36	1087	1115	1020	1020
query37	129	101	92	92
query38	2915	2915	2898	2898
query39	927	862	859	859
query39_1	842	830	828	828
query40	243	152	136	136
query41	65	61	56	56
query42	304	301	303	301
query43	244	257	232	232
query44	
query45	205	189	185	185
query46	901	980	621	621
query47	2138	2110	2031	2031
query48	320	328	225	225
query49	628	459	385	385
query50	690	279	218	218
query51	4110	4103	4148	4103
query52	288	294	283	283
query53	295	337	278	278
query54	305	276	282	276
query55	93	91	84	84
query56	326	335	322	322
query57	1383	1374	1292	1292
query58	289	280	275	275
query59	1342	1459	1312	1312
query60	334	349	322	322
query61	149	145	164	145
query62	627	602	550	550
query63	310	285	273	273
query64	5124	1279	1017	1017
query65	
query66	1470	477	359	359
query67	16381	16369	16275	16275
query68	
query69	395	316	289	289
query70	985	964	964	964
query71	341	318	311	311
query72	2782	2621	2413	2413
query73	547	563	336	336
query74	9996	9928	9785	9785
query75	2889	2743	2476	2476
query76	2303	1029	661	661
query77	366	391	326	326
query78	11136	11277	10640	10640
query79	3238	817	609	609
query80	1732	647	541	541
query81	601	285	239	239
query82	1006	149	118	118
query83	340	281	247	247
query84	300	121	95	95
query85	908	487	450	450
query86	494	303	296	296
query87	3142	3106	2995	2995
query88	4440	2680	2655	2655
query89	429	365	345	345
query90	2158	180	175	175
query91	169	163	136	136
query92	90	76	72	72
query93	2799	820	503	503
query94	648	332	281	281
query95	584	338	331	331
query96	654	531	231	231
query97	2454	2519	2407	2407
query98	251	224	216	216
query99	1037	973	917	917
Total cold run time: 240200 ms
Total hot run time: 153622 ms

@JNSimba
Copy link
Member Author

JNSimba commented Mar 10, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 27793 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 03d43c8c72ad8cb545c1375690e76f1d3fafaa80, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17700	4518	4343	4343
q2	q3	10652	791	510	510
q4	4676	366	253	253
q5	7569	1183	1012	1012
q6	175	178	154	154
q7	777	844	685	685
q8	9310	1486	1322	1322
q9	4644	4727	4726	4726
q10	6253	1894	1637	1637
q11	482	254	242	242
q12	698	561	475	475
q13	18071	2975	2175	2175
q14	227	236	222	222
q15	928	790	802	790
q16	763	729	675	675
q17	718	872	401	401
q18	6012	5490	5346	5346
q19	1171	982	602	602
q20	500	494	388	388
q21	4661	2040	1552	1552
q22	398	347	283	283
Total cold run time: 96385 ms
Total hot run time: 27793 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4732	4571	4508	4508
q2	q3	3870	4360	3843	3843
q4	879	1216	784	784
q5	4057	4345	4330	4330
q6	179	178	151	151
q7	1780	1640	1551	1551
q8	2453	2696	2704	2696
q9	7549	7464	7383	7383
q10	3772	4045	3713	3713
q11	540	449	421	421
q12	503	579	477	477
q13	2734	3132	2283	2283
q14	282	291	277	277
q15	853	808	860	808
q16	733	769	776	769
q17	1145	1358	1347	1347
q18	7086	6765	6560	6560
q19	896	868	895	868
q20	2077	2263	1969	1969
q21	3928	3543	3361	3361
q22	462	481	392	392
Total cold run time: 50510 ms
Total hot run time: 48491 ms

@JNSimba
Copy link
Member Author

JNSimba commented Mar 10, 2026

/review

@doris-robot
Copy link

TPC-DS: Total hot run time: 152518 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 03d43c8c72ad8cb545c1375690e76f1d3fafaa80, data reload: false

query5	4343	644	528	528
query6	352	239	224	224
query7	4217	471	264	264
query8	353	253	241	241
query9	8697	2733	2708	2708
query10	507	394	375	375
query11	7339	5901	5565	5565
query12	189	128	127	127
query13	1272	456	358	358
query14	5564	3842	3563	3563
query14_1	2811	2806	2863	2806
query15	216	199	173	173
query16	989	481	462	462
query17	1106	735	629	629
query18	2464	456	355	355
query19	223	211	189	189
query20	139	127	130	127
query21	227	144	125	125
query22	4743	5037	4610	4610
query23	16551	15965	15661	15661
query23_1	15959	15807	15904	15807
query24	7420	1677	1294	1294
query24_1	1286	1255	1223	1223
query25	579	553	411	411
query26	1244	259	146	146
query27	2802	488	291	291
query28	4499	1836	1840	1836
query29	828	558	461	461
query30	319	248	208	208
query31	1341	1286	1218	1218
query32	77	77	79	77
query33	497	334	274	274
query34	951	929	565	565
query35	622	684	596	596
query36	1045	1131	1004	1004
query37	132	95	85	85
query38	2873	2937	2810	2810
query39	888	883	841	841
query39_1	818	843	839	839
query40	237	150	135	135
query41	63	61	57	57
query42	308	297	303	297
query43	238	259	228	228
query44	
query45	196	191	184	184
query46	885	1027	604	604
query47	2101	2105	2040	2040
query48	322	313	228	228
query49	641	455	384	384
query50	675	292	218	218
query51	4094	4187	4060	4060
query52	288	301	285	285
query53	293	330	280	280
query54	298	262	257	257
query55	94	90	85	85
query56	319	311	308	308
query57	1366	1354	1283	1283
query58	283	279	278	278
query59	1339	1472	1263	1263
query60	334	341	330	330
query61	150	145	150	145
query62	608	595	544	544
query63	313	278	281	278
query64	5050	1277	980	980
query65	
query66	1444	457	351	351
query67	16396	16532	16162	16162
query68	
query69	396	319	278	278
query70	1036	940	945	940
query71	331	316	299	299
query72	2843	2650	2515	2515
query73	542	544	319	319
query74	9999	9892	9755	9755
query75	2825	2743	2458	2458
query76	2297	1030	683	683
query77	346	362	344	344
query78	11202	11466	10649	10649
query79	1999	789	605	605
query80	1704	635	539	539
query81	576	272	243	243
query82	1024	151	117	117
query83	345	272	241	241
query84	252	118	92	92
query85	1205	473	438	438
query86	427	303	333	303
query87	3192	3074	2993	2993
query88	3499	2654	2618	2618
query89	425	369	343	343
query90	1874	184	173	173
query91	166	153	140	140
query92	81	73	71	71
query93	1705	822	512	512
query94	644	334	284	284
query95	590	385	321	321
query96	656	508	242	242
query97	2461	2509	2443	2443
query98	231	219	208	208
query99	974	971	934	934
Total cold run time: 235081 ms
Total hot run time: 152518 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/9) 🎉
Increment coverage report
Complete coverage report

@JNSimba
Copy link
Member Author

JNSimba commented Mar 10, 2026

run cloud_p0

@JNSimba
Copy link
Member Author

JNSimba commented Mar 11, 2026

run external

@JNSimba
Copy link
Member Author

JNSimba commented Mar 11, 2026

run buildall

@JNSimba
Copy link
Member Author

JNSimba commented Mar 11, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 27504 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3a4c61476a751f13945d9dcba05361c9a27fc40e, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17679	4539	4295	4295
q2	q3	10651	768	508	508
q4	4677	374	251	251
q5	7553	1180	1024	1024
q6	178	180	154	154
q7	780	842	680	680
q8	9337	1465	1285	1285
q9	4836	4687	4730	4687
q10	6325	1888	1638	1638
q11	447	269	238	238
q12	778	575	472	472
q13	18082	2955	2198	2198
q14	230	223	218	218
q15	940	814	819	814
q16	748	722	681	681
q17	734	861	400	400
q18	6118	5386	5156	5156
q19	1131	996	625	625
q20	495	505	389	389
q21	4456	2083	1486	1486
q22	393	327	305	305
Total cold run time: 96568 ms
Total hot run time: 27504 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4623	4570	4544	4544
q2	q3	3881	4460	3883	3883
q4	883	1214	799	799
q5	4043	4373	4346	4346
q6	186	170	139	139
q7	1792	1677	1581	1581
q8	2484	2735	2565	2565
q9	7545	7295	7398	7295
q10	3730	3949	3664	3664
q11	529	446	429	429
q12	486	601	477	477
q13	2814	3228	2354	2354
q14	289	299	280	280
q15	856	818	870	818
q16	756	774	718	718
q17	1181	1435	1433	1433
q18	7061	6857	6543	6543
q19	906	857	865	857
q20	2078	2184	2027	2027
q21	3878	3527	3291	3291
q22	483	414	374	374
Total cold run time: 50484 ms
Total hot run time: 48417 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 153571 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3a4c61476a751f13945d9dcba05361c9a27fc40e, data reload: false

query5	4324	638	528	528
query6	336	234	212	212
query7	4221	475	279	279
query8	354	258	240	240
query9	8743	2770	2733	2733
query10	505	394	351	351
query11	7433	5892	5633	5633
query12	188	129	125	125
query13	1263	450	369	369
query14	5749	3862	3547	3547
query14_1	2863	2846	2893	2846
query15	203	196	177	177
query16	998	468	454	454
query17	1112	749	619	619
query18	2453	461	363	363
query19	226	216	190	190
query20	137	134	134	134
query21	229	145	124	124
query22	4743	4856	4778	4778
query23	15857	15679	15323	15323
query23_1	15448	16298	16045	16045
query24	8191	1684	1319	1319
query24_1	1280	1295	1281	1281
query25	603	559	469	469
query26	1660	291	171	171
query27	2841	499	312	312
query28	4975	1949	1946	1946
query29	952	653	527	527
query30	327	263	239	239
query31	1417	1352	1293	1293
query32	83	78	81	78
query33	527	355	291	291
query34	1276	992	598	598
query35	664	694	592	592
query36	1098	1092	983	983
query37	134	98	86	86
query38	3065	2930	2852	2852
query39	905	880	848	848
query39_1	828	835	837	835
query40	229	153	137	137
query41	65	60	59	59
query42	314	300	302	300
query43	246	241	222	222
query44	
query45	199	189	188	188
query46	904	990	615	615
query47	2157	2176	2067	2067
query48	317	325	231	231
query49	618	460	390	390
query50	679	280	214	214
query51	4155	4110	4134	4110
query52	289	291	289	289
query53	293	337	292	292
query54	300	278	283	278
query55	90	86	84	84
query56	316	329	310	310
query57	1380	1344	1272	1272
query58	287	284	277	277
query59	1328	1463	1265	1265
query60	344	326	331	326
query61	149	148	149	148
query62	639	606	545	545
query63	308	280	273	273
query64	5081	1281	985	985
query65	
query66	1455	478	361	361
query67	16422	16424	16277	16277
query68	
query69	402	303	282	282
query70	1018	984	971	971
query71	340	315	317	315
query72	2802	2688	2395	2395
query73	546	558	332	332
query74	10003	9949	9759	9759
query75	2871	2767	2455	2455
query76	2276	1039	686	686
query77	368	395	318	318
query78	11168	11334	10677	10677
query79	2680	810	640	640
query80	1739	641	569	569
query81	570	288	246	246
query82	1009	154	121	121
query83	331	271	245	245
query84	251	122	100	100
query85	899	474	432	432
query86	422	328	285	285
query87	3187	3113	3019	3019
query88	3559	2678	2648	2648
query89	421	378	345	345
query90	2022	187	186	186
query91	171	159	138	138
query92	76	72	66	66
query93	1074	837	511	511
query94	646	336	312	312
query95	577	344	388	344
query96	640	518	229	229
query97	2490	2457	2439	2439
query98	234	216	215	215
query99	999	942	900	900
Total cold run time: 237599 ms
Total hot run time: 153571 ms

@JNSimba
Copy link
Member Author

JNSimba commented Mar 11, 2026

run external

@JNSimba JNSimba requested a review from Copilot March 11, 2026 07:34
@JNSimba
Copy link
Member Author

JNSimba commented Mar 11, 2026

/review

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PostgreSQL CDC Schema Change Support

Reviewed all 28 changed files (2120+ additions). The overall design is solid — the three-stage detection (name diff → JDBC refresh → filtered diff), rename guard, and idempotent DDL execution are well thought out. Below are findings organized by the critical checkpoints from our review guidelines.

Critical Checkpoint Conclusions

1. Goal & correctness: The PR accomplishes its goal of enabling ADD/DROP COLUMN detection for PG CDC. The rename guard (simultaneous ADD+DROP → skip DDL) is a reasonable safety measure. The limitation of not supporting MODIFY COLUMN type is explicitly documented and acceptable given the Kafka Connect type ambiguity.

2. Modification scope: The change is focused and well-structured. DeserializeResult cleanly replaces the previous List<String> return type. The AbstractCdcSourceReader base class properly abstracts common schema tracking logic.

3. Concurrency: tableSchemas uses ConcurrentHashMap ✅. applySchemaChange() uses putAll (non-atomic) but is only called from the single-threaded writeRecords() path in PipelineCoordinator ✅. The writeRecordsforceFlush → DDL → apply → continue-DML ordering is correct and ensures data consistency.

4. Lifecycle management: Schema lifecycle is clean: FE persists via @SerializedName("ts") in JdbcSourceOffsetProvider, replayed on restart via replayIfNeed(), loaded into CDC client via tryLoadTableSchemasFromRequest(). No circular references detected.

5. Configuration items: No new user-facing configuration items added (schema change detection is automatic). ✅

6. Incompatible changes: The commitOffset signature changed to accept tableSchemas parameter. The DeserializeResult replaces List<String>. Both are internal CDC client APIs, not public interfaces. The @SerializedName("ts") field in JdbcSourceOffsetProvider is additive and backward-compatible (Gson ignores unknown fields on deserialization). ✅

7. Parallel code paths: MySQL path (MySqlDebeziumJsonDeserializer) has a TODO for handleSchemaChangeEvent that returns empty(). This is pre-existing behavior (previously returned Collections.emptyList()). Not a regression.

8. Transaction & persistence: tableSchemas is persisted atomically with offset in commitOffset() under writeLock() in StreamingInsertJob.java:1148-1173. FE restart correctly restores via replayIfNeed(). ✅

9. Test coverage: Two comprehensive regression tests cover: snapshot+binlog schema change, ADD/DROP/rename-guard scenarios, offset=latest (no snapshot) path, double ADD, DROP+ADD rename guard, UPDATE after rename guard, ADD with DEFAULT, ADD NOT NULL with DEFAULT. Unit tests cover PG type mapping. ✅ The .out files have the auto-generated header comment.

10. Observability: Good INFO-level logging at key decision points (DDL detection, execution, schema serialization). WARN-level for rename guard skip and idempotent DDL skip. ✅

11. Performance: refreshSingleTableSchema opens a new JDBC connection per schema change detection event. This is acceptable since schema changes are rare. No hot-path performance concerns.

Issues Found

See inline comments for specific issues. Summary:

  1. [Medium] Hardcoded auth header in HttpUtil.getAuthHeader()"Basic YWRtaW46" (base64 of admin:). Pre-existing pattern, but now centralized. Should be parameterized or at minimum documented.

  2. [Low] refreshSingleTableSchema swallows exception context — wraps in bare RuntimeException instead of IOException, which means the IOException catch in PostgresDebeziumJsonDeserializer.deserialize() won't catch JDBC failures from the refresher. The RuntimeException will propagate uncaught up to PipelineCoordinator.writeRecords() where it's caught by the generic Exception handler — so it's functionally safe, but the error message could be improved.

  3. [Suggestion] bpchar UTF-8 multiplier — multiplying length by 3 is reasonable for UTF-8 but could exceed Doris CHAR(255) limit. The code handles this by switching to VARCHAR when len > 255, which is correct.

  4. [Info] commitOffset response parsing — now requires code == 0 in response body (previously just HTTP 200). This is stricter but correct — it properly validates the FE response.

@JNSimba
Copy link
Member Author

JNSimba commented Mar 11, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 27731 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8e11fed9a3dcbc3544cb009160bef62ec74ef33b, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17627	4489	4304	4304
q2	q3	10637	792	511	511
q4	4673	363	266	266
q5	7548	1195	1021	1021
q6	176	174	145	145
q7	780	848	667	667
q8	9297	1462	1336	1336
q9	4834	4784	4754	4754
q10	6256	1903	1664	1664
q11	470	253	239	239
q12	715	580	468	468
q13	18034	2902	2210	2210
q14	239	228	219	219
q15	893	792	797	792
q16	743	728	657	657
q17	711	854	439	439
q18	6084	5483	5274	5274
q19	1236	1014	590	590
q20	497	509	386	386
q21	4739	2174	1510	1510
q22	424	322	279	279
Total cold run time: 96613 ms
Total hot run time: 27731 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4664	4628	4527	4527
q2	q3	3869	4336	3833	3833
q4	891	1205	783	783
q5	4104	4433	4361	4361
q6	178	174	140	140
q7	1753	1664	1488	1488
q8	2475	2665	2711	2665
q9	7678	7352	7429	7352
q10	3729	4121	3631	3631
q11	516	443	423	423
q12	473	573	437	437
q13	2740	3135	2340	2340
q14	279	285	272	272
q15	833	820	848	820
q16	746	800	800	800
q17	1198	1418	1379	1379
q18	7392	6829	6600	6600
q19	893	878	868	868
q20	2049	2164	2000	2000
q21	3964	3463	3376	3376
q22	467	437	388	388
Total cold run time: 50891 ms
Total hot run time: 48483 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 153314 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8e11fed9a3dcbc3544cb009160bef62ec74ef33b, data reload: false

query5	4342	631	539	539
query6	332	242	211	211
query7	4222	454	277	277
query8	355	256	240	240
query9	8699	2797	2777	2777
query10	537	388	354	354
query11	7416	5861	5531	5531
query12	188	131	125	125
query13	1272	461	363	363
query14	6308	3822	3576	3576
query14_1	2858	2831	2832	2831
query15	207	199	185	185
query16	983	468	461	461
query17	1122	715	613	613
query18	2575	457	360	360
query19	223	214	189	189
query20	142	135	133	133
query21	233	150	131	131
query22	4830	5090	4652	4652
query23	16001	15588	15430	15430
query23_1	15434	16300	15912	15912
query24	7317	1699	1323	1323
query24_1	1293	1294	1266	1266
query25	576	509	447	447
query26	1260	287	162	162
query27	2915	516	319	319
query28	4822	1938	2003	1938
query29	837	681	477	477
query30	318	255	211	211
query31	1366	1298	1220	1220
query32	80	76	75	75
query33	514	338	282	282
query34	931	940	574	574
query35	637	662	611	611
query36	1096	1137	999	999
query37	135	94	84	84
query38	2941	2939	2910	2910
query39	875	875	826	826
query39_1	833	831	827	827
query40	230	153	142	142
query41	62	60	58	58
query42	308	299	294	294
query43	239	253	226	226
query44	
query45	196	189	198	189
query46	879	978	612	612
query47	2139	2151	2068	2068
query48	309	313	229	229
query49	650	463	378	378
query50	684	273	206	206
query51	4141	4038	4055	4038
query52	295	293	280	280
query53	295	336	280	280
query54	308	274	265	265
query55	96	89	85	85
query56	331	329	332	329
query57	1359	1348	1290	1290
query58	307	283	279	279
query59	1359	1454	1260	1260
query60	354	337	328	328
query61	151	143	145	143
query62	621	596	541	541
query63	313	284	270	270
query64	4964	1308	1021	1021
query65	
query66	1423	471	354	354
query67	16428	16387	16303	16303
query68	
query69	381	327	288	288
query70	1025	895	1000	895
query71	338	312	310	310
query72	2794	2756	2623	2623
query73	545	544	318	318
query74	9976	9886	9795	9795
query75	2873	2766	2481	2481
query76	2291	1056	697	697
query77	383	404	339	339
query78	11190	11369	10671	10671
query79	1113	786	593	593
query80	864	662	531	531
query81	515	276	244	244
query82	1327	158	123	123
query83	365	265	246	246
query84	249	119	105	105
query85	952	477	440	440
query86	388	310	326	310
query87	3148	3107	3004	3004
query88	3509	2644	2643	2643
query89	430	373	341	341
query90	1757	186	185	185
query91	171	188	139	139
query92	81	76	71	71
query93	906	827	495	495
query94	507	313	306	306
query95	603	346	379	346
query96	646	516	226	226
query97	2464	2500	2401	2401
query98	236	230	220	220
query99	1016	993	920	920
Total cold run time: 233004 ms
Total hot run time: 153314 ms

@JNSimba
Copy link
Member Author

JNSimba commented Mar 11, 2026

run p0

@JNSimba
Copy link
Member Author

JNSimba commented Mar 11, 2026

run external

2 similar comments
@JNSimba
Copy link
Member Author

JNSimba commented Mar 11, 2026

run external

@JNSimba
Copy link
Member Author

JNSimba commented Mar 11, 2026

run external

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Mar 12, 2026
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@JNSimba JNSimba merged commit 8ccfa80 into apache:master Mar 12, 2026
28 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants