Skip to content

[fix](streaming-job) Support delete sign in CDC stream TVF#65146

Open
JNSimba wants to merge 5 commits into
apache:masterfrom
JNSimba:fix/cdc-stream-include-delete-sign
Open

[fix](streaming-job) Support delete sign in CDC stream TVF#65146
JNSimba wants to merge 5 commits into
apache:masterfrom
JNSimba:fix/cdc-stream-include-delete-sign

Conversation

@JNSimba

@JNSimba JNSimba commented Jul 2, 2026

Copy link
Copy Markdown
Member
  1. Added the include_delete_sign parameter to the CDC stream TVF; when enabled, it outputs __DORIS_DELETE_SIGN__.
  2. Added regression test cases for MySQL and PostgreSQL to cover insert, delete, and update paths involving jobs and TVFs.

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@JNSimba

JNSimba commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking regression-test issue on the refreshed head.

Critical checkpoints:

  • Goal: TVF schema exposure for CDC delete sign is implemented, but the new regression coverage cannot run in normal comparison mode because expected output files are missing.
  • Scope/focus: implementation is small and localized to CdcStreamTableValuedFunction plus tests.
  • Concurrency/lifecycle: no new concurrency or lifecycle management introduced.
  • Config/compatibility: include_delete_sign is a TVF property validated as boolean; no new FE image/edit-log field found.
  • Parallel paths: MySQL and PostgreSQL delete-sign paths are covered by the new suites in intent, but both need generated .out files.
  • Tests: FE unit test was attempted but blocked before Java execution by missing thirdparty/installed/bin/protoc; git diff --check passed.

Subagent conclusions:

  • optimizer-rewrite: no candidates.
  • tests-session-config: no candidates beyond MAIN-1 after convergence.
  • C1 was invalidated by the PR head moving from 173721a to 1b4595d.
  • C2 ended with both live subagents reporting NO_NEW_VALUABLE_FINDINGS for the current ledger/comment set.

User focus: no additional focus was provided.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Codex automated review failed and did not complete.

Error: Codex completed, but no new pull request review was submitted for the current head SHA.
Workflow run: https://github.com/apache/doris/actions/runs/28571002190

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

@JNSimba

JNSimba commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

/review

@JNSimba

JNSimba commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

run buildall

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review summary for PR 65146.

I did not find any additional non-duplicate inline issues in the effective four-file PR diff. The existing inline thread #65146 (comment) remains the blocking item: the two new CDC delete-sign regression suites add qt_* checks but the matching .out files are not present under regression-test/data/job_p0/streaming_job/cdc/tvf/.

Critical checkpoint conclusions:

  • Goal/test proof: the FE change exposes __DORIS_DELETE_SIGN__ only when include_delete_sign=true, and the new unit/regression tests target that behavior; regression expected outputs are still missing as already commented.
  • Scope: implementation is narrowly contained to CDC stream TVF schema exposure plus tests.
  • Concurrency/lifecycle: no new shared mutable concurrency or lifecycle ownership path is introduced.
  • Configuration/compatibility: the new TVF property is optional, strictly boolean-validated, defaults false, and is not persisted as new job state.
  • Parallel paths: MySQL and PostgreSQL CDC TVF paths are both covered; CDC client already emits the delete-sign field on READ/CREATE/UPDATE/DELETE records.
  • Test coverage/results: FE unit coverage exists, but the regression .out files must be generated and committed before the suites can pass.

Subagent conclusions: optimizer-rewrite and tests-session-config subagents found no new valuable candidates beyond the existing missing-output thread. Convergence round 1 ended with both subagents replying NO_NEW_VALUABLE_FINDINGS for the same ledger and proposed final comment set.

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 30057 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1b4595d671c542f14a4368415874fe6ca79f3850, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17743	4374	4201	4201
q2	2167	327	204	204
q3	10250	1453	838	838
q4	4681	469	336	336
q5	7538	855	567	567
q6	185	174	143	143
q7	781	839	622	622
q8	9346	1524	1582	1524
q9	5635	4481	4402	4402
q10	6787	1808	1523	1523
q11	515	363	344	344
q12	713	563	445	445
q13	18152	3377	2804	2804
q14	289	269	250	250
q15	q16	790	774	717	717
q17	1040	1041	942	942
q18	6974	5820	5558	5558
q19	1307	1372	1150	1150
q20	823	678	529	529
q21	6348	2982	2638	2638
q22	457	378	320	320
Total cold run time: 102521 ms
Total hot run time: 30057 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	5313	4923	5077	4923
q2	326	327	210	210
q3	5007	5324	4722	4722
q4	2110	2157	1388	1388
q5	4816	4921	4646	4646
q6	249	187	139	139
q7	1871	1798	1588	1588
q8	2442	2170	2205	2170
q9	7790	7488	7273	7273
q10	4712	4663	4178	4178
q11	538	404	382	382
q12	744	745	530	530
q13	3104	3456	2786	2786
q14	284	279	252	252
q15	q16	684	706	613	613
q17	1295	1281	1279	1279
q18	7440	7040	7004	7004
q19	1130	1090	1151	1090
q20	2226	2241	1956	1956
q21	5339	4702	4450	4450
q22	519	459	398	398
Total cold run time: 57939 ms
Total hot run time: 51977 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 175109 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1b4595d671c542f14a4368415874fe6ca79f3850, data reload: false

query5	4300	662	506	506
query6	514	220	218	218
query7	4921	638	351	351
query8	346	197	176	176
query9	8748	4129	4107	4107
query10	482	362	308	308
query11	5949	2466	2179	2179
query12	166	103	100	100
query13	1238	630	420	420
query14	6269	5465	5014	5014
query14_1	4335	4315	4291	4291
query15	223	202	183	183
query16	1005	456	437	437
query17	1137	736	556	556
query18	2430	485	331	331
query19	207	188	161	161
query20	112	111	102	102
query21	233	164	135	135
query22	13661	13630	13422	13422
query23	17612	16563	16181	16181
query23_1	16279	16294	16312	16294
query24	7667	1835	1316	1316
query24_1	1321	1298	1327	1298
query25	593	445	372	372
query26	1346	373	216	216
query27	2575	643	399	399
query28	4459	2005	2026	2005
query29	1095	631	486	486
query30	358	270	225	225
query31	1153	1109	985	985
query32	118	61	62	61
query33	522	318	258	258
query34	1183	1148	646	646
query35	818	833	675	675
query36	1361	1349	1209	1209
query37	172	107	90	90
query38	1898	1728	1651	1651
query39	923	920	903	903
query39_1	896	882	878	878
query40	286	167	139	139
query41	64	63	65	63
query42	93	94	93	93
query43	341	340	293	293
query44	1521	773	776	773
query45	216	200	193	193
query46	1097	1237	760	760
query47	2329	2310	2209	2209
query48	418	403	284	284
query49	580	457	322	322
query50	1089	427	336	336
query51	4439	4367	4339	4339
query52	86	92	79	79
query53	269	327	207	207
query54	290	231	207	207
query55	76	71	68	68
query56	286	299	284	284
query57	1451	1397	1297	1297
query58	311	260	259	259
query59	1650	1711	1451	1451
query60	303	271	250	250
query61	154	149	155	149
query62	735	685	570	570
query63	255	208	209	208
query64	2547	793	600	600
query65	4894	4799	4783	4783
query66	1857	537	392	392
query67	29796	29798	29599	29599
query68	3130	1694	1042	1042
query69	450	312	274	274
query70	1111	953	947	947
query71	395	332	318	318
query72	3064	2842	2529	2529
query73	842	816	446	446
query74	5174	5016	4755	4755
query75	2803	2714	2234	2234
query76	2311	1276	813	813
query77	383	416	297	297
query78	12674	12646	11923	11923
query79	1523	1189	787	787
query80	1323	568	507	507
query81	527	340	285	285
query82	684	166	131	131
query83	399	344	299	299
query84	335	170	178	170
query85	1016	605	509	509
query86	570	293	262	262
query87	1893	1838	1771	1771
query88	3772	2816	2812	2812
query89	471	408	348	348
query90	1956	210	196	196
query91	202	190	160	160
query92	62	64	57	57
query93	1729	1545	1008	1008
query94	847	363	310	310
query95	768	477	555	477
query96	1072	820	352	352
query97	2726	2681	2584	2584
query98	217	208	200	200
query99	1190	1169	1007	1007
Total cold run time: 261697 ms
Total hot run time: 175109 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 29.17 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1b4595d671c542f14a4368415874fe6ca79f3850, data reload: false

query1	0.01	0.01	0.01
query2	0.11	0.04	0.04
query3	0.24	0.14	0.14
query4	1.06	0.14	0.13
query5	0.28	0.23	0.22
query6	1.06	1.11	1.06
query7	0.03	0.01	0.01
query8	0.06	0.04	0.04
query9	0.40	0.31	0.32
query10	0.55	0.57	0.55
query11	0.20	0.14	0.14
query12	0.19	0.15	0.15
query13	0.49	0.49	0.48
query14	1.04	1.02	1.02
query15	0.73	0.71	0.68
query16	0.32	0.32	0.32
query17	1.16	1.18	1.10
query18	0.23	0.21	0.22
query19	2.04	1.90	1.95
query20	0.02	0.01	0.01
query21	11.36	0.28	0.13
query22	3.33	0.05	0.04
query23	11.93	0.40	0.12
query24	1.88	0.40	0.19
query25	0.08	0.04	0.04
query26	0.59	0.24	0.15
query27	0.03	0.03	0.03
query28	2.72	1.12	0.60
query29	9.23	4.74	3.44
query30	0.26	0.16	0.16
query31	1.85	0.69	0.32
query32	3.27	0.62	0.49
query33	3.21	3.24	3.19
query34	11.16	5.87	5.48
query35	5.50	5.49	5.47
query36	0.52	0.47	0.43
query37	0.10	0.07	0.06
query38	0.07	0.04	0.03
query39	0.05	0.03	0.03
query40	0.18	0.16	0.16
query41	0.09	0.03	0.03
query42	0.05	0.03	0.03
query43	0.05	0.04	0.03
Total cold run time: 77.73 s
Total hot run time: 29.17 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/40) 🎉
Increment coverage report
Complete coverage report

@JNSimba

JNSimba commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29836 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6dd0e273b05bbb6e0dc7d055688b7b43f5f81f23, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17681	4089	4103	4089
q2	2122	321	202	202
q3	10276	1396	822	822
q4	4685	468	345	345
q5	7526	890	578	578
q6	182	175	139	139
q7	813	855	633	633
q8	9602	1486	1630	1486
q9	6131	4434	4410	4410
q10	6919	1780	1532	1532
q11	519	356	325	325
q12	735	569	443	443
q13	18098	3311	2775	2775
q14	273	263	246	246
q15	q16	789	786	721	721
q17	961	981	989	981
q18	7025	5763	5716	5716
q19	1464	1270	1136	1136
q20	808	670	532	532
q21	5951	2747	2426	2426
q22	444	369	299	299
Total cold run time: 103004 ms
Total hot run time: 29836 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4434	4421	4401	4401
q2	292	326	216	216
q3	4587	4984	4454	4454
q4	2067	2152	1386	1386
q5	4473	4339	4325	4325
q6	236	175	130	130
q7	2165	1859	1615	1615
q8	2527	2151	2135	2135
q9	7966	7842	7893	7842
q10	4721	4727	4273	4273
q11	594	571	384	384
q12	771	776	557	557
q13	3290	3667	2979	2979
q14	310	302	289	289
q15	q16	734	759	676	676
q17	1391	1369	1355	1355
q18	8067	7416	6736	6736
q19	1160	1098	1144	1098
q20	2219	2237	1939	1939
q21	5285	4585	4432	4432
q22	530	476	411	411
Total cold run time: 57819 ms
Total hot run time: 51633 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 173741 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6dd0e273b05bbb6e0dc7d055688b7b43f5f81f23, data reload: false

query5	4333	648	504	504
query6	472	230	216	216
query7	4847	606	352	352
query8	345	196	180	180
query9	8797	4058	4057	4057
query10	480	350	314	314
query11	5926	2353	2134	2134
query12	158	105	105	105
query13	1280	602	442	442
query14	6301	5333	4993	4993
query14_1	4322	4310	4333	4310
query15	225	214	179	179
query16	973	457	452	452
query17	1112	714	608	608
query18	2415	468	340	340
query19	201	188	145	145
query20	108	111	105	105
query21	239	155	134	134
query22	13685	13645	13424	13424
query23	17423	16521	16145	16145
query23_1	16216	16252	16255	16252
query24	7643	1773	1303	1303
query24_1	1352	1317	1323	1317
query25	542	465	367	367
query26	1362	343	213	213
query27	2624	609	396	396
query28	4808	2058	2103	2058
query29	1083	640	521	521
query30	349	277	233	233
query31	1132	1100	1017	1017
query32	110	77	69	69
query33	552	340	260	260
query34	1207	1145	645	645
query35	802	794	696	696
query36	1398	1387	1271	1271
query37	154	108	95	95
query38	1888	1709	1649	1649
query39	927	925	905	905
query39_1	888	882	891	882
query40	253	172	146	146
query41	73	70	71	70
query42	99	98	94	94
query43	332	331	288	288
query44	1449	805	797	797
query45	213	190	186	186
query46	1163	1240	761	761
query47	2401	2375	2230	2230
query48	412	406	288	288
query49	614	427	332	332
query50	1034	432	335	335
query51	4419	4543	4293	4293
query52	92	91	79	79
query53	277	292	219	219
query54	298	254	226	226
query55	78	76	70	70
query56	311	321	324	321
query57	1460	1457	1339	1339
query58	332	271	264	264
query59	1645	1698	1458	1458
query60	318	283	275	275
query61	179	174	174	174
query62	712	638	596	596
query63	250	216	217	216
query64	2597	819	649	649
query65	4905	4830	4761	4761
query66	1814	510	390	390
query67	29635	29568	29449	29449
query68	3196	1567	936	936
query69	406	303	277	277
query70	1053	985	967	967
query71	359	319	308	308
query72	2880	2586	2276	2276
query73	831	778	421	421
query74	5158	4972	4758	4758
query75	2626	2592	2221	2221
query76	2321	1189	806	806
query77	360	381	290	290
query78	12422	12409	11805	11805
query79	1410	1219	761	761
query80	653	548	444	444
query81	456	331	284	284
query82	594	168	127	127
query83	410	329	302	302
query84	290	164	137	137
query85	921	630	515	515
query86	375	316	280	280
query87	1844	1830	1768	1768
query88	3782	2835	2822	2822
query89	454	405	365	365
query90	1956	201	194	194
query91	207	190	158	158
query92	64	65	57	57
query93	1629	1550	976	976
query94	555	373	325	325
query95	793	513	482	482
query96	1103	805	325	325
query97	2688	2688	2544	2544
query98	215	206	200	200
query99	1174	1180	1016	1016
Total cold run time: 259641 ms
Total hot run time: 173741 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.26 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6dd0e273b05bbb6e0dc7d055688b7b43f5f81f23, data reload: false

query1	0.01	0.00	0.00
query2	0.10	0.05	0.06
query3	0.26	0.14	0.13
query4	1.62	0.14	0.14
query5	0.24	0.26	0.22
query6	1.20	1.08	1.08
query7	0.04	0.00	0.00
query8	0.06	0.04	0.04
query9	0.39	0.31	0.33
query10	0.55	0.55	0.60
query11	0.21	0.15	0.14
query12	0.19	0.15	0.15
query13	0.48	0.47	0.47
query14	1.03	1.01	1.01
query15	0.61	0.60	0.59
query16	0.33	0.33	0.31
query17	1.13	1.11	1.13
query18	0.22	0.21	0.21
query19	2.01	1.94	1.92
query20	0.02	0.01	0.01
query21	15.43	0.23	0.13
query22	4.88	0.06	0.05
query23	16.13	0.31	0.13
query24	2.99	0.43	0.30
query25	0.12	0.07	0.04
query26	0.72	0.22	0.15
query27	0.05	0.03	0.05
query28	3.49	0.94	0.53
query29	12.49	4.27	3.46
query30	0.27	0.14	0.14
query31	2.77	0.59	0.32
query32	3.22	0.60	0.49
query33	3.23	3.23	3.21
query34	15.66	4.30	3.52
query35	3.51	3.53	3.53
query36	0.56	0.45	0.43
query37	0.08	0.06	0.06
query38	0.05	0.04	0.04
query39	0.04	0.03	0.02
query40	0.17	0.16	0.15
query41	0.09	0.03	0.03
query42	0.04	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 96.73 s
Total hot run time: 25.26 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/9) 🎉
Increment coverage report
Complete coverage report

@JNSimba

JNSimba commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

run external

@JNSimba

JNSimba commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

run cloud_p0

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/9) 🎉
Increment coverage report
Complete coverage report

@JNSimba

JNSimba commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

run external

@JNSimba

JNSimba commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review result: I did not find additional blocking issues in the current head.

Critical checkpoint conclusions:

  • Goal and coverage: the PR exposes include_delete_sign on cdc_stream, keeps the default schema unchanged, validates invalid values, and covers MySQL/PostgreSQL snapshot, insert, update, and delete-sign delete behavior.
  • Scope: the code change is focused on TVF schema exposure and targeted tests; it does not introduce persistence, edit-log, or FE-BE protocol changes.
  • Data correctness: CDC client rows already emit __DORIS_DELETE_SIGN__; the PR only exposes that field as an optional TVF column and the regression suites explicitly bind it into Unique Key MoW target tables.
  • Parallel paths: scheduled streaming tasks preserve the original TVF properties during JdbcTvfSourceOffsetProvider.rewriteTvfParams(), so include_delete_sign remains available after task-time offset rewrite.
  • Concurrency/lifecycle: no new shared mutable lifecycle or lock path was introduced.
  • Compatibility/config: default false preserves existing output schema; the new option is strict boolean validated and is intentionally removed from the HTTP payload because it is an FE schema switch.
  • Tests/results: FE unit coverage and MySQL/PostgreSQL regression suites with matching .out files are present. The prior inline thread about missing .out files is stale for this head.
  • Validation: local FE/regression execution was not possible because thirdparty/installed and thirdparty/installed/bin/protoc are missing in this worktree. GitHub currently reports FE UT, CheckStyle, P0 regression, non-concurrent regression, and related checks passing; external regression is still pending.
  • User focus: no additional user-provided focus was supplied.

Subagent conclusions:

  • optimizer-rewrite: no candidate became an inline comment; suspicious ALTER/JDBC TVF rewrite points were dismissed with code evidence.
  • tests-session-config: no candidate became an inline comment; the existing missing-output concern is addressed by the current .out files.
  • Final convergence round: both live subagents returned NO_NEW_VALUABLE_FINDINGS for the same current ledger/comment set after the proposed final comment set was empty.

@JNSimba

JNSimba commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

run external

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/100) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants