Skip to content

Conversation

@JNSimba
Copy link
Member

@JNSimba JNSimba commented Dec 10, 2025

What problem does this PR solve?

Issue Number: close #58896

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Dec 10, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@JNSimba JNSimba changed the title [Proposal] Extend streaming job to support MySQL synchronization [Feature] Extend streaming job to support MySQL synchronization Dec 10, 2025
@JNSimba JNSimba changed the title [Feature] Extend streaming job to support MySQL synchronization [Feature](Streaming Job) Extend streaming job to support MySQL synchronization Dec 10, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends streaming jobs to support MySQL synchronization via CDC (Change Data Capture), enabling users to sync data from MySQL databases to Doris in real-time. The implementation includes a new CDC client service and modifications to the streaming job framework.

Key Changes:

  • Introduces a CDC client Spring Boot application that interfaces with MySQL using Flink CDC connectors
  • Adds support for FROM MySQL TO Database syntax in job creation
  • Implements split-based data reading for both snapshot and binlog phases
  • Adds RPC endpoints for BE-FE communication to handle CDC operations

Reviewed changes

Copilot reviewed 85 out of 85 changed files in this pull request and generated no comments.

Show a summary per file
File Description
regression-test/suites/job_p0/streaming_job/cdc/test_streaming_mysql_job.groovy Regression test for MySQL streaming job with CDC
gensrc/proto/internal_service.proto Adds RPC interface for CDC client communication
fs_brokers/cdc_client/** Complete CDC client implementation using Spring Boot
fe/fe-core/.../streaming/** Extends streaming job framework with multi-table task support
fe/fe-core/.../offset/jdbc/** JDBC offset provider for tracking MySQL binlog positions
fe/fe-core/.../util/StreamingJobUtils.java Utility functions for streaming job management
docker/thirdparties/docker-compose/mysql/my.cnf Enables MySQL binlog for CDC

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hello-stephen

This comment was marked as outdated.

@JNSimba
Copy link
Member Author

JNSimba commented Dec 21, 2025

run feut

1 similar comment
@JNSimba
Copy link
Member Author

JNSimba commented Dec 21, 2025

run feut

@JNSimba
Copy link
Member Author

JNSimba commented Dec 21, 2025

run check_coverage

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 34.13% (57/167) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 58.22% (20146/34603)
Line Coverage 43.94% (195530/444943)
Region Coverage 38.36% (155381/405063)
Branch Coverage 39.18% (66146/168812)

@JNSimba
Copy link
Member Author

JNSimba commented Dec 21, 2025

run buildall

1 similar comment
@JNSimba
Copy link
Member Author

JNSimba commented Dec 21, 2025

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.67% (1760/2209)
Line Coverage 65.37% (30949/47346)
Region Coverage 66.01% (15425/23368)
Branch Coverage 56.49% (8204/14522)

@doris-robot
Copy link

TPC-H: Total hot run time: 35025 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 32e4834b31a5fa692fed1ca6e2a78baa20496ff0, data reload: false

------ Round 1 ----------------------------------
q1	17602	4274	4074	4074
q2	2012	364	229	229
q3	10198	1378	738	738
q4	10294	837	312	312
q5	8857	2146	1935	1935
q6	239	172	137	137
q7	1005	842	703	703
q8	9359	1469	1227	1227
q9	7281	5360	5384	5360
q10	6893	2382	1982	1982
q11	540	331	295	295
q12	716	736	578	578
q13	17795	3721	3038	3038
q14	288	294	276	276
q15	605	514	525	514
q16	704	690	616	616
q17	720	770	575	575
q18	8190	7086	7059	7059
q19	1501	973	622	622
q20	408	363	240	240
q21	4251	3947	3549	3549
q22	1051	1011	966	966
Total cold run time: 110509 ms
Total hot run time: 35025 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4240	4083	4087	4083
q2	337	410	309	309
q3	2158	2722	2237	2237
q4	1341	1768	1314	1314
q5	4499	4637	4744	4637
q6	224	181	136	136
q7	2088	1986	1786	1786
q8	2642	2485	2521	2485
q9	7678	7559	7522	7522
q10	3096	3232	2844	2844
q11	595	524	576	524
q12	711	760	615	615
q13	3547	4024	3311	3311
q14	344	287	307	287
q15	558	517	546	517
q16	676	707	630	630
q17	1250	1467	1375	1375
q18	7836	7652	7613	7613
q19	818	795	808	795
q20	1885	1964	1795	1795
q21	4692	4326	4195	4195
q22	1105	1028	961	961
Total cold run time: 52320 ms
Total hot run time: 49971 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 178510 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 32e4834b31a5fa692fed1ca6e2a78baa20496ff0, data reload: false

query5	4512	576	429	429
query6	323	218	236	218
query7	4225	470	283	283
query8	321	281	261	261
query9	8790	2523	2517	2517
query10	531	377	325	325
query11	15214	15121	14868	14868
query12	174	121	119	119
query13	1272	518	388	388
query14	6145	3076	2767	2767
query14_1	2653	2666	2642	2642
query15	210	202	175	175
query16	790	468	468	468
query17	1154	710	609	609
query18	2508	450	358	358
query19	238	236	212	212
query20	125	117	112	112
query21	225	144	115	115
query22	4037	3982	3898	3898
query23	16659	16217	15931	15931
query23_1	16076	16086	15913	15913
query24	7311	1670	1207	1207
query24_1	1277	1252	1239	1239
query25	580	494	446	446
query26	1255	271	165	165
query27	2744	459	310	310
query28	4500	2121	2114	2114
query29	824	559	466	466
query30	312	251	228	228
query31	862	708	616	616
query32	79	73	70	70
query33	550	350	299	299
query34	896	911	543	543
query35	803	811	738	738
query36	870	967	844	844
query37	128	91	75	75
query38	2788	2854	2823	2823
query39	757	737	699	699
query39_1	706	713	693	693
query40	218	135	120	120
query41	71	64	63	63
query42	111	102	108	102
query43	428	449	385	385
query44	1324	748	736	736
query45	189	193	186	186
query46	868	977	606	606
query47	1679	1743	1624	1624
query48	320	325	238	238
query49	619	431	340	340
query50	666	295	216	216
query51	3826	3767	3810	3767
query52	104	108	96	96
query53	333	357	288	288
query54	276	247	242	242
query55	78	74	70	70
query56	298	299	290	290
query57	1154	1123	1105	1105
query58	265	252	245	245
query59	2513	2503	2397	2397
query60	314	304	290	290
query61	194	158	153	153
query62	689	674	625	625
query63	325	291	307	291
query64	4903	1289	988	988
query65	4003	3959	3949	3949
query66	1435	423	312	312
query67	15398	14904	14938	14904
query68	7250	983	718	718
query69	499	349	306	306
query70	1070	970	947	947
query71	373	297	274	274
query72	6024	4913	5008	4913
query73	688	595	307	307
query74	8960	8772	8817	8772
query75	3203	3133	2786	2786
query76	3913	1116	727	727
query77	512	384	290	290
query78	9504	9552	8888	8888
query79	1705	873	616	616
query80	931	681	555	555
query81	575	275	235	235
query82	398	129	100	100
query83	260	249	239	239
query84	258	123	103	103
query85	918	499	464	464
query86	425	309	288	288
query87	3076	3063	2998	2998
query88	3300	2297	2311	2297
query89	468	415	392	392
query90	2102	164	155	155
query91	169	166	148	148
query92	77	69	65	65
query93	1500	903	560	560
query94	528	303	280	280
query95	581	332	296	296
query96	604	472	208	208
query97	2243	2319	2227	2227
query98	207	196	192	192
query99	1257	1278	1231	1231
Total cold run time: 259409 ms
Total hot run time: 178510 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.35 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 32e4834b31a5fa692fed1ca6e2a78baa20496ff0, data reload: false

query1	0.06	0.04	0.04
query2	0.14	0.07	0.07
query3	0.33	0.08	0.09
query4	1.60	0.10	0.10
query5	0.28	0.25	0.25
query6	1.18	0.66	0.64
query7	0.03	0.03	0.03
query8	0.07	0.06	0.07
query9	0.59	0.52	0.51
query10	0.56	0.57	0.56
query11	0.26	0.13	0.13
query12	0.26	0.14	0.14
query13	0.63	0.63	0.61
query14	1.00	0.99	1.00
query15	0.88	0.81	0.83
query16	0.40	0.39	0.39
query17	1.05	1.02	1.04
query18	0.25	0.23	0.22
query19	1.87	1.85	1.88
query20	0.02	0.01	0.02
query21	15.39	0.28	0.24
query22	4.99	0.10	0.09
query23	15.43	0.40	0.22
query24	2.37	0.47	0.29
query25	0.10	0.09	0.10
query26	0.18	0.16	0.17
query27	0.10	0.09	0.09
query28	3.78	1.36	1.16
query29	12.54	4.15	3.28
query30	0.33	0.13	0.11
query31	2.80	0.67	0.43
query32	3.23	0.59	0.49
query33	3.05	3.01	3.04
query34	17.00	5.14	4.58
query35	4.63	4.66	4.63
query36	0.63	0.50	0.48
query37	0.24	0.08	0.08
query38	0.20	0.05	0.06
query39	0.07	0.05	0.05
query40	0.20	0.17	0.14
query41	0.12	0.07	0.07
query42	0.08	0.05	0.05
query43	0.06	0.05	0.05
Total cold run time: 98.98 s
Total hot run time: 28.35 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 88.60% (101/114) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.46% (18927/35401)
Line Coverage 39.32% (175449/446220)
Region Coverage 33.86% (135744/400871)
Branch Coverage 34.77% (58495/168234)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 34.73% (58/167) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 58.22% (20146/34603)
Line Coverage 43.94% (195517/444943)
Region Coverage 38.36% (155396/405063)
Branch Coverage 39.18% (66142/168812)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 79.04% (132/167) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 64.36% (22269/34603)
Line Coverage 50.78% (225944/444943)
Region Coverage 45.63% (184836/405063)
Branch Coverage 46.88% (79134/168812)

1 similar comment
@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 79.04% (132/167) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 64.36% (22269/34603)
Line Coverage 50.78% (225944/444943)
Region Coverage 45.63% (184836/405063)
Branch Coverage 46.88% (79134/168812)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 69.51% (766/1102) 🎉
Increment coverage report
Complete coverage report

1 similar comment
@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 69.51% (766/1102) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 79.04% (132/167) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 64.38% (22276/34603)
Line Coverage 50.79% (225990/444943)
Region Coverage 45.64% (184889/405063)
Branch Coverage 46.89% (79158/168812)

@JNSimba
Copy link
Member Author

JNSimba commented Dec 21, 2025

run external

1 similar comment
@JNSimba
Copy link
Member Author

JNSimba commented Dec 22, 2025

run external

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 22, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@JNSimba JNSimba merged commit b7fb2bb into apache:master Dec 22, 2025
27 of 29 checks passed
github-actions bot pushed a commit that referenced this pull request Dec 22, 2025
…onization (#58898)

### What problem does this PR solve?

Issue Number: close #58896
JNSimba added a commit that referenced this pull request Dec 23, 2025
…onization (#58898)

### What problem does this PR solve?

Issue Number: close #58896
yiguolei pushed a commit that referenced this pull request Dec 26, 2025
…MySQL synchronization #58898 (#59228)

Cherry-picked from #58898

Co-authored-by: wudi <wudi@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal] Extend streaming job to support MySQL synchronization

9 participants