[improve](streaming-job) async chunk splitting for cdc source job by JNSimba · Pull Request #63079 · apache/doris

JNSimba · 2026-05-08T09:38:05Z

Summary

StreamingInsertJob (CDC FROM-TO and cdc_stream TVF paths) used to call splitChunks() synchronously inside CREATE STREAMING JOB, asking cdc_client to cut every chunk of every table before returning. On large/non-uniform PK tables this can take 30+ minutes — far beyond the BE→cdc_client BRPC 60s timeout, and the SQL client blocks the whole time.

This PR makes splitting tick-driven by the FE scheduler:

CREATE returns immediately; no more synchronous splitChunks().
Each scheduler tick advanceSplits() issues one short fetchSplits RPC (default batchSize=100) and pushes that batch into remainingSplits. Tasks dispatch as soon as the first batch lands, so end-to-end first-byte latency stays close to flink-cdc's.
cdc_client is stateless — every RPC reconstructs ChunkSplitter from the (currentSplittingTable, nextSplitStart, nextSplitId) triple supplied by FE; flink-cdc internals are untouched (uses the public ChunkSplitter API only).
Crash recovery uses three sources of truth:
- editlog persists committedSplitProgress (3-field SplitProgress) + existing chunkHighWatermarkMap / binlogOffsetPersist
- streaming_job_meta system table holds full chunk_list JSON per table (UPSERT each advanceSplits)
- cdc_client memory holds nothing
Both FROM-TO (multi-table) and TVF (single-table) paths share the same SourceOffsetProvider#initSplitProgress / noMoreSplits / advanceSplits interface; StreamingJobSchedulerTask.handlePendingState pre-advances one batch so the first task doesn't wait a full max_interval.

Detailed design lives in the linked plan.

Changes

fe-common: FetchTableSplitsRequest adds nextSplitStart (Object[]) / nextSplitId / batchSize.
fe-core:
- SourceOffsetProvider adds 3 default methods: initSplitProgress / advanceSplits / noMoreSplits.
- JdbcSourceOffsetProvider implements the async state machine (committed/cdc SplitProgress, advanceSplits, dedup, system-table UPSERT, replay path).
- JdbcTvfSourceOffsetProvider.initOnCreate no longer pre-splits; relies on the same scheduler tick path.
- StreamingInsertJob carries syncTables (@SerializedName("st")); initSourceJob / initInsertJob initialize SplitProgress; advanceSplitsIfNeed() mirrors fetchMeta error handling (PAUSE on failure).
- StreamingJobSchedulerTask.handlePendingState / handleRunningState call advanceSplitsIfNeed() each tick; PENDING handler pre-advances and short-circuits if PAUSED.
- StreamingJobUtils.upsertChunkList covers id-allocation via MAX(id)+1 lookup.
cdc_client/JdbcIncrementalSourceReader: getSourceSplits() rebuilt around the public ChunkSplitter API (no more in-memory loop / reflection hack).

Tests

SplitProgressTest — copy/null-field semantics.
JdbcSourceOffsetProviderAsyncSplitTest — covers advanceSplits (first call / continue same table / cross-table switch / dedup / empty batch), noMoreSplits, updateOffset committed-progress advancement (mid-chunk vs last chunk vs replay missing-split path), and computeCdcRemainingTables.
Regression case (separate commit, not in this PR yet): test_streaming_postgres_job_async_split.groovy — 100 rows × snapshot_split_size=5 → 20 splits across multiple ticks; asserts CREATE returns < 30s, full snapshot count + DISTINCT id, then INSERT/UPDATE/DELETE in binlog phase.

Test plan

mvn test -pl fe/fe-core -Dtest=JdbcSourceOffsetProviderAsyncSplitTest,SplitProgressTest
Run test_streaming_postgres_job_async_split regression locally
PG/MySQL non-uniform PK large-table manual test: confirm CREATE returns in seconds, SHOW STREAMING JOB immediately reflects the new job, snapshot completes, binlog phase healthy
FE restart mid-snapshot: confirm cdc-side resumes from system-table position, no duplicate / lost rows
cdc_client kill mid-snapshot: confirm FE retries on next tick, no duplicate / lost rows
cdc_stream TVF + StreamingInsertJob path: confirm CREATE no longer blocks

hello-stephen · 2026-05-08T09:38:11Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

Copilot

Pull request overview

This PR moves StreamingInsertJob (CDC FROM-TO and cdc_stream TVF) snapshot chunk splitting from a synchronous CREATE STREAMING JOB path to an incremental, scheduler-tick-driven flow. The goal is to avoid long blocking CREATE times and BRPC timeouts on large / skewed PK tables by fetching snapshot splits in small batches and persisting progress for recovery.

Changes:

Adds split-progress APIs to SourceOffsetProvider and implements an async split state machine in JdbcSourceOffsetProvider (plus new FE tests).
Introduces FetchTableSplitsRequest fields to drive stateless, resumable split generation (nextSplitStart/nextSplitId/batchSize) and rebuilds cdc_client split fetching around flink-cdc ChunkSplitter.
Persists per-table chunk lists incrementally via StreamingJobUtils.upsertChunkList, and advances splits each scheduler tick (including a pre-advance in PENDING).

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/reader/JdbcIncrementalSourceReader.java	Reworks `/api/fetchSplits` handling to drive flink-cdc `ChunkSplitter` directly (stateless batch split generation).
fe/fe-core/src/test/java/org/apache/doris/job/offset/jdbc/SplitProgressTest.java	Unit tests for `SplitProgress` default state and deep-copy semantics.
fe/fe-core/src/test/java/org/apache/doris/job/offset/jdbc/JdbcSourceOffsetProviderAsyncSplitTest.java	Unit tests covering async split advancement, dedup, noMoreSplits, and committed-progress advancement.
fe/fe-core/src/main/java/org/apache/doris/job/util/StreamingJobUtils.java	Adds per-table chunk_list UPSERT support with id reuse / allocation.
fe/fe-core/src/main/java/org/apache/doris/job/offset/SourceOffsetProvider.java	Adds default split-progress hooks (`initSplitProgress`, `advanceSplits`, `noMoreSplits`).
fe/fe-core/src/main/java/org/apache/doris/job/offset/jdbc/JdbcTvfSourceOffsetProvider.java	Removes create-time pre-splitting; re-init split progress on replay; relies on scheduler-driven split fetching.
fe/fe-core/src/main/java/org/apache/doris/job/offset/jdbc/JdbcSourceOffsetProvider.java	Implements async split progress, scheduler-driven split fetching, persistence to system table, and restart replay logic.
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingJobSchedulerTask.java	Calls `advanceSplitsIfNeed()` each tick and pre-advances once in PENDING before dispatch.
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertJob.java	Persists `syncTables`, initializes split progress on CREATE, and adds `advanceSplitsIfNeed()` that pauses job on failure.
fe/fe-common/src/main/java/org/apache/doris/job/cdc/request/FetchTableSplitsRequest.java	Adds `nextSplitStart`, `nextSplitId`, and `batchSize` fields to support resumable batched split fetching.

Comments suppressed due to low confidence (1)

fe/fe-core/src/main/java/org/apache/doris/job/offset/jdbc/JdbcTvfSourceOffsetProvider.java:320

replayIfNeed() comment still says snapshot splits in the meta table are "written by initOnCreate", but initOnCreate() is now an intentional no-op and meta writes come from scheduler-driven advanceSplits()/upsertChunkList. Updating this comment will avoid confusion when debugging TVF recovery behavior.

        // Re-init transient split progress fields lost across FE restart.
        // syncTables itself is persisted on StreamingInsertJob; cdcSplitProgress is rebuilt empty
        // here and advanceSplits will resume from the system table on next tick.
        if (cdcSplitProgress == null) {
            initSplitProgress(job.getSyncTables());
        }
        if (currentOffset == null) {
            // No committed txn yet. If snapshot splits exist in the meta table (written by
            // initOnCreate), restore remainingSplits so getNextOffset() returns snapshot splits
            // instead of a BinlogSplit (which would incorrectly skip the snapshot phase).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

JNSimba · 2026-05-11T06:06:36Z

/review

github-actions

I found one additional blocking issue beyond the existing review threads.

Critical checkpoint conclusions:

Goal/test coverage: the PR moves CDC chunk splitting from CREATE-time blocking to scheduler-driven async fetching and adds focused unit tests, but the TVF FE-restart path is not covered and can stop fetching the remaining chunks.
Scope: the change is focused on async split progress, though it touches both FROM-TO and TVF paths.
Concurrency: existing threads already cover the broad splitsLock/RPC concern; I did not add a duplicate. The new issue is lifecycle/replay state, not a new lock-order finding.
Lifecycle/replay: blocking issue found for cdc_stream TVF restart because transient cachedSyncTables is not restored.
Configuration/compatibility: no new config or wire-incompatible persistent format issue found beyond the added optional request fields.
Parallel paths: FROM-TO replay initializes split progress, but the TVF override does not do the equivalent unconditionally.
Testing: unit tests cover the base provider state machine, but miss TVF replay after FE restart. No end-to-end regression result is included in this PR.
Observability/performance/transactions: no additional distinct issue found beyond existing review threads.
User focus: no additional user-provided review focus was supplied.

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

JNSimba · 2026-05-12T11:07:50Z

run buildall

JNSimba · 2026-05-12T11:08:09Z

/review

github-actions

Reviewed the full PR diff and existing review threads. I found two additional correctness issues that are not duplicates of the existing comments.

Critical checkpoint conclusions:

Goal/test: The PR moves CDC snapshot split generation to scheduler-driven batches and adds tests for the base async state machine, but the cdc_stream TVF path still has an untested concurrent update path and the split-key type cache lacks collision coverage.
Scope/focus: The implementation is mostly focused, but it now routes both non-TVF and TVF CDC through shared async splitting state, which exposes subclass state-management gaps.
Concurrency: Blocking issue found. JdbcTvfSourceOffsetProvider.updateOffset() mutates inherited split lists/maps without splitsLock while scheduler-driven advanceSplits() and inherited readers use that lock.
Lifecycle/restart: Existing review threads already cover TVF restart/cache restoration concerns; I did not duplicate them.
Config: New CDC RPC/batch configs are present with descriptions and are read dynamically.
Compatibility/protocol: FE now sends batchSize and resume fields; existing threads already cover composite-boundary contract concerns.
Parallel paths: MySQL and generic JDBC split fetching were both updated; the cache-key collision issue affects the generic JDBC path and should be considered for MySQL if chunk key/database identity can vary.
Tests: Added tests cover base provider state, but no TVF concurrency/restart continuation test or cache-collision test covers the issues below.
Observability: Logs were added around split fetch and progress; no additional blocker found.
Persistence/transactions: Snapshot progress is persisted through the meta table/EditLog or txn attachment depending on path; existing unresolved threads cover restart truncation risks.
Performance: Existing review threads already cover long lock holds and repeated sorting; I did not duplicate them.

User focus: no additional user-provided review focus was supplied.

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

…k splitting

JNSimba · 2026-05-12T12:20:42Z

run buildall

hello-stephen · 2026-05-12T12:39:38Z

FE UT Coverage Report

Increment line coverage 42.86% (111/259) 🎉
Increment coverage report
Complete coverage report

… fix unevenly-cut resume

JNSimba · 2026-05-12T12:57:05Z

/review

github-actions

Automated review result: request changes. I did not add new inline comments because the blocking findings I confirmed are already covered by existing review threads and should not be duplicated.

Critical checkpoint conclusions:

Goal/test proof: The PR aims to move CDC snapshot split generation to scheduler-driven batches and add resume/restart coverage. Tests were added, but current code still has unresolved correctness and concurrency concerns already raised in inline threads, including snapshot truncation/races and split-fetch batching behavior.
Scope/focus: The feature is focused, but the change is broad across FE scheduling, persisted split metadata, CDC client split generation, and TVF recovery; several edge paths still need tightening before merge.
Concurrency: Not OK. Existing threads cover split-state access/locking issues and long blocking RPC/internal SQL work while holding splitsLock. These can delay task commits/scheduling or race with TVF updateOffset paths.
Lifecycle/restart: Not OK. Existing TVF restart comments cover mid-snapshot replay/resume hazards where transient split progress/table cache reconstruction can truncate snapshot progress after restored chunks are consumed.
Configuration: A new dynamic FE batch-size config is read at split-fetch time, which is appropriate, but existing comments note the batch contract/enforcement is still incomplete in CDC client paths.
Compatibility/storage format: No new external storage format incompatibility found beyond the new persisted split-progress fields, but replay behavior must be correct across FE restart before this is safe.
Parallel paths: Not fully OK. MySQL, generic JDBC/Postgres, non-TVF, and cdc_stream TVF paths all need consistent batch limits, startup-mode handling, and resume semantics; existing threads identify gaps.
Conditional checks/error handling: Some existing concerns remain around splitId parsing, composite split boundaries, SQL literal construction, and error clarity.
Test coverage/results: Regression and unit tests were added, but coverage does not yet prove the unresolved edge cases from existing threads are fixed; no tests were run by this review.
Observability: Logging was added for split fetch/progress, sufficient for basic diagnosis, but correctness blockers remain.
Transaction/persistence/data correctness: Not OK until the existing replay, upsert, and split-progress concerns are resolved; these can affect snapshot completeness/duplicates after restart or retries.
FE/BE variable passing: The new batchSize field is passed from FE now, but existing comments show downstream enforcement still needs work.
Performance: Existing comments cover heavy work under splitsLock and repeated full-list sorting/materialization; these are relevant for large split counts.

User focus: No additional user-provided review focus was supplied.

Existing inline review threads should be addressed rather than duplicated here, especially the split-state concurrency/locking, TVF restart recovery, SQL upsert safety, batch-size enforcement, split-key/composite-boundary handling, and split metadata determinism issues.

Copilot

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 5 comments.

…plits and sink resolveSplitKeyClass to PG reader

github-actions

I reviewed the current head against the existing review threads and did not add duplicate inline comments. I cannot approve this revision yet because the already-open review threads still cover blocking correctness/reliability issues in the async CDC split path, including restart/resume cursor reconstruction, empty/invalid split-batch handling, durable split-meta consistency, unsafe meta-table SQL/upsert semantics, batch-size enforcement, and regression-test races.

Checkpoint conclusions:

Goal/test proof: the PR aims to make CDC snapshot split generation asynchronous and adds unit/regression coverage, but the existing unresolved threads show important restart, resume, and batching cases are not yet proven safe.
Scope/clarity: the feature is focused, but it changes FE scheduling, durable meta, cdc_client split generation, and TVF replay paths together, increasing integration risk.
Concurrency: split state is now lock-protected in many FE paths, but existing threads identify remaining scheduler/commit/replay races and long-running state transitions that need resolution.
Lifecycle/restart: FE restart and checkpoint replay are central to this change; existing comments identify cases where restored split progress can truncate or duplicate snapshots.
Configuration: the new mutable split batch-size config still needs positive-value validation to avoid wedging split advancement.
Compatibility/storage: no storage-format compatibility issue was found, but the new internal meta-table persistence/upsert semantics remain a correctness concern.
Parallel code paths: both multi-table JDBC and cdc_stream TVF paths were reviewed; existing comments already cover distinct TVF and non-TVF recovery gaps.
Conditional checks/error handling: several new checks exist, but existing comments call out empty fetch results and malformed/invalid state needing fail-fast or bounded retry behavior.
Test coverage/results: coverage is broad, but existing test-race comments and missing targeted restart/error tests remain blockers.
Observability/performance: logs were added, but existing comments still cover scheduler overhead/lock and retry-observability concerns.
Data correctness: unresolved issues can lead to duplicate, skipped, or truncated snapshot chunks, so the core data-correctness checkpoint is not satisfied yet.

User focus: no additional user-provided review focus was specified.

hello-stephen · 2026-05-15T05:04:01Z

TPC-H: Total hot run time: 29397 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit aae059f3a34ec601fe1dcf27ece4c7f2afee36f9, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17862	3906	3806	3806
q2	q3	10725	864	593	593
q4	4664	464	344	344
q5	7452	1321	1128	1128
q6	181	164	138	138
q7	912	934	759	759
q8	9321	1408	1262	1262
q9	5605	5394	5294	5294
q10	6241	2072	1818	1818
q11	457	264	254	254
q12	667	422	295	295
q13	18146	3265	2741	2741
q14	295	280	264	264
q15	q16	890	870	793	793
q17	930	973	704	704
q18	6482	5662	5584	5584
q19	1332	1326	1042	1042
q20	555	399	264	264
q21	5165	2460	1967	1967
q22	472	383	347	347
Total cold run time: 98354 ms
Total hot run time: 29397 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4657	4610	4744	4610
q2	q3	4650	4801	4177	4177
q4	2122	2170	1397	1397
q5	4972	5040	5235	5040
q6	206	167	136	136
q7	2028	1784	1618	1618
q8	3336	3150	3156	3150
q9	8497	8528	8418	8418
q10	4504	4496	4368	4368
q11	585	418	399	399
q12	711	759	513	513
q13	3297	3716	2836	2836
q14	442	325	286	286
q15	q16	798	804	733	733
q17	1314	1320	1292	1292
q18	7934	7006	7190	7006
q19	1193	1147	1160	1147
q20	2228	2216	1966	1966
q21	6186	5591	4927	4927
q22	544	515	427	427
Total cold run time: 60204 ms
Total hot run time: 54446 ms

hello-stephen · 2026-05-15T05:14:55Z

TPC-DS: Total hot run time: 169833 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit aae059f3a34ec601fe1dcf27ece4c7f2afee36f9, data reload: false

query5	4308	669	528	528
query6	312	213	205	205
query7	4211	584	300	300
query8	319	232	216	216
query9	8813	4045	4059	4045
query10	451	345	303	303
query11	5792	2384	2194	2194
query12	180	128	125	125
query13	1299	632	415	415
query14	5998	5474	5046	5046
query14_1	4376	4353	4372	4353
query15	219	205	183	183
query16	1086	441	423	423
query17	1144	772	641	641
query18	2627	510	401	401
query19	216	204	156	156
query20	150	139	134	134
query21	222	144	122	122
query22	13583	13549	13450	13450
query23	17193	16369	16044	16044
query23_1	16182	16165	16135	16135
query24	7360	1750	1320	1320
query24_1	1342	1360	1371	1360
query25	561	486	432	432
query26	1298	313	170	170
query27	2724	612	331	331
query28	4428	1961	1966	1961
query29	978	622	523	523
query30	290	224	199	199
query31	1106	1072	925	925
query32	89	75	74	74
query33	534	344	286	286
query34	1183	1109	664	664
query35	754	783	683	683
query36	1308	1326	1124	1124
query37	155	106	96	96
query38	3241	3154	3060	3060
query39	917	914	894	894
query39_1	878	908	888	888
query40	237	154	135	135
query41	66	64	63	63
query42	111	108	109	108
query43	334	326	295	295
query44	
query45	210	201	192	192
query46	1061	1246	735	735
query47	2305	2275	2116	2116
query48	383	431	278	278
query49	633	517	435	435
query50	725	292	230	230
query51	4296	4263	4209	4209
query52	104	110	97	97
query53	256	292	210	210
query54	322	287	252	252
query55	96	92	85	85
query56	324	334	312	312
query57	1430	1390	1317	1317
query58	319	277	284	277
query59	1578	1632	1430	1430
query60	345	350	339	339
query61	186	179	177	177
query62	669	625	573	573
query63	245	208	216	208
query64	2505	888	757	757
query65	
query66	1732	530	414	414
query67	30063	29937	29319	29319
query68	
query69	477	359	311	311
query70	1093	1041	929	929
query71	297	275	271	271
query72	2825	2756	2378	2378
query73	852	805	441	441
query74	5081	4905	4744	4744
query75	2749	2654	2330	2330
query76	2308	1162	843	843
query77	434	421	343	343
query78	12962	12967	12416	12416
query79	1485	1024	734	734
query80	1359	584	492	492
query81	537	286	242	242
query82	1046	166	121	121
query83	372	280	255	255
query84	258	147	140	140
query85	1037	500	441	441
query86	454	338	353	338
query87	3435	3323	3216	3216
query88	3597	2678	2669	2669
query89	439	394	338	338
query90	1942	178	178	178
query91	175	161	141	141
query92	78	78	76	76
query93	1078	977	558	558
query94	728	337	292	292
query95	671	479	343	343
query96	1030	763	318	318
query97	2685	2687	2547	2547
query98	234	229	226	226
query99	1089	1137	991	991
Total cold run time: 254149 ms
Total hot run time: 169833 ms

hello-stephen · 2026-05-15T07:31:01Z

FE Regression Coverage Report

Increment line coverage 4.40% (14/318) 🎉
Increment coverage report
Complete coverage report

…ed to binlog

…F sourceProperties is restored

JNSimba · 2026-05-15T08:13:25Z

/review

JNSimba · 2026-05-15T08:14:08Z

run buildall

…start regression case

JNSimba · 2026-05-15T08:18:00Z

run buildall

github-actions

I cannot approve this PR yet. The current head still needs the existing blocking async CDC split review threads resolved, especially the snapshot replay/transition cases, non-progressing empty split batches, batch-size/config validation, and meta-table persistence/escaping/id-allocation concerns. I did not add duplicate inline comments because those issues are already present in the existing review context.

Critical checkpoint conclusions:

Goal/test proof: the goal is scheduler-driven async CDC snapshot split generation and recovery. The added tests cover several paths, but existing threads identify restart and scheduler interleaving cases that are not yet safely proven.
Scope/focus: the change is broad across FE scheduling, offset replay, cdc_client split generation, internal meta persistence, and regression tests; it is not a purely local change.
Concurrency: split state is now lock-protected in several places and long RPC/UPSERT work was moved outside the lock, but existing scheduler/task success interleavings and empty-batch progress behavior remain blocking concerns.
Lifecycle/replay: FE restart replay is the highest-risk part; existing comments describe cases where restored binlog/snapshot/meta state can re-cut, truncate, or duplicate snapshot work.
Configuration: the new mutable split batch size is still used as an RPC batch size and needs positive-value validation/clamping per the existing thread.
Compatibility: no storage-format compatibility issue was found, but FE/cdc_client request semantics around split boundaries and startup modes must stay consistent with existing comments.
Parallel paths: both normal JDBC and TVF paths were modified; existing comments already cover distinct replay/locking gaps across both.
Conditional checks: several new conditions are intended to distinguish snapshot vs binlog and complete vs mid-split states; the existing review threads show remaining edge cases requiring clearer handling.
Test coverage/results: tests were added, but existing comments call out races/coverage gaps around sampler assertions and restart timing.
Observability: logs were added around split progress; no additional observability-only blocker found.
Transaction/persistence: internal meta-table writes are part of durable split recovery; existing comments on ignored/unsafe meta persistence and SQL construction remain important.
Data correctness: unresolved issues can duplicate, skip, or truncate snapshot chunks after scheduler ticks or FE restart, so data correctness is not yet established.
Performance: long lock hold was improved, but existing comments still note batch-size enforcement and repeated full-list processing concerns.

User focus: no additional user-provided review focus was supplied.

hello-stephen · 2026-05-15T08:34:25Z

TPC-H: Total hot run time: 30810 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e78d8905bc1cd85e9f862530fe0713abbaf0cde2, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17890	3802	3778	3778
q2	q3	10757	1378	793	793
q4	4680	474	347	347
q5	7570	2231	2158	2158
q6	261	179	138	138
q7	934	763	624	624
q8	9345	1652	1565	1565
q9	6695	4902	4909	4902
q10	6457	2108	1775	1775
q11	461	268	245	245
q12	687	435	290	290
q13	18255	3432	2787	2787
q14	268	257	238	238
q15	q16	820	788	703	703
q17	915	973	1037	973
q18	7008	5688	5432	5432
q19	1206	1229	1078	1078
q20	529	396	257	257
q21	5849	2583	2432	2432
q22	448	359	295	295
Total cold run time: 101035 ms
Total hot run time: 30810 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4218	4143	4104	4104
q2	q3	4488	4904	4319	4319
q4	2118	2164	1389	1389
q5	4403	4229	4296	4229
q6	232	177	130	130
q7	2166	1968	1627	1627
q8	2525	2095	2022	2022
q9	7831	7703	7842	7703
q10	4551	4481	4102	4102
q11	591	400	374	374
q12	847	784	522	522
q13	3254	3572	3058	3058
q14	296	316	283	283
q15	q16	745	730	653	653
q17	1312	1311	1307	1307
q18	7857	7406	7053	7053
q19	1073	1065	1117	1065
q20	2193	2252	1927	1927
q21	5320	4628	4517	4517
q22	528	461	433	433
Total cold run time: 56548 ms
Total hot run time: 50817 ms

hello-stephen · 2026-05-15T08:45:21Z

TPC-DS: Total hot run time: 170989 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e78d8905bc1cd85e9f862530fe0713abbaf0cde2, data reload: false

query5	4351	655	510	510
query6	315	223	206	206
query7	4269	543	321	321
query8	327	230	213	213
query9	8828	3981	3984	3981
query10	450	343	309	309
query11	5775	2456	2228	2228
query12	184	128	127	127
query13	1336	609	430	430
query14	5949	5344	5096	5096
query14_1	4321	4321	4345	4321
query15	207	204	182	182
query16	1021	450	407	407
query17	1119	726	585	585
query18	2524	473	354	354
query19	213	195	155	155
query20	147	134	128	128
query21	217	140	115	115
query22	13838	13512	13465	13465
query23	17269	16359	15997	15997
query23_1	16205	16170	16146	16146
query24	7600	1747	1280	1280
query24_1	1315	1265	1298	1265
query25	565	498	399	399
query26	1308	317	174	174
query27	2714	545	361	361
query28	4462	1928	1940	1928
query29	993	642	489	489
query30	301	244	199	199
query31	1096	1064	950	950
query32	86	80	73	73
query33	533	344	291	291
query34	1205	1177	645	645
query35	777	802	666	666
query36	1324	1371	1233	1233
query37	165	112	93	93
query38	3265	3200	3143	3143
query39	981	988	933	933
query39_1	911	957	919	919
query40	246	146	130	130
query41	65	64	63	63
query42	115	116	111	111
query43	330	326	291	291
query44	
query45	216	198	193	193
query46	1079	1233	729	729
query47	2300	2296	2159	2159
query48	397	390	296	296
query49	636	492	385	385
query50	1039	342	251	251
query51	4268	4290	4275	4275
query52	108	104	96	96
query53	260	288	206	206
query54	325	275	271	271
query55	99	91	88	88
query56	327	314	326	314
query57	1418	1386	1307	1307
query58	316	287	271	271
query59	1555	1639	1416	1416
query60	331	334	315	315
query61	180	171	177	171
query62	674	649	569	569
query63	248	203	212	203
query64	2522	864	760	760
query65	
query66	1688	468	354	354
query67	30028	30081	30006	30006
query68	
query69	463	334	299	299
query70	1010	976	978	976
query71	314	272	271	271
query72	3008	2653	2416	2416
query73	839	745	417	417
query74	5088	4941	4703	4703
query75	2840	2629	2256	2256
query76	2314	1132	768	768
query77	403	410	342	342
query78	12084	12119	11608	11608
query79	1429	1044	742	742
query80	681	543	452	452
query81	454	277	239	239
query82	1228	157	120	120
query83	352	276	249	249
query84	252	143	113	113
query85	879	535	454	454
query86	410	341	338	338
query87	3467	3334	3197	3197
query88	3574	2664	2659	2659
query89	447	382	338	338
query90	1892	184	181	181
query91	179	175	138	138
query92	76	81	73	73
query93	1571	1461	838	838
query94	535	361	307	307
query95	653	478	348	348
query96	1115	813	331	331
query97	2735	2696	2571	2571
query98	243	237	230	230
query99	1133	1128	984	984
Total cold run time: 253609 ms
Total hot run time: 170989 ms

hello-stephen · 2026-05-15T08:58:29Z

TPC-H: Total hot run time: 31292 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2918b239599ffdc4159d6fe0a8e712a7be534731, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17653	4021	4020	4020
q2	q3	10772	1386	795	795
q4	4681	473	353	353
q5	7718	2256	2076	2076
q6	238	172	139	139
q7	951	766	626	626
q8	9368	1785	1582	1582
q9	5146	4966	4895	4895
q10	6370	2078	1767	1767
q11	427	280	241	241
q12	627	430	294	294
q13	18144	3343	2752	2752
q14	261	259	236	236
q15	q16	816	780	705	705
q17	932	977	866	866
q18	6968	5908	5590	5590
q19	1833	1352	1113	1113
q20	668	445	295	295
q21	6087	2850	2628	2628
q22	585	376	319	319
Total cold run time: 100245 ms
Total hot run time: 31292 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4870	4802	4709	4709
q2	q3	4830	5217	4585	4585
q4	2105	2207	1402	1402
q5	4903	4684	4655	4655
q6	240	177	127	127
q7	1863	1747	1515	1515
q8	2356	2109	2057	2057
q9	7702	7213	7254	7213
q10	4471	4391	3970	3970
q11	520	372	347	347
q12	705	727	509	509
q13	2994	3407	2769	2769
q14	264	277	255	255
q15	q16	669	693	607	607
q17	1262	1232	1236	1232
q18	7526	6788	6843	6788
q19	1120	1137	1110	1110
q20	2216	2215	1925	1925
q21	5282	4633	4517	4517
q22	528	445	402	402
Total cold run time: 56426 ms
Total hot run time: 50694 ms

hello-stephen · 2026-05-15T09:09:22Z

TPC-DS: Total hot run time: 168963 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2918b239599ffdc4159d6fe0a8e712a7be534731, data reload: false

query5	4331	666	513	513
query6	317	215	206	206
query7	4291	553	293	293
query8	336	229	222	222
query9	8838	4039	4065	4039
query10	440	338	313	313
query11	5764	2381	2212	2212
query12	186	133	128	128
query13	1296	599	444	444
query14	5973	5306	5013	5013
query14_1	4329	4322	4312	4312
query15	204	206	181	181
query16	1005	458	430	430
query17	1166	715	578	578
query18	2468	490	353	353
query19	222	200	169	169
query20	137	131	129	129
query21	218	138	114	114
query22	13635	13557	13368	13368
query23	17272	16471	16034	16034
query23_1	16171	16082	16174	16082
query24	7635	1767	1291	1291
query24_1	1320	1304	1305	1304
query25	580	511	442	442
query26	1315	331	177	177
query27	2712	550	342	342
query28	4463	1963	1952	1952
query29	1024	640	511	511
query30	325	241	205	205
query31	1117	1080	944	944
query32	91	77	76	76
query33	555	362	306	306
query34	1173	1152	647	647
query35	768	803	675	675
query36	1350	1318	1192	1192
query37	157	108	96	96
query38	3220	3128	3021	3021
query39	922	940	915	915
query39_1	889	894	884	884
query40	243	158	126	126
query41	73	73	68	68
query42	114	111	111	111
query43	325	324	287	287
query44	
query45	218	205	202	202
query46	1047	1185	737	737
query47	2357	2369	2200	2200
query48	412	422	300	300
query49	658	510	408	408
query50	970	358	252	252
query51	4294	4262	4218	4218
query52	107	106	96	96
query53	274	296	202	202
query54	330	284	286	284
query55	100	94	89	89
query56	303	318	331	318
query57	1431	1401	1291	1291
query58	303	284	285	284
query59	1638	1614	1422	1422
query60	340	375	303	303
query61	154	152	154	152
query62	669	622	558	558
query63	235	198	208	198
query64	2388	813	619	619
query65	
query66	1721	476	342	342
query67	30098	30018	29294	29294
query68	
query69	462	339	296	296
query70	1039	1032	987	987
query71	305	278	277	277
query72	2962	2687	2377	2377
query73	842	736	436	436
query74	5079	4945	4727	4727
query75	2665	2618	2236	2236
query76	2300	1114	766	766
query77	396	413	335	335
query78	12181	12142	11551	11551
query79	1473	1003	782	782
query80	661	537	450	450
query81	458	279	247	247
query82	1390	164	129	129
query83	362	275	254	254
query84	263	138	111	111
query85	868	580	446	446
query86	408	334	300	300
query87	3436	3376	3195	3195
query88	3503	2621	2614	2614
query89	442	389	335	335
query90	1995	183	181	181
query91	177	171	153	153
query92	79	76	75	75
query93	1468	1403	876	876
query94	556	348	315	315
query95	678	384	426	384
query96	1004	803	343	343
query97	2697	2663	2575	2575
query98	238	234	241	234
query99	1118	1109	962	962
Total cold run time: 253570 ms
Total hot run time: 168963 ms

hello-stephen · 2026-05-15T11:04:31Z

FE Regression Coverage Report

Increment line coverage 69.77% (247/354) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-05-15T11:31:28Z

FE Regression Coverage Report

Increment line coverage 69.77% (247/354) 🎉
Increment coverage report
Complete coverage report

JNSimba · 2026-05-15T12:31:42Z

/review

github-actions

I found one additional blocking issue beyond the existing review threads: async split resume casts the persisted split boundary with Jackson convertValue directly, which does not preserve the temporal conversion behavior already used when reconstructing snapshot splits for execution.

Critical checkpoint conclusions: the async split goal is mostly covered by the new scheduler-driven design and tests, but temporal split-key resume is not proven and can fail after the first batch; the change is focused but still touches high-risk split lifecycle paths; concurrency issues around split state already have existing review threads and I did not duplicate them; lifecycle/replay paths were reviewed, with this new issue affecting mid-table resume after FE/CDC JSON round-trip; the new mutable batch-size config is covered by an existing review thread; no storage-format compatibility issue was found in this pass; parallel MySQL and generic JDBC paths both need the same temporal-bound fix; regression coverage is broad but does not cover temporal chunk-key async resume; observability is adequate for this issue because the failure would surface as fetchSplits failure but the root cause would be hard to diagnose from tests; no BE transaction/version/delete-bitmap concerns apply; no additional user focus was provided.

…ounds to keep temporal types intact

JNSimba · 2026-05-15T14:49:56Z

/review

JNSimba · 2026-05-15T14:50:01Z

run buildall

github-actions

Summary: Requesting changes for one additional correctness issue found in the current head. The existing review threads already cover the other major async split concerns, including temporal boundary conversion, TVF replay cursor restoration, invalid batch size, long lock holds, meta-table SQL safety, and duplicate snapshot reuse after a committed snapshot offset.

Critical checkpoint conclusions:

Goal/test coverage: The PR aims to make CDC snapshot splitting asynchronous and adds regression/unit coverage, but the empty-first-batch path can still start binlog consumption before snapshot splits exist, so the goal is not fully met.
Scope/focus: The change is focused on async CDC split state, but the new scheduler/provider interaction needs one more guard.
Concurrency: The PR improves several split-list locks; no new distinct lock-order issue was found beyond existing review threads.
Lifecycle/restart: Restart handling is heavily touched; no additional restart issue beyond the existing threads and the empty-first-batch dispatch issue was identified.
Configuration: The dynamic batch-size validation issue is already covered by an existing thread; no additional config issue found.
Compatibility/protocol: No additional FE-BE protocol compatibility issue found beyond existing split boundary contract threads.
Parallel paths: MySQL/JDBC/TVF paths were checked; no additional uncopied path issue found beyond existing threads.
Tests/results: Tests were reviewed at code level; no additional test-only blocker beyond existing sampler-race threads.
Observability/performance: Existing threads already cover the main performance/observability risks; no additional blocking issue found.
Data correctness: The new issue can skip the initial snapshot and lose/duplicate CDC data when the first async split fetch returns no splits.

User focus: No additional user-provided review focus was supplied.

github-actions · 2026-05-15T14:56:22Z

+            } else {
+                // only binlog
+                return currentOffset == null
+                        ? new JdbcOffset(Collections.singletonList(new BinlogSplit())) : currentOffset;


This fallback can still start a binlog task before any snapshot split has been produced. In the first PENDING scheduler tick for an initial/snapshot job, advanceSplitsIfNeed() may call advanceSplits(), which sets cdcSplitProgress.currentSplittingTable and then returns if /api/fetchSplits yields an empty batch. handlePendingState() still creates the first streaming task, and because currentOffset == null and remainingSplits is empty, this branch returns an empty BinlogSplit. That task can then read from the binlog without the initial snapshot ever being scheduled. This is distinct from the already-raised post-commit reuse issue: it happens before the first snapshot task exists. Please make snapshot modes return a non-consumable/defer state (or have the scheduler skip task creation) while checkNeedSplitChunks(sourceProperties) is true, remainingSplits is empty, and noMoreSplits() is false.

hello-stephen · 2026-05-15T15:18:28Z

TPC-H: Total hot run time: 31183 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 58ebbd61708f63480c6839fd9662d1c747f7fa06, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17799	3923	3833	3833
q2	q3	10748	1386	787	787
q4	4682	469	342	342
q5	7600	2226	2096	2096
q6	331	177	141	141
q7	945	764	625	625
q8	9345	1743	1565	1565
q9	6968	4868	5037	4868
q10	6452	2079	1784	1784
q11	430	271	240	240
q12	692	424	302	302
q13	18194	3403	2783	2783
q14	261	255	232	232
q15	q16	814	785	699	699
q17	848	901	852	852
q18	6775	5829	5552	5552
q19	1216	1218	1189	1189
q20	538	425	279	279
q21	6029	2774	2696	2696
q22	453	366	318	318
Total cold run time: 101120 ms
Total hot run time: 31183 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4608	4485	4669	4485
q2	q3	4784	5184	4585	4585
q4	2138	2342	1391	1391
q5	4844	4718	4579	4579
q6	232	189	133	133
q7	1857	1746	1470	1470
q8	2378	1878	1897	1878
q9	7167	7194	7040	7040
q10	4447	4367	3966	3966
q11	518	375	346	346
q12	700	713	508	508
q13	2963	3420	2801	2801
q14	272	278	249	249
q15	q16	676	737	604	604
q17	1250	1217	1211	1211
q18	7306	6906	6637	6637
q19	1109	1071	1110	1071
q20	2201	2196	1931	1931
q21	5255	4564	4431	4431
q22	519	468	451	451
Total cold run time: 55224 ms
Total hot run time: 49767 ms

hello-stephen · 2026-05-15T15:29:29Z

TPC-DS: Total hot run time: 169024 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 58ebbd61708f63480c6839fd9662d1c747f7fa06, data reload: false

query5	4330	640	517	517
query6	325	216	200	200
query7	4264	526	299	299
query8	321	228	230	228
query9	8834	3959	3959	3959
query10	435	363	296	296
query11	5782	2424	2164	2164
query12	194	126	123	123
query13	1291	629	444	444
query14	5980	5337	5001	5001
query14_1	4326	4316	4286	4286
query15	209	198	181	181
query16	1007	448	411	411
query17	1135	720	589	589
query18	2453	484	346	346
query19	223	203	158	158
query20	138	134	130	130
query21	213	137	120	120
query22	13655	13621	13355	13355
query23	17218	16274	16060	16060
query23_1	16121	16133	16187	16133
query24	7521	1730	1317	1317
query24_1	1322	1305	1321	1305
query25	592	493	444	444
query26	1313	329	176	176
query27	2695	562	343	343
query28	4496	1967	1942	1942
query29	979	641	508	508
query30	312	246	202	202
query31	1114	1081	931	931
query32	87	75	71	71
query33	547	360	301	301
query34	1181	1134	657	657
query35	758	785	687	687
query36	1308	1312	1196	1196
query37	157	106	90	90
query38	3206	3186	3063	3063
query39	922	912	885	885
query39_1	879	881	891	881
query40	242	151	129	129
query41	73	68	67	67
query42	112	114	111	111
query43	317	324	286	286
query44	
query45	212	211	214	211
query46	1140	1177	725	725
query47	2262	2329	2176	2176
query48	410	415	292	292
query49	648	498	405	405
query50	963	344	255	255
query51	4319	4275	4229	4229
query52	105	105	99	99
query53	258	295	213	213
query54	342	284	274	274
query55	94	90	88	88
query56	310	324	316	316
query57	1407	1362	1332	1332
query58	306	289	279	279
query59	1587	1610	1414	1414
query60	327	326	317	317
query61	179	202	159	159
query62	662	616	560	560
query63	249	201	204	201
query64	2405	795	637	637
query65	
query66	1757	502	350	350
query67	30054	29980	29852	29852
query68	
query69	471	335	321	321
query70	976	1005	896	896
query71	313	276	270	270
query72	2991	2874	2361	2361
query73	846	794	430	430
query74	5076	4926	4727	4727
query75	2655	2586	2262	2262
query76	2345	1115	766	766
query77	403	403	326	326
query78	12029	12123	11684	11684
query79	1426	1014	738	738
query80	660	530	458	458
query81	452	275	251	251
query82	1382	164	126	126
query83	373	279	254	254
query84	250	142	112	112
query85	880	531	483	483
query86	384	322	330	322
query87	3396	3328	3218	3218
query88	3511	2654	2670	2654
query89	441	396	335	335
query90	1958	182	177	177
query91	178	182	142	142
query92	75	79	72	72
query93	1502	1442	855	855
query94	547	349	291	291
query95	683	376	444	376
query96	1112	757	345	345
query97	2713	2682	2555	2555
query98	252	226	227	226
query99	1120	1085	965	965
Total cold run time: 252545 ms
Total hot run time: 169024 ms

hello-stephen · 2026-05-15T16:09:46Z

FE UT Coverage Report

Increment line coverage 39.56% (127/321) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-05-15T17:46:24Z

FE Regression Coverage Report

Increment line coverage 2.49% (8/321) 🎉
Increment coverage report
Complete coverage report

JNSimba · 2026-05-15T23:08:46Z

run external

hello-stephen · 2026-05-16T01:07:44Z

FE Regression Coverage Report

Increment line coverage 2.49% (8/321) 🎉
Increment coverage report
Complete coverage report

[feature](streaming-job) async chunk splitting for StreamingInsertJob

d3ade29

JNSimba changed the title ~~[feature](streaming-job) async chunk splitting for StreamingInsertJob~~ [improve](streaming-job) async chunk splitting for StreamingInsertJob May 8, 2026

JNSimba requested a review from Copilot May 11, 2026 03:56

Copilot started reviewing on behalf of JNSimba May 11, 2026 03:57 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

JNSimba requested a review from Copilot May 11, 2026 06:06

Copilot started reviewing on behalf of JNSimba May 11, 2026 06:07 View session

github-actions Bot requested changes May 11, 2026

View reviewed changes

Comment thread fe/fe-core/src/main/java/org/apache/doris/job/offset/jdbc/JdbcTvfSourceOffsetProvider.java Outdated

Copilot AI reviewed May 11, 2026

View reviewed changes

[fix](streaming-job) various fixes for async chunk splitting

ebc5ab1

JNSimba marked this pull request as ready for review May 12, 2026 11:07

github-actions Bot requested changes May 12, 2026

View reviewed changes

Comment thread ...-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertJob.java

Comment thread ...ient/src/main/java/org/apache/doris/cdcclient/source/reader/JdbcIncrementalSourceReader.java Outdated

JNSimba requested a review from Copilot May 12, 2026 11:22

Copilot started reviewing on behalf of JNSimba May 12, 2026 11:23 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

[fix](streaming-job) review fixes and regression cases for async chun…

109d22c

…k splitting

[fix](streaming-job) cast split key by JDBC driver's natural class to…

b7c5368

… fix unevenly-cut resume

JNSimba requested a review from Copilot May 12, 2026 12:56

Copilot started reviewing on behalf of JNSimba May 12, 2026 12:59 View session

github-actions Bot requested changes May 12, 2026

View reviewed changes

Copilot AI reviewed May 12, 2026

View reviewed changes

[opt](streaming-job) release splitsLock during RPC/UPSERT in advanceS…

8b89cde

…plits and sink resolveSplitKeyClass to PG reader

JNSimba added the dev/4.1.x label May 13, 2026

github-actions Bot requested changes May 15, 2026

View reviewed changes

JNSimba added 2 commits May 15, 2026 16:10

[fix](streaming-job) skip advanceSplits once currentOffset transition…

9be5001

…ed to binlog

[fix](streaming-job) call ensureInitialized before replayIfNeed so TV…

e78d890

…F sourceProperties is restored

[test](streaming-job) raise PG replication slot cap and add binlog-re…

2918b23

…start regression case

github-actions Bot requested changes May 15, 2026

View reviewed changes

JNSimba changed the title ~~[improve](streaming-job) async chunk splitting for StreamingInsertJob~~ [improve](streaming-job) async chunk splitting for cdc source job May 15, 2026

github-actions Bot requested changes May 15, 2026

View reviewed changes

Comment thread ...ient/src/main/java/org/apache/doris/cdcclient/source/reader/JdbcIncrementalSourceReader.java

Comment thread ...c_client/src/main/java/org/apache/doris/cdcclient/source/reader/mysql/MySqlSourceReader.java

[fix](streaming-job) convert async split resume boundary via convertB…

58ebbd6

…ounds to keep temporal types intact

github-actions Bot requested changes May 15, 2026

View reviewed changes

Conversation

JNSimba commented May 8, 2026

Summary

Changes

Tests

Test plan

Uh oh!

hello-stephen commented May 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JNSimba commented May 11, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JNSimba commented May 12, 2026

Uh oh!

JNSimba commented May 12, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JNSimba commented May 12, 2026

Uh oh!

hello-stephen commented May 12, 2026

FE UT Coverage Report

Uh oh!

JNSimba commented May 12, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

hello-stephen commented May 15, 2026