Skip to content

[refactor](fe) Add CatalogProvider SPI framework and migrate ES as pilot#61604

Open
morningman wants to merge 18 commits intoapache:masterfrom
morningman:wt-catalog-spi
Open

[refactor](fe) Add CatalogProvider SPI framework and migrate ES as pilot#61604
morningman wants to merge 18 commits intoapache:masterfrom
morningman:wt-catalog-spi

Conversation

@morningman
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary: This PR introduces a Service Provider Interface (SPI) framework for external datasource catalogs, enabling dynamic loading and ClassLoader isolation for catalog plugins. ES (Elasticsearch) is migrated as the first pilot.

Phase 1 — SPI Framework:

  • CatalogProvider SPI interface with methods for catalog lifecycle management
  • CatalogProviderRegistry thread-safe provider registry
  • CatalogPluginLoader with URLClassLoader isolation for plugin JARs
  • FE startup integration: loadPlugins() called before loadImage()

Phase 2 — ES Migration (Pilot):

  • EsCatalogProvider implementing CatalogProvider SPI
  • CatalogFactory: SPI provider lookup before switch-case fallback
  • ExternalCatalog.buildDbForInit: SPI provider lookup before switch-case fallback
  • ExternalCatalog: Three abstract methods converted to concrete SPI-delegating defaults (initLocalObjectsImpl, listTableNamesFromRemote, tableExist) with transient provider field
  • GsonUtils: ES types changed from registerSubtype to registerCompatibleSubtype for plugin-agnostic persistence
  • PhysicalPlanTranslator.visitPhysicalEsScan: SPI-based ScanNode creation with fallback
  • fe-catalogs/catalog-es/ Maven module with shade plugin for self-contained plugin JAR
  • ES source files migrated to independent module

Key Design Decisions:

  • Plugin-agnostic persistence: All external catalogs serialize/deserialize as base types. Old EditLog entries (e.g. "clazz":"EsExternalCatalog") are handled via registerCompatibleSubtype
  • Lazy initialization: Plugins only loaded when catalog is first accessed via makeSureInitialized()
  • Graceful degradation: Missing plugin marks catalog as unavailable without blocking FE startup
  • ClassLoader isolation: Each plugin gets isolated URLClassLoader preventing dependency conflicts

Release note

None

Check List (For Author)

  • Test: No need to test - structural refactoring with backward-compatible SPI fallback
  • Behavior changed: No
  • Does this need documentation: No

…rce plugin support

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary: External data source code (Iceberg, Paimon, Hive, ES, etc.) is
tightly coupled into fe-core via hardcoded switch-case and instanceof chains.
This makes it impossible to add new data sources without modifying core code.

This commit adds the SPI framework infrastructure:
- CatalogProvider: SPI interface for external datasource plugins
- CatalogProviderRegistry: thread-safe type-to-provider mapping
- CatalogPluginLoader: plugin discovery with ClassLoader isolation
- Env.java: load plugins before EditLog replay

### Release note

None

### Check List (For Author)

- Test: No need to test - framework only, no behavioral changes
- Behavior changed: No
- Does this need documentation: No
…Factory and ExternalCatalog

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary: This is Phase 2 of the external datasource SPI refactoring.
It implements the first CatalogProvider (ES) and wires the SPI lookup into
CatalogFactory and ExternalCatalog.buildDbForInit, with fallback to the existing
hardcoded switch-case for non-migrated datasources.

Changes:
- Add createCatalog() to CatalogProvider SPI interface
- Implement EsCatalogProvider with all SPI methods
- Add META-INF/services registration for ServiceLoader discovery
- CatalogFactory: try SPI provider before switch-case fallback
- ExternalCatalog.buildDbForInit: try SPI provider before switch-case fallback

### Release note

None

### Check List (For Author)

- Test: No need to test - SPI wiring with fallback, no behavioral change
- Behavior changed: No
- Does this need documentation: No
… GsonUtils, PhysicalPlanTranslator, Maven module

### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary: Completes the ES datasource SPI migration (Phase 2) by:

1. ExternalCatalog: Convert 3 abstract methods (initLocalObjectsImpl,
   listTableNamesFromRemote, tableExist) to concrete SPI-delegating defaults.
   Add transient CatalogProvider field with auto-resolution in initLocalObjects().
   Subclasses still override for backward compatibility.

2. GsonUtils: Change ES registerSubtype to registerCompatibleSubtype for all 3
   type adapter factories (Catalog/Database/Table). Old "EsExternalCatalog" JSON
   now deserializes to ExternalCatalog. Add ExternalCatalog as registered subtype
   for new serialization.

3. PhysicalPlanTranslator: visitPhysicalEsScan now uses CatalogProvider SPI to
   create ScanNode, with fallback to direct EsScanNode for backward compat.

4. Maven module: Create fe-catalogs/catalog-es/ with pom.xml (provided fe-core
   dependency, shade plugin for fat JAR). Register as module in parent pom.xml.

### Release note

None

### Check List (For Author)

- Test: No need to test - structural refactoring with SPI fallback, no behavioral change
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Problem Summary: Migrates ES datasource code to the independent
fe-catalogs/catalog-es Maven module:

- Move EsCatalogProvider.java from fe-core to catalog-es (git mv)
- Move META-INF/services SPI registration from fe-core to catalog-es (git mv)
- Copy all ES source files (22 files) to catalog-es module

The ES code remains in fe-core temporarily for backward compatibility
(CatalogFactory switch-case fallback, GsonUtils imports). Phase 4 will
remove these duplicates from fe-core once all datasources are migrated.

### Release note

None

### Check List (For Author)

- Test: No need to test - file migration only, no logic change
- Behavior changed: No
- Does this need documentation: No
@Thearas
Copy link
Contributor

Thearas commented Mar 23, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

… Env, ExternalCatalog

### What problem does this PR solve?

Problem Summary: Fix CI checkstyle failure - SPI imports must appear
after other datasource sub-package imports in lexicographical order.

### Release note

None

### Check List (For Author)

- Test: No need to test - import reorder only
- Behavior changed: No
- Does this need documentation: No
…nector-es

### What problem does this PR solve?

Problem Summary: Rename the connector plugin module directory:
- fe-catalogs/catalog-es -> fe-connectors/connector-es
- Update parent pom.xml module reference
- Update connector-es pom.xml (artifactId, name, dependencies)

The SPI classes remain in fe-core/datasource/spi/ as they reference
fe-core types (ExternalCatalog, ScanNode, etc.) and cannot be extracted
to an independent module without first abstracting those dependencies.

### Release note

None

### Check List (For Author)

- Test: No need to test - directory rename only
- Behavior changed: No
- Does this need documentation: No
…ctory to connectors/

### What problem does this PR solve?

Problem Summary: Add build.sh support for building the ES connector plugin
independently via `--connector-es` flag. Also update CatalogPluginLoader
to load plugins from `connectors/` directory (matching build output layout)
instead of the previous `catalogs/` directory.

Changes:
- Add `--connector-es` option to build.sh (default OFF, ON in full build)
- Build fe-connectors/connector-es Maven module when flag is set
- Copy doris-connector-es.jar to output/fe/connectors/es/
- CatalogPluginLoader: CATALOGS_DIR -> CONNECTORS_DIR

### Release note

None

### Check List (For Author)

- Test: No need to test - build script change only
- Behavior changed: No
- Does this need documentation: No
@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 26982 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d44580b7b51fddd42d110aa32eca7521130bb625, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17635	4556	4303	4303
q2	q3	10641	781	523	523
q4	4679	363	260	260
q5	7560	1191	1020	1020
q6	184	178	149	149
q7	803	831	678	678
q8	9298	1460	1393	1393
q9	4870	4732	4720	4720
q10	6235	1957	1676	1676
q11	459	265	237	237
q12	718	597	473	473
q13	18067	2936	2208	2208
q14	228	235	218	218
q15	q16	736	770	673	673
q17	749	857	442	442
q18	6082	5442	5276	5276
q19	1100	972	625	625
q20	542	497	380	380
q21	4411	1825	1422	1422
q22	561	403	306	306
Total cold run time: 95558 ms
Total hot run time: 26982 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4806	4595	4665	4595
q2	q3	3864	4357	3834	3834
q4	923	1215	842	842
q5	4092	4449	4389	4389
q6	183	172	141	141
q7	1804	1677	1511	1511
q8	2516	2780	2560	2560
q9	7802	7513	7421	7421
q10	3841	4027	3631	3631
q11	503	433	423	423
q12	483	588	444	444
q13	2773	3175	2443	2443
q14	292	344	286	286
q15	q16	738	785	737	737
q17	1140	1399	1406	1399
q18	7128	6915	6737	6737
q19	940	901	939	901
q20	2080	2148	2001	2001
q21	3978	3500	3510	3500
q22	448	437	391	391
Total cold run time: 50334 ms
Total hot run time: 48186 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 168410 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d44580b7b51fddd42d110aa32eca7521130bb625, data reload: false

query5	4319	630	487	487
query6	343	223	204	204
query7	4204	467	267	267
query8	350	261	230	230
query9	8737	2678	2683	2678
query10	523	380	334	334
query11	6956	5117	4882	4882
query12	176	132	123	123
query13	1263	448	346	346
query14	5685	3719	3422	3422
query14_1	2829	2798	2762	2762
query15	212	191	179	179
query16	993	464	446	446
query17	879	717	602	602
query18	2425	444	352	352
query19	214	223	187	187
query20	133	129	127	127
query21	207	130	110	110
query22	13276	13884	14582	13884
query23	16586	15829	15869	15829
query23_1	15783	15435	15305	15305
query24	7197	1652	1212	1212
query24_1	1232	1249	1230	1230
query25	559	486	423	423
query26	1234	265	147	147
query27	2782	488	297	297
query28	4489	1848	1835	1835
query29	837	565	482	482
query30	297	230	189	189
query31	1000	949	892	892
query32	77	72	72	72
query33	515	337	281	281
query34	892	886	520	520
query35	659	687	606	606
query36	1073	1117	976	976
query37	129	94	81	81
query38	2962	2935	2846	2846
query39	862	837	812	812
query39_1	805	790	807	790
query40	238	149	137	137
query41	63	58	59	58
query42	261	257	255	255
query43	240	237	216	216
query44	
query45	204	191	184	184
query46	891	987	605	605
query47	2132	3046	2096	2096
query48	303	315	229	229
query49	637	466	384	384
query50	689	272	212	212
query51	4092	4056	4005	4005
query52	260	267	256	256
query53	288	344	293	293
query54	325	285	295	285
query55	100	92	91	91
query56	343	335	332	332
query57	1935	1856	1799	1799
query58	298	285	291	285
query59	2805	2983	2752	2752
query60	357	355	337	337
query61	191	185	180	180
query62	641	606	551	551
query63	314	289	277	277
query64	5226	1398	1137	1137
query65	
query66	1481	481	376	376
query67	24315	24351	24401	24351
query68	
query69	426	324	322	322
query70	978	980	969	969
query71	344	315	306	306
query72	2844	2670	2248	2248
query73	541	541	319	319
query74	9639	9607	9387	9387
query75	2874	2760	2469	2469
query76	2296	1024	664	664
query77	364	418	301	301
query78	10983	11141	10492	10492
query79	1153	768	556	556
query80	1363	628	559	559
query81	540	264	222	222
query82	988	156	124	124
query83	340	270	242	242
query84	303	126	98	98
query85	919	510	460	460
query86	408	332	300	300
query87	3152	3152	3051	3051
query88	3548	2674	2645	2645
query89	441	370	343	343
query90	2032	195	176	176
query91	174	162	149	149
query92	75	79	67	67
query93	1013	840	494	494
query94	642	329	299	299
query95	589	346	320	320
query96	638	505	239	239
query97	2466	2514	2420	2420
query98	235	223	225	223
query99	1040	976	925	925
Total cold run time: 249621 ms
Total hot run time: 168410 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 21.21% (28/132) 🎉
Increment coverage report
Complete coverage report

@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 27070 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e88a9a1c17bfab85da4be2dbc99b955e206a0e90, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17653	4495	4305	4305
q2	q3	10649	761	521	521
q4	4671	374	264	264
q5	7550	1205	1035	1035
q6	183	174	151	151
q7	812	858	665	665
q8	9685	1485	1376	1376
q9	4969	4706	4758	4706
q10	6326	1946	1680	1680
q11	474	257	261	257
q12	759	588	470	470
q13	18043	2924	2169	2169
q14	229	237	221	221
q15	q16	755	746	680	680
q17	752	889	431	431
q18	5828	5424	5206	5206
q19	1413	994	640	640
q20	542	500	390	390
q21	4497	1867	1598	1598
q22	430	364	305	305
Total cold run time: 96220 ms
Total hot run time: 27070 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4750	4673	4705	4673
q2	q3	3889	4379	3855	3855
q4	888	1198	800	800
q5	4168	4452	4299	4299
q6	191	171	153	153
q7	1781	1646	1540	1540
q8	2526	2775	2645	2645
q9	7484	7498	7402	7402
q10	3826	3988	3644	3644
q11	507	446	422	422
q12	510	624	452	452
q13	2706	3548	2354	2354
q14	289	303	280	280
q15	q16	747	768	743	743
q17	1184	1403	1386	1386
q18	7190	6851	6557	6557
q19	957	951	912	912
q20	2076	2243	2196	2196
q21	4013	3626	3371	3371
q22	476	430	396	396
Total cold run time: 50158 ms
Total hot run time: 48080 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 168897 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e88a9a1c17bfab85da4be2dbc99b955e206a0e90, data reload: false

query5	4342	624	503	503
query6	339	235	208	208
query7	4210	461	264	264
query8	354	249	234	234
query9	8757	2734	2745	2734
query10	516	394	352	352
query11	6966	5159	4906	4906
query12	182	138	128	128
query13	1280	457	354	354
query14	5764	3782	3445	3445
query14_1	2848	2861	2845	2845
query15	202	198	181	181
query16	975	474	377	377
query17	900	741	642	642
query18	2457	457	355	355
query19	218	218	190	190
query20	139	127	127	127
query21	211	143	119	119
query22	13209	13946	14620	13946
query23	16405	15917	15771	15771
query23_1	15686	15697	15680	15680
query24	7660	1649	1224	1224
query24_1	1235	1228	1235	1228
query25	621	463	426	426
query26	1225	267	149	149
query27	2788	486	301	301
query28	4926	1854	1905	1854
query29	832	568	470	470
query30	295	234	195	195
query31	1012	954	867	867
query32	80	74	72	72
query33	523	321	277	277
query34	894	891	528	528
query35	648	691	608	608
query36	1064	1207	974	974
query37	137	91	86	86
query38	2999	2897	2889	2889
query39	848	819	818	818
query39_1	793	787	810	787
query40	231	153	136	136
query41	63	99	58	58
query42	266	259	259	259
query43	237	247	214	214
query44	
query45	208	188	185	185
query46	884	980	614	614
query47	2102	2158	2036	2036
query48	311	312	227	227
query49	638	459	382	382
query50	685	272	210	210
query51	4170	4124	4071	4071
query52	264	272	254	254
query53	287	336	284	284
query54	306	277	275	275
query55	98	87	84	84
query56	314	316	322	316
query57	1900	1698	1746	1698
query58	287	283	268	268
query59	2781	2948	2721	2721
query60	341	347	320	320
query61	159	160	154	154
query62	627	587	535	535
query63	310	279	277	277
query64	5085	1288	1049	1049
query65	
query66	1463	453	359	359
query67	24270	24363	24232	24232
query68	
query69	405	305	287	287
query70	971	964	954	954
query71	335	329	306	306
query72	2839	2726	2451	2451
query73	542	545	324	324
query74	9575	9565	9374	9374
query75	2828	2807	2492	2492
query76	2291	1025	692	692
query77	362	372	295	295
query78	10922	11103	10523	10523
query79	2601	767	597	597
query80	1809	612	552	552
query81	554	261	227	227
query82	993	151	115	115
query83	332	270	246	246
query84	297	122	95	95
query85	918	509	474	474
query86	412	340	310	310
query87	3118	3167	3066	3066
query88	3558	2675	2663	2663
query89	416	368	342	342
query90	2011	181	168	168
query91	171	171	143	143
query92	78	76	74	74
query93	1135	825	500	500
query94	651	325	290	290
query95	591	336	389	336
query96	636	524	230	230
query97	2489	2496	2392	2392
query98	242	221	218	218
query99	1018	1006	921	921
Total cold run time: 252009 ms
Total hot run time: 168897 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 21.21% (28/132) 🎉
Increment coverage report
Complete coverage report

@morningman morningman requested a review from zclllyybb as a code owner March 23, 2026 23:07
…ke connector-es default ON

### What problem does this PR solve?

Problem Summary: Complete the ES connector decoupling by removing the
duplicated ES external catalog classes from fe-core (EsExternalCatalog,
EsExternalDatabase, EsExternalTable, EsScanNode, ESCatalogAction) since
they now live in the connector-es plugin module.

Changes:
- Delete EsExternalCatalog, EsExternalDatabase, EsExternalTable,
  EsScanNode from fe-core/datasource/es/
- Delete ESCatalogAction (ES-specific HTTP API, tightly coupled to ES internals)
- GsonUtils: replace class references with string literals for backward compat
- CatalogFactory: remove ES switch-case fallback (fully SPI-driven now)
- PhysicalPlanTranslator: remove ES fallback, error if plugin not loaded
- ExternalCatalog: remove ES case from createDatabase switch
- Env: replace instanceof EsExternalCatalog with type string check
- build.sh: change BUILD_CONNECTOR_ES default from 0 to 1
- Add AGENTS.md guide for creating new connector plugins

Note: ES utility classes (EsRestClient, EsUtil, EsRepository, etc.)
remain in fe-core as they are shared with internal EsTable support.

### Release note

None

### Check List (For Author)

- Test: No need to test - refactoring, needs full CI validation
- Behavior changed: No
- Does this need documentation: No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants