[refactor](fe) Add CatalogProvider SPI framework and migrate ES as pilot#61604
Open
morningman wants to merge 18 commits intoapache:masterfrom
Open
[refactor](fe) Add CatalogProvider SPI framework and migrate ES as pilot#61604morningman wants to merge 18 commits intoapache:masterfrom
morningman wants to merge 18 commits intoapache:masterfrom
Conversation
…rce plugin support ### What problem does this PR solve? Issue Number: close #xxx Problem Summary: External data source code (Iceberg, Paimon, Hive, ES, etc.) is tightly coupled into fe-core via hardcoded switch-case and instanceof chains. This makes it impossible to add new data sources without modifying core code. This commit adds the SPI framework infrastructure: - CatalogProvider: SPI interface for external datasource plugins - CatalogProviderRegistry: thread-safe type-to-provider mapping - CatalogPluginLoader: plugin discovery with ClassLoader isolation - Env.java: load plugins before EditLog replay ### Release note None ### Check List (For Author) - Test: No need to test - framework only, no behavioral changes - Behavior changed: No - Does this need documentation: No
…Factory and ExternalCatalog ### What problem does this PR solve? Issue Number: close #xxx Problem Summary: This is Phase 2 of the external datasource SPI refactoring. It implements the first CatalogProvider (ES) and wires the SPI lookup into CatalogFactory and ExternalCatalog.buildDbForInit, with fallback to the existing hardcoded switch-case for non-migrated datasources. Changes: - Add createCatalog() to CatalogProvider SPI interface - Implement EsCatalogProvider with all SPI methods - Add META-INF/services registration for ServiceLoader discovery - CatalogFactory: try SPI provider before switch-case fallback - ExternalCatalog.buildDbForInit: try SPI provider before switch-case fallback ### Release note None ### Check List (For Author) - Test: No need to test - SPI wiring with fallback, no behavioral change - Behavior changed: No - Does this need documentation: No
… GsonUtils, PhysicalPlanTranslator, Maven module ### What problem does this PR solve? Issue Number: close #xxx Problem Summary: Completes the ES datasource SPI migration (Phase 2) by: 1. ExternalCatalog: Convert 3 abstract methods (initLocalObjectsImpl, listTableNamesFromRemote, tableExist) to concrete SPI-delegating defaults. Add transient CatalogProvider field with auto-resolution in initLocalObjects(). Subclasses still override for backward compatibility. 2. GsonUtils: Change ES registerSubtype to registerCompatibleSubtype for all 3 type adapter factories (Catalog/Database/Table). Old "EsExternalCatalog" JSON now deserializes to ExternalCatalog. Add ExternalCatalog as registered subtype for new serialization. 3. PhysicalPlanTranslator: visitPhysicalEsScan now uses CatalogProvider SPI to create ScanNode, with fallback to direct EsScanNode for backward compat. 4. Maven module: Create fe-catalogs/catalog-es/ with pom.xml (provided fe-core dependency, shade plugin for fat JAR). Register as module in parent pom.xml. ### Release note None ### Check List (For Author) - Test: No need to test - structural refactoring with SPI fallback, no behavioral change - Behavior changed: No - Does this need documentation: No
### What problem does this PR solve? Issue Number: close #xxx Problem Summary: Migrates ES datasource code to the independent fe-catalogs/catalog-es Maven module: - Move EsCatalogProvider.java from fe-core to catalog-es (git mv) - Move META-INF/services SPI registration from fe-core to catalog-es (git mv) - Copy all ES source files (22 files) to catalog-es module The ES code remains in fe-core temporarily for backward compatibility (CatalogFactory switch-case fallback, GsonUtils imports). Phase 4 will remove these duplicates from fe-core once all datasources are migrated. ### Release note None ### Check List (For Author) - Test: No need to test - file migration only, no logic change - Behavior changed: No - Does this need documentation: No
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
… Env, ExternalCatalog ### What problem does this PR solve? Problem Summary: Fix CI checkstyle failure - SPI imports must appear after other datasource sub-package imports in lexicographical order. ### Release note None ### Check List (For Author) - Test: No need to test - import reorder only - Behavior changed: No - Does this need documentation: No
…nector-es ### What problem does this PR solve? Problem Summary: Rename the connector plugin module directory: - fe-catalogs/catalog-es -> fe-connectors/connector-es - Update parent pom.xml module reference - Update connector-es pom.xml (artifactId, name, dependencies) The SPI classes remain in fe-core/datasource/spi/ as they reference fe-core types (ExternalCatalog, ScanNode, etc.) and cannot be extracted to an independent module without first abstracting those dependencies. ### Release note None ### Check List (For Author) - Test: No need to test - directory rename only - Behavior changed: No - Does this need documentation: No
…ctory to connectors/ ### What problem does this PR solve? Problem Summary: Add build.sh support for building the ES connector plugin independently via `--connector-es` flag. Also update CatalogPluginLoader to load plugins from `connectors/` directory (matching build output layout) instead of the previous `catalogs/` directory. Changes: - Add `--connector-es` option to build.sh (default OFF, ON in full build) - Build fe-connectors/connector-es Maven module when flag is set - Copy doris-connector-es.jar to output/fe/connectors/es/ - CatalogPluginLoader: CATALOGS_DIR -> CONNECTORS_DIR ### Release note None ### Check List (For Author) - Test: No need to test - build script change only - Behavior changed: No - Does this need documentation: No
…fix DORIS_HOME reference
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 26982 ms |
TPC-DS: Total hot run time: 168410 ms |
Contributor
FE UT Coverage ReportIncrement line coverage |
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 27070 ms |
TPC-DS: Total hot run time: 168897 ms |
Contributor
FE UT Coverage ReportIncrement line coverage |
…ke connector-es default ON ### What problem does this PR solve? Problem Summary: Complete the ES connector decoupling by removing the duplicated ES external catalog classes from fe-core (EsExternalCatalog, EsExternalDatabase, EsExternalTable, EsScanNode, ESCatalogAction) since they now live in the connector-es plugin module. Changes: - Delete EsExternalCatalog, EsExternalDatabase, EsExternalTable, EsScanNode from fe-core/datasource/es/ - Delete ESCatalogAction (ES-specific HTTP API, tightly coupled to ES internals) - GsonUtils: replace class references with string literals for backward compat - CatalogFactory: remove ES switch-case fallback (fully SPI-driven now) - PhysicalPlanTranslator: remove ES fallback, error if plugin not loaded - ExternalCatalog: remove ES case from createDatabase switch - Env: replace instanceof EsExternalCatalog with type string check - build.sh: change BUILD_CONNECTOR_ES default from 0 to 1 - Add AGENTS.md guide for creating new connector plugins Note: ES utility classes (EsRestClient, EsUtil, EsRepository, etc.) remain in fe-core as they are shared with internal EsTable support. ### Release note None ### Check List (For Author) - Test: No need to test - refactoring, needs full CI validation - Behavior changed: No - Does this need documentation: No
79a0a9c to
4f455da
Compare
bfa4923 to
f7b656b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: close #xxx
Problem Summary: This PR introduces a Service Provider Interface (SPI) framework for external datasource catalogs, enabling dynamic loading and ClassLoader isolation for catalog plugins. ES (Elasticsearch) is migrated as the first pilot.
Phase 1 — SPI Framework:
CatalogProviderSPI interface with methods for catalog lifecycle managementCatalogProviderRegistrythread-safe provider registryCatalogPluginLoaderwith URLClassLoader isolation for plugin JARsloadPlugins()called beforeloadImage()Phase 2 — ES Migration (Pilot):
EsCatalogProviderimplementing CatalogProvider SPICatalogFactory: SPI provider lookup before switch-case fallbackExternalCatalog.buildDbForInit: SPI provider lookup before switch-case fallbackExternalCatalog: Three abstract methods converted to concrete SPI-delegating defaults (initLocalObjectsImpl,listTableNamesFromRemote,tableExist) with transientproviderfieldGsonUtils: ES types changed fromregisterSubtypetoregisterCompatibleSubtypefor plugin-agnostic persistencePhysicalPlanTranslator.visitPhysicalEsScan: SPI-based ScanNode creation with fallbackfe-catalogs/catalog-es/Maven module with shade plugin for self-contained plugin JARKey Design Decisions:
"clazz":"EsExternalCatalog") are handled viaregisterCompatibleSubtypemakeSureInitialized()Release note
None
Check List (For Author)