Skip to content

[#10093] fix(core): Introduce ClassLoaderPool to share ClassLoaders across same-type catalogs#10480

Open
LuciferYang wants to merge 3 commits intoapache:mainfrom
LuciferYang:fix-10093
Open

[#10093] fix(core): Introduce ClassLoaderPool to share ClassLoaders across same-type catalogs#10480
LuciferYang wants to merge 3 commits intoapache:mainfrom
LuciferYang:fix-10093

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Mar 19, 2026

What changes were proposed in this pull request?

Introduce a ClassLoaderPool with reference counting to share IsolatedClassLoader instances across catalogs of the same type, and centralize ClassLoader resource cleanup into the pool's lifecycle.

Core mechanism: Catalogs with identical isolation-relevant properties share a single IsolatedClassLoader. The isolation key includes the provider, package, authorization provider, Kerberos identity, and backend URIs (metastore.uris, jdbc-url, fs.defaultFS). Operators can extend the isolation dimensions via gravitino.catalog.classloader.isolation.extra-properties without code changes. The pool uses ConcurrentHashMap.compute() for atomic acquire/release, and performs cleanup (JDBC driver deregistration, ThreadLocal clearing, MySQL AbandonedConnectionCleanupThread shutdown) only when the last catalog releases the shared ClassLoader.

New classes:

  • ClassLoaderKeyMap<String, String>-based key for ClassLoader sharing, decoupled from specific property names
  • ClassLoaderPool — thread-safe pool with reference counting and lifecycle management
  • PooledClassLoaderEntry — holds a shared ClassLoader and its reference count

Changes to existing classes:

  • CatalogManager — integrates pool into catalog creation, test connection, and close paths; fixes ClassLoader leak in testConnection() and getResolvedProperties(); defines built-in isolation property keys and supports configurable extra keys
  • Configs — adds gravitino.catalog.classloader.isolation.extra-properties configuration
  • ClassLoaderResourceCleanerUtils — broadens ThreadLocal cleanup from webserver-only to all application threads; adds MySQL cleanup
  • Removes scattered cleanup from JdbcCatalogOperations, IcebergCatalogWrapper, IcebergCatalogOperations, and PaimonCatalogOperations

Why are the changes needed?

Concurrent catalog creation with different names but the same provider type causes OutOfMemoryError: Metaspace. Each catalog creates an independent IsolatedClassLoader that loads all provider JARs into Metaspace. With MaxMetaspaceSize=512m (default) and Iceberg catalogs consuming ~30-80 MB each, ~10 catalogs exhaust the limit.

This patch addresses four root causes:

  1. No ClassLoader sharing — same-type catalogs loaded identical classes into separate Metaspace regions
  2. ClassLoader leak in testConnection() — wrapper was never closed after connection test
  3. Incomplete ThreadLocal cleanup — only cleaned webserver threads, missing ForkJoinPool and other threads
  4. Inconsistent cleanup — only 2 of 9+ catalog types called ClassLoaderResourceCleanerUtils

Fix: #10093

Does this PR introduce any user-facing change?

Yes. A new optional server configuration is added:

  • gravitino.catalog.classloader.isolation.extra-properties — comma-separated list of additional catalog property keys used to determine ClassLoader isolation. Supplements the built-in defaults (package, authorization-provider, Kerberos identity, metastore.uris, jdbc-url, fs.defaultFS) and cannot remove them. Default is empty.

How was this patch tested?

Unit tests (TestClassLoaderPool — 19 tests): acquire/release semantics, reference counting, concurrent access with 20 threads, close-during-acquire race, double-release resilience, Kerberos key isolation, backend URI isolation (metastore URIs, JDBC URLs, fs.defaultFS), authorization provider isolation, package property isolation.

Integration tests (TestClassLoaderPoolIntegration — 3 tests): same-type catalogs share ClassLoader instance, drop one doesn't affect others, manager close cleans up pool.

Existing tests: TestCatalogManager and TestJdbcCatalogOperations pass without modification.

Benchmark (JDK 17, -XX:MaxMetaspaceSize=512m, fileset provider, 10 concurrent threads):

Metaspace growth (committed KB):

Catalogs Baseline (main) ClassLoaderPool Reduction
100 +890 +261 3.4x
500 +3,280 +82 40x
1,000 +6,416 +9 713x
5,000 +13,969 +67 209x
10,000 +40,394 +11 3,672x

Classes loaded:

Catalogs Baseline ClassLoaderPool Reduction
1,000 21,772 11,387 48%
10,000 60,373 11,961 80%

Baseline Metaspace grows O(N) with catalog count. The pool stays flat at ~8.7 MB — O(number of distinct keys). No OOM or performance regression on either version. For Iceberg catalogs (~50 MB/ClassLoader), baseline OOMs at ~10 catalogs; with the pool, catalogs sharing the same key reuse a single ClassLoader, so Metaspace scales with the number of distinct configurations rather than the number of catalog instances.

Future extensibility

ClassLoaderKey stores isolation properties as a generic Map<String, String>, decoupled from any specific property names. This makes the pool infrastructure key-agnostic — only the logic that builds the key needs to know which properties matter.

If new ClassLoader-scoped static state is discovered in the future, the isolation criteria can be extended by adding property keys to the server configuration:

gravitino.catalog.classloader.isolation.extra-properties = custom.backend.endpoint

CatalogManager reads this list at startup and extracts matching values from catalog properties when building the key. Operators can then add isolation criteria for environment-specific static state without code changes or recompilation. The ClassLoaderKey, ClassLoaderPool, and PooledClassLoaderEntry require no modification — only the source of property keys changes.

@yuqi1129
Copy link
Contributor

We had tried it in this way, see #2644, but aborted it as a potential risk exists in classloader sharing, including

  • Static value in the classloader may be changed by one catalog and used by another
  • Properties in classloader initialization may affect its behaviour; however, it can't be changed, although another catalog with the same type has different properties.

These two factors mentioned above will affect the correctness of the Gravitino catalogs, so we hesitate to move on.

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Mar 19, 2026

We had tried it in this way, see #2644, but aborted it as a potential risk exists in classloader sharing, including

  • Static value in the classloader may be changed by one catalog and used by another
  • Properties in classloader initialization may affect its behaviour; however, it can't be changed, although another catalog with the same type has different properties.

These two factors mentioned above will affect the correctness of the Gravitino catalogs, so we hesitate to move on.

@yuqi1129 Thank you for the context on #2644. The three concerns raised by @jerryshao are valid for that PR's approach — here's how this PR differs and addresses each one.


Background: how this PR differs from #2644

PR #2644 proposed reusing ClassLoaders during alter operations on a single catalog, passing the old ClassLoader to the new CatalogWrapper. This created complex lifecycle issues — especially when catalog names change (cache invalidation closes the wrapper, which would close the shared ClassLoader) and when altered properties require a fresh ClassLoader.

This PR takes a fundamentally different approach: a reference-counted pool that shares ClassLoaders across different catalog instances based on a composite key. The alter path is unaffected — alter still invalidates the old cache entry (releasing the pool reference) and creates a new wrapper (acquiring from the pool). The pool handles whether to reuse or create a new ClassLoader based on whether the key changed.


Addressing the three specific concerns

1. "The classloader reuse for alteration makes the code a little broken"

This PR does not change the alter path's control flow. When a catalog is altered:

  • The old CatalogWrapper is invalidated from the Caffeine cache → removal listener calls wrapper.close()pool.release(poolEntry) decrements refCount
  • A new CatalogWrapper is created via createCatalogWrapper()pool.acquire(key, factory) either reuses an existing ClassLoader (if other catalogs share the same key) or creates a new one
  • The alter code in CatalogManager remains unchanged — it still invalidates and recreates as before

The complex setShouldCloseClassLoader / conditional invalidation pattern from #2644 is not needed because the pool's reference counting handles the lifecycle automatically.

2. "If we modify some properties that require rebooting and refreshing the classloader, with this we cannot support it"

This concern applies when alter changes properties that affect ClassLoader construction. In this PR, the ClassLoaderKey is built from the catalog's current properties at creation time. If an alter changes a property that is part of the key (e.g., Catalog.PROPERTY_PACKAGE, authorization plugin path, or Kerberos identity), the new wrapper will have a different key and get a fresh ClassLoader. The old ClassLoader's refCount is decremented by the close of the old wrapper, and will be cleaned up when no other catalogs reference it.

If the alter changes a property that is NOT part of the key (e.g., metastore URI, JDBC URL), the ClassLoader is reused — which is correct, because these properties do not affect ClassLoader construction (they are consumed by CatalogOperations.initialize(), which runs per-instance on the new wrapper).

3. "Sharing classloader between catalogs is dangerous, because some static variables that were created based on logic A will be used by another catalog that are based on logic B"

I audited all catalog implementations for static mutable state:

  • Instance-level state (safe): All catalog operations classes store runtime configuration in instance fields — dataSource, clientPool, icebergCatalogWrapper, hadoopConf, fileSystemCache, etc. These are per-catalog-instance and are not affected by ClassLoader sharing.

  • UserGroupInformation.loginUser (addressed): The only unkeyed static state that affects correctness. ClassLoaderKey includes Kerberos principal and keytab via isolation properties, so catalogs with different identities get separate ClassLoaders.

  • IcebergHiveCachedClientPool.clientPoolCache (safe): Static, but keyed by (metastore URI, catalog name, user credentials). Different catalog configurations produce different cache entries — sharing the ClassLoader does not cause cross-contamination.

  • FileSystem.CACHE (safe): Static, but keyed by (scheme, authority, UGI). Same isolation guarantee as above.

The key insight is that ClassLoader construction in Gravitino depends only on JAR paths (provider + package + auth plugin), not on catalog runtime properties. Two catalogs with the same ClassLoaderKey load exactly the same classes — their "logic A" and "logic B" differ only at the instance level (different metastore URIs, different JDBC URLs, etc.), which is stored in instance fields and not affected by ClassLoader sharing.


Future extensibility

ClassLoaderKey stores isolation properties as a generic Map<String, String>, decoupled from any specific property names. This makes the pool infrastructure key-agnostic — only the logic that builds the key needs to know which properties matter.

If new ClassLoader-scoped static state is discovered in the future, the isolation criteria can be extended by introducing a server configuration such as:

gravitino.catalog.classloader.isolation.properties = authentication.type,authentication.kerberos.principal,authentication.kerberos.keytab-uri

CatalogManager would read this list at startup and extract matching values from catalog properties when building the key. Operators can then add isolation criteria for environment-specific static state without code changes or recompilation. The ClassLoaderKey, ClassLoaderPool, and PooledClassLoaderEntry require no modification — only the source of property keys changes. If the community prefers, I can implement this configurable approach in the current PR.

Since I don't have a thorough understanding of the project yet, please correct me if I've misunderstood anything.

@roryqi roryqi requested a review from yuqi1129 March 19, 2026 14:49
@github-actions
Copy link

github-actions bot commented Mar 19, 2026

Code Coverage Report

Overall Project 64.92% -0.15% 🟢
Files changed 60.01% 🟢

Module Coverage
aliyun 1.73% 🔴
api 47.14% 🟢
authorization-common 85.96% 🟢
aws 1.1% 🔴
azure 2.6% 🔴
catalog-common 9.8% -3.83% 🔴
catalog-fileset 80.02% 🟢
catalog-hive 80.98% 🟢
catalog-jdbc-clickhouse 79.06% 🟢
catalog-jdbc-common 42.49% -10.77% 🟢
catalog-jdbc-doris 80.28% 🟢
catalog-jdbc-hologres 54.03% 🟢
catalog-jdbc-mysql 79.23% 🟢
catalog-jdbc-oceanbase 78.38% 🟢
catalog-jdbc-postgresql 82.05% 🟢
catalog-jdbc-starrocks 78.27% 🟢
catalog-kafka 77.01% 🟢
catalog-lakehouse-generic 45.07% 🟢
catalog-lakehouse-hudi 79.1% 🟢
catalog-lakehouse-iceberg 87.13% -2.52% 🟢
catalog-lakehouse-paimon 77.69% -0.9% 🟢
catalog-model 77.72% 🟢
cli 44.51% 🟢
client-java 77.83% 🟢
common 49.42% 🟢
core 80.99% -0.31% 🟢
filesystem-hadoop3 76.97% 🟢
flink 38.86% 🔴
flink-runtime 0.0% 🔴
gcp 14.2% 🔴
hadoop-common 10.39% 🔴
hive-metastore-common 45.82% 🟢
iceberg-common 51.87% -8.2% 🟢
iceberg-rest-server 66.54% +0.38% 🟢
integration-test-common 0.0% 🔴
jobs 66.17% 🟢
lance-common 23.88% 🔴
lance-rest-server 57.84% 🟢
lineage 53.02% 🟢
optimizer 82.87% 🟢
optimizer-api 21.95% 🔴
server 85.6% 🟢
server-common 69.43% 🟢
spark 32.79% 🔴
spark-common 39.09% 🔴
trino-connector 31.62% 🔴
Files
Module File Coverage
catalog-common ClassLoaderResourceCleanerUtils.java 0.0% 🔴
catalog-jdbc-common JdbcCatalogOperations.java 5.69% 🔴
catalog-lakehouse-iceberg IcebergCatalogOperations.java 80.74% 🟢
catalog-lakehouse-paimon PaimonCatalogOperations.java 74.55% 🟢
core ClassLoaderKey.java 100.0% 🟢
Configs.java 98.87% 🟢
PooledClassLoaderEntry.java 88.24% 🟢
ClassLoaderPool.java 84.85% 🟢
CatalogManager.java 66.05% 🟢
IsolatedClassLoader.java 48.48% 🔴
iceberg-common IcebergCatalogWrapper.java 0.0% 🔴
iceberg-rest-server CatalogWrapperForREST.java 70.98% 🟢

Copy link
Contributor

@yuqi1129 yuqi1129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR — the OOM/Metaspace problem is real and the benchmark numbers are impressive. I spent some time studying the design carefully and have a few concerns about correctness under certain real-world configurations. I'd like to discuss these before merging.


Background: What ClassLoader sharing gives up

IsolatedClassLoader was designed to give every catalog its own isolated class space so that third-party library static state (Hadoop UGI, FileSystem cache, JDBC DriverManager, HiveConf, etc.) cannot bleed between catalogs. ClassLoaderPool partially relaxes that isolation. The question is whether ClassLoaderKey captures all the dimensions along which static state can diverge between two catalogs of the same type.


Concern 1: ClassLoaderKey is missing critical backend-URI dimensions

The current key is:

provider + packageProperty + authorizationPkgPath + kerberosPrincipal + kerberosKeytab

This is correct for isolating Kerberos identity, but it doesn't account for catalogs that point to different backends of the same type. Consider:

Scenario — two Iceberg catalogs with different HMS URIs:

catalog-A: provider=lakehouse-iceberg, metastore.uris=thrift://hms-A:9083
catalog-B: provider=lakehouse-iceberg, metastore.uris=thrift://hms-B:9083

Both produce the same ClassLoaderKey and share one IsolatedClassLoader. Inside that ClassLoader, HiveConf has a static configuration space. HiveConf.ConfVars and the valuesPerLabel cache are static. If catalog-B's initialization overwrites catalog-A's HMS URI in HiveConf, catalog-A starts talking to the wrong metastore.

Same problem exists for:

  • fs.defaultFS (Hadoop FileSystem.CACHE is a per-ClassLoader static keyed on URI + conf + UserGroupInformation; two catalogs pointing at different HDFS clusters may cross-contaminate the FileSystem cache)
  • JDBC URL (less likely to corrupt static state, but the AbandonedConnectionCleanupThread and driver registry are global per-ClassLoader)

Suggested fix: extend ClassLoaderKey to include the backend URI(s) that anchor static state. For Iceberg/Paimon Hive-backend: metastore.uris. For JDBC catalogs: jdbc-url. For fileset catalogs: fs.defaultFS.


Concern 2: FileSystem.closeAll() during doFinalCleanup can disconnect live catalogs

closeStatsDataClearerInFileSystem calls FileSystem.closeAll() — a static method that closes every cached FileSystem in that ClassLoader's cache. doFinalCleanup runs only when refCount reaches 0 (i.e., the last catalog sharing this ClassLoader is closed), so under the current logic it won't fire while other catalogs are live.

However, if Concern 1 is fixed and two catalogs with different HDFS URIs are given separate keys, this is safe. But as long as ClassLoaderKey is under-specified (same key for different backends), there is a window where catalog-A's cleanup kills catalog-B's live HDFS connections.


Concern 3: ThreadLocal cross-contamination during shared lifetime

With per-catalog ClassLoaders, ThreadLocal values from different catalogs are of different Class objects (loaded by different ClassLoaders), so they are naturally isolated. With a shared ClassLoader, Catalog A and Catalog B load the same Class, meaning a ThreadLocal set by catalog-A's code on a Jetty thread is visible to catalog-B's code running on the same thread.

Concretely: if Iceberg or Hive sets a per-request ThreadLocal (e.g., Hadoop SecurityContext, Hive SessionState, or Iceberg's ResolvingFileIO context), catalog-B could pick up a stale value left by catalog-A. This is especially risky for Hive SessionState, which is a ThreadLocal singleton used throughout HMS interaction.


Comparison with industry practice

The closest industry analogue is Trino's plugin ClassLoader: a single ClassLoader is shared across all connector instances of the same plugin. Trino makes this safe because its connectors are stateless — all per-request state lives in ConnectorSession, not in static fields. Gravitino's catalogs are stateful (they hold HMS connections, HDFS FileSystems, UGI login state), which is the key difference that makes sharing riskier here.


Summary of concerns

Issue Severity Affected catalogs
ClassLoaderKey missing metastore.uris / fs.defaultFS High Iceberg-Hive, Hive, Paimon-Hive, fileset
FileSystem.closeAll() can disconnect live catalog (if keys are under-specified) High (conditional on above) Any catalog using HDFS
ThreadLocal cross-contamination between shared-ClassLoader catalogs Medium Iceberg, Hive (SessionState)
AWS SDK v2 static state not cleaned up (only v1 handled) Low Iceberg/Paimon with S3 backend

Suggestion

The OOM fix is valuable and I'd like to see it merged. One path forward:

  1. Extend ClassLoaderKey to include the backend URI dimensions that anchor per-ClassLoader static state. This makes sharing conservative (fewer catalogs will share) but correct.
  2. Or, limit sharing to the case where all config properties are identical (not just the 5 current dimensions). This is the safest interpretation of "same type" — if two catalogs are truly identical in configuration, they genuinely cannot diverge in static state.
  3. Add a test that creates two same-type catalogs pointing at different backends and verifies they get separate ClassLoader entries in the pool.

Happy to discuss the trade-offs — the refCount/cleanup mechanics look solid and the testConnection leak fix is a clear win regardless.

@LuciferYang
Copy link
Contributor Author

@yuqi1129 Thanks for your valuable advice. I'm a bit busy today, but I'll take care of the this tomorrow.

@LuciferYang
Copy link
Contributor Author

The failure of the S3 integration test appears to be unrelated to the current pr

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Mar 24, 2026

Thanks @yuqi1129 for the thorough review. Here is the status of each concern.

Concern 1: ClassLoaderKey missing backend-URI dimensions — Fixed

Extended the isolation key with metastore.uris, jdbc-url, and fs.defaultFS. Also added authorization-provider (determines which authorization plugin JARs are loaded).

The full set of built-in default isolation keys (catalog property names) is now:

Category Property keys
Classpath package (Catalog.PROPERTY_PACKAGE), authorization-provider
Kerberos identity authentication.type, authentication.kerberos.principal, authentication.kerberos.keytab-uri
Backend URIs metastore.uris, jdbc-url, fs.defaultFS

These defaults cannot be removed. Operators can add more via a new server config:

gravitino.catalog.classloader.isolation.extra-properties = custom.backend.endpoint

ClassLoaderKey stores isolation properties as a generic Map<String, String>, decoupled from specific property names — only CatalogManager.buildClassLoaderKey needs to know which properties matter. This makes the pool infrastructure key-agnostic and extensible without modifying pool classes.

Concern 2: FileSystem.closeAll() can disconnect live catalogs — Resolved by Concern 1 fix

With backend URIs now in the key, catalogs pointing at different HDFS clusters get separate ClassLoaders. doFinalCleanup only runs when refCount reaches 0, so FileSystem.closeAll() cannot affect live catalogs sharing the same ClassLoader.

Concern 3: ThreadLocal cross-contamination — Not fully resolved, inherent trade-off

This is a genuine limitation of ClassLoader sharing. With per-catalog ClassLoaders, ThreadLocal values are naturally isolated because each ClassLoader loads its own copy of the class. With a shared ClassLoader, catalogs on the same thread can see each other's ThreadLocal state.

The Concern 1 fix reduces the blast radius — catalogs sharing a ClassLoader now have identical backend configurations, so leaked ThreadLocal state is less likely to cause incorrect behavior (e.g., talking to the wrong metastore). But it does not eliminate the problem. If a library sets a ThreadLocal with per-catalog state (e.g., Hive SessionState, Hadoop SecurityContext), cross-contamination is still possible between catalogs sharing the same ClassLoader on the same thread.

What this PR does:

  • Reduces exposure by isolating catalogs with different backends into separate ClassLoaders (Concern 1 fix)
  • Provides extra-properties config as an operational escape hatch — if a specific ThreadLocal issue surfaces, operators can add the relevant property key to force separation without a code release

What this PR does not do:

  • It does not guarantee ThreadLocal isolation between catalogs sharing a ClassLoader. This is an inherent trade-off of sharing and cannot be fully solved at the key level.

Open question for the community: Is this trade-off acceptable given the Metaspace savings, or should we consider additional safeguards (e.g., a per-catalog opt-out property to force a dedicated ClassLoader)? Happy to discuss.

Concern 4: AWS SDK v2 static state — Not addressed in this PR

Low severity, not related to the key design. Can be addressed as a follow-up if this one merged.

Suggestion 3: Tests for different-backend isolation — Added

New tests:

  • testDifferentMetastoreUrisCreateDifferentEntries
  • testSameMetastoreUrisShareEntry
  • testDifferentJdbcUrlsCreateDifferentEntries
  • testDifferentDefaultFsCreateDifferentEntries
  • testKeyWithAuthorizationProvider

Total: 19 unit tests + 3 integration tests.

@yuqi1129
Copy link
Contributor

Thanks @yuqi1129 for the thorough review. Here is the status of each concern.

Concern 1: ClassLoaderKey missing backend-URI dimensions — Fixed

Extended the isolation key with metastore.uris, jdbc-url, and fs.defaultFS. Also added authorization-provider (determines which authorization plugin JARs are loaded).

The full set of built-in default isolation keys (catalog property names) is now:

Category Property keys
Classpath package (Catalog.PROPERTY_PACKAGE), authorization-provider
Kerberos identity authentication.type, authentication.kerberos.principal, authentication.kerberos.keytab-uri
Backend URIs metastore.uris, jdbc-url, fs.defaultFS
These defaults cannot be removed. Operators can add more via a new server config:

gravitino.catalog.classloader.isolation.extra-properties = custom.backend.endpoint

ClassLoaderKey stores isolation properties as a generic Map<String, String>, decoupled from specific property names — only CatalogManager.buildClassLoaderKey needs to know which properties matter. This makes the pool infrastructure key-agnostic and extensible without modifying pool classes.

Concern 2: FileSystem.closeAll() can disconnect live catalogs — Resolved by Concern 1 fix

With backend URIs now in the key, catalogs pointing at different HDFS clusters get separate ClassLoaders. doFinalCleanup only runs when refCount reaches 0, so FileSystem.closeAll() cannot affect live catalogs sharing the same ClassLoader.

Concern 3: ThreadLocal cross-contamination — Not fully resolved, inherent trade-off

This is a genuine limitation of ClassLoader sharing. With per-catalog ClassLoaders, ThreadLocal values are naturally isolated because each ClassLoader loads its own copy of the class. With a shared ClassLoader, catalogs on the same thread can see each other's ThreadLocal state.

The Concern 1 fix reduces the blast radius — catalogs sharing a ClassLoader now have identical backend configurations, so leaked ThreadLocal state is less likely to cause incorrect behavior (e.g., talking to the wrong metastore). But it does not eliminate the problem. If a library sets a ThreadLocal with per-catalog state (e.g., Hive SessionState, Hadoop SecurityContext), cross-contamination is still possible between catalogs sharing the same ClassLoader on the same thread.

What this PR does:

  • Reduces exposure by isolating catalogs with different backends into separate ClassLoaders (Concern 1 fix)
  • Provides extra-properties config as an operational escape hatch — if a specific ThreadLocal issue surfaces, operators can add the relevant property key to force separation without a code release

What this PR does not do:

  • It does not guarantee ThreadLocal isolation between catalogs sharing a ClassLoader. This is an inherent trade-off of sharing and cannot be fully solved at the key level.

Open question for the community: Is this trade-off acceptable given the Metaspace savings, or should we consider additional safeguards (e.g., a per-catalog opt-out property to force a dedicated ClassLoader)? Happy to discuss.

Concern 4: AWS SDK v2 static state — Not addressed in this PR

Low severity, not related to the key design. Can be addressed as a follow-up if this one merged.

Suggestion 3: Tests for different-backend isolation — Added

New tests:

  • testDifferentMetastoreUrisCreateDifferentEntries
  • testSameMetastoreUrisShareEntry
  • testDifferentJdbcUrlsCreateDifferentEntries
  • testDifferentDefaultFsCreateDifferentEntries
  • testKeyWithAuthorizationProvider

Total: 19 unit tests + 3 integration tests.

I will take time to review it again. Thanks for your quick response.

@LuciferYang
Copy link
Contributor Author

Thank you @yuqi1129

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug report] Create catalog concurrently encounter OutOfMemoryError: Metaspace (authorization disable, cache enable)

2 participants