Skip to content

[#10844] fix(hive): close HiveClientFactory on pool shutdown#10854

Merged
diqiu50 merged 3 commits intoapache:mainfrom
yuqi1129:issue_10844
Apr 24, 2026
Merged

[#10844] fix(hive): close HiveClientFactory on pool shutdown#10854
diqiu50 merged 3 commits intoapache:mainfrom
yuqi1129:issue_10844

Conversation

@yuqi1129
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR makes Hive client classloader cleanup deterministic when Hive client pools are closed.

  • Override HiveClientPool#close() to call super.close() and then always close HiveClientFactory in finally.
  • Add TestHiveClientPool assertion to verify factory-close path is invoked.
  • Add dev/ci/hive_catalog_oom_repro.sh stress script to repeatedly create schema/table, load table, and drop catalog for long-run metaspace/OOM reproduction.

Why are the changes needed?

HiveClientFactory owns backendClassLoader (HiveClientClassLoader) and only closes it in HiveClientFactory#close().
Previously, HiveClientPool only closed pooled HiveClient instances, so the factory close path could be skipped, making classloader cleanup depend on GC timing and causing classloader accumulation under long-running churn.

Fix: #10844

Does this PR introduce any user-facing change?

No API changes.
A new internal CI/dev stress script is added for reproduction and verification.

How was this patch tested?

  • ./gradlew :catalogs:hive-metastore-common:test --tests org.apache.gravitino.hive.TestHiveClientPool --no-daemon
  • bash -n dev/ci/hive_catalog_oom_repro.sh

Ensure HiveClientPool closes its HiveClientFactory after pool clients are closed, so backend HiveClientClassLoader gets explicitly closed instead of relying on GC timing.

Also add a regression assertion in TestHiveClientPool and a stress script (dev/ci/hive_catalog_oom_repro.sh) that repeatedly creates/loads/drops Hive catalogs for long-run OOM reproduction.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 23, 2026 08:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes Hive client classloader cleanup deterministic by ensuring HiveClientFactory is always closed when a HiveClientPool is shut down, and adds tooling to help reproduce long-running Hive catalog churn/metaspace issues.

Changes:

  • Override HiveClientPool#close() to call super.close() and then close the underlying HiveClientFactory in a finally block.
  • Extend TestHiveClientPool to assert the factory-close path is invoked.
  • Add a dev/ci/hive_catalog_oom_repro.sh stress script that repeatedly creates/drops Hive catalogs/schemas/tables.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
dev/ci/hive_catalog_oom_repro.sh Adds a churn/stress repro script for long-run metaspace/OOM investigation.
catalogs/hive-metastore-common/src/test/java/org/apache/gravitino/hive/TestHiveClientPool.java Adds verification that the pool’s factory-close path is executed on close.
catalogs/hive-metastore-common/src/main/java/org/apache/gravitino/hive/HiveClientPool.java Ensures HiveClientFactory (and its classloader) is closed deterministically when the pool closes.

Comment thread dev/ci/hive_catalog_oom_repro.sh Outdated
Comment thread dev/ci/hive_catalog_oom_repro.sh Outdated
@github-actions
Copy link
Copy Markdown

Code Coverage Report

Overall Project 65.27% 🟢
Files changed 71.43% 🟢

Module Coverage
aliyun 1.73% 🔴
api 47.27% 🟢
authorization-common 85.96% 🟢
aws 1.1% 🔴
azure 2.6% 🔴
catalog-common 10.2% 🔴
catalog-fileset 80.02% 🟢
catalog-glue 75.36% 🟢
catalog-hive 81.83% 🟢
catalog-jdbc-clickhouse 79.06% 🟢
catalog-jdbc-common 43.93% 🟢
catalog-jdbc-doris 80.28% 🟢
catalog-jdbc-hologres 54.03% 🟢
catalog-jdbc-mysql 79.23% 🟢
catalog-jdbc-oceanbase 78.38% 🟢
catalog-jdbc-postgresql 82.05% 🟢
catalog-jdbc-starrocks 78.27% 🟢
catalog-kafka 77.01% 🟢
catalog-lakehouse-generic 45.07% 🟢
catalog-lakehouse-hudi 79.1% 🟢
catalog-lakehouse-iceberg 87.16% 🟢
catalog-lakehouse-paimon 77.71% 🟢
catalog-model 77.72% 🟢
cli 44.51% 🟢
client-java 77.63% 🟢
common 48.67% 🟢
core 81.47% 🟢
filesystem-hadoop3 76.97% 🟢
flink 40.55% 🟢
flink-runtime 0.0% 🔴
gcp 14.2% 🔴
hadoop-common 10.39% 🔴
hive-metastore-common 46.83% +0.31% 🟢
iceberg-common 55.2% 🟢
iceberg-rest-server 67.25% 🟢
integration-test-common 0.0% 🔴
jobs 66.17% 🟢
lance-common 23.88% 🔴
lance-rest-server 57.84% 🟢
lineage 53.02% 🟢
optimizer 82.95% 🟢
optimizer-api 21.95% 🔴
server 85.62% 🟢
server-common 69.76% 🟢
spark 32.79% 🔴
spark-common 39.09% 🔴
trino-connector 34.27% 🔴
Files
Module File Coverage
hive-metastore-common HiveClientPool.java 71.43% 🟢

@yuqi1129 yuqi1129 self-assigned this Apr 23, 2026
@yuqi1129 yuqi1129 requested a review from diqiu50 April 23, 2026 11:43
Copy link
Copy Markdown
Contributor

@diqiu50 diqiu50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@diqiu50 diqiu50 merged commit a22daa0 into apache:main Apr 24, 2026
31 of 34 checks passed
laserninja pushed a commit to laserninja/gravitino that referenced this pull request Apr 24, 2026
…pache#10854)

### What changes were proposed in this pull request?
This PR makes Hive client classloader cleanup deterministic when Hive
client pools are closed.

- Override `HiveClientPool#close()` to call `super.close()` and then
always close `HiveClientFactory` in `finally`.
- Add `TestHiveClientPool` assertion to verify factory-close path is
invoked.
- Add `dev/ci/hive_catalog_oom_repro.sh` stress script to repeatedly
create schema/table, load table, and drop catalog for long-run
metaspace/OOM reproduction.

### Why are the changes needed?
`HiveClientFactory` owns `backendClassLoader` (`HiveClientClassLoader`)
and only closes it in `HiveClientFactory#close()`.
Previously, `HiveClientPool` only closed pooled `HiveClient` instances,
so the factory close path could be skipped, making classloader cleanup
depend on GC timing and causing classloader accumulation under
long-running churn.

Fix: apache#10844

### Does this PR introduce _any_ user-facing change?
No API changes.
A new internal CI/dev stress script is added for reproduction and
verification.

### How was this patch tested?
- `./gradlew :catalogs:hive-metastore-common:test --tests
org.apache.gravitino.hive.TestHiveClientPool --no-daemon`
- `bash -n dev/ci/hive_catalog_oom_repro.sh`

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
yuqi1129 added a commit that referenced this pull request Apr 29, 2026
…ry on pool shutdown (#10854) (#10891)

**Cherry-pick Information:**
- Original commit: a22daa0
- Target branch: `branch-1.2`
- Status: ✅ Clean cherry-pick (no conflicts)

Co-authored-by: Qi Yu <yuqi@datastrato.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug report] Gravitino-server Metaspace OOM after long-running

3 participants