Skip to content

[fix](catalog) avoid catalog/cache lock inversion in resetToUninitialized#61103

Draft
suxiaogang223 wants to merge 2 commits intoapache:branch-3.1from
suxiaogang223:codex/fix-catalog-cache-deadlock-3.1
Draft

[fix](catalog) avoid catalog/cache lock inversion in resetToUninitialized#61103
suxiaogang223 wants to merge 2 commits intoapache:branch-3.1from
suxiaogang223:codex/fix-catalog-cache-deadlock-3.1

Conversation

@suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Mar 6, 2026

What problem does this PR solve?

This fixes a potential Java deadlock caused by lock-order inversion between ExternalCatalog object monitor(synchronized methods) and Caffeine/ConcurrentHashMap internal key locks during cache invalidation/loading.

The deadlock can happen when one thread holds the catalog monitor and invalidates cache, while another thread holds cache-internal lock and calls back into catalog initialization.

What is changed?

  1. In ExternalCatalog, split reset flow into lock-protected state reset(resetToUninitializedInLock) and cache refresh outside catalog monitor.
  2. In JdbcExternalCatalog, keep identifier mapping reset under lock, but perform onRefreshCache outside lock.
  3. Add FE unit test ExternalCatalogResetToUninitializedTest to assert onRefreshCache runs without holding catalog monitor (both base external catalog and JDBC path).

Testing

  • Not rerun to completion in this turn after branch switch.

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants