[hive] Fix hive catalog lock may encounter deadlock. #6783

zhuangchong · 2025-12-09T12:56:37Z

Purpose

Linked issue: close #6782

There are many possible causes for “hive lock may encounter deadlock”, for example:
1.The first table task is still running, and subsequent tasks cannot acquire the Hive table lock, leading to a timeout.
2.Delays in acquiring the Hive metastore lock, which also cause timeouts.
And so on.

In my latest changes:

1.Added detailed logs to show whether a lock acquisition failure was caused by a timeout or by another lock state.

paimon/paimon-hive/paimon-hive-catalog/src/main/java/org/apache/paimon/hive/HiveCatalogLock.java

Lines 116 to 120 in 3342ef9

    
           String msg = 
        
                   String.format( 
        
                           "for table %s.%s (lockId=%d) after %dms. Final lock state: %s", 
        
                           database, table, lockId, duration, lockState); 
        
           LOG.info("Acquire lock {}", msg);

2.Fixed an issue where lockResponse = clients.run(client -> client.checkLock(lockId)); would throw an exception and the lock would not be released, preventing subsequent tasks from acquiring the lock.

paimon/paimon-hive/paimon-hive-catalog/src/main/java/org/apache/paimon/hive/HiveCatalogLock.java

Lines 95 to 112 in 3342ef9

    
           try { 
        
               while (lockResponse.getState() == LockState.WAITING) { 
        
                   long elapsed = System.currentTimeMillis() - startMs; 
        
                   if (elapsed >= acquireTimeout) { 
        
                       break; 
        
                   } 
        
                   nextSleep = Math.min(nextSleep * 2, checkMaxSleep); 
        
                   Thread.sleep(nextSleep); 
        
                   lockResponse = clients.run(client -> client.checkLock(lockId)); 
        
               } 
        
           } finally { 
        
               if (lockResponse.getState() != LockState.ACQUIRED) { 
        
                   // unlock if not acquired 
        
                   unlock(lockId); 
        
               } 
        
           }

Tests

API and Format

Documentation

JingsongLi · 2025-12-10T03:12:21Z

paimon-hive/paimon-hive-catalog/src/main/java/org/apache/paimon/hive/HiveCatalogLock.java

-        long lockId = lock(database, table);
+        Long lockId = null;
        try {
+            lockId = lock(database, table);


Just for runWithLock method:
I cannot get it why modify here. What is difference?

There are many possible causes for “hive lock may encounter deadlock”, for example:

The first table task is still running, and subsequent tasks cannot acquire the Hive table lock, leading to a timeout.

Delays in acquiring the Hive metastore lock, which also cause timeouts.
And so on.

In my latest changes:

Added detailed logs to show whether a lock acquisition failure was caused by a timeout or by another lock state.

Fixed an issue where lockResponse = clients.run(client -> client.checkLock(lockId)); would throw an exception and the lock would not be released, preventing subsequent tasks from acquiring the lock.

JingsongLi · 2025-12-11T02:15:44Z

Can you update the Purpose in the description? I cannot get information in try lock method.

zhuangchong · 2025-12-11T04:28:02Z

Can you update the Purpose in the description? I cannot get information in try lock method.

done.

JingsongLi · 2025-12-11T10:09:33Z

Thanks @zhuangchong +1

zhuangchong added 2 commits December 9, 2025 20:51

Fix hive catalog lock may encounter deadlock.

6a91059

fix

ff43f07

JingsongLi reviewed Dec 10, 2025

View reviewed changes

fix

3342ef9

JingsongLi merged commit f3f7bd3 into apache:master Dec 11, 2025
22 checks passed

zhuangchong deleted the hive-catalog-lock branch December 11, 2025 11:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hive] Fix hive catalog lock may encounter deadlock. #6783

[hive] Fix hive catalog lock may encounter deadlock. #6783

Uh oh!

zhuangchong commented Dec 9, 2025 •

edited

Loading

Uh oh!

JingsongLi Dec 10, 2025

Uh oh!

zhuangchong Dec 10, 2025 •

edited

Loading

Uh oh!

JingsongLi commented Dec 11, 2025

Uh oh!

zhuangchong commented Dec 11, 2025

Uh oh!

JingsongLi commented Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	String msg =
	String.format(
	"for table %s.%s (lockId=%d) after %dms. Final lock state: %s",
	database, table, lockId, duration, lockState);
	LOG.info("Acquire lock {}", msg);

	try {
	while (lockResponse.getState() == LockState.WAITING) {
	long elapsed = System.currentTimeMillis() - startMs;
	if (elapsed >= acquireTimeout) {
	break;
	}

	nextSleep = Math.min(nextSleep * 2, checkMaxSleep);
	Thread.sleep(nextSleep);

	lockResponse = clients.run(client -> client.checkLock(lockId));
	}
	} finally {
	if (lockResponse.getState() != LockState.ACQUIRED) {
	// unlock if not acquired
	unlock(lockId);
	}
	}

[hive] Fix hive catalog lock may encounter deadlock. #6783

[hive] Fix hive catalog lock may encounter deadlock. #6783

Uh oh!

Conversation

zhuangchong commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Tests

API and Format

Documentation

Uh oh!

JingsongLi Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

zhuangchong Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingsongLi commented Dec 11, 2025

Uh oh!

zhuangchong commented Dec 11, 2025

Uh oh!

JingsongLi commented Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhuangchong commented Dec 9, 2025 •

edited

Loading

zhuangchong Dec 10, 2025 •

edited

Loading