Hive: make sure to unlock table when the finally block is not executed#3606
Hive: make sure to unlock table when the finally block is not executed#3606smallx wants to merge 3 commits intoapache:masterfrom
Conversation
| * Also, to avoid memory leaks caused by addShutdownHook, we use a class level hook. | ||
| */ | ||
| private static synchronized void initUnlockTableHook(ClientPool metaClients) { | ||
| globalMetaClients = metaClients; |
There was a problem hiding this comment.
What happens if you have multiple catalogs configured with different URI?
Having a global variable storing the clients could be a problem
There was a problem hiding this comment.
+1. This is a use case that I know exists at some companies (multiple catalogs configured with different URIs).
There was a problem hiding this comment.
Thanks. I've fixed it.
| LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest)); | ||
| AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState()); | ||
| long lockId = lockResponse.getLockid(); | ||
| tableLocksById.put(lockId, fullName); |
There was a problem hiding this comment.
Maybe store the triplet, so we can unlock it:
- lockId
- fullName
- metaClients
| if (lockId.isPresent()) { | ||
| try { | ||
| doUnlock(lockId.get()); | ||
| tableLocksById.remove(lockId.get()); |
There was a problem hiding this comment.
If there is a problem with unlock, this could cause a memory leak
There was a problem hiding this comment.
Thanks. I've fixed it.
|
I am still wary of this change, as the issue seems quite rare and could be fixed by running Hive housekeeper thread which removes old locks without heartbeat. OTHO this introduces additional complexity and possible leaks, problems. Could you explain the use-case which causes this issue, and help me understand why the other solutions are not sufficient? Thanks, |
baseline.gradle
Outdated
| '-Xep:EqualsGetClass:OFF', | ||
| // patterns that are allowed | ||
| '-Xep:MissingCasesInEnumSwitch:OFF', | ||
| '-Xep:ShutdownHook:OFF', |
There was a problem hiding this comment.
Instead of adding this, is it possible to add an annotation suppressing the error in just that one part of the code?
If not, then that's ok.
There was a problem hiding this comment.
Thanks. I've fixed it.
|
Our HMS does not enable txn and housekeeper service, which is Hive's default behavior. When our streaming task has a fatal error, it will exit in another thread, which may lead to a small probability of lock non release. At the same time, other users may also use |
May I know why the housekeeper threads are not enabled? This seems brittle to me, and forcing every user to clean up everything, everytime seems counterproductive. Thanks, Peter |
|
Thanks @pvary We haven't encountered similar lock problems before. And we will also try housekeeper service. If we can forwardly release the locks when the program is abnormal, it may be better for users who do not start the housekeeper threads. When the program exits normally, |
|
Close this because it not be a general problem and can be solved by starting Hive housekeeper services. Thanks everyone. |
Avoid that the
finallycode block used to unlock hive table may not be executed due to the following conditions:System.exit(N)The old pr is #3000