Hive: make sure to unlock table when the finally block is not executed by smallx · Pull Request #3606 · apache/iceberg

smallx · 2021-11-25T05:22:24Z

Avoid that the finally code block used to unlock hive table may not be executed due to the following conditions:

System.exit(N)
all non-daemon threads exit

The old pr is #3000

smallx · 2021-11-25T05:27:31Z

@kbendick @pvary Can you help me to run github workflows, thanks.

pvary · 2021-11-30T10:52:01Z

hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java

+   * Also, to avoid memory leaks caused by addShutdownHook, we use a class level hook.
+   */
+  private static synchronized void initUnlockTableHook(ClientPool metaClients) {
+    globalMetaClients = metaClients;


What happens if you have multiple catalogs configured with different URI?
Having a global variable storing the clients could be a problem

+1. This is a use case that I know exists at some companies (multiple catalogs configured with different URIs).

Thanks. I've fixed it.

pvary · 2021-11-30T10:53:58Z

hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java

    LockResponse lockResponse = metaClients.run(client -> client.lock(lockRequest));
    AtomicReference<LockState> state = new AtomicReference<>(lockResponse.getState());
    long lockId = lockResponse.getLockid();
+    tableLocksById.put(lockId, fullName);


Maybe store the triplet, so we can unlock it:

lockId

fullName

metaClients

pvary · 2021-11-30T10:56:47Z

hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java

    if (lockId.isPresent()) {
      try {
        doUnlock(lockId.get());
+        tableLocksById.remove(lockId.get());


If there is a problem with unlock, this could cause a memory leak

Thanks. I've fixed it.

pvary · 2021-11-30T11:00:05Z

I am still wary of this change, as the issue seems quite rare and could be fixed by running Hive housekeeper thread which removes old locks without heartbeat. OTHO this introduces additional complexity and possible leaks, problems.

Could you explain the use-case which causes this issue, and help me understand why the other solutions are not sufficient?

Thanks,
Peter

kbendick · 2021-11-30T22:59:34Z

baseline.gradle

          '-Xep:EqualsGetClass:OFF',
          // patterns that are allowed
          '-Xep:MissingCasesInEnumSwitch:OFF',
+          '-Xep:ShutdownHook:OFF',


Instead of adding this, is it possible to add an annotation suppressing the error in just that one part of the code?

If not, then that's ok.

Thanks. I've fixed it.

smallx · 2021-12-05T14:26:46Z

Thanks @pvary @kbendick

Our HMS does not enable txn and housekeeper service, which is Hive's default behavior. When our streaming task has a fatal error, it will exit in another thread, which may lead to a small probability of lock non release.

At the same time, other users may also use System.exit(N) to exit in Spark application, although this is not recommended.

pvary · 2021-12-06T06:26:35Z

Our HMS does not enable txn and housekeeper service, which is Hive's default behavior.

May I know why the housekeeper threads are not enabled? This seems brittle to me, and forcing every user to clean up everything, everytime seems counterproductive.

Thanks, Peter

smallx · 2021-12-06T18:30:49Z

Thanks @pvary

We haven't encountered similar lock problems before. And we will also try housekeeper service.

If we can forwardly release the locks when the program is abnormal, it may be better for users who do not start the housekeeper threads.

When the program exits normally, tableLocksById should be empty, so this shutdown hook will do nothing. The lock cleaning action will be executed only when the program exits abnormally. If possible, I can also add a property to control whether to add shutdown hook.

smallx · 2022-01-13T13:24:40Z

Close this because it not be a general problem and can be solved by starting Hive housekeeper services. Thanks everyone.

Hive: make sure to unlock table when the finally block is not executed

a97e665

github-actions bot added build hive INFRA labels Nov 25, 2021

pvary reviewed Nov 30, 2021

View reviewed changes

kbendick reviewed Nov 30, 2021

View reviewed changes

fix code review

ef8b992

Merge branch 'master' into hive-lock-fix

5c3ae36

smallx closed this Jan 13, 2022

Conversation

smallx commented Nov 25, 2021

Uh oh!

smallx commented Nov 25, 2021

Uh oh!

pvary Nov 30, 2021

Choose a reason for hiding this comment

Uh oh!

kbendick Nov 30, 2021

Choose a reason for hiding this comment

Uh oh!

smallx Dec 5, 2021

Choose a reason for hiding this comment

Uh oh!

pvary Nov 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pvary Nov 30, 2021

Choose a reason for hiding this comment

Uh oh!

smallx Dec 5, 2021

Choose a reason for hiding this comment

Uh oh!

pvary commented Nov 30, 2021

Uh oh!

kbendick Nov 30, 2021

Choose a reason for hiding this comment

Uh oh!

smallx Dec 5, 2021

Choose a reason for hiding this comment

Uh oh!

smallx commented Dec 5, 2021

Uh oh!

pvary commented Dec 6, 2021

Uh oh!

smallx commented Dec 6, 2021

Uh oh!

smallx commented Jan 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pvary Nov 30, 2021 •

edited

Loading