-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hive: Fix concurrent transactions overwriting commits by adding hive lock heartbeats. #5036
Conversation
@szehon-ho and @RussellSpitzer, you may be interested in this one, too. |
@SinghAsDev: Quick question: Did you face a specific issue around this, or we are just fixing a long outstanding I am asking, because I have a feeling that the need for heartbeat is more like a theoretical issue than a concrete one. The code is like this:
The I was trying to find situations where we need heartbeat, and I was able to come up with these:
For the JVM pause situations the problem is still unlikely to happen (even without the heartbeat), and the heartbeating thread is also suspended in this case, so there is no guarantees that it will help. Since Hive 4 is around the release, we might try to add a All that said: using the default configuration on the HMS, the timeout is set to 300s. If we can not finish the commit in 5 mins, then we already are in serious trouble. |
Hi @pvary we did hit the issue that led to data corruption. I think a lot of platforms are still going to remain on older version of HMS, and may not want to increase the txn timeout. Tbh, we should have rolled back the transaction or at least fail the job when we failed to unlock the hive lock from the beginning. Many users may be hitting this issue without even realizing. Btw, there can be many more reasons for the transactions to take way longer than txn timeout. For instance, HMS does a fast scan for alter table operations even when stats collection is turned off. This fast stats lists all files from S3, and can take hours. Another example, would be a HMS pre commit hook taking long for whatever reason. I would argue that we can not safely make the assumption that transactions would finish within txn timeout. |
That's seriously bad and unnecessary. This would cause a seriously long commit times. |
We are on hive 1.2. But I think the issue still exists in later versions. I agree this is more of a bug. We had fixed this for our hms deployment. Btw, forgot to mention this issue is limited to unpartitioned tables. All iceberg tables are stored as unpartitioned tables on HMS, so iceberg tables are affected by this. |
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
Outdated
Show resolved
Hide resolved
After fixing this issue, you still see long |
Yea, though likely it won't be as frequent anymore. I think the point I am trying to make is that we can not guarantee that Iceberg commit would be able finish within hive.txn.timeout. You and I both listed a few reasons on why that can happen. I think heartbeats on locks is a must have, or we should fail the commit/ rollback the commit if unlock fails with lock not found. I prefer the former as it would really be a waste of compute to simply fail or rollback. |
return null; | ||
}); | ||
} catch (TException | InterruptedException e) { | ||
LOG.error("Fail to heartbeat for lock: {}", lockId, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this, but I am thinking about the situation where we find that the heartbeat is failed.
Shall we make sure that the persistTable
is not called, and return a CommitFailedException
?
Similarly - as you have already mentioned - we might want to throw a CommitStateUnknownException
if the unlock has failed. This is a little bit different, because the metastore_location might be set as the current transaction expects, but some concurrent transactions change might be lost in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I also think we should fail the commit on heart beat failure. However, to do so we would have to add retry on heart beats as well, so that transient failures don't fail the entire job. I will add that.
For the unlock failure handling, I think that needs a bit more discussion and if it is OK with you let's do that discussion in a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RetryingMetastoreClient does the retry for us. No need to do the retries by hand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yea, forgot about the niceties of RetryingMetastoreClient
. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @pvary , I was thinking about this more and I am wondering if throwing CommitFailedException would really be of any benefit. Within the acquired hive lock, the only expensive operation we do is alter table operation, which is a blocking call. Once issued, we won't be able to abort it, even in case heartbeat fails. Please correct me if I am missing something here. I think it would be make more sense to handle the lock not found during unlock operation and try to revert the commit or at least throw CommitStateUnknownException
. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SinghAsDev: We can do some elaborate stuff to throw CommitFailedException
if we are before the alter
operation, and we can throw CommitStateUnknownException
if we called the alter
command already, but I hope this would be a rare occurrence, so it might not worth the extra complexity. The question is how we will be able to communicate back to the original thread that the heartbeat is failed. Based on this the decision above could be an easy one, or a more complex one, and then we can decide which one to chose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pvary I agree that this added complexity does not really buy us much. All the lengthy operations I have seen that needed heartbeats are within the alter table operation, so very unlikely we will see heartbeat failure before alter operation. After alter table is done, any issue with lock, we will anyway see in the unlock operation. As such, I still think we likely don't need the added complexity. If we agree on this reasoning, I can add this info in the code as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. So we would likely to need a test where the failing heartbeat will fail the commit with the CommitStateUnknownException
. So It is fine if we have CommitStateUnknownException
for all of these cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test. Can you help take a look @pvary @szehon-ho , thanks!
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
Outdated
Show resolved
Hide resolved
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveMetastore.java
Outdated
Show resolved
Hide resolved
You have convinced me here, I just trying to collect the things that we have to fix in Hive to make this work as fast as possible 😄 |
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommitLocks.java
Show resolved
Hide resolved
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommitLocks.java
Outdated
Show resolved
Hide resolved
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
Show resolved
Hide resolved
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
Outdated
Show resolved
Hide resolved
@@ -43,8 +43,12 @@ public abstract class HiveMetastoreTest { | |||
@BeforeClass | |||
public static void startMetastore() throws Exception { | |||
HiveMetastoreTest.metastore = new TestHiveMetastore(); | |||
metastore.start(); | |||
HiveMetastoreTest.hiveConf = metastore.hiveConf(); | |||
if (HiveMetastoreTest.hiveConf != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand, how would this happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is happening before class is actually instantiated HiveMetastoreTest.hiveConf
can be null. I actually saw this in the unit test I added as part of this PR.
I think that would be awesome. |
The 0.14.0 release is coming soon. I'll try to include this if it is done in time. |
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
Show resolved
Hide resolved
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
Show resolved
Hide resolved
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
Outdated
Show resolved
Hide resolved
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
Show resolved
Hide resolved
hive-metastore/src/test/java/org/apache/iceberg/hive/HiveMetastoreTest.java
Show resolved
Hide resolved
hive-metastore/src/test/java/org/apache/iceberg/hive/HiveMetastoreTest.java
Outdated
Show resolved
Hide resolved
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommitLocks.java
Outdated
Show resolved
Hide resolved
@SinghAsDev: Could you please check why the test is failing? Also left some minor comments. Thanks, |
07895a4
to
f6356ef
Compare
@pvary mind looking again, thanks! |
hive-metastore/src/test/java/org/apache/iceberg/hive/HiveMetastoreTest.java
Outdated
Show resolved
Hide resolved
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
Outdated
Show resolved
Hide resolved
@pvary @szehon-ho @rdblue I think this is ready now? The one test failure is not related to this diff. I will force push to re-trigger the checks. |
ab7c26c
to
1360677
Compare
Thanks @SinghAsDev for the PR and @szehon-ho for the review! |
When using Hive Metastore catalog, it is possible that longer transactions can take longer than transaction timeout of the Hive Metastore (hive.txn.timeout or metastore.txn.timeout in the newer versions). In those cases, concurrent transactions can overwrite each other leading to data corruption. This change fixes concurrent transactions overwriting commits by adding hive lock heartbeats.