-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make load hive table more robust #1687
make load hive table more robust #1687
Conversation
} | ||
|
||
protected void refreshFromMetadataLocation(String newLocation, Predicate<Exception> shouldRetry, | ||
int numRetries) { | ||
protected boolean refreshFromMetadataLocation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iceberg style is to put the new line only before the arguments that don't fit within the line width. So
foo( x, y,
z)
Not
foo (x,
y,
z)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
I have some general concerns before looking more closely, one thing that this PR would do is double the time to failure on misconfiguration which is already quite long. I'm also a little confused about the use case for this PR, are the two locations both equally valid? It seems like we don't wouldn't normally want to fall back to different location? |
hi~ @RussellSpitzer thanks for your concerns . about the use case for this PR iceberg/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java Lines 201 to 204 in 4ba48be
I want to make a improvment that trying the previous_metadata_location only if geting the metadata_location throws NotFoundException, and if the metadata_location is no found , I will set the previous_metadata_location value to it in case of both metadata_location are not found. |
2.the improvment of trying previous_location use quite a long time
if (refreshFromMetadataLocation(metadataLocation, previousMetadataLocation) && table != null) { | ||
Map<String, String> parameters = table.getParameters(); | ||
parameters.put(METADATA_LOCATION_PROP, previousMetadataLocation); | ||
persistTable(table, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces a write operation in the Iceberg read path of a table. I feel that it is not a good idea for readers to modify table metadata in the metastore. Would it be okay to not try to fix the table during reads? The readers can still probably use the previousMetadataLocation for loads. To solve the issue you see in #1688 would it better if the writer itself fixes the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Committing this change introduces a correctness problem. I don't think this should be committed. It is not safe to automatically roll back if there is a problem loading the current metadata file because the changes in the current metadata file are lost.
There are cases where the current metadata file may not be found. The S3A file system may throw
NotFoundException
because of negative caching in S3, even though the metadata file exists. If that happens, then this would silently roll back anything committed in the current metadata. There are other cases where this may happen as well and we simply don't know why the current metadata file was deleted. Assuming a specific cause is not a viable solution.The problem in #1688 is that the Hive table was updated, but the operation appeared to fail on the client. The client deletes the metadata file from the commit attempt so that the commit cannot be accidentally successful (the case you hit) because the client will throw an exception that will very likely result in a retry. For example, Flink appending files will keep trying to append the same files.
I think the right solution to the problem is to improve the handling in the commit path, not the read path. The commit path can load the table again and check whether the metadata file is the one it attempted to commit. If it is, then it can return success instead of deleting the metadata file.
@shardulm94 thanks for your suggestions .I'll try some improvments on writing.
Committing this change introduces a correctness problem. I don't think this should be committed. It is not safe to automatically roll back if there is a problem loading the current metadata file because the changes in the current metadata file are lost. There are cases where the current metadata file may not be found. The S3A file system may throw The problem in #1688 is that the Hive table was updated, but the operation appeared to fail on the client. The client deletes the metadata file from the commit attempt so that the commit cannot be accidentally successful (the case you hit) because the client will throw an exception that will very likely result in a retry. For example, Flink appending files will keep trying to append the same files. I think the right solution to the problem is to improve the handling in the commit path, not the read path. The commit path can load the table again and check whether the metadata file is the one it attempted to commit. If it is, then it can return success instead of deleting the metadata file. |
thanks @rdblue ,I will add some retry on commit and try checking whether the metadata file is the one it attempted to commit before deleting the metadata file . |
Fixes #1688
2 if we used previous_metadata_location on freshing operation ,we will set it to metadata_location in case of both metadataLocation no found when next fresh