New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3OutputStream - failure to close should persist on subsequent close calls #5311
Conversation
aws/src/test/java/org/apache/iceberg/aws/s3/TestS3OutputStream.java
Outdated
Show resolved
Hide resolved
aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
Outdated
Show resolved
Hide resolved
aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it feasible for us to fix this in BaseTaskWriter? It sounds like a failed "close" call should just stop the commit process. It feels like a bit of a workaround to have subsequent "close" calls throw an exception when we probably shouldn't be making subsequent "close" calls
I could be misunderstanding this though
I agree with @RussellSpitzer, I think we should avoid the double close, since that is what is causing the problem (at least as far as I understand). |
@rdblue @RussellSpitzer I agree close should be only called once and we are relying on that behavior quite strongly and adding the data files.
Let me know your thoughts. |
reverted changes to S3OutputStream to keep close api consistent |
After further discussion with @RussellSpitzer, brought back my changes to S3OutputStream. As failure to close a S3 stream leaves it in a bad state which cannot be recovered, any future calls to that stream should continue to fail. Changes now are simple and just in S3OutputStream. cc @rdblue |
add comments based on suggestion
…lose calls (apache#5311) (cherry picked from commit d44565b)
Fix for #5310 and #4168
Issue
When S3OutputStream fails to upload a file successfully on call to close due to some failure, IcebergStreamWriter in Flink still ends up adding the file to completedDataFiles from BaseTaskWriter resulting in table metadata pointing to a s3 data file which was never uploaded to s3.
Steps to Reproduce
Flink 1.14 pipeline with Iceberg 0.13
Customer implemented ProcessFunction<FlinkRecord, Row> function with catch all exceptions in processElement
configure pipeline to use S3FileIO and file size according to your test data so that the file will roll to new file
S3 failure on putObject(should be reproducible for MultiPartUpload as well) call to shouldRollToNewFile which calls close --> completeUploads
StackTrace from failure
Testing