S3OutputStream - failure to close should persist on subsequent close calls #5311

abmo-x · 2022-07-20T07:32:00Z

Issue

When S3OutputStream fails to upload a file successfully on call to close due to some failure, IcebergStreamWriter in Flink still ends up adding the file to completedDataFiles from BaseTaskWriter resulting in table metadata pointing to a s3 data file which was never uploaded to s3.

Steps to Reproduce

Flink 1.14 pipeline with Iceberg 0.13
Customer implemented ProcessFunction<FlinkRecord, Row> function with catch all exceptions in processElement
- This is important as this is what leads to close() called twice from:
  - shouldRollToNewFile --> closeCurrent
  - close
configure pipeline to use S3FileIO and file size according to your test data so that the file will roll to new file
S3 failure on putObject(should be reproducible for MultiPartUpload as well) call to shouldRollToNewFile which calls close --> completeUploads

StackTrace from failure

org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator
	at org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:101)
	at org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:80)
	at org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:39)
	at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
	at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
	at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
	...
	at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
	at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:233)
	at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
	at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:496)
	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761)
	at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
	at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator
	at org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:101)
	at org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:80)
	at org.apache.flink.streaming.runtime.tasks.ChainingOutput.collect(ChainingOutput.java:39)
	at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
	at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
	at org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:38)
	at org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:99)
	... 21 more
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to load credentials from service endpoint.
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:98)
	at software.amazon.awssdk.auth.credentials.HttpCredentialsProvider.refreshCredentials(HttpCredentialsProvider.java:110)
	at software.amazon.awssdk.utils.cache.CachedSupplier.refreshCache(CachedSupplier.java:132)
	at software.amazon.awssdk.utils.cache.OneCallerBlocks.prefetch(OneCallerBlocks.java:38)
	at software.amazon.awssdk.utils.cache.CachedSupplier.prefetchCache(CachedSupplier.java:116)
	at software.amazon.awssdk.utils.cache.CachedSupplier.get(CachedSupplier.java:91)
	at java.base/java.util.Optional.map(Optional.java:265)
	at software.amazon.awssdk.auth.credentials.HttpCredentialsProvider.resolveCredentials(HttpCredentialsProvider.java:146)
	at software.amazon.awssdk.auth.credentials.AwsCredentialsProviderChain.resolveCredentials(AwsCredentialsProviderChain.java:85)
	at software.amazon.awssdk.auth.credentials.internal.LazyAwsCredentialsProvider.resolveCredentials(LazyAwsCredentialsProvider.java:45)
	at software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider.resolveCredentials(DefaultCredentialsProvider.java:104)
	at software.amazon.awssdk.awscore.client.handler.AwsClientHandlerUtils.createExecutionContext(AwsClientHandlerUtils.java:76)
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.createExecutionContext(AwsSyncClientHandler.java:68)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:97)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:167)
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:94)
	at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:55)
	at software.amazon.awssdk.services.s3.DefaultS3Client.putObject(DefaultS3Client.java:8350)
	at org.apache.iceberg.aws.s3.S3OutputStream.completeUploads(S3OutputStream.java:396)
	at org.apache.iceberg.aws.s3.S3OutputStream.close(S3OutputStream.java:256)
	at org.apache.parquet.io.DelegatingPositionOutputStream.close(DelegatingPositionOutputStream.java:38)
	at org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:1106)
	at org.apache.iceberg.parquet.ParquetWriter.close(ParquetWriter.java:239)
	at org.apache.iceberg.io.DataWriter.close(DataWriter.java:82)
	at org.apache.iceberg.io.BaseTaskWriter$BaseRollingWriter.closeCurrent(BaseTaskWriter.java:288)
	at org.apache.iceberg.io.BaseTaskWriter$BaseRollingWriter.write(BaseTaskWriter.java:254)
	at org.apache.iceberg.io.PartitionedFanoutWriter.write(PartitionedFanoutWriter.java:58)
	at org.apache.iceberg.flink.sink.IcebergStreamWriter.processElement(IcebergStreamWriter.java:74)
	at org.apache.flink.streaming.runtime.tasks.ChainingOutput.pushToOperator(ChainingOutput.java:99)
	... 27 more
Caused by: software.amazon.awssdk.core.exception.SdkServiceException: Unauthorized

Pipeline should keep running even on above failure, then snapshot barrier gets triggered
- This calls close and ends up adding the datafile which was never uploaded to S3

Testing

Unit tests added
Testing on our dev pipeline, will update the results after the pipeline runs for a little bit

…calls to close

core/src/test/java/org/apache/iceberg/io/TestDataWriter.java

aws/src/test/java/org/apache/iceberg/aws/s3/TestS3OutputStream.java

aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java

RussellSpitzer

Is it feasible for us to fix this in BaseTaskWriter? It sounds like a failed "close" call should just stop the commit process. It feels like a bit of a workaround to have subsequent "close" calls throw an exception when we probably shouldn't be making subsequent "close" calls

I could be misunderstanding this though

rdblue · 2022-07-20T20:45:22Z

I agree with @RussellSpitzer, I think we should avoid the double close, since that is what is causing the problem (at least as far as I understand).

core/src/test/java/org/apache/iceberg/io/TestDataWriter.java

abmo-x · 2022-07-20T22:38:40Z

@rdblue @RussellSpitzer
Added a commit to clear currentWriter on close in BaseTaskWriter and added 2 test cases around failure to close and complete.

I agree close should be only called once and we are relying on that behavior quite strongly and adding the data files.
However I found that the writers are held and closed more than once in various scenarios which causes this issue where a close resulted in failure and writers end up in a bad state:

when user defined functions catch all exceptions and ignore failures on write as seen in Flink's processElement which internally triggers a roll to new file.
This behavior was also observed before other than BaseTaskWriter and fix was made to not close already closed stream in AWS: fix bugs around using S3FileIO for table operations #1749

Let me know your thoughts.

abmo-x · 2022-07-25T21:31:35Z

reverted changes to S3OutputStream to keep close api consistent

core/src/main/java/org/apache/iceberg/io/DataWriter.java

core/src/test/java/org/apache/iceberg/io/TestDataWriter.java

abmo-x · 2022-07-26T21:28:56Z

After further discussion with @RussellSpitzer, brought back my changes to S3OutputStream.

As failure to close a S3 stream leaves it in a bad state which cannot be recovered, any future calls to that stream should continue to fail. Changes now are simple and just in S3OutputStream.

cc @rdblue

…calls to close

add comments based on suggestion

…lose calls (apache#5311)

…lose calls (#5311)

…lose calls (apache#5311) (cherry picked from commit d44565b)

Abid Mohammed added 3 commits July 19, 2022 23:50

DataWriter - failure to close should not create a dataFile on future …

8307d83

…calls to close

remove test case which was not needed

a29b6f0

fix spelling

f91928e

github-actions bot added AWS core labels Jul 20, 2022