Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK repeatedly complaining "Not all bytes were read from the S3ObjectInputStream" #1211

Closed
steveloughran opened this issue Jun 27, 2017 · 34 comments
Assignees
Labels
guidance Question that needs advice or information.

Comments

@steveloughran
Copy link

With a recent upgrade to the 1.11.134 SDK, tests seeking around a large CSV file is triggering a large set of repeated warnings about closing the stream early.

2017-06-27 15:47:05,121 [ScalaTest-main-running-S3ACSVReadSuite] WARN  internal.S3AbortableInputStream (S3AbortableInputStream.java:close(163)) - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
2017-06-27 15:47:06,730 [ScalaTest-main-running-S3ACSVReadSuite] WARN  internal.S3AbortableInputStream (S3AbortableInputStream.java:close(163)) - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.

full log

I've seen other people complaining about this, with the issue being closed as "user gets to fix their code", or similar. However, here's why I believe the system is overreacting: it is making invalid assumptions about the remaining amount of data & cost of read vs abort & reconnect, and it is also failing to note that it's previously warned of this.

Hadoop supports reading in files of tens to hundreds of GB, with the client code assuming its a Posix input stream where seek() is inefficient, but less inefficient than reading.

The first S3A release did call close(), rather than abort(), leading to HADOOP-11570: seeking being pathologically bad on a long input stream. What we now do, HADOOP-13047, is provide a tunable threshold for forward seeks where we skip bytes rather than seek. The default is 64K, for long-haul links a value of 512K works better. But above 512K, even over a long-haul connect, it is better to set up a new HTTPS connection than try and reuse an existing HTTP/1.1 connection. Which is what we do.

only now, every time it happens a message appears in the log, "This is likely an error". It's not, its exactly what we want to do based on our benchmarking of IO performance. We do have a faster IO mechanism when users explicitly want random access, but as thats pathological on non-seeking file reads, it's not on by default.

I'm covering this in HADOOP-14596; I think we'll end up configuring log4j so that, even in production clusters, warning messages from S3AbortableInputStream are not logged. This is a bit dangerous.

Here are some ways which our logs could be improved without having to be so drastic.

  1. Look at the amount of remaining data before warning of suboptimal performance. If it's more than some threshold (64K? 128K?) then don't complain. Instead consider that the caller has made the optimal choice.
  2. Remember that it's been warned once & not bother repeating it on every single seek() call.

thanks.

@steveloughran
Copy link
Author

related to #1111, #1203. Also surfacing in IMPALA-5387.

@shorea
Copy link
Contributor

shorea commented Jun 27, 2017

Is it possible to explicitly call abort if it's the optimal choice for the circumstances? If you call abort then the warning message is not logged, and for cases where you do want to reuse the connection you can just drain the stream.

BTW close when there are bytes left in the stream does abort the underlying connection. This was a change we made in 1.11.x.

@steveloughran
Copy link
Author

OK, so that's the big change...it used to stream through but now it aborts. That's a major change. We can fix that. Will a skip(EOF-pos) do it?

@dagnir
Copy link
Contributor

dagnir commented Jun 27, 2017

@steveloughran Yes, that should be sufficient to drain the stream.

@steveloughran
Copy link
Author

OK, We'll do that, plus also

  • add count of bytes remaining in final stream close to the instrumentation we track per stream (and coalesce into the whole FS instance after)
  • add in our troubleshooting what to do if you see the log message.
    It'd still be good if it only got logged once though.

@steveloughran
Copy link
Author

Current status: if we drain the stream a lot of the messages go away, but we still seem to get told off on an abort() call, even though that's an explicit request to break the connection

testRandomIONormalPolicy: Random IO with policy "normal"

2017-06-28 15:24:21,013 [JUnit-testRandomIONormalPolicy] DEBUG s3a.S3AInputStream (S3AInputStream.java:reopen(148)) - reopen(s3a://landsat-pds/scene_list.gz) for read from new offset range[2097152-45603307], length=131072, streamPosition=0, nextReadPosition=2097152
2017-06-28 15:24:21,895 [JUnit-testRandomIONormalPolicy] DEBUG s3a.S3AInputStream (S3AInputStream.java:closeStream(456)) - Closing stream seekInStream(): soft
2017-06-28 15:24:21,896 [JUnit-testRandomIONormalPolicy] DEBUG s3a.S3AInputStream (S3AInputStream.java:closeStream(486)) - Aborting stream
2017-06-28 15:24:21,896 [JUnit-testRandomIONormalPolicy] WARN  internal.S3AbortableInputStream (S3AbortableInputStream.java:close(163)) - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
2017-06-28 15:24:21,897 [JUnit-testRandomIONormalPolicy] DEBUG s3a.S3AInputStream (S3AInputStream.java:closeStream(490)) - Stream s3a://landsat-pds/scene_list.gz aborted: seekInStream(); remaining=43375083 streamPos=2228224, nextReadPos=131072, request range 2097152-45603307 length=45603307
2017-06-28 15:24:21,897 [JUnit-testRandomIONormalPolicy] DEBUG s3a.S3AInputStream (S3AInputStream.java:reopen(148)) - reopen(s3a://landsat-pds/scene_list.gz) for read from new offset range[131072-45603307], length=131072, streamPosition=131072, nextReadPosition=131072
2017-06-28 15:24:23,227 [JUnit-testRandomIONormalPolicy] DEBUG s3a.S3AInputStream (S3AInputStream.java:closeStream(456)) - Closing stream seekInStream(): soft
2017-06-28 15:24:23,227 [JUnit-testRandomIONormalPolicy] DEBUG s3a.S3AInputStream (S3AInputStream.java:closeStream(486)) - Aborting stream
2017-06-28 15:24:23,228 [JUnit-testRandomIONormalPolicy] WARN  internal.S3AbortableInputStream (S3AbortableInputStream.java:close(163)) - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
2017-06-28 15:24:23,229 [JUnit-testRandomIONormalPolicy] DEBUG s3a.S3AInputStream (S3AInputStream.java:closeStream(490)) - Stream s3a://landsat-pds/scene_list.gz aborted: seekInStream(); remaining=45341163 streamPos=262144, nextReadPos=5242880, request range 131072-45603307 length=45603307
2017-06-28 15:24:23,229 [JUnit-testRandomIONormalPolicy] DEBUG s3a.S3AInputStream (S3AInputStream.java:reopen(148)) - reopen(s3a://landsat-pds/scene_list.gz) for read from new offset range[5242880-45603307], length=65536, streamPosition=5242880, nextReadPosition=5242880
2017-06-28 15:24:24,226 [JUnit-testRandomIONormalPolicy] DEBUG s3a.S3AInputStream (S3AInputStream.java:closeStream(456)) - Closing stream seekInStream(): soft
2017-06-28 15:24:24,228 [JUnit-testRandomIONormalPolicy] DEBUG s3a.S3AInputStream (S3AInputStream.java:closeStream(486)) - Aborting stream
2017-06-28 15:24:24,228 [JUnit-testRandomIONormalPolicy] WARN  internal.S3AbortableInputStream (S3AbortableInputStream.java:close(163)) - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
2017-06-28 15:24:24,229 [JUnit-testRandomIONormalPolicy] DEBUG s3a.S3AInputStream (S3AInputStream.java:closeStream(490)) - Stream s3a://landsat-pds/scene_list.gz aborted: seekInStream(); remaining=40294891 streamPos=5308416, nextReadPos=1048576, request range 5242880-45603307 length=45603307
2017-06-28 15:24:24,229 [JUnit-testRandomIONormalPolicy] DEBUG s3a.S3AInputStream (S3AInputStream.java:reopen(148)) - reopen(s3a://landsat-pds/scene_list.gz) for read from new offset range[1048576-45603307], length=1048576, streamPosition=1048576, nextReadPosition=1048576
2017-06-28 15:24:26,310 [JUnit-testRandomIONormalPolicy] INFO  contract.ContractTestUtils (ContractTestUtils.java:end(1511)) - Duration of Time to execute 4 reads of total size 1376256 bytes: 5,293,705,537 nS
2017-06-28 15:24:26,311 [JUnit-testRandomIONormalPolicy] DEBUG s3a.S3AInputStream (S3AInputStream.java:closeStream(456)) - Closing stream close() operation: soft
2017-06-28 15:24:26,311 [JUnit-testRandomIONormalPolicy] DEBUG s3a.S3AInputStream (S3AInputStream.java:closeStream(486)) - Aborting stream
2017-06-28 15:24:26,311 [JUnit-testRandomIONormalPolicy] WARN  internal.S3AbortableInputStream (S3AbortableInputStream.java:close(163)) - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
2017-06-28 15:24:26,311 [JUnit-testRandomIONormalPolicy] DEBUG s3a.S3AInputStream (S3AInputStream.java:closeStream(490)) - Stream s3a://landsat-pds/scene_list.gz aborted: close() operation; remaining=43506155 streamPos=2097152, nextReadPos=0, request range 1048576-45603307 length=45603307
2017-06-28 15:24:26,312 [JUnit-testRandomIONormalPolicy] INFO  scale.ITestS3AInputStreamPerformance (ITestS3AInputStreamPerformance.java:logTimePerIOP(203)) - Time per byte read: 3,846 nS
2017-06-28 15:24:26,312 [JUnit-testRandomIONormalPolicy] INFO  scale.ITestS3AInputStreamPerformance (ITestS3AInputStreamPerformance.java:executeRandomIO(464)) - Effective bandwidth 0.247936 MB/S
2017-06-28 15:24:26,313 [JUnit-testRandomIONormalPolicy] INFO  scale.ITestS3AInputStreamPerformance (ITestS3AInputStreamPerformance.java:logStreamStatistics(365)) - Stream Statistics
StreamStatistics{OpenOperations=4, CloseOperations=4, Closed=0, Aborted=4, SeekOperations=2, ReadExceptions=0, ForwardSeekOperations=0, BackwardSeekOperations=2, BytesSkippedOnSeek=0, BytesBackwardsOnSeek=6356992, BytesRead=1376256, BytesRead excluding skipped=1376256, ReadOperations=159, ReadFullyOperations=4, ReadsIncomplete=155, BytesReadInClose=0, BytesDiscardedInAbort=172517292}
2017-06-28 15:24:26,313 [JUnit-testRandomIONormalPolicy] INFO  contract.AbstractFSContractTestBase (AbstractFSContractTestBase.java:describe(255)) - cleanup
2017-06-28 15:24:26,545 [JUnit-testRandomIONormalPolicy] INFO  contract.AbstractFSContractTestBase (AbstractFSContractTestBase.java:describe(255)) - closing file system

@steveloughran
Copy link
Author

Here's what I suspect is happening. We've invoked abort(), but a stream close() is still being called (somewhere) and that's triggering the warning, even though the stream is now aborted & shouldn't be considered live.

I think maybe you should see if the operations in 'abort()` are complete (set the http channel to null? the input stream? & make sure its in sync with the expectations of the warning code in close().

There's nothing else we can do in our code. I'm draining the stream on a close when the remaining bytes is in our configured skip range.

(there's one flaw in my hypothesis: after we call 'abort(), we *dont* call close()`.

@steveloughran
Copy link
Author

It's happening in your abort call

    @Override
    public void abort() {
        if (httpRequest != null) {
            httpRequest.abort();
        }
        IOUtils.closeQuietly(in, null);    // **HERE**
    }

think you should catch the special case "this is an abort so don't worry about it" & not log.

@shorea
Copy link
Contributor

shorea commented Jun 28, 2017

Ah I see, yeah we can fix that. Thanks for reporting.

@dagnir
Copy link
Contributor

dagnir commented Jul 20, 2017

This was fixed in 1.11.163. Going to go ahead and close this.

@dagnir dagnir closed this as completed Jul 20, 2017
@dado0583
Copy link

I'm still getting this error in 1.11.182. I'm using AWS-SDK as the mechanism to access S3 within Apache Spark. Has anyone else seen this issue? Spark is pulling down a 640Kb test file and throwing up this error.

17/08/25 23:11:32 WARN S3AbortableInputStream: Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.

@steveloughran
Copy link
Author

you will see this at the tail of reads in S3A if the version of hadoop you are using doesn't have HADOOP-14596 in to drain the stream at the end of any non-aborted cloeStream() call.

Which version of the Hadoop JARs are on your CP? There have been no ASF releases with this patch (indeed, no ASF releases built & tested against 1.11.x, AFAIK)

@mirosval
Copy link

I'm still seeing this issue, I call s3Object.getObjectContent.abort(), but I get the warning on "com.amazonaws" % "aws-java-sdk-s3" % "1.11.200". It appears to be coming exactly from the place @steveloughran pointed out above.

@shorea
Copy link
Contributor

shorea commented Sep 22, 2017

Is the call to abort throwing an exception. Looks like we set the flag after we try to abort so if it fails we may still log the warning.

@steveloughran
Copy link
Author

If I apply the HADOOP-14890 patch to update the AWS SDK for Hadoop, the message goes away from hadoop-3. That means the abort() path is silent; we're making sure we drain the input stream when we want to stop reading before the end of the stream, but not actually force an abort. I'm happy...just need to get the SDK update in.

@tudor
Copy link

tudor commented Sep 25, 2017

@shorea: The error might be here:

// The default abort() implementation calls abort on the wrapped stream

The comment says "The default abort() implementation calls abort on the wrapped stream if it's an SdkFilterInputStream; otherwise we'll need to close the stream." but the code doesn't actually do that -- it calls close() if the wrapped stream is not a SdkFilterInputStream, and does nothing if the wrapped stream is a SdkFilterInputStream.

cc: @dhruba

@tnorthrup
Copy link

The S3ObjectInputStream class extends SdkFilterInputStream and the abort() in S3ObjectInputStream performs super.abort() which is the default abort() implementation that aborts the wrapped stream as described.

@davidvandebunte
Copy link

I'm in 1.11.221, and I'm seeing the IOUtils.closeQuietly at the end of S3ObjectInputStream.abort printing this warning. The "aborted" flag is just not getting set correctly, it's hard to track what the intention was to get it set though.

@shorea
Copy link
Contributor

shorea commented Nov 2, 2017

@davidvandebunte would you happen to have a reproducible code sample for this problem?

@davidvandebunte
Copy link

davidvandebunte commented Nov 3, 2017

@shorea Try HelloWorld.java.txt

@jhovell
Copy link

jhovell commented Dec 7, 2017

Also experiencing this with version 1.11.215:

// try-with-resources auto-close adds a 2nd WARN to log
try (final S3ObjectInputStream is = amazonS3.getObject(new GetObjectRequest(bucketName,
                    keyName)).getObjectContent()){
            myService.process(is); // read most of the input stream, but not all -it's a ZIP archive and I read 1 file inside the ZIP archive (which is ~99% of the file in my case)
            is.abort(); // adds a 3rd WARN to the log
        }

So by closing the input stream I get a 2nd warning to the log. And then by explicitly calling abort I get a 3rd WARN message to the log.

Workaround will be to consume rest of input stream using another library but there ought to be a way to suppress this warning. Also it would very helpful if the WARN message included a stack trace or some other meta data as it took a while to figure out what code was generating this warn message.

@steveloughran
Copy link
Author

@jhovell if you look @ what we do in Hadoop's S3A connector, if we aren't that far from the EOF, we just skip() to the end. Further away (which means a few hundred KB) and we call the abort, accept that the error text can get printed & not worry about it. I think we may have some coverage in our troubleshooting docs...if not something should go in.

FWIW, libraries trying to be "helpful" by downgrading close() to read-to-EOF, and abort to close() in order to improve HTTP connection pooling are a recurrent problem with Hadoop object store connectors. When you have a 16GB file and want to stop reading 32 MB in, abort() is absolutely the correct strategy, and we know exactly what we are doing, at least until HTTP/2 becomes the transport

@okulkarni-weekendr
Copy link

@ALL has there been any progress on this ticket yet?

@innobead
Copy link

innobead commented May 2, 2018

Any update? still encountered this issue.

@tekumara
Copy link

tekumara commented Jun 5, 2018

Fixed for me in hadoop-aws 2.9.0 and later.

@steveloughran
Copy link
Author

yes, fixed on 2.9 and 3.0 with the AWS update of HADOOP-14890

@atorstling
Copy link

atorstling commented Jun 28, 2018

I'm getting this warning when aborting on 1.11.355. I think it's a regression or incomplete fix. I filed a new bug: #1657

@metabrain
Copy link

metabrain commented Jul 19, 2018

Still happening on 1.11.368

@shorea
Copy link
Contributor

shorea commented Jul 19, 2018

@metabrain we have a small fix per #1657. This should be released today.

@PowerToThePeople111
Copy link

PowerToThePeople111 commented Jul 31, 2018

Hi,

I still got the following message:

18/07/31 18:27:09 WARN S3AbortableInputStream: Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
18/07/31 18:27:14 WARN S3AbortableInputStream: Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.

I only checked for a sample, but the tasks producing the logs containing those messages were also the ones which only complete after an unreasonable amount of time.

@echokk11
Copy link

echokk11 commented Dec 3, 2018

1.11.435 happens too

@steveloughran
Copy link
Author

For people like @PowerToThePeople111 who are seeing this w/ S3A, download cloudstore, run storediag with the -j option to see what JARs are actually on your CP. We shouldn't see this on recent Hadoop releases, e.g 3.1+.

@naveen09
Copy link

naveen09 commented Oct 7, 2019

Experiencing the same in 1.11.595

WARN S3AbortableInputStream:178 - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.

@steveloughran
Copy link
Author

@naveen09 -you need to explicitly drain the bytes in the stream, call abort() or do a ranged get. It should not be showing this message on an explicit call to abort()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guidance Question that needs advice or information.
Projects
None yet
Development

No branches or pull requests