-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17887. Remove the wrapper class GzipOutputStream #3377
Conversation
if (currentBufLen <= 0) { | ||
return compressedBytesWritten; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, found this bug. If we set an empty input to the compress stream, it will cause endless loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a test case for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without removing the condition, current test will timeout after removing GzipOutputStream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need another test for it? We currently have test coverage for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which test is failing before removing the condition? so it passes with the wrapper class but fails after?
Also, we can remove the currentBufLen
variable now since it is no longer used anywhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testGzipCodec
will cause timeout after removing GzipOutputStream
.
Because of this line:
codecTest(conf, seed, 0, "org.apache.hadoop.io.compress.GzipCodec");
It writes an empty input to the compress stream. Due to this currentBufLen
check, compress
will return 0 endlessly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's still good to have a dedicated test for this edge case. We can use @Test(timeout=<value>)
to check the timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see. let me add one then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one test for empty input case.
cc @sunchao |
@Test(timeout=20000) | ||
public void testGzipCompressorWithEmptyInput() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In current trunk, this test will cause:
org.junit.runners.model.TestTimedOutException: test timed out after 20000 milliseconds
at org.apache.hadoop.io.compress.TestCodec.testGzipCompressorWithEmptyInput(TestCodec.java:1076)
I did a microbenchmark by running 10 times of compressing/decompresing random data. The average time: After: 12.93s It is pretty close. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merged to trunk. Thanks @viirya for the contribution! |
As we provide built-in gzip compressor, we can use it in compressor stream. The wrapper
GzipOutputStream
can be removed now.BTW, I did a microbenchmark by running 10 times of compressing/decompresing random data.
The average time:
After: 12.93s
Before: 13.52s
It is pretty close.