Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-17887. Remove the wrapper class GzipOutputStream #3377

Merged
merged 3 commits into from
Sep 9, 2021

Conversation

viirya
Copy link
Member

@viirya viirya commented Sep 3, 2021

As we provide built-in gzip compressor, we can use it in compressor stream. The wrapper GzipOutputStream can be removed now.

BTW, I did a microbenchmark by running 10 times of compressing/decompresing random data.

The average time:

After: 12.93s
Before: 13.52s

It is pretty close.

Comment on lines -89 to -92
if (currentBufLen <= 0) {
return compressedBytesWritten;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, found this bug. If we set an empty input to the compress stream, it will cause endless loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test case for this?

Copy link
Member Author

@viirya viirya Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without removing the condition, current test will timeout after removing GzipOutputStream.

Copy link
Member Author

@viirya viirya Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need another test for it? We currently have test coverage for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which test is failing before removing the condition? so it passes with the wrapper class but fails after?

Also, we can remove the currentBufLen variable now since it is no longer used anywhere else.

Copy link
Member Author

@viirya viirya Sep 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testGzipCodec will cause timeout after removing GzipOutputStream.

Because of this line:

codecTest(conf, seed, 0, "org.apache.hadoop.io.compress.GzipCodec");

It writes an empty input to the compress stream. Due to this currentBufLen check, compress will return 0 endlessly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's still good to have a dedicated test for this edge case. We can use @Test(timeout=<value>) to check the timeout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see. let me add one then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one test for empty input case.

@viirya
Copy link
Member Author

viirya commented Sep 3, 2021

cc @sunchao

Comment on lines +1056 to +1057
@Test(timeout=20000)
public void testGzipCompressorWithEmptyInput() throws IOException {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current trunk, this test will cause:

org.junit.runners.model.TestTimedOutException: test timed out after 20000 milliseconds         
        at org.apache.hadoop.io.compress.TestCodec.testGzipCompressorWithEmptyInput(TestCodec.java:1076)                                                                                       

@viirya
Copy link
Member Author

viirya commented Sep 3, 2021

I did a microbenchmark by running 10 times of compressing/decompresing random data.

The average time:

After: 12.93s
Before: 13.52s

It is pretty close.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 58s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 33m 39s trunk passed
+1 💚 compile 22m 46s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 compile 19m 24s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 checkstyle 1m 1s trunk passed
+1 💚 mvnsite 1m 32s trunk passed
+1 💚 javadoc 1m 1s trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 36s trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 2m 27s trunk passed
+1 💚 shadedclient 18m 19s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 56s the patch passed
+1 💚 compile 21m 58s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javac 21m 58s the patch passed
+1 💚 compile 19m 25s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 javac 19m 25s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 1m 0s /results-checkstyle-hadoop-common-project_hadoop-common.txt hadoop-common-project/hadoop-common: The patch generated 1 new + 61 unchanged - 2 fixed = 62 total (was 63)
+1 💚 mvnsite 1m 29s the patch passed
+1 💚 javadoc 1m 0s the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
+1 💚 javadoc 1m 37s the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
+1 💚 spotbugs 2m 34s the patch passed
+1 💚 shadedclient 18m 45s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 6s hadoop-common in the patch passed.
+1 💚 asflicense 0m 49s The patch does not generate ASF License warnings.
189m 16s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3377/3/artifact/out/Dockerfile
GITHUB PR #3377
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux de8f4df59f6d 4.15.0-153-generic #160-Ubuntu SMP Thu Jul 29 06:54:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 3442e67
Default Java Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3377/3/testReport/
Max. process+thread count 2993 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3377/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus Sep 5, 2021
@apache apache deleted a comment from hadoop-yetus Sep 5, 2021
Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sunchao sunchao merged commit e708836 into apache:trunk Sep 9, 2021
@sunchao
Copy link
Member

sunchao commented Sep 9, 2021

Merged to trunk. Thanks @viirya for the contribution!

kiran-maturi pushed a commit to kiran-maturi/hadoop that referenced this pull request Nov 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants