Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-17125. Using snappy-java in SnappyCodec #2297

Merged
merged 24 commits into from
Oct 6, 2020

Conversation

viirya
Copy link
Member

@viirya viirya commented Sep 10, 2020

See https://issues.apache.org/jira/browse/HADOOP-17125 for details.

Offline discussed with @dbtsai and submitted this based on #2201.

@@ -1,166 +0,0 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per #2201 (comment) Are those native code used in hadoop-mapreduce-client-nativetask? If so, we probably need to keep it now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, because we remove native method in java files, I think we don't generate .h file needed for compilation: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2297/1/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt

[WARNING] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2297/src/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/snappy/SnappyDecompressor.c:32:10: fatal error: org_apache_hadoop_io_compress_snappy_SnappyDecompressor.h: No such file or directory
[WARNING]  #include "org_apache_hadoop_io_compress_snappy_SnappyDecompressor.h"
[WARNING]           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[WARNING] compilation terminated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I don't see they are used in hadoop-mapreduce-client-nativetask if I don't miss it. Let's wait the build and test.

@dbtsai
Copy link
Member

dbtsai commented Sep 10, 2020

Thanks @viirya for taking over my #2201 , and continue working on it.

@dbtsai
Copy link
Member

dbtsai commented Sep 11, 2020

The only test failure in

TestSnappyCompressorDecompressor.testSnappyDirectBlockCompression

, I guess it's because in SnappyDirectBlockCompression, the compressedByteBuffer is in read mode already, so we don't need to change it to read mode in the decompressBytesDirect().

@viirya
Copy link
Member Author

viirya commented Sep 11, 2020

@dbtsai Yeah, let me look at it today. Hope to pass all tests soon.

@viirya
Copy link
Member Author

viirya commented Sep 11, 2020

@sunchao I think all tests are passed. But there are two -1, do you know what it means?

@sunchao
Copy link
Member

sunchao commented Sep 11, 2020

@sunchao I think all tests are passed. But there are two -1, do you know what it means?

@viirya looks like the gcc compilation or check style failed - you can check test results for cc

@viirya
Copy link
Member Author

viirya commented Sep 11, 2020

@sunchao Thanks. I saw a check style failure:

./hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/snappy/TestSnappyCompressorDecompressor.java:355:    int[] size = { 4 * 1024, 64 * 1024, 128 * 1024, 1024 * 1024 };:18: '{' is followed by whitespace. [NoWhitespaceAfter]

But I don't change its style in the diff.

@viirya
Copy link
Member Author

viirya commented Sep 11, 2020

@sunchao
Copy link
Member

sunchao commented Sep 11, 2020

Yes. I also don't see any error in the log files. I think we can check Yetus repo to see how it decides whether it is a -1 or -0.

cc @jojochuang @aajisaka @steveloughran do you have any idea what caused the CI failure here?

@sunchao
Copy link
Member

sunchao commented Sep 11, 2020

BTW I think we no longer need the -Drequire.snappy flag and REQUIRE_SNAPPY anymore with this right?

@viirya
Copy link
Member Author

viirya commented Sep 11, 2020

BTW I think we no longer need the -Drequire.snappy flag and REQUIRE_SNAPPY anymore with this right?

Yes, I plan to remove them once we get rid of -1.

@viirya
Copy link
Member Author

viirya commented Sep 13, 2020

Checked the cc warnings and the related code. They are committed long time ago, e.g., 2014, and not touched here. Many of the cc warnings are warning: dynamic exception specifications are deprecated in C++11 [-Wdeprecated]. I guess it is either due to that we didn't check such warnings when the code was committed, or compilation tools upgrade? I think it is not caused by this change. Because we removed some .c and .h files, so the CI build triggered related building.

So I am not sure which one is good, fixing these compilation warnings, or ignoring them?

@sunchao
Copy link
Member

sunchao commented Sep 13, 2020

So I am not sure which one is good, fixing these compilation warnings, or ignoring them?

Yeah. Looks to me we can just ignore these for now and proceed to other things in this PR.

@viirya
Copy link
Member Author

viirya commented Sep 14, 2020

Looks like CI failed to fetch and install yetus? @sunchao do you know how we can re-trigger CI build and testing?

@sunchao
Copy link
Member

sunchao commented Sep 14, 2020

Just re-triggered the job let's see what happens

@viirya
Copy link
Member Author

viirya commented Sep 14, 2020

It seems still failed to fetch and install yetus, and not just this PR, other PRs also encountered it...

@viirya
Copy link
Member Author

viirya commented Sep 14, 2020

@sunchao Who we should let them know about the CI issue?

@sunchao
Copy link
Member

sunchao commented Sep 14, 2020

@viirya interesting ... I think you can send an email to the Hadoop dev list (common-dev@hadoop.apache.org, you may need to subscribe first).

@viirya
Copy link
Member Author

viirya commented Sep 15, 2020

OK, seems the CI is working now.

@viirya
Copy link
Member Author

viirya commented Sep 15, 2020

I have run a benchmark and compatibility test locally. I use SnappyCodec to write and read a ~200MB SequenceFile. Before and after this change, the performance is nearly the same.

For compatibility test, I write SequenceFile using two SnappyCodec and read it back using each other. The file can be read without problem. And the file size is also the same.

@viirya
Copy link
Member Author

viirya commented Oct 1, 2020

Hmm, for CompressDecompressTester.java, it seems to me that it is from original code?

    else if (compressor.getClass().isAssignableFrom(ZlibCompressor.class)) {
      return ZlibFactory.isNativeZlibLoaded(new Configuration());
-    }              
-    else if (compressor.getClass().isAssignableFrom(SnappyCompressor.class)
-            && isNativeSnappyLoadable())
+    }
+    else if (compressor.getClass().isAssignableFrom(SnappyCompressor.class))

Anyway, I can fix it here if you think it is ok.

@sunchao
Copy link
Member

sunchao commented Oct 2, 2020

The style issue was fixed in the last run. The CI failed because of unit tests and ASF license (I don't really see the file jobTokenPassword). Seems neither is related to this PR.

@viirya
Copy link
Member Author

viirya commented Oct 2, 2020

Fixed another and last style issue. Checked with mvn checkstyle:check locally.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yetus failures are all unrelated. One minor tweak suggested to that change on test reporting. I don't like ever losing stacks of nested exceptions, so if you are changing that code, just throw the AssertionError which fail() would normally do, with the caught exception as the cause. Not your fault, I know, but since you are there...

if (ex.getMessage() != null) {
fail(joiner.join(name, ex.getMessage()));
} else {
fail(joiner.join(name, ExceptionUtils.getStackTrace(ex)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPE is why toString() is what new code should do.
Why don't we just throw new AssertionError(name +ex, ex). That way, the stack trace doesn't get lost, which is something we never want to have happen,

@saintstack
Copy link
Contributor

If making a new PR, the ' compile' is redundant given its maven default?

The license failure is:

Lines that start with ????? in the ASF License  report indicate files that do not have an Apache license header:
 !????? /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2297/src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/jobTokenPassword

No harm fixing it as part of this patch... add the 'jobTokenPassword' from below in ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/pom.xml

       <plugin>
         <groupId>org.apache.rat</groupId>
         <artifactId>apache-rat-plugin</artifactId>
         <configuration>
           <excludes>
             <exclude>src/test/java/org/apache/hadoop/cli/data60bytes</exclude>
             <exclude>src/test/resources/job_1329348432655_0001-10.jhist</exclude>
             <exlude>**/jobTokenPassword</exclude>
           </excludes>
         </configuration>
       </plugin>

Otherwise patch is looking good to me.

@saintstack
Copy link
Contributor

The native compile complaints seem unrelated...

@viirya
Copy link
Member Author

viirya commented Oct 5, 2020

Thanks @steveloughran and @saintstack. Updated the diff based on your suggestions.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 0m 0s test4tests The patch appears to include 4 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 5m 36s Maven dependency ordering for branch
+1 💚 mvninstall 24m 2s trunk passed
+1 💚 compile 19m 49s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 compile 17m 5s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 checkstyle 2m 56s trunk passed
+1 💚 mvnsite 20m 56s trunk passed
+1 💚 shadedclient 14m 21s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 6m 29s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 7m 8s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗 spotbugs 0m 45s Used deprecated FindBugs config; considering switching to SpotBugs.
+0 🆗 findbugs 0m 25s branch/hadoop-project no findbugs output file (findbugsXml.xml)
+0 🆗 findbugs 0m 23s branch/hadoop-project-dist no findbugs output file (findbugsXml.xml)
-0 ⚠️ patch 1m 7s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 35s Maven dependency ordering for patch
+1 💚 mvninstall 21m 15s the patch passed
+1 💚 compile 19m 18s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
-1 ❌ cc 19m 18s /diff-compile-cc-root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1.txt root-jdkUbuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 generated 40 new + 123 unchanged - 40 fixed = 163 total (was 163)
+1 💚 golang 19m 18s the patch passed
+1 💚 javac 19m 18s the patch passed
+1 💚 compile 17m 9s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
-1 ❌ cc 17m 9s /diff-compile-cc-root-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01.txt root-jdkPrivateBuild-1.8.0_265-8u265-b01-0ubuntu218.04-b01 with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu218.04-b01 generated 36 new + 127 unchanged - 36 fixed = 163 total (was 163)
+1 💚 golang 17m 9s the patch passed
+1 💚 javac 17m 9s the patch passed
+1 💚 checkstyle 2m 50s root: The patch generated 0 new + 140 unchanged - 3 fixed = 140 total (was 143)
+1 💚 mvnsite 17m 36s the patch passed
+1 💚 shellcheck 0m 0s There were no new shellcheck issues.
+1 💚 shelldocs 0m 18s There were no new shelldocs issues.
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 5s The patch has no ill-formed XML file.
+1 💚 shadedclient 14m 11s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 6m 25s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 7m 9s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗 findbugs 0m 23s hadoop-project has no data from findbugs
+0 🆗 findbugs 0m 24s hadoop-project-dist has no data from findbugs
_ Other Tests _
-1 ❌ unit 587m 25s /patch-unit-root.txt root in the patch passed.
+1 💚 asflicense 1m 51s The patch does not generate ASF License warnings.
891m 20s
Reason Tests
Failed junit tests hadoop.yarn.applications.distributedshell.TestDistributedShell
hadoop.crypto.key.kms.server.TestKMS
hadoop.hdfs.TestFileChecksumCompositeCrc
hadoop.hdfs.server.balancer.TestBalancer
hadoop.hdfs.server.namenode.TestFileTruncate
hadoop.hdfs.TestDFSShell
hadoop.hdfs.TestFileChecksum
hadoop.tools.TestDistCpSystem
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2297/22/artifact/out/Dockerfile
GITHUB PR #2297
Optional Tests dupname asflicense shellcheck shelldocs compile javac javadoc mvninstall mvnsite unit shadedclient xml cc findbugs checkstyle golang
uname Linux 928304952c17 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 6ece640
Default Java Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2297/22/testReport/
Max. process+thread count 4090 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-project-dist hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient . U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2297/22/console
versions git=2.17.1 maven=3.6.0 shellcheck=0.4.6 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor

No harm fixing it as part of this patch... add the 'jobTokenPassword' from below in ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/pom.xml

I think it's actually some test runner bug, really it should be cleaned up. But we can pull in the patch to shut it up.

@steveloughran
Copy link
Contributor

Ok, I'm happy too

+1, merging to trunk and branch-3.3

@steveloughran steveloughran merged commit c9ea344 into apache:trunk Oct 6, 2020
asfgit pushed a commit that referenced this pull request Oct 6, 2020
This switches the SnappyCodec to use the java-snappy codec, rather than the native one.

To use the codec, snappy-java.jar (from org.xerial.snappy) needs to be on the classpath.

This comesin as an avro dependency,  so it is already on the hadoop-common classpath,
as well as in hadoop-common/lib.
The version used is now managed in the hadoop-project POM; initially 1.1.7.7

Contributed by DB Tsai and Liang-Chi Hsieh

Change-Id: Id52a404a0005480e68917cd17f0a27b7744aea4e
@saintstack
Copy link
Contributor

Thanks for pushing this through @steveloughran +1 on master and branch-3.3.

@viirya
Copy link
Member Author

viirya commented Oct 6, 2020

@dbtsai
Copy link
Member

dbtsai commented Oct 6, 2020

Thanks all for helping and pushing this through! This will simplify how people deploy snappy native lib greatly.

@steveloughran
Copy link
Contributor

JIRA on apache is offline & updated -we need to remember to update that, including something in the release notes

@viirya
Copy link
Member Author

viirya commented Oct 6, 2020

Ok, got it. I will update release notes once it is back. Seems I cannot update Hadoop JIRA.

@viirya
Copy link
Member Author

viirya commented Oct 6, 2020

Looks like the JIRA is back now? https://issues.apache.org/jira/browse/HADOOP-17125

@steveloughran
Copy link
Contributor

JIRA closed, added a release note.

@sunchao
Copy link
Member

sunchao commented Oct 7, 2020

Thanks @steveloughran - could you assign the JIRA to @viirya ?

@steveloughran
Copy link
Contributor

@viirya ...what's your JIRA username?

@viirya
Copy link
Member Author

viirya commented Oct 9, 2020

@steveloughran Username is viirya too. Thanks.

@steveloughran
Copy link
Contributor

@viirya assigned JIRA to you. you are also free to assign any other Hadoop JIRAs to yourself...

@viirya
Copy link
Member Author

viirya commented Oct 12, 2020

@steveloughran Thank you! I tried to assign this ticket, but seems cannot do it.

@steveloughran
Copy link
Contributor

you needed to be listed in the project settings as someone with the right permissions. its done now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants