Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-6509: [Java][CI] Upgrade maven-surefire-plugin to version 3.0.0-M3, disable Gandiva JNI unit tests temporarily #5370

Closed
wants to merge 11 commits into from

Conversation

wesm
Copy link
Member

@wesm wesm commented Sep 12, 2019

To see if the CI issues go away.

@wesm
Copy link
Member Author

wesm commented Sep 12, 2019

Here's what the surefire docs says:

Corrupted STDOUT by directly writing to native stream in forked JVM
If your tests use native library which prints to STDOUT this warning message appears because the library corrupted the channel used by the plugin in order to transmit events with test status back to Maven process. It would be even worse if you override the Java stream by System.setOut because the stream is also supposed to be corrupted but the Maven will never see the tests finished and build may hang. 

This warning message appears if you use FileDescriptor.out or JVM prints GC summary. 

In that case the warning is printed "Corrupted STDOUT by directly writing to native stream in forked JVM", and a dump file can be found in Reports directory. 

If debug level is enabled then messages of corrupted stream appear in the console.

(not actually sure if this warning is related...)

Are any of the Gandiva unit tests writing to stdout from native code? @pprudhvi @praveenbingo

@wesm
Copy link
Member Author

wesm commented Sep 12, 2019

This SO thread suggests that surefire 3.0.0-M3 may fix the issue

https://stackoverflow.com/questions/49578847/maven-error-occurred-in-starting-fork-check-output-in-log

@wesm wesm changed the title ARROW-6509: [Java][CI] Upgrade maven-surefire-plugin to 2.22.1 ARROW-6509: [Java][CI] Upgrade maven-surefire-plugin to version 3.0.0-M3 Sep 12, 2019
@wesm
Copy link
Member Author

wesm commented Sep 12, 2019

Turning off the JVM forking exposed the errors in the logs

[ERROR] Tests run: 3, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 0.652 s <<< FAILURE! - in org.apache.arrow.vector.TestOversizedAllocationForValueVector
[ERROR] testBitVectorReallocation  Time elapsed: 0.647 s  <<< ERROR!
java.lang.Exception: Unexpected exception, expected<org.apache.arrow.vector.util.OversizedAllocationException> but was<java.lang.IllegalArgumentException>
	at org.apache.arrow.vector.TestOversizedAllocationForValueVector.testBitVectorReallocation(TestOversizedAllocationForValueVector.java:93)
[ERROR] testFixedVectorReallocation  Time elapsed: 0.004 s  <<< ERROR!
java.lang.Exception: Unexpected exception, expected<org.apache.arrow.vector.util.OversizedAllocationException> but was<java.lang.AssertionError>
	at org.apache.arrow.vector.TestOversizedAllocationForValueVector.testFixedVectorReallocation(TestOversizedAllocationForValueVector.java:58)
[ERROR] testVariableVectorReallocation  Time elapsed: 0.001 s  <<< ERROR!
java.lang.Exception: Unexpected exception, expected<org.apache.arrow.vector.util.OversizedAllocationException> but was<org.apache.arrow.memory.OutOfMemoryException>
	at org.apache.arrow.vector.TestOversizedAllocationForValueVector.testVariableVectorReallocation(TestOversizedAllocationForValueVector.java:112)
Caused by: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 2147483647 byte(s) of direct memory (used: 16777216, max: 1743257600)
	at org.apache.arrow.vector.TestOversizedAllocationForValueVector.testVariableVectorReallocation(TestOversizedAllocationForValueVector.java:112)
[INFO] Running org.apache.arrow.vector.types.pojo.TestSchema
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.252 s - in org.apache.arrow.vector.types.pojo.TestSchema
[INFO] Running org.apache.arrow.vector.types.pojo.TestExtensionType
22:18:34.086 [main] INFO  o.a.arrow.vector.types.pojo.Field - Unrecognized extension type: uuid
22:18:34.111 [main] INFO  o.a.arrow.vector.types.pojo.Field - Unrecognized extension type: uuid
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.063 s - in org.apache.arrow.vector.types.pojo.TestExtensionType
[INFO] Running org.apache.arrow.vector.TestUnionVector
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.099 s - in org.apache.arrow.vector.TestUnionVector
[INFO] Running org.apache.arrow.vector.TestDurationVector
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.021 s - in org.apache.arrow.vector.TestDurationVector
[INFO] Running org.apache.arrow.vector.TestIntervalYearVector
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.011 s - in org.apache.arrow.vector.TestIntervalYearVector
[INFO] Running org.apache.arrow.vector.TestValueVector
[ERROR] Tests run: 51, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 0.584 s <<< FAILURE! - in org.apache.arrow.vector.TestValueVector
[ERROR] testFixedType2  Time elapsed: 0.001 s  <<< ERROR!
org.apache.arrow.vector.util.OversizedAllocationException: Memory required for vector capacity 306783377 is (2147483648), which is more than max allowed (2147483647)
	at org.apache.arrow.vector.TestValueVector.testFixedType2(TestValueVector.java:195)
[ERROR] testFixedType3  Time elapsed: 0.001 s  <<< ERROR!
org.apache.arrow.vector.util.OversizedAllocationException: Memory required for vector capacity 306783377 is (2147483648), which is more than max allowed (2147483647)
	at org.apache.arrow.vector.TestValueVector.testFixedType3(TestValueVector.java:300)
[ERROR] testFixedType4  Time elapsed: 0.001 s  <<< ERROR!
org.apache.arrow.vector.util.OversizedAllocationException: Memory required for vector capacity 153391688 is (2147483648), which is more than max allowed (2147483647)
	at org.apache.arrow.vector.TestValueVector.testFixedType4(TestValueVector.java:400)

@wesm
Copy link
Member Author

wesm commented Sep 12, 2019

These memory allocations look weird

cc also @tianchen92 @liyafan82

@emkornfield
Copy link
Contributor

@wesm I think the failure you were seeing might be because:
<argLine>-Darrow.vector.max_allocation_bytes=1048576</argLine> doesn't get applied when the force is removed. When I add <arrow.vector.max_allocation_bytes>1048576</arrow.vector.max_allocation_bytes> to the system property variables the tests start passing

@emkornfield
Copy link
Contributor

@ursabot build

@emkornfield
Copy link
Contributor

@wesm I pushed some commits that I think will work around the brittleness in java tests when forking=0, did the upgrade of the plugin by itself also fail? I can't reproduce locally on my mac. I might try resurrecting a linux VM to see if I can reproduce the error there.

java/pom.xml Outdated
@@ -39,7 +39,7 @@
<dep.fbs.version>1.9.0</dep.fbs.version>
<dep.flatc.version>1.9.0</dep.flatc.version>
<arrow.vector.classifier></arrow.vector.classifier>
<forkCount>2</forkCount>
<forkCount>0</forkCount>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this problem.

IMO, reducing the value of forkCount seems to be the correct direction of solving the problem.
However, it also reduces the parallelism of running tests.
The original value (2) is already small enough. I am not sure if we should further reduce it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug still hasn't been fixed -- reducing the forkCount to 0 simply is allowing us to see the segfault / core dump that's happening...

@wesm
Copy link
Member Author

wesm commented Sep 13, 2019

OK, now we are getting somewhere

[INFO] ------------------------------------------------------------------------
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f5641e16e26, pid=28036, tid=0x00007f5684efe700
#
# JRE version: OpenJDK Runtime Environment (8.0_191-b12) (build 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12)
# Java VM: OpenJDK 64-Bit Server VM (25.191-b12 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libgandiva_jni.sodbddc505-1c64-4e65-89f1-5ad5bd499246+0xa5ce26]
#
# Core dump written. Default location: /home/travis/build/apache/arrow/java/core or core.28036
#
# An error report file with more information is saved as:
# /home/travis/build/apache/arrow/java/hs_err_pid28036.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
/home/travis/build/apache/arrow/ci/travis_script_gandiva_java.sh: line 33: 28036 Aborted                 (core dumped) $TRAVIS_MVN test -P arrow-jni -pl gandiva -Darrow.cpp.build.dir=$CPP_BUILD_DIR/debug

@wesm
Copy link
Member Author

wesm commented Sep 13, 2019

I restored the forkCount=2 and turned off the Gandiva JNI tests so we can get passing CI builds again

@codecov-io
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (master@c2f7264). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #5370   +/-   ##
=========================================
  Coverage          ?   88.61%           
=========================================
  Files             ?      948           
  Lines             ?   125732           
  Branches          ?     1437           
=========================================
  Hits              ?   111419           
  Misses            ?    13951           
  Partials          ?      362

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c2f7264...dcd6980. Read the comment docs.

@wesm wesm changed the title ARROW-6509: [Java][CI] Upgrade maven-surefire-plugin to version 3.0.0-M3 ARROW-6509: [Java][CI] Upgrade maven-surefire-plugin to version 3.0.0-M3, disable Gandiva JNI unit tests temporarily Sep 13, 2019
@wesm wesm closed this in 5b783fe Sep 13, 2019
@bkietz
Copy link
Member

bkietz commented Sep 13, 2019

If you filed a bug report at http://bugreport.java.com/bugreport/crash.jsp or anywhere else, could you drop a link to that here and/or on jira?

@wesm
Copy link
Member Author

wesm commented Sep 13, 2019

I don't think it's a bug in the JVM, it seems likely it's caused by one of the last few Gandiva-related patches

https://github.com/apache/arrow/commits/master/cpp/src/gandiva

pprudhvi pushed a commit to pprudhvi/arrow that referenced this pull request Sep 16, 2019
…-M3, disable Gandiva JNI unit tests temporarily

To see if the CI issues go away.

Closes apache#5370 from wesm/ARROW-6509 and squashes the following commits:

dcd6980 <Wes McKinney> Disable Gandiva Java tests
92b1bee <emkornfield> use canonical file
1e3ae38 <emkornfield> use canonical file in read normalized.
9c63f96 <emkornfield> set user timezone.
d805d2a <emkornfield> fix indentation.
997cea3 <emkornfield> Update TestIntegration.java
3a83fd5 <emkornfield> add path fallback for json
09a7bf9 <emkornfield> Add vector.max_allocation_bytes to config
7431585 <Wes McKinney> Do not fork JVM
fb2b647 <Wes McKinney> try 3.0.0-M3
4cbeaad <Wes McKinney> Try upgrading surefire to 2.22.1

Lead-authored-by: Wes McKinney <wesm+git@apache.org>
Co-authored-by: emkornfield <emkornfield@gmail.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants