Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28723][SQL] Upgrade to Hive 2.3.6 for HiveMetastore Client and Hadoop-3.2 profile #25443

Closed
wants to merge 24 commits into from

Conversation

@wangyum
Copy link
Member

commented Aug 14, 2019

What changes were proposed in this pull request?

This PR upgrade the built-in Hive to 2.3.6 for hadoop-3.2.

Hive 2.3.6 release notes:

  • HIVE-22096: Backport HIVE-21584 (Java 11 preparation: system class loader is not URLClassLoader)
  • HIVE-21859: Backport HIVE-17466 (Metastore API to list unique partition-key-value combinations)
  • HIVE-21786: Update repo URLs in poms branch 2.3 version

Why are the changes needed?

Make Spark support JDK 11.

Does this PR introduce any user-facing change?

Yes. Please see SPARK-28684 and SPARK-24417 for more details.

How was this patch tested?

Existing unit test and manual test.

@wangyum wangyum changed the title [WIP][test-hadoop3.2] Test on JDK 11 with Hive 2.3.6 on jenkins [WIP][test-hadoop3.2] Test JDK 11 with Hive 2.3.6 on jenkins Aug 14, 2019

dev/run-tests-jenkins.py Outdated Show resolved Hide resolved
pom.xml Outdated Show resolved Hide resolved
@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

cc @dbtsai

@wangyum wangyum changed the title [WIP][test-hadoop3.2] Test JDK 11 with Hive 2.3.6 on jenkins [WIP][test-hadoop3.2][test-maven] Test JDK 11 with Hive 2.3.6 on jenkins Aug 14, 2019

@wangyum

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2019

retest this please

@dongjoon-hyun dongjoon-hyun added the SQL label Aug 14, 2019

@dongjoon-hyun dongjoon-hyun changed the title [WIP][test-hadoop3.2][test-maven] Test JDK 11 with Hive 2.3.6 on jenkins [WIP][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins Aug 14, 2019

@SparkQA

This comment was marked as outdated.

Copy link

commented Aug 14, 2019

Test build #109073 has finished for PR 25443 at commit 540d6da.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.
@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

Since the test is parallel, could you add the following, too?

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
-    case "2.3" | "2.3.0" | "2.3.1" | "2.3.2" | "2.3.3" | "2.3.4" | "2.3.5" => hive.v2_3
+    case "2.3" | "2.3.0" | "2.3.1" | "2.3.2" | "2.3.3" | "2.3.4" | "2.3.5" | "2.3.6" => hive.v2_3

@dongjoon-hyun dongjoon-hyun changed the title [WIP][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins Aug 14, 2019

@wangyum

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2019

Will do it later.

@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

@wangyum . I believe we should do #25443 (comment) in this PR to be complete.

cc @gatorsmile

@SparkQA

This comment was marked as outdated.

Copy link

commented Aug 14, 2019

Test build #109074 has finished for PR 25443 at commit 540d6da.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@wangyum

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2019

Failed with these errors:

ExternalSorterSuite:
- empty data stream with kryo ser
- empty data stream with java ser
- few elements per partition with kryo ser
- few elements per partition with java ser
- empty partitions with spilling with kryo ser
- empty partitions with spilling with java ser
- spilling in local cluster with kryo ser *** FAILED ***
  org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.NoSuchMethodError: java.nio.ByteBuffer.flip()Ljava/nio/ByteBuffer;
java.lang.NoSuchMethodError: java.nio.ByteBuffer.flip()Ljava/nio/ByteBuffer;
	at org.apache.spark.util.io.ChunkedByteBufferOutputStream.toChunkedByteBuffer(ChunkedByteBufferOutputStream.scala:115)
	at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:307)
	at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:137)
	at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:91)
	at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
	at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:74)
	at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1470)
	at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1182)
	at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1086)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$submitStage$5(DAGScheduler.scala:1089)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$submitStage$5$adapted(DAGScheduler.scala:1088)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1088)
	at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1030)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2129)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2121)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2110)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

If we build and test with both JDK11, it will pass. The current Jenkins seems to build with JDK8 and running on JDK11 and hit this known issue.

$ build/sbt "core/testOnly *.ExternalSorterSuite"
[info] ExternalSorterSuite:
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/dhyun/PRS/SPARK-HIVE-2.3.6/common/unsafe/target/scala-2.12/spark-unsafe_2.12-3.0.0-SNAPSHOT.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
[info] - empty data stream with kryo ser (1 second, 457 milliseconds)
[info] - empty data stream with java ser (89 milliseconds)
[info] - few elements per partition with kryo ser (89 milliseconds)
[info] - few elements per partition with java ser (74 milliseconds)
[info] - empty partitions with spilling with kryo ser (329 milliseconds)
[info] - empty partitions with spilling with java ser (156 milliseconds)
[info] - spilling in local cluster with kryo ser (4 seconds, 296 milliseconds)
[info] - spilling in local cluster with java ser (4 seconds, 372 milliseconds)
[info] - spilling in local cluster with many reduce tasks with kryo ser (5 seconds, 718 milliseconds)
[info] - spilling in local cluster with many reduce tasks with java ser (6 seconds, 51 milliseconds)
[info] - cleanup of intermediate files in sorter (113 milliseconds)
[info] - cleanup of intermediate files in sorter with failures (111 milliseconds)
[info] - cleanup of intermediate files in shuffle (297 milliseconds)
[info] - cleanup of intermediate files in shuffle with failures (121 milliseconds)
[info] - no sorting or partial aggregation with kryo ser (58 milliseconds)
[info] - no sorting or partial aggregation with java ser (53 milliseconds)
[info] - no sorting or partial aggregation with spilling with kryo ser (62 milliseconds)
[info] - no sorting or partial aggregation with spilling with java ser (68 milliseconds)
[info] - sorting, no partial aggregation with kryo ser (63 milliseconds)
[info] - sorting, no partial aggregation with java ser (54 milliseconds)
[info] - sorting, no partial aggregation with spilling with kryo ser (58 milliseconds)
[info] - sorting, no partial aggregation with spilling with java ser (61 milliseconds)
[info] - partial aggregation, no sorting with kryo ser (52 milliseconds)
[info] - partial aggregation, no sorting with java ser (51 milliseconds)
[info] - partial aggregation, no sorting with spilling with kryo ser (55 milliseconds)
[info] - partial aggregation, no sorting with spilling with java ser (49 milliseconds)
[info] - partial aggregation and sorting with kryo ser (44 milliseconds)
[info] - partial aggregation and sorting with java ser (44 milliseconds)
[info] - partial aggregation and sorting with spilling with kryo ser (48 milliseconds)
[info] - partial aggregation and sorting with spilling with java ser (49 milliseconds)
[info] - sort without breaking sorting contracts with kryo ser (1 second, 904 milliseconds)
[info] - sort without breaking sorting contracts with java ser (1 second, 860 milliseconds)
[info] - sort without breaking timsort contracts for large arrays !!! IGNORED !!!
[info] - spilling with hash collisions (208 milliseconds)
[info] - spilling with many hash collisions (589 milliseconds)
[info] - spilling with hash collisions using the Int.MaxValue key (168 milliseconds)
[info] - spilling with null keys and values (226 milliseconds)
[info] - sorting updates peak execution memory (1 second, 347 milliseconds)
[info] - force to spill for external sorter (800 milliseconds)
[info] ScalaTest
[info] Run completed in 34 seconds, 99 milliseconds.
[info] Total number of tests run: 38
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 38, failed 0, canceled 0, ignored 1, pending 0
[info] All tests passed.

cc @srowen

@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

dev/run-tests-jenkins Outdated Show resolved Hide resolved
HyukjinKwon added 2 commits Aug 14, 2019
@SparkQA

This comment was marked as outdated.

Copy link

commented Aug 14, 2019

Test build #109079 has finished for PR 25443 at commit 2ebdcb9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@HyukjinKwon

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

Hm, seems not working. Let me check.

@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

6821aa5 is still running, isn't it?

@HyukjinKwon

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

Yeah but my approach e6508c0 and 2ebdcb9 doesn't seem working.

@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

Oh, got it.

@SparkQA

This comment has been minimized.

Copy link

commented Aug 19, 2019

Test build #109299 has finished for PR 25443 at commit 9defec2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Aug 19, 2019

Last two failures are due to #25373 which breaks all Maven profiles (Hadoop-2.7/Hadoop-3.2) . I reverted it from the master branch.

@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Aug 19, 2019

Retest this please.

@SparkQA

This comment has been minimized.

Copy link

commented Aug 19, 2019

Test build #109303 has finished for PR 25443 at commit 9defec2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@HyukjinKwon

This comment has been minimized.

Copy link
Member

commented Aug 19, 2019

retest this please

@SparkQA

This comment has been minimized.

Copy link

commented Aug 19, 2019

Test build #109305 has finished for PR 25443 at commit 9defec2.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.
@HyukjinKwon

This comment has been minimized.

Copy link
Member

commented Aug 19, 2019

retest this please

@SparkQA

This comment has been minimized.

Copy link

commented Aug 19, 2019

Test build #109324 has finished for PR 25443 at commit 9defec2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@@ -183,6 +183,8 @@ def main():
os.environ["AMPLAB_JENKINS_BUILD_PROFILE"] = "hadoop2.7"
if "test-hadoop3.2" in ghprb_pull_title:
os.environ["AMPLAB_JENKINS_BUILD_PROFILE"] = "hadoop3.2"
# os.environ["JAVA_HOME"] = "/usr/java/jdk-11.0.1"

This comment has been minimized.

Copy link
@srowen

srowen Aug 19, 2019

Member

Aside from removing this, this is ready to merge? @HyukjinKwon @dongjoon-hyun

pom.xml Outdated
<repository>
<id>staged</id>
<name>staged-releases</name>
<url>https://repository.apache.org/content/repositories/staging/</url>

This comment has been minimized.

Copy link
@HyukjinKwon

HyukjinKwon Aug 19, 2019

Member

@srowen, yea almost ready per https://github.com/apache/spark/pull/25443/files#r315437547. Plus, Hive 2.3.6 vote looks going to be open now. After the release, we can switch this snapshot to the official Hive 2.3.6 and I believe we're good to go.

This comment has been minimized.

Copy link
@HyukjinKwon
@SparkQA

This comment has been minimized.

Copy link

commented Aug 23, 2019

Test build #109609 has finished for PR 25443 at commit ff4783c.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@wangyum

This comment has been minimized.

Copy link
Member Author

commented Aug 23, 2019

We need to retest this once Hive 2.3.6 is pushed to the maven repository.

@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Aug 23, 2019

Yep. Maven publishing takes some time after binary artifact release.

@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Aug 23, 2019

@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Aug 23, 2019

Retest this please.

@dongjoon-hyun dongjoon-hyun changed the title [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins [SPARK-28723][SQL] Upgrade to Hive 2.3.6 for HiveMetastore Client and Hadoop-3.2 profile Aug 23, 2019

@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Aug 23, 2019

Retest this please.

@SparkQA

This comment has been minimized.

Copy link

commented Aug 24, 2019

Test build #109662 has finished for PR 25443 at commit ff4783c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@@ -17,6 +17,11 @@

package org.apache.spark.sql.hive.thriftserver

This comment has been minimized.

Copy link
@dongjoon-hyun

dongjoon-hyun Aug 24, 2019

Member

During JDK11 testing and review, we has been skipped renaming in order to focus JDK11 related stuff by minimizing PR diff. We may need to rename this src file directory v2.3.5 to v2.3.6 again for consistency later. If the test pass, I'd like to merge this AS-IS PR first.

cc @gatorsmile , @srowen

@SparkQA

This comment has been minimized.

Copy link

commented Aug 24, 2019

Test build #109658 has finished for PR 25443 at commit ff4783c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Aug 24, 2019

+1, LGTM. Merged to master.
Thank you so much, @wangyum , @srowen , @HyukjinKwon , @shaneknapp !

@wangyum wangyum deleted the wangyum:test-on-jenkins branch Aug 24, 2019

@HyukjinKwon

This comment has been minimized.

Copy link
Member

commented Aug 24, 2019

+1!

@HyukjinKwon
Copy link
Member

left a comment

Late LGTM too!

@dongjoon-hyun

This comment has been minimized.

Copy link
Member

commented Aug 24, 2019

FYI, after this, we have one successful Jenkins result on JDK11.

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
[INFO] 
[INFO] Spark Project Parent POM ........................... SUCCESS [  3.603 s]
[INFO] Spark Project Tags ................................. SUCCESS [  8.820 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 23.616 s]
[INFO] Spark Project Local DB ............................. SUCCESS [  6.317 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 58.109 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 12.534 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [  9.939 s]
[INFO] Spark Project Launcher ............................. SUCCESS [  9.372 s]
[INFO] Spark Project Core ................................. SUCCESS [23:13 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [  8.764 s]
[INFO] Spark Project GraphX ............................... SUCCESS [01:17 min]
[INFO] Spark Project Streaming ............................ SUCCESS [05:38 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [10:23 min]
[INFO] Spark Project SQL .................................. SUCCESS [  01:44 h]
[INFO] Spark Project ML Library ........................... SUCCESS [33:00 min]
[INFO] Spark Project Tools ................................ SUCCESS [  1.508 s]
[INFO] Spark Project Hive ................................. SUCCESS [  01:09 h]
[INFO] Spark Project Graph API ............................ SUCCESS [  3.619 s]
[INFO] Spark Project Cypher ............................... SUCCESS [  3.860 s]
[INFO] Spark Project Graph ................................ SUCCESS [  2.397 s]
[INFO] Spark Project REPL ................................. SUCCESS [01:26 min]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [  3.692 s]
[INFO] Spark Project YARN ................................. SUCCESS [07:36 min]
[INFO] Spark Project Mesos ................................ SUCCESS [ 37.176 s]
[INFO] Spark Project Hive Thrift Server ................... SUCCESS [09:03 min]
[INFO] Spark Project Assembly ............................. SUCCESS [  3.331 s]
[INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [  7.260 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [01:16 min]
[INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS [07:36 min]
[INFO] Spark Kinesis Integration .......................... SUCCESS [ 26.717 s]
[INFO] Spark Project Examples ............................. SUCCESS [ 27.544 s]
[INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [  2.694 s]
[INFO] Spark Avro ......................................... SUCCESS [01:54 min]
[INFO] Spark Project Kinesis Assembly ..................... SUCCESS [  2.481 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  04:40 h
[INFO] Finished at: 2019-08-24T03:35:36-07:00
[INFO] ------------------------------------------------------------------------

cc @gatorsmile , @dbtsai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.