Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40303][DOCS] Deprecate old Java 8 versions prior to 8u362 #39671

Closed
wants to merge 2 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Jan 20, 2023

What changes were proposed in this pull request?

This PR aims to deprecate old Java 8 versions prior to 8u362.

Why are the changes needed?

8u362 fixed a performance issue: openjdk/jdk8u-dev#161

Benchmark code:

val dir = "/tmp/spark/benchmark"
val N = 2000000
val columns = Range(0, 100).map(i => s"id % $i AS id$i")

spark.range(N).selectExpr(columns: _*).write.mode("Overwrite").parquet(dir)

val cnt = 60
val selectExps = columns.take(cnt).map(_.split(" ").last).map(c => s"count(distinct $c)")
val start = System.currentTimeMillis()
spark.read.parquet(dir).selectExpr(selectExps: _*).collect()
println(s"Benchmark result. Java version: ${System.getProperty("java.version")}. Time(ms): ${System.currentTimeMillis() - start}")

8u352 benchmark result:

export JAVA_HOME=/Users/yumwang/Downloads/zulu8.66.0.15-ca-jdk8.0.352-macosx_x64
export PATH=${JAVA_HOME}/bin:${PATH}

bin/spark-shell  --master "local[2]" -i benchmark.scala
Benchmark result. Java version: 1.8.0_352. Time(ms): 641155

8u362 benchmark result:

export JAVA_HOME=/Users/yumwang/Downloads/zulu8.68.0.19-ca-jdk8.0.362-macosx_x64
export PATH=${JAVA_HOME}/bin:${PATH}

bin/spark-shell  --master "local[2]" -i benchmark.scala
Benchmark result. Java version: 1.8.0_362. Time(ms): 79360

Does this PR introduce any user-facing change?

No.

How was this patch tested?

N/A.

@github-actions github-actions bot added the DOCS label Jan 20, 2023
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we simply deprecate, @wangyum ?

- Java 8 prior to version 8u201 support is deprecated as of Spark 3.2.0.
+ Java 8 prior to version 8u362 support is deprecated as of Spark 3.4.0.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 20, 2023

Oh, does Zulu only have that released version, @wangyum ?

I cannot find docker image and Adoptium (Temurin) Java yet.

@wangyum
Copy link
Member Author

wangyum commented Jan 20, 2023

Oh, does Zulu only have that released version, @wangyum ?

I cannot find docker image and Adoptium (Temurin) Java yet.

It is in progress:
image

@wangyum wangyum changed the title [SPARK-40303][DOCS] Recommends users to use JDK 8u362 and later versions [SPARK-40303][DOCS] Deprecate old Java 8 versions prior to 8u362 Jan 20, 2023
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM for Apache Spark 3.4.0.

cc @srowen , @HyukjinKwon , @xinrong-meng , @LuciferYang

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM, thanks @wangyum

@LuciferYang
Copy link
Contributor

One problem is that GA is still using Temurin 8u352 for build and test. We need to wait for a while before running GA tasks using 8u362.

@LuciferYang
Copy link
Contributor

Could you use 8u362 to run full UTs offline to check compatibility? Thanks ~ @wangyum

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 20, 2023

To @LuciferYang , I don't think this is a compatibility issue or any failure.

@LuciferYang
Copy link
Contributor

@dongjoon-hyun Hmm...do you remember SPARK-40846? When we upgrade from 8u345 to 8u352 for GA testing, there are some time zone issue that need to be solved by changing the code, so I am not sure whether it is the right time to directly recommend a Java version without GA verification in the document. But maybe I'm too conservative.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 20, 2023

Timezone issues are inevitable which we need to adjust the code in a regular basis, @LuciferYang .

@dongjoon-hyun
Copy link
Member

BTW, we didn't cut the branch yet and we still have one month for Apache Spark 3.4.0 release. I'm considering that time period for this decision, @LuciferYang . You are also correct and being conservative is better if we don't have a room like that.

@LuciferYang
Copy link
Contributor

Ok, plenty of time. I am fine to make this change

@dongjoon-hyun
Copy link
Member

Now, both Zulu and Adoptiun(Temurin) are available. Thank you all. Merged to master.
Screenshot 2023-01-20 at 9 02 42 AM

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 20, 2023

FYI, GitHub Action is currently on the old version but jdk8u362 will be automatically applied in one or two weeks. There is no need for us to do something. I'll keep monitoring the version change.

Screenshot 2023-01-20 at 9 28 23 AM

@wangyum wangyum deleted the SPARK-40303 branch January 30, 2023 04:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants