[SPARK-33288][YARN][FOLLOW-UP][test-hadoop2.7] Fix type mismatch error #30375

wangyum · 2020-11-14T11:59:18Z

What changes were proposed in this pull request?

This pr fix type mismatch error:

[error] /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7-hive-2.3/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala:320:52: type mismatch;
[error]  found   : Long
[error]  required: Int
[error]         Resource.newInstance(resourcesWithDefaults.totalMemMiB, resourcesWithDefaults.cores)
[error]                                                    ^
[error] one error found

Why are the changes needed?

Fix compile issue.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing test.

wangyum · 2020-11-14T11:59:54Z

resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala

@@ -317,7 +317,7 @@ private[yarn] class YarnAllocator(
        customSparkResources
      }
      val resource =
-        Resource.newInstance(resourcesWithDefaults.totalMemMiB, resourcesWithDefaults.cores)
+        Resource.newInstance(resourcesWithDefaults.totalMemMiB.toInt, resourcesWithDefaults.cores)


Hadoop 2.7 only support Int type:
https://github.com/apache/hadoop/blob/release-2.7.4-RC0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Resource.java#L56

SparkQA · 2020-11-14T12:19:47Z

Test build #131090 has finished for PR 30375 at commit 6002504.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2020-11-14T12:22:13Z

cc @tgravescs

SparkQA · 2020-11-14T12:42:23Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35693/

SparkQA · 2020-11-14T13:08:58Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35693/

wangyum · 2020-11-14T14:31:14Z

retest this please.

SparkQA · 2020-11-14T14:55:35Z

Test build #131093 has finished for PR 30375 at commit 6002504.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-11-14T15:18:07Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35696/

mridulm · 2020-11-14T15:32:38Z

Given the value is in MB's, perhaps we should maintain it as int and not do .toInt from a long, thoughts @tgravescs ?

SparkQA · 2020-11-14T15:47:29Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35696/

dongjoon-hyun

Thank you, @wangyum .

Ya. Hadoop 2.7 profile is easily outdated in these days. Given that, hopefully, Apache Spark 3.1 can give more benefits and help to migrate to Hadoop 3.2+ and we can remove Hadoop 2.7 maintenance burden eventually.

BTW, @wangyum and @mridulm and @tgravescs . Do you think it's possible for us to start discussion for dropping Hadoop 2.7 at Apache Spark 3.2?

mridulm · 2020-11-14T18:12:20Z

BTW, @wangyum and @mridulm and @tgravescs . Do you think it's possible for us to start discussion for dropping Hadoop 2.7 at Apache Spark 3.2?

Is the proposal to drop 2.7 and move to 2.10 ? Or to drop 2.x entirely and move to hadoop 3.x ?

Hadoop 2.7.4, which we use, was released 3 years back - while 2.10 was released 1 year back.
Given hadoop 3.x is a major version change, IMO we need to support 2.x branch for longer until larger set of users have migrated.

Assuming we are still supporting 2.10, will the issues we are facing get addressed ?

dongjoon-hyun · 2020-11-14T18:56:54Z

My initial question was about dropping hadoop-2.x profile completely like we did for hive-1.2, @mridulm . It was just a question for the possibility~

If you think so, I will focus on supporting Hadoop 2.x LTS officially in Apache Spark 3 too.

dongjoon-hyun · 2020-11-14T19:08:31Z

I made a PR to protect Hadoop 2 profile. Could you review that, @mridulm ?

[SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2 #30378

wangyum · 2020-11-15T15:04:13Z

+1 for supporting Hadoop 2.x LTS officially in Apache Spark 3.

wangyum · 2020-11-16T03:29:26Z

Merged to master.

dongjoon-hyun · 2020-11-16T03:29:40Z

Thank you, @wangyum .

### What changes were proposed in this pull request? This PR aims to protect `Hadoop 2.x` profile compilation in Apache Spark 3.1+. ### Why are the changes needed? Since Apache Spark 3.1+ switch our default profile to Hadoop 3, we had better prevent at least compilation error with `Hadoop 2.x` profile at the PR review phase. Although this is an additional workload, it will finish quickly because it's compilation only. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the GitHub Action. - This should be merged after #30375 . Closes #30378 from dongjoon-hyun/SPARK-33454. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>

tgravescs · 2020-11-16T14:14:21Z

thanks for fixing, I don't think we should drop Hadoop 2.x, I think to many people are still using it.

mridulm · 2020-11-16T18:51:34Z

@tgravescs any thoughts on this comment ?
This probably got merged before you got to it.

tgravescs · 2020-11-16T19:19:31Z

sorry missed your comment @mridulm yeah we can just use int since this are all MiB, I'll file a followup and put up a PR.

Fix type mismatch error

6002504

github-actions bot added the YARN label Nov 14, 2020

wangyum commented Nov 14, 2020

View reviewed changes

dongjoon-hyun reviewed Nov 14, 2020

View reviewed changes

dongjoon-hyun mentioned this pull request Nov 14, 2020

[SPARK-33454][INFRA] Add GitHub Action job for Hadoop 2 #30378

Closed

srowen approved these changes Nov 15, 2020

View reviewed changes

HyukjinKwon approved these changes Nov 16, 2020

View reviewed changes

wangyum closed this in f660946 Nov 16, 2020

wangyum deleted the SPARK-33288 branch November 16, 2020 03:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-33288][YARN][FOLLOW-UP][test-hadoop2.7] Fix type mismatch error #30375

[SPARK-33288][YARN][FOLLOW-UP][test-hadoop2.7] Fix type mismatch error #30375

wangyum commented Nov 14, 2020 •

edited

wangyum Nov 14, 2020

SparkQA commented Nov 14, 2020

wangyum commented Nov 14, 2020

SparkQA commented Nov 14, 2020

SparkQA commented Nov 14, 2020

wangyum commented Nov 14, 2020

SparkQA commented Nov 14, 2020

SparkQA commented Nov 14, 2020

mridulm commented Nov 14, 2020

SparkQA commented Nov 14, 2020

dongjoon-hyun left a comment

mridulm commented Nov 14, 2020

dongjoon-hyun commented Nov 14, 2020

dongjoon-hyun commented Nov 14, 2020

wangyum commented Nov 15, 2020

wangyum commented Nov 16, 2020

dongjoon-hyun commented Nov 16, 2020

tgravescs commented Nov 16, 2020

mridulm commented Nov 16, 2020

tgravescs commented Nov 16, 2020

[SPARK-33288][YARN][FOLLOW-UP][test-hadoop2.7] Fix type mismatch error #30375

[SPARK-33288][YARN][FOLLOW-UP][test-hadoop2.7] Fix type mismatch error #30375

Conversation

wangyum commented Nov 14, 2020 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

wangyum Nov 14, 2020

Choose a reason for hiding this comment

SparkQA commented Nov 14, 2020

wangyum commented Nov 14, 2020

SparkQA commented Nov 14, 2020

SparkQA commented Nov 14, 2020

wangyum commented Nov 14, 2020

SparkQA commented Nov 14, 2020

SparkQA commented Nov 14, 2020

mridulm commented Nov 14, 2020

SparkQA commented Nov 14, 2020

dongjoon-hyun left a comment

Choose a reason for hiding this comment

mridulm commented Nov 14, 2020

dongjoon-hyun commented Nov 14, 2020

dongjoon-hyun commented Nov 14, 2020

wangyum commented Nov 15, 2020

wangyum commented Nov 16, 2020

dongjoon-hyun commented Nov 16, 2020

tgravescs commented Nov 16, 2020

mridulm commented Nov 16, 2020

tgravescs commented Nov 16, 2020

wangyum commented Nov 14, 2020 •

edited