Skip to content

[HUDI-1869] Upgrading Spark3 To 3.1#3580

Closed
pengzhiwei2018 wants to merge 1 commit intoapache:masterfrom
pengzhiwei2018:dev_spark3.1
Closed

[HUDI-1869] Upgrading Spark3 To 3.1#3580
pengzhiwei2018 wants to merge 1 commit intoapache:masterfrom
pengzhiwei2018:dev_spark3.1

Conversation

@pengzhiwei2018
Copy link

@pengzhiwei2018 pengzhiwei2018 commented Sep 1, 2021

What is the purpose of the pull request

  • Upgrading spark3 version from the spark3.0.x to spark 3.1.x

  • Support both spark 3.0.x and spark 3.1.x.

  • Build package for spark 3.0.x use:

    mvn clean install -DskipTests -Pspark3 -PisSpark3.0

    For other spark3 version :

    mvn clean install -DskipTests -Pspark3

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@hudi-bot
Copy link
Collaborator

hudi-bot commented Sep 1, 2021

CI report:

  • cb5cfb39017db2f09ce5c5fc8d0af5e2a35b92eb UNKNOWN
  • b418e2b Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run travis re-run the last Travis build
  • @hudi-bot run azure re-run the last Azure build

@vinothchandar
Copy link
Member

spark 3.0.x will not supported as spark 3.1 has many changes to the class interface.

this is just crazy how badly it breaks across minor versions

@pengzhiwei2018
Copy link
Author

spark 3.0.x will not supported as spark 3.1 has many changes to the class interface.

this is just crazy how badly it breaks across minor versions

If we want to both support spark 3.0 and spark 3.1, maybe way need another build config to specify the spark version.

@pengzhiwei2018
Copy link
Author

spark 3.0.x will not supported as spark 3.1 has many changes to the class interface.

this is just crazy how badly it breaks across minor versions

If we want to both support spark 3.0 and spark 3.1, maybe way need another build config to specify the spark version.

Hi @vinothchandar , I have support both spark3.0 and spark3.1. Please take a review again~

<flink.version>1.13.1</flink.version>
<spark2.version>2.4.4</spark2.version>
<spark3.version>3.0.0</spark3.version>
<spark3.version>3.1.0</spark3.version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark 3.1.0 is broken, the minimal 3.1.x version is 3.1.1 https://spark.apache.org/news/next-official-release-spark-3.1.1.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, any reason not to use the latest patch version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a 3.1.2 now as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest always use the latest patched version, for this case, Spark 2.4.8, 3.0.3, 3.1.2

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense to me.

@vinothchandar vinothchandar self-assigned this Sep 7, 2021
@vinothchandar
Copy link
Member

Will review this

Comment on lines +1490 to +1498
<profile>
<id>isSpark3.0</id>
<!-- for spark 3.0.x we need override the follow propeprties to package and run test-->
<properties>
<spark3.version>3.0.0</spark3.version>
<spark.version>${spark3.version}</spark.version>
<scalatest.version>3.0.1</scalatest.version>
</properties>
</profile>
Copy link
Member

@xushiyan xushiyan Sep 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of a boolean flag, better be explicit about spark version and have a maven profile match to it. we can name the profile -Dspark3.0.x just for 3.0.x and keep -Dspark3 points to the latest supported Spark 3 release.

@vinothchandar vinothchandar added the priority:blocker Production down; release blocker label Oct 20, 2021
@novakov-alexey-zz
Copy link

@pengzhiwei2018 do you want to upgrade us to Spark 3.2 with Scala 2.13 ?

@xushiyan
Copy link
Member

This change continues in #3844

@xushiyan xushiyan closed this Oct 24, 2021
@novakov-alexey-zz
Copy link

@xushiyan Does it make sense to upgrade Hudi straight to Spark 3.2 and extend #3844 for 3.2 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:blocker Production down; release blocker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants