Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34762][BUILD] Fix the build failure with Scala 2.13 which is related to commons-* with better solution #31880

Closed
wants to merge 5 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 5 additions & 6 deletions dev/change-scala-version.sh
Original file line number Diff line number Diff line change
Expand Up @@ -60,14 +60,13 @@ BASEDIR=$(dirname $0)/..
find "$BASEDIR" -name 'pom.xml' -not -path '*target*' -print \
-exec bash -c "sed_i 's/\(artifactId.*\)_'$FROM_VERSION'/\1_'$TO_VERSION'/g' {}" \;

# dependency:get is workaround for SPARK-34762 to download the JAR file of commons-cli.
# Without this, build with Scala 2.13 using SBT will fail because the help plugin used below downloads only the POM file.
COMMONS_CLI_VERSION=`build/mvn help:evaluate -Dexpression=commons-cli.version -q -DforceStdout`
build/mvn dependency:get -Dartifact=commons-cli:commons-cli:${COMMONS_CLI_VERSION} -q

# Update <scala.version> in parent POM
# First find the right full version from the profile's build
SCALA_VERSION=`build/mvn help:evaluate -Pscala-${TO_VERSION} -Dexpression=scala.version -q -DforceStdout`
# NOTE: We used mvn help:evaluate to fetch the value of scala.version before but sed is used now.
# This is a workaround for SPARK-34762.
ESCAPED_TO_VERSION=$(echo $TO_VERSION | sed -n "s/\./\\\\./gp")
SCALA_VERSION=$(sed -n "/<id>scala-$ESCAPED_TO_VERSION<\/id>/,/<\/profile>/ \
s;^.*<scala\.version>\(.*\)</scala\.version>.*$;\1;p" pom.xml)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine now(?) we're avoiding maven help that caused side effect. It makes the script more robust anyway ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean we can go with this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't see why this part is necessary. Can we fix this without breaking other parts of the build script that depend on scala.version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sbt (or courier?) seems to fail to resolve dependency if the pom file for a dependency is in ~/.m2 but jar file is not. I don't know the reason.

For master branch, before #31862, change-scala-version.sh run mvn help:evaluate and it downloads commons-cli-1.2.pom but doesn't commons-cli-1.2.jar.

$ ls -R ~/.m2/repository/commons-cli/commons-cli/
/home/kou/.m2/repository/commons-cli/commons-cli/:
1.2

/home/kou/.m2/repository/commons-cli/commons-cli/1.2:
_remote.repositories  commons-cli-1.2.pom  commons-cli-1.2.pom.sha1

You can also confirm with mvn -X help:evaluate.

So, I resolved by getting commons-cli-1.2.jarusing mvn dependency:get in #31862.

For branch-3.1, mvn help:evaluate also downloads commons-cli-1.2.jar but it's resolved this part by #31862.
But mvn dependency:get downloads commons-io-2.6.pom though it doesn't download commons-io-2.6.jar.

ls -R ~/.m2/repository/commons-io/commons-io/
/home/kou/.m2/repository/commons-io/commons-io/:
2.4  2.5  2.6

/home/kou/.m2/repository/commons-io/commons-io/2.4:
_remote.repositories  commons-io-2.4.pom  commons-io-2.4.pom.sha1

/home/kou/.m2/repository/commons-io/commons-io/2.5:
_remote.repositories  commons-io-2.5.jar  commons-io-2.5.jar.sha1  commons-io-2.5.pom  commons-io-2.5.pom.sha1

/home/kou/.m2/repository/commons-io/commons-io/2.6:
_remote.repositories  commons-io-2.6.pom  commons-io-2.6.pom.sha1

I understand branch-3.1 depends on commons-io-2.5 but, in fact, if we manually delete commons-io/comons-io-2.6 before sbt, build successfully finishes.

It's also true for master that commons-io-2.6.pom is present but commons-io-2.6.jar is absent.
But there is one difference between master and branch-3.1.
master depends on commons-io-2.8 which is newer version than commons-io-2.6 while branch-3.1 depends on commons-io-2.5 which is older than commons-io-2.6.
So I guess this affects build failure for branch-3.1 while it succeeds for master.

Anyway, if we don't use maven plugins in change-scala-version.sh, this problem can be resolved easily.
Or, do you have a better solution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we solve this by updating commons-io in older branches? that would be fine too IMHO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought what you think too and it can resolve this issue for the time being.
But I'm afraid that this build failure happens again in the future.

In this case, only commons-cli and comons-io matters but, actually, help and dependency downloads not only them.
I confirmed that help downloads pom files but not jar files for 300+ dependencies.

If we use newer maven or upgraded plugins and Spark and those plugins have a comondependency but plugins use newer version, this problem can happen again.

My worry might be unnecessary or you think we just just fix this problem when it happens again, I'll close this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear can we fix this by using newer Maven versions or newer plugins, or newer versions of the dependencies? I think that's fine, even if it means it pulls in a lot of stuff.

Copy link
Member Author

@sarutak sarutak Mar 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can fix this by using newer Maven or newer plugins.
Spark needs to use newer version of dependencies than what plugins use.

This problem can happen if all the following condition is true.

  • Spark and a plugin have a common direct/indirect dependcy.
  • A plugin uses newer or the same version of the dependency.
  • The plugin downloads pom but not jar for the dependency.
  • Build with sbt (or may be the case courier is used) under the condition that the pom is present but the jar is absent..

One example is dependency-plugin and commons-io. Both Spark and dependency-plugin depends on commons-io (dependency-plugin seems to depend on it indirectly).
And branch-3.1 depends on commons-io:2.4, while dependency-plugin depends on newer commons-io:2.6.
When mvn dependency:get runs, pom is downloaded but doesn't jar for commons-io:2.6.
Under this condition, if we build with sbt, sbt or courier doesn't download the dependent jar, leading this issue.

Newer Maven and newer plugins can depends on newer version of the common dependency than what Spark depends on. So I don't think we can't fix this issue using newer Maven or newer plugins.

sed_i '1,/<scala\.version>[0-9]*\.[0-9]*\.[0-9]*</s/<scala\.version>[0-9]*\.[0-9]*\.[0-9]*</<scala.version>'$SCALA_VERSION'</' \
"$BASEDIR/pom.xml"

Expand Down