Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34762][BUILD] Fix the build failure with Scala 2.13 which is related to commons-* with better solution #31880

Closed
wants to merge 5 commits into from

Conversation

sarutak
Copy link
Member

@sarutak sarutak commented Mar 18, 2021

What changes were proposed in this pull request?

This PR fixes the issue that build fails with Scala 2.13 and sbt with better solution.
The issue was resolved in #31862 for master branch but it's still in branch-3.1.
The reason for branch-3.1 is mvn help in change-scala-version.sh downloads the POM file of commons-io and the JAR file is not downloaded.
For master branch, it's not commons-io but commons-cli.

According to the result, the affected library seems to be subject to various factors but one factor is the maven help plugin.
So, I modified change-scala-version.sh to change the way to fetch scala.version to simply use sed rather than mvn.

Why are the changes needed?

To fix the issue with more proper way.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

I confirmed build succeed with the following procedure.

find ~/.m2 -name commons-cli -exec rm -rf {} \;
find ~/.ivy2 -name commons-cli -exec rm -rf {} \;
find ~/.cache/ -name commons-cli -exec rm -rf {} \; // For Linux
find ~/Library/Caches -name commons-cli -exec rm -rf {} \; // For macOS

dev/change-scala-version 2.13
./build/sbt -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pdocker-integration-tests -Pkubernetes-integration-tests -Pspark-ganglia-lgpl -Pscala-2.13 clean compile test:compile

pom.xml Outdated
@@ -162,7 +162,9 @@
<commons.math3.version>3.4.1</commons.math3.version>
<!-- managed up from 3.2.1 for SPARK-11652 -->
<commons.collections.version>3.2.2</commons.collections.version>
<scala.version>2.12.10</scala.version>
<scala-2.12.version>2.12.10</scala-2.12.version>
<scala-2.13.version>2.13.5</scala-2.13.version>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers:
I confirmed this solution works even for branch-3.1 but branch-3.1 uses 2.13.4 rather than 2.13.5 so I'll open a backport PR after this PR merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, why do we need both versions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to have both versions to avoid issues like SPARK-34774.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite see why it is necessary for this change?

Copy link
Member Author

@sarutak sarutak Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to preserve one version in pom.xml even after overwrite it withchange-scala-version.sh.
But now that #31865 introduces <scala.version> in <profile>, we can take another solution (3c66069) without having both <scala-2.12.version> and <scala-2.13.version>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the advantage or need - surely the build has one scala version? otherwise you have to flip even which property is referenced in the whole build?

Copy link
Member Author

@sarutak sarutak Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first time I open this PR, <scala.version>2.12.10</scala.version> is absent from the profile <id>scala-2.12</id> so, if we change the version with change-scala-version.sh 2.13, there will be no more 2.12.10 because <scala.version>2.12.10</scala.version> is overwritten with <scala.version>2.13.5</scala.version>.
Then, change-scala-version.sh 2.12 will fail to modify pom.xml.

If we have both versions in properties (<scala-2.12.version> and <scala-2.13-version>) which will not be overwritten by change-scala-version.sh, it's easy to choose scala.version to change.

But, as I mentioned above, now we don't need such properties.

@sarutak
Copy link
Member Author

sarutak commented Mar 18, 2021

cc: @HyukjinKwon

@sarutak
Copy link
Member Author

sarutak commented Mar 18, 2021

After I confirm that GA and Jenkins pass, I'll merge this.

@github-actions github-actions bot added the BUILD label Mar 18, 2021
@SparkQA
Copy link

SparkQA commented Mar 18, 2021

Test build #136209 has finished for PR 31880 at commit 3c66069.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sarutak
Copy link
Member Author

sarutak commented Mar 18, 2021

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 18, 2021

Test build #136218 has finished for PR 31880 at commit 9660460.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

# This is a workaround for SPARK-34762.
ESCAPED_TO_VERSION=$(echo $TO_VERSION | sed -n "s/\./\\\\./gp")
SCALA_VERSION=$(sed -n "/<id>scala-$ESCAPED_TO_VERSION<\/id>/,/<\/profile>/ \
s;^.*<scala\.version>\(.*\)</scala\.version>.*$;\1;p" pom.xml)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine now(?) we're avoiding maven help that caused side effect. It makes the script more robust anyway ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean we can go with this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't see why this part is necessary. Can we fix this without breaking other parts of the build script that depend on scala.version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sbt (or courier?) seems to fail to resolve dependency if the pom file for a dependency is in ~/.m2 but jar file is not. I don't know the reason.

For master branch, before #31862, change-scala-version.sh run mvn help:evaluate and it downloads commons-cli-1.2.pom but doesn't commons-cli-1.2.jar.

$ ls -R ~/.m2/repository/commons-cli/commons-cli/
/home/kou/.m2/repository/commons-cli/commons-cli/:
1.2

/home/kou/.m2/repository/commons-cli/commons-cli/1.2:
_remote.repositories  commons-cli-1.2.pom  commons-cli-1.2.pom.sha1

You can also confirm with mvn -X help:evaluate.

So, I resolved by getting commons-cli-1.2.jarusing mvn dependency:get in #31862.

For branch-3.1, mvn help:evaluate also downloads commons-cli-1.2.jar but it's resolved this part by #31862.
But mvn dependency:get downloads commons-io-2.6.pom though it doesn't download commons-io-2.6.jar.

ls -R ~/.m2/repository/commons-io/commons-io/
/home/kou/.m2/repository/commons-io/commons-io/:
2.4  2.5  2.6

/home/kou/.m2/repository/commons-io/commons-io/2.4:
_remote.repositories  commons-io-2.4.pom  commons-io-2.4.pom.sha1

/home/kou/.m2/repository/commons-io/commons-io/2.5:
_remote.repositories  commons-io-2.5.jar  commons-io-2.5.jar.sha1  commons-io-2.5.pom  commons-io-2.5.pom.sha1

/home/kou/.m2/repository/commons-io/commons-io/2.6:
_remote.repositories  commons-io-2.6.pom  commons-io-2.6.pom.sha1

I understand branch-3.1 depends on commons-io-2.5 but, in fact, if we manually delete commons-io/comons-io-2.6 before sbt, build successfully finishes.

It's also true for master that commons-io-2.6.pom is present but commons-io-2.6.jar is absent.
But there is one difference between master and branch-3.1.
master depends on commons-io-2.8 which is newer version than commons-io-2.6 while branch-3.1 depends on commons-io-2.5 which is older than commons-io-2.6.
So I guess this affects build failure for branch-3.1 while it succeeds for master.

Anyway, if we don't use maven plugins in change-scala-version.sh, this problem can be resolved easily.
Or, do you have a better solution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we solve this by updating commons-io in older branches? that would be fine too IMHO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought what you think too and it can resolve this issue for the time being.
But I'm afraid that this build failure happens again in the future.

In this case, only commons-cli and comons-io matters but, actually, help and dependency downloads not only them.
I confirmed that help downloads pom files but not jar files for 300+ dependencies.

If we use newer maven or upgraded plugins and Spark and those plugins have a comondependency but plugins use newer version, this problem can happen again.

My worry might be unnecessary or you think we just just fix this problem when it happens again, I'll close this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear can we fix this by using newer Maven versions or newer plugins, or newer versions of the dependencies? I think that's fine, even if it means it pulls in a lot of stuff.

Copy link
Member Author

@sarutak sarutak Mar 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can fix this by using newer Maven or newer plugins.
Spark needs to use newer version of dependencies than what plugins use.

This problem can happen if all the following condition is true.

  • Spark and a plugin have a common direct/indirect dependcy.
  • A plugin uses newer or the same version of the dependency.
  • The plugin downloads pom but not jar for the dependency.
  • Build with sbt (or may be the case courier is used) under the condition that the pom is present but the jar is absent..

One example is dependency-plugin and commons-io. Both Spark and dependency-plugin depends on commons-io (dependency-plugin seems to depend on it indirectly).
And branch-3.1 depends on commons-io:2.4, while dependency-plugin depends on newer commons-io:2.6.
When mvn dependency:get runs, pom is downloaded but doesn't jar for commons-io:2.6.
Under this condition, if we build with sbt, sbt or courier doesn't download the dependent jar, leading this issue.

Newer Maven and newer plugins can depends on newer version of the common dependency than what Spark depends on. So I don't think we can't fix this issue using newer Maven or newer plugins.

@dongjoon-hyun
Copy link
Member

Hi, All.

To isolate the release branch (branch-3.1) from Scala 2.13, I removed Scala-2.13 Build GitHub Action job from branch-3.1 completely.

From now, we are able to focus on master branch for Apache Spark 3.2.0 in terms of Scala 2.13.

@sarutak
Copy link
Member Author

sarutak commented Mar 24, 2021

Now GA for master with Scala 2.13 is healthy and d1de69f removed Scala 2.13 build from GA for branch-3.1, I'll close this PR.
If this issue happens again, let's re-visit here.

@sarutak sarutak closed this Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants