Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6363][BUILD] Make Scala 2.11 the default Scala version #10608

Closed
wants to merge 10 commits into from

Conversation

JoshRosen
Copy link
Contributor

This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).

The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).

After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.

@JoshRosen
Copy link
Contributor Author

Note: I'm happy to defer merging this pull request for a while. I just happened to have the changes ready locally and figured that it would be nice to test them on Jenkins.

@SparkQA
Copy link

SparkQA commented Jan 6, 2016

Test build #48796 has finished for PR 10608 at commit 623a929.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen JoshRosen closed this Jan 6, 2016
@JoshRosen JoshRosen reopened this Jan 27, 2016
@SparkQA
Copy link

SparkQA commented Jan 27, 2016

Test build #50218 has finished for PR 10608 at commit 18c5223.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

JoshRosen referenced this pull request Jan 27, 2016
The high level idea is that instead of having the executors send both accumulator updates and TaskMetrics, we should have them send only accumulator updates. This eliminates the need to maintain both code paths since one can be implemented in terms of the other. This effort is split into two parts:

**SPARK-12895: Implement TaskMetrics using accumulators.** TaskMetrics is basically just a bunch of accumulable fields. This patch makes TaskMetrics a syntactic wrapper around a collection of accumulators so we don't need to send TaskMetrics from the executors to the driver.

**SPARK-12896: Send only accumulator updates to the driver.** Now that TaskMetrics are expressed in terms of accumulators, we can capture all TaskMetrics values if we just send accumulator updates from the executors to the driver. This completes the parent issue SPARK-10620.

While an effort has been made to preserve as much of the public API as possible, there were a few known breaking DeveloperApi changes that would be very awkward to maintain. I will gather the full list shortly and post it here.

Note: This was once part of #10717. This patch is split out into its own patch from there to make it easier for others to review. Other smaller pieces of already been merged into master.

Author: Andrew Or <andrew@databricks.com>

Closes #10835 from andrewor14/task-metrics-use-accums.
@JoshRosen
Copy link
Contributor Author

/cc @srowen @pwendell @yhuai for review

@srowen
Copy link
Member

srowen commented Jan 27, 2016

That looks like what I'd expect -- roughly what you get from running the change-version script and updating the list of dependencies?

@@ -127,13 +127,4 @@
</plugin>
</plugins>
</build>
<profiles>
<!-- Quasiquotes are merged into scala reflect from scala 2.11 onwards. -->
<profile>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this profile still become necessary when 2.10 is enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, I think it's no longer needed. This was from the days where Spark SQL depended on the Quasiquote compiler for its code generation, but I think this profile became redundant once we started using Janino and removed the old codgen.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@JoshRosen
Copy link
Contributor Author

@srowen, yep, this is pretty much just the result of running the version changing and dep update scripts, followed by a grep through the documentation and build to invert all of the directions and build scripts.

@SparkQA
Copy link

SparkQA commented Jan 27, 2016

Test build #50236 has finished for PR 10608 at commit 18c5223.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jan 28, 2016

Want to fix the mima issue?

@SparkQA
Copy link

SparkQA commented Jan 29, 2016

Test build #50321 has finished for PR 10608 at commit 8dd36ab.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jan 29, 2016

Test build #50343 has finished for PR 10608 at commit 8dd36ab.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

These look like legitimate test failures. It's a little tricky to reason about which ones might be caused by this patch vs. longstanding 2.11 compatibility issues because it turns out that we don't actually run the entire test suite against Scala 2.11 but only test compilation.

A lot of the failures seem to occur in HiveThriftBinaryServerSuite; it seems that we time out while waiting for the server to start. I wonder whether there's a transitive dependency change in that subproject which might have caused this.

@JoshRosen
Copy link
Contributor Author

Looks like the problem might be related to #9816: I think that the REPL log level is being set to WARN, preventing the suite from finding the log messages which indicate that the ThriftServer started up.

@JoshRosen
Copy link
Contributor Author

I think I found the problem: looks like Utils.isInterp() was returning true even in non-interpreter code because of a small difference in how the 2.11 REPL works.

@SparkQA
Copy link

SparkQA commented Jan 29, 2016

Test build #50367 has finished for PR 10608 at commit 2c901e5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 29, 2016

Test build #50401 has finished for PR 10608 at commit d24c31d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Okay, I think this should now be ready for final review and sign-off.

@rxin
Copy link
Contributor

rxin commented Jan 30, 2016

It's a bit hard to know whether the repl changes make sense or not, but I think we just need to try it out and see if problems come up.

LGTM.

@rxin
Copy link
Contributor

rxin commented Jan 30, 2016

Merging this in master. Hopefully compilation will be faster.

@asfgit asfgit closed this in 289373b Jan 30, 2016
@JoshRosen JoshRosen deleted the SPARK-6363 branch January 30, 2016 08:25
@@ -165,7 +165,7 @@
<!-- managed up from 3.2.1 for SPARK-11652 -->
<commons.collections.version>3.2.2</commons.collections.version>
<scala.version>2.10.5</scala.version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshRosen , why this version is 2.10.5, is it need to modify?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants