[SPARK-4397][Core] Reorganize 'implicit's to improve the API convenience #3262

zsxwing · 2014-11-14T07:50:53Z

This PR moved implicits to package object and companion object to enable the Scala compiler search them automatically without explicit importing.

It should not break any API. A test project for backforward compatibility is here. It proves the codes compiled with Spark 1.1.0 can run with this PR.

To summarize, the changes are:

Deprecated the old implicit conversion functions: this preserves binary compatibility for code compiled against earlier versions of Spark.
Removed "implicit" from them so they are just normal functions: this made sure the compiler doesn't get confused and warn about multiple implicits in scope.
Created new implicit functions in package rdd object, which is part of the scope that scalac will search when looking for implicit conversions on various RDD objects.

The disadvantage is there are duplicated codes in SparkContext for backforward compatibility.

zsxwing · 2014-11-14T07:51:18Z

/cc @rxin

SparkQA · 2014-11-14T07:55:01Z

Test build #23354 has started for PR 3262 at commit 1eda9e4.

This patch merges cleanly.

SparkQA · 2014-11-14T07:55:13Z

Test build #23354 has finished for PR 3262 at commit 1eda9e4.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-14T07:55:14Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23354/
Test FAILed.

SparkQA · 2014-11-14T08:04:58Z

Test build #23356 has started for PR 3262 at commit 3ac4f07.

This patch merges cleanly.

rxin · 2014-11-14T08:06:04Z

core/src/test/scala/org/apache/spark/ImplicitSuite.scala

+
+  def testRddToPairRDDFunctions(): Unit = {
+    val rdd: org.apache.spark.rdd.RDD[(Int, Int)] = mockRDD
+    rdd.groupByKey


can u add parentheses to groupByKey?

SparkQA · 2014-11-14T08:19:53Z

Test build #23358 has started for PR 3262 at commit 9b73188.

This patch merges cleanly.

aarondav · 2014-11-14T08:49:48Z

What's the distinction for intToIntWritable/writableConverters?

SparkQA · 2014-11-14T09:27:20Z

Test build #23356 has finished for PR 3262 at commit 3ac4f07.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-14T09:27:23Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23356/
Test PASSed.

SparkQA · 2014-11-14T09:44:26Z

Test build #23358 has finished for PR 3262 at commit 9b73188.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-14T09:44:29Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23358/
Test PASSed.

SparkQA · 2014-11-14T12:10:02Z

Test build #23367 has started for PR 3262 at commit 3bdcae2.

This patch merges cleanly.

zsxwing · 2014-11-14T12:16:18Z

What's the distinction for intToIntWritable/writableConverters?

writableConverters can work. Already done. Here is the code to test binary compatibility.

import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.SparkContext._

object ImplicitBackforwardCompatibilityApp {

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("ImplicitBackforwardCompatibilityApp")
    val sc = new SparkContext(conf)

    val rdd = sc.parallelize(1 to 100).map(i => (i, i))
    val rdd2 = rdd.groupByKey() // rddToPairRDDFunctions
    val rdd3 = rdd2.sortByKey() // rddToOrderedRDDFunctions
    val s1 = rdd3.map(_._1).stats() // numericRDDToDoubleRDDFunctions
    println(s1)
    val s2 = rdd3.map(_._1.toDouble).stats() // doubleRDDToDoubleRDDFunctions
    println(s2)
    val f = rdd2.countAsync() // rddToAsyncRDDActions
    println(f.get())
    rdd2.map { case (k, v) => (k, v.size)} saveAsSequenceFile ("/tmp/implicit_test_path") // rddToSequenceFileRDDFunctions

    val a1 = sc.accumulator(123.4) // DoubleAccumulatorParam
    a1.add(1.0)
    println(a1.value)
    val a2 = sc.accumulator(123) // IntAccumulatorParam
    a2.add(3)
    println(a2.value)
    val a3 = sc.accumulator(123L) // LongAccumulatorParam
    a3.add(11L)
    println(a3.value)
    val a4 = sc.accumulator(123F) // FloatAccumulatorParam
    a4.add(1.1F)
    println(a4.value)

    {
      sc.parallelize(1 to 10).map(i => (i, i)).saveAsSequenceFile("/tmp/implicit_test_int")
      val r = sc.sequenceFile[Int, Int]("/tmp/implicit_test_int")
      r.map { case (k, v) => (k.toString, v.toString)} foreach (println)
    }

    {
      sc.parallelize(1 to 10).map(i => (i.toLong, i.toLong)).saveAsSequenceFile("/tmp/implicit_test_long")
      val r = sc.sequenceFile[Long, Long]("/tmp/implicit_test_long")
      r.map { case (k, v) => (k.toString, v.toString)} foreach (println)
    }

    {
      sc.parallelize(1 to 10).map(i => (i.toDouble, i.toDouble)).saveAsSequenceFile("/tmp/implicit_test_double")
      val r = sc.sequenceFile[Double, Double]("/tmp/implicit_test_double")
      r.map { case (k, v) => (k.toString, v.toString)} foreach (println)
    }

    {
      sc.parallelize(1 to 10).map(i => (i.toFloat, i.toFloat)).saveAsSequenceFile("/tmp/implicit_test_float")
      val r = sc.sequenceFile[Float, Float]("/tmp/implicit_test_float")
      r.map { case (k, v) => (k.toString, v.toString)} foreach (println)
    }

    {
      sc.parallelize(1 to 10).map(i => (i.toString, i.toString)).saveAsSequenceFile("/tmp/implicit_test_string")
      val r = sc.sequenceFile[String, String]("/tmp/implicit_test_string")
      r.map { case (k, v) => (k.toString, v.toString)} foreach (println)
    }

    {
      sc.parallelize(1 to 10).map(i => (true, false)).saveAsSequenceFile("/tmp/implicit_test_boolean")
      val r = sc.sequenceFile[Boolean, Boolean]("/tmp/implicit_test_boolean")
      r.map { case (k, v) => (k.toString, v.toString)} foreach (println)
    }

    {
      sc.parallelize(1 to 10).map(i => (Array(i.toByte), Array(i.toByte))).saveAsSequenceFile("/tmp/implicit_test_bytes")
      val r = sc.sequenceFile[Array[Byte], Array[Byte]]("/tmp/implicit_test_bytes")
      r.map { case (k, v) => (k.toString, v.toString)} foreach (println)
    }

    {
      sc.parallelize(1 to 10).map(i => (i.toString, i.toString)).saveAsSequenceFile("/tmp/implicit_test_writable")
      val r = sc.sequenceFile[org.apache.hadoop.io.Text, org.apache.hadoop.io.Text]("/tmp/implicit_test_writable")
      r.map { case (k, v) => (k.toString, v.toString)} foreach (println)
    }

    sc.stop()
  }
}

I compiled the above codes with Spark 1.1.0 and ran it with the new Spark compiled from this PR. And it works correctly.

For intToIntWritable, the problem is that the implicit value for SequenceFileRDDFunctions is a function T => Writable[T]. However, we cannot add these xxxToXXXWritable methods to the implicit scope of T => Writable[T] which is out of the Spark's codes. The definition of implicit scope is:

implicit scope, which contains all sort of companion objects and package object that bear some relation to the implicit's type which we search for (i.e. package object of the type, companion object of the type itself, of its type constructor if any, of its parameters if any, and also of its supertype and supertraits).

Ref: http://eed3si9n.com/revisiting-implicits-without-import-tax

A possible solution is creating a new class for T => Writable[T] like WritableConverter, and change the implicit type of SequenceFileRDDFunctions to this class. E.g.

class SequenceFileRDDFunctions[K, V](
    self: RDD[(K, V)])(implicit keyConverter: NewWritableConverter[K], valueConverter: NewWritableConverter[V])

However, since it's a breaking change (of cause, we can also add a new SequenceFileRDDFunctions class to avoid breaking the old codes), I don't think it's worth us to change it.

SparkQA · 2014-11-14T13:34:48Z

Test build #23367 has finished for PR 3262 at commit 3bdcae2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-14T13:34:51Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23367/
Test PASSed.

rxin · 2014-11-15T06:40:01Z

core/src/main/scala/org/apache/spark/SparkContext.scala

    def addInPlace(t1: Float, t2: Float) = t1 + t2
    def zero(initialValue: Float) = 0f
  }

  // TODO: Add AccumulatorParams for other types, e.g. lists and strings

-  implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)])
+  @deprecated("An API for backforward compatibility", "1.2.0")


update these accordingly too

mateiz · 2014-11-15T06:58:10Z

@zsxwing just curious, with the old conversions being deprecated, is there any chance they'll create compiler warnings in common uses of the code? In any case this seems pretty cool if it doesn't actually break binary compatibility. I guess one risk is if it adds new implicits that cause something to compile differently, but it seems unlikely from a first glance.

rxin · 2014-11-15T06:59:44Z

Ok I finally went through the code. I like the change and it is pretty clever. I believe it should preserve both source compatibility and binary compatibility.

To summarize, the changes are:

Deprecated the old implicit conversion functions: this preserves binary compatibility for code compiled against earlier versions of Spark.
Removed "implicit" from them so they are just normal functions: this made sure the compiler doesn't get confused and warn about multiple implicits in scope.
Created new implicit functions in package rdd object, which is part of the scope that scalac will search when looking for implicit conversions on various RDD objects.

It is still a tricky change so it'd be great to get more eyes.

SparkQA · 2014-11-15T16:44:58Z

Test build #23425 has started for PR 3262 at commit 7266218.

This patch merges cleanly.

zsxwing · 2014-11-15T16:47:17Z

@rxin Thank you for the great summary and reviewing. Already updated it accordingly.

rxin · 2014-11-21T08:01:58Z

@heathermiller @gzm0 - do you think this pr is good for merge now?

gzm0 · 2014-11-21T08:05:21Z

core/src/main/scala/org/apache/spark/SparkContext.scala


-  implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)])
+  @deprecated("Replaced by implicit functions in org.apache.spark.rdd package object. This is " +


All these comments are outdated (still refer to package object, but should refer to RDD companion)

Thank you. Fixed it.

gzm0 · 2014-11-21T08:05:35Z

Otherwise LGTM

gzm0 · 2014-11-21T08:33:21Z

LGTM

SparkQA · 2014-11-21T08:35:06Z

Test build #23716 has started for PR 3262 at commit fc30314.

This patch merges cleanly.

SparkQA · 2014-11-21T10:06:01Z

Test build #23716 has finished for PR 3262 at commit fc30314.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-21T10:06:04Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23716/
Test PASSed.

rxin · 2014-11-21T18:06:27Z

I'm merging this in master. Thanks for working on this @zsxwing and everybody else for reviewing.

rxin · 2014-11-21T18:06:58Z

cc @mateiz @pwendell I'm leaving this out of branch-1.2 thinking it is too last minute to merge something like this. Let me know if you want to cherry pick this into branch-1.2.

mateiz · 2014-11-21T18:42:50Z

Yeah merging to master sounds fine; it's too late to put it in 1.2.

mateiz · 2014-11-21T18:43:06Z

Thanks for the patch @zsxwing, this is very cool.

We reverted #3459 in branch-1.2 due to missing `import o.a.s.SparkContext._`, which is no longer needed in master (#3262). This PR adds #3459 back to branch-1.2 with correct imports. Github is out-of-sync now. The real changes are the last two commits. Author: Xiangrui Meng <meng@databricks.com> Closes #3473 from mengxr/SPARK-4604-1.2 and squashes the following commits: a7638a5 [Xiangrui Meng] add import o.a.s.SparkContext._ for v1.2 b749000 [Xiangrui Meng] [SPARK-4604][MLLIB] make MatrixFactorizationModel public

…+ doc updates We reverted #3439 in branch-1.2 due to missing `import o.a.s.SparkContext._`, which is no longer needed in master (#3262). This PR adds #3439 back to branch-1.2 with correct imports. Github is out-of-sync now. The real changes are the last two commits. Author: Joseph K. Bradley <joseph@databricks.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3474 from mengxr/SPARK-4583-1.2 and squashes the following commits: aca2abb [Xiangrui Meng] add import o.a.s.SparkContext._ for v1.2 6b5564a [Joseph K. Bradley] [SPARK-4583] [mllib] LogLoss for GradientBoostedTrees fix + doc updates

This PR cleans up `import SparkContext._` in core for SPARK-4397(#3262) to prove it really works well. Author: zsxwing <zsxwing@gmail.com> Closes #3530 from zsxwing/SPARK-4397-cleanup and squashes the following commits: 04e2273 [zsxwing] Cleanup 'import SparkContext._' in core

As #3262 wasn't merged to branch 1.2, the `since` value of `deprecated` should be '1.3.0'. Author: zsxwing <zsxwing@gmail.com> Closes #3573 from zsxwing/SPARK-4397-version and squashes the following commits: 1daa03c [zsxwing] Change the 'since' value to '1.3.0'

This PR moved `implicit`s to `package object` and `companion object` to enable the Scala compiler search them automatically without explicit importing. It should not break any API. A test project for backforward compatibility is [here](https://github.com/zsxwing/SPARK-4397-Backforward-Compatibility). It proves the codes compiled with Spark 1.1.0 can run with this PR. To summarize, the changes are: * Deprecated the old implicit conversion functions: this preserves binary compatibility for code compiled against earlier versions of Spark. * Removed "implicit" from them so they are just normal functions: this made sure the compiler doesn't get confused and warn about multiple implicits in scope. * Created new implicit functions in package rdd object, which is part of the scope that scalac will search when looking for implicit conversions on various RDD objects. The disadvantage is there are duplicated codes in SparkContext for backforward compatibility. Author: zsxwing <zsxwing@gmail.com> Closes apache#3262 from zsxwing/SPARK-4397 and squashes the following commits: fc30314 [zsxwing] Update the comments 9c27aff [zsxwing] Move implicit functions to object RDD and forward old functions to new implicit ones directly 2b5f5a4 [zsxwing] Comments for the deprecated functions 52353de [zsxwing] Remove private[spark] from object WritableConverter 34641d4 [zsxwing] Move ImplicitSuite to org.apache.sparktest 7266218 [zsxwing] Add comments to warn the duplicate codes in SparkContext 185c12f [zsxwing] Remove simpleWritableConverter from SparkContext 3bdcae2 [zsxwing] Move WritableConverter implicits to object WritableConverter 9b73188 [zsxwing] Fix the code style issue 3ac4f07 [zsxwing] Add license header 1eda9e4 [zsxwing] Reorganize 'implicit's to improve the API convenience

Reorganize 'implicit's to improve the API convenience

1eda9e4

Add license header

3ac4f07

rxin reviewed Nov 14, 2014
View reviewed changes

Fix the code style issue

9b73188

Move WritableConverter implicits to object WritableConverter

3bdcae2

rxin reviewed Nov 15, 2014
View reviewed changes

zsxwing added 2 commits November 16, 2014 00:26

Remove simpleWritableConverter from SparkContext

185c12f

Add comments to warn the duplicate codes in SparkContext

7266218

gzm0 reviewed Nov 21, 2014
View reviewed changes

Update the comments

fc30314

asfgit closed this in 65b987c Nov 24, 2014

zsxwing deleted the SPARK-4397 branch November 25, 2014 02:14

This was referenced Nov 26, 2014

[BRANCH-1.2][SPARK-4604][MLLIB] make MatrixFactorizationModel public #3473

Closed

[BRANCH-1.2][SPARK-4583][MLLIB] LogLoss for GradientBoostedTrees fix + doc updates #3474

Closed

zsxwing mentioned this pull request Dec 1, 2014

[SPARK-4397][Core] Cleanup 'import SparkContext._' in core #3530

Closed

zsxwing mentioned this pull request Dec 3, 2014

[SPARK-4397][Core] Change the 'since' value of '@deprecated' to '1.3.0' #3573

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-4397][Core] Reorganize 'implicit's to improve the API convenience #3262

[SPARK-4397][Core] Reorganize 'implicit's to improve the API convenience #3262

zsxwing commented Nov 14, 2014

zsxwing commented Nov 14, 2014

SparkQA commented Nov 14, 2014

SparkQA commented Nov 14, 2014

AmplabJenkins commented Nov 14, 2014

SparkQA commented Nov 14, 2014

rxin Nov 14, 2014

SparkQA commented Nov 14, 2014

aarondav commented Nov 14, 2014

SparkQA commented Nov 14, 2014

AmplabJenkins commented Nov 14, 2014

SparkQA commented Nov 14, 2014

AmplabJenkins commented Nov 14, 2014

SparkQA commented Nov 14, 2014

zsxwing commented Nov 14, 2014

SparkQA commented Nov 14, 2014

AmplabJenkins commented Nov 14, 2014

rxin Nov 15, 2014

mateiz commented Nov 15, 2014

rxin commented Nov 15, 2014

SparkQA commented Nov 15, 2014

zsxwing commented Nov 15, 2014

rxin commented Nov 21, 2014

gzm0 Nov 21, 2014

zsxwing Nov 21, 2014

gzm0 commented Nov 21, 2014

gzm0 commented Nov 21, 2014

SparkQA commented Nov 21, 2014

SparkQA commented Nov 21, 2014

AmplabJenkins commented Nov 21, 2014

rxin commented Nov 21, 2014

rxin commented Nov 21, 2014

mateiz commented Nov 21, 2014

mateiz commented Nov 21, 2014


		implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)])
		@deprecated("Replaced by implicit functions in org.apache.spark.rdd package object. This is " +

[SPARK-4397][Core] Reorganize 'implicit's to improve the API convenience #3262

[SPARK-4397][Core] Reorganize 'implicit's to improve the API convenience #3262

Conversation

zsxwing commented Nov 14, 2014

zsxwing commented Nov 14, 2014

SparkQA commented Nov 14, 2014

SparkQA commented Nov 14, 2014

AmplabJenkins commented Nov 14, 2014

SparkQA commented Nov 14, 2014

rxin Nov 14, 2014

Choose a reason for hiding this comment

SparkQA commented Nov 14, 2014

aarondav commented Nov 14, 2014

SparkQA commented Nov 14, 2014

AmplabJenkins commented Nov 14, 2014

SparkQA commented Nov 14, 2014

AmplabJenkins commented Nov 14, 2014

SparkQA commented Nov 14, 2014

zsxwing commented Nov 14, 2014

SparkQA commented Nov 14, 2014

AmplabJenkins commented Nov 14, 2014

rxin Nov 15, 2014

Choose a reason for hiding this comment

mateiz commented Nov 15, 2014

rxin commented Nov 15, 2014

SparkQA commented Nov 15, 2014

zsxwing commented Nov 15, 2014

rxin commented Nov 21, 2014

gzm0 Nov 21, 2014

Choose a reason for hiding this comment

zsxwing Nov 21, 2014

Choose a reason for hiding this comment

gzm0 commented Nov 21, 2014

gzm0 commented Nov 21, 2014

SparkQA commented Nov 21, 2014

SparkQA commented Nov 21, 2014

AmplabJenkins commented Nov 21, 2014

rxin commented Nov 21, 2014

rxin commented Nov 21, 2014

mateiz commented Nov 21, 2014

mateiz commented Nov 21, 2014