Skip to content
Permalink
Browse files

[SPARK-4923][REPL] Add Developer API to REPL to allow re-publishing t…

…he REPL jar

As requested in [SPARK-4923](https://issues.apache.org/jira/browse/SPARK-4923), I've provided a rough DeveloperApi for the repl. I've only done this for Scala 2.10 because it does not appear that Scala 2.11 is implemented. The Scala 2.11 repl still has the old `scala.tools.nsc` package and the SparkIMain does not appear to have the class server needed for shipping code over (unless this functionality has been moved elsewhere?). I also left alone the `ExecutorClassLoader` and `ConstructorCleaner` as I have no experience working with those classes.

This marks the majority of methods in `SparkIMain` as _private_ with a few special cases being _private[repl]_ as other classes within the same package access them. Any public method has been marked with `DeveloperApi` as suggested by pwendell and I took the liberty of writing up a Scaladoc for each one to further elaborate their usage.

As the Scala 2.11 REPL [conforms]((scala/scala#2206)) to [JSR-223](http://docs.oracle.com/javase/8/docs/technotes/guides/scripting/), the [Spark Kernel](https://github.com/ibm-et/spark-kernel) uses the SparkIMain of Scala 2.10 in the same manner. So, I've taken care to expose methods predominately related to necessary functionality towards a JSR-223 scripting engine implementation.

1. The ability to _get_ variables from the interpreter (and other information like class/symbol/type)
2. The ability to _put_ variables into the interpreter
3. The ability to _compile_ code
4. The ability to _execute_ code
5. The ability to get contextual information regarding the scripting environment

Additional functionality that I marked as exposed included the following:

1. The blocking initialization method (needed to actually start SparkIMain instance)
2. The class server uri (needed to set the _spark.repl.class.uri_ property after initialization), reduced from the entire class server
3. The class output directory (beneficial for tools like ours that need to inspect and use the directory where class files are served)
4. Suppression (quiet/silence) mechanics for output
5. Ability to add a jar to the compile/runtime classpath
6. The reset/close functionality
7. Metric information (last variable assignment, "needed" for extracting results from last execution, real variable name for better debugging)
8. Execution wrapper (useful to have, but debatable)

Aside from `SparkIMain`, I updated other classes/traits and their methods in the _repl_ package to be private/package protected where possible. A few odd cases (like the SparkHelper being in the scala.tools.nsc package to expose a private variable) still exist, but I did my best at labelling them.

`SparkCommandLine` has proven useful to extract settings and `SparkJLineCompletion` has proven to be useful in implementing auto-completion in the [Spark Kernel](https://github.com/ibm-et/spark-kernel) project. Other than those - and `SparkIMain` - my experience has yielded that other classes/methods are not necessary for interactive applications taking advantage of the REPL API.

Tested via the following:

    $ export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
    $ mvn -Phadoop-2.3 -DskipTests clean package && mvn -Phadoop-2.3 test

Also did a quick verification that I could start the shell and execute some code:

    $ ./bin/spark-shell
    ...

    scala> val x = 3
    x: Int = 3

    scala> sc.parallelize(1 to 10).reduce(_+_)
    ...
    res1: Int = 55

Author: Chip Senkbeil <rcsenkbe@us.ibm.com>
Author: Chip Senkbeil <chip.senkbeil@gmail.com>

Closes #4034 from rcsenkbeil/AddDeveloperApiToRepl and squashes the following commits:

053ca75 [Chip Senkbeil] Fixed failed build by adding missing DeveloperApi import
c1b88aa [Chip Senkbeil] Added DeveloperApi to public classes in repl
6dc1ee2 [Chip Senkbeil] Added missing method to expose error reporting flag
26fd286 [Chip Senkbeil] Refactored other Scala 2.10 classes and methods to be private/package protected where possible
925c112 [Chip Senkbeil] Added DeveloperApi and Scaladocs to SparkIMain for Scala 2.10
  • Loading branch information...
Chip Senkbeil authored and pwendell committed Jan 16, 2015
1 parent ecf943d commit d05c9ee6e8441e54732e40de45d1d2311307908f
@@ -19,14 +19,21 @@ package org.apache.spark.repl

import scala.tools.nsc.{Settings, CompilerCommand}
import scala.Predef._
import org.apache.spark.annotation.DeveloperApi

/**
* Command class enabling Spark-specific command line options (provided by
* <i>org.apache.spark.repl.SparkRunnerSettings</i>).
*
* @example new SparkCommandLine(Nil).settings
*
* @param args The list of command line arguments
* @param settings The underlying settings to associate with this set of
* command-line options
*/
@DeveloperApi
class SparkCommandLine(args: List[String], override val settings: Settings)
extends CompilerCommand(args, settings) {

def this(args: List[String], error: String => Unit) {
this(args, new SparkRunnerSettings(error))
}
@@ -15,7 +15,7 @@ import scala.tools.nsc.ast.parser.Tokens.EOF

import org.apache.spark.Logging

trait SparkExprTyper extends Logging {
private[repl] trait SparkExprTyper extends Logging {
val repl: SparkIMain

import repl._
@@ -17,6 +17,23 @@

package scala.tools.nsc

import org.apache.spark.annotation.DeveloperApi

// NOTE: Forced to be public (and in scala.tools.nsc package) to access the
// settings "explicitParentLoader" method

/**
* Provides exposure for the explicitParentLoader method on settings instances.
*/
@DeveloperApi
object SparkHelper {
/**
* Retrieves the explicit parent loader for the provided settings.
*
* @param settings The settings whose explicit parent loader to retrieve
*
* @return The Optional classloader representing the explicit parent loader
*/
@DeveloperApi
def explicitParentLoader(settings: Settings) = settings.explicitParentLoader
}
Oops, something went wrong.

0 comments on commit d05c9ee

Please sign in to comment.
You can’t perform that action at this time.