The examples are basic and only for newbies in Scala and Spark.
There are a lot of developers who love Spark and want to have custom APIs for Spark. For example, a Spark integration with another software or simple customized APIs, which can be modulized and used frequently as a thrid-party library. For those guys, here are some custom API examples in this library. This can be imported by a jar as a third-party with Spark.
-
customPipe(..)
equivalent topipe(..)
inRDD
import com.company.spark.custom._ val rdd = sc.parallelize(Seq(1, 2, 3)) rdd.customPipe("cat").collect()
-
customCount()
equivalent tocount()
inDataFrame
import com.company.spark.custom._ val data = Seq(1, 2, 3, 4, 5) val rdd = sc.parallelize(numList) val df = numRDD.toDF df.customCount()
-
customTextFile(..)
equivalent totextFile(..)
inSparkContext
import com.company.spark.custom._ val path = "path-to-file" sc.customTextFile(path)
-
customLoadJsonRDD(..)
equivalent tojsonRDD(..)
inSQLContext
import com.company.spark.custom._ val jsonRDD = sparkContext.parallelize( """{"a": 1}""" :: """{"a": 2}""" :: Nil) sqlContext.customLoadJsonRDD(jsonRDD)
-
For test in your local, just run below:
./dev/run-tests
This library is built with SBT, which is automatically downloaded by the included shell script. To build a JAR file simply run sbt/sbt package
from the project root. The build configuration includes support for both Scala 2.10 and 2.11.