Skip to content

Jiaweihu08/example-spark-session-extension

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Example Session Extension For Spark Session

Implementing a dummy optimization rule in Spark SQL to convert the sample upperBound from 0.1 to 0.2.

How to use

  • Install sbt
  • Clone the repo
    git clone git@github.com:Jiaweihu08/example-spark-session-extension.git
  • Get the jar
    cd example-spark-session-extension
    sbt assembly
  • Launch spark-shell
    $SPARK_HOME/bin/spark-shell \
    --jars ./target/scala-2.12/example-spark-session-extension-assembly-0.1.jar \
    --conf spark.sql.extensions=extensions.MySparkSessionExtension
  • See changes in action
    val source = "/src/test/scala/resources/ecommerce300k_2019_Nov.csv"
     
    val df = (spark.read
              .option("header", "true")
              .option("inferSchema", "true")
              .csv(source))
    
    df.sample(0.1).explain(true)
    image

About

This is an example extension for spark session

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages