Skip to content
Branch: master
Go to file
Code

Latest commit

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Groovy Spark

Groovy repl support

Use Case Example

write a normal spark program in groovy

	static void main(String[] args){

		def textFile = sc.textFile("hdfs://namenode.local:8020/kmeans_data.txt")
		def result = textFile.map(new PairF({ row ->
		   //arbitary map function
			return new Tuple2(row, Arrays.asList([1,2,3]))

        }))
		println(result.count())

		//insert a call to repl() to drop into a groovy shell with context (access to all in scope variables)
		repl()
      }

When execution reaches the repl() call you get a groovy prompt where can explore variables and launch new spark jobs

 groovy:000> 
 			 import scala.Tuple2;
			 import repl.F;
			 import repl.PairF;
			 filtered_result = result.filter(new F({ row ->
			     //arbitrary filter function	
				 return true
			 }))

			 filtered_result.take(2
===> [[0.1, 0.1, 0.1], [9.2, 9.2, 9.2]])

Building

Groovy-spark currently requires a slightly modified version of groovy. It is hoped that this can be removed someday but for now you can do the following to build groovy-spark

			 git clone https://github.com/bunions1/groovy-core.git
			 git checkout spark_shell_support
			 git submodule init
			 git submodule update

			 #this take a while as it has to build all of groovy
			 ./gradlew dist

			 #executes the example
			 ./gradlew :groovy-spark-example:run			 

About

No description, website, or topics provided.

Resources

Releases

No releases published
You can’t perform that action at this time.