Skip to content

Using Scala to create a Spark UDF designed to be callable from PySpark.

License

Notifications You must be signed in to change notification settings

ONSBigData/scala_udf_example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scala UDF Example

Example of a UDF defined in Scala, callable from PySpark.

Simply wraps a call to JaroWinklerDistance from Apache commons.

Usage

To build the Jar:

mvn package

To add the jar to PySpark set the following config:

spark.driver.extraClassPath /path/to/jarfile.jar
spark.jars /path/to/jarfile.jar

To register the function with PySpark:

sqlContext = SQLContext(spark.sparkContext)
sqlContext.registerJavaFunction('jaro_winkler', 'uk.gov.ons.mdr.examples.JaroWinklerDistance', pyspark.sql.types.DoubleType())

About

Using Scala to create a Spark UDF designed to be callable from PySpark.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages