- Title: UDF in Spark
- Slug: spark-scala-udf
- Date: 2019-11-26
- Category: Computer Science
- Tags: programming, Scala, Spark, UDF, user-defined function
- Author: Ben Du

## Map vs UDF

https://stackoverflow.com/questions/38860808/performance-impact-of-rdd-api-vs-udfs-mixed-with-dataframe-api

https://stackoverflow.com/questions/39039081/difference-between-a-map-and-udf

https://stackoverflow.com/questions/43411234/spark-sql-whether-to-use-row-transformation-or-udf

## UDF

https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-udfs.html

https://blog.cloudera.com/blog/2017/02/working-with-udfs-in-apache-spark/

Use the higher-level standard Column-based functions with Dataset operators 
whenever possible before reverting to using your own custom UDF functions 
since UDFs are a blackbox for Spark and so it does not even try to optimize them.

In [1]:
%%classpath add mvn
org.apache.spark spark-core_2.11 2.3.1
org.apache.spark spark-sql_2.11 2.3.1

In [2]:
import org.apache.spark.sql.SparkSession

val spark = SparkSession
    .builder()
    .master("local[2]")
    .appName("Spark UDF Examples")
    .getOrCreate()
import spark.implicits._

org.apache.spark.sql.SparkSession$implicits$@2269a270

In [3]:
val df = Seq(
    (0, "hello"), 
    (1, "world")
).toDF("id", "text")
df.show

+---+-----+
| id| text|
+---+-----+
|  0|hello|
|  1|world|
+---+-----+



null

In [4]:
import org.apache.spark.sql.functions.udf

val upper: String => String = _.toUpperCase
val upperUDF = udf(upper)

UserDefinedFunction(<function1>,StringType,Some(List(StringType)))

In [5]:
df.withColumn("upper", upperUDF($"text")).show

+---+-----+-----+
| id| text|upper|
+---+-----+-----+
|  0|hello|HELLO|
|  1|world|WORLD|
+---+-----+-----+



In [6]:
val someUDF = udf((arg1: Long, arg2: Long) => {
    arg1 + arg2
})

UserDefinedFunction(<function2>,LongType,Some(List(LongType, LongType)))

## References

https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/sql/Dataset.html

https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/sql/functions.html

https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Row.html