- Title: Column Functions and Operators in Spark
- Slug: spark-col-functions-operators
- Date: 2019-12-18 11:08:55
- Category: Computer Science
- Tags: programming, Scala, Spark, DataFrame, column, functions, operators
- Author: Ben Du

In [2]:
from pathlib import Path
import findspark
findspark.init(str(next(Path("/opt").glob("spark-3*"))))

from pyspark.sql import SparkSession, DataFrame
from pyspark.sql.functions import *
from pyspark.sql.types import StructType
spark = SparkSession.builder.appName("PySpark_Str_Func") \
    .enableHiveSupport().getOrCreate()

## [Rounding Functions](http://www.legendu.net/misc/blog/spark-dataframe-func-rounding)

Please refer to 
[Rounding Functions in Spark](http://www.legendu.net/misc/blog/spark-dataframe-func-rounding)
for details.

## [String Functions](http://www.legendu.net/misc/blog/spark-dataframe-func-string)

Please refer to 
[String Functions in Spark](http://www.legendu.net/misc/blog/spark-dataframe-func-string)
for details.

## [Statistical Functions](http://www.legendu.net/misc/blog/spark-stat-functions)

Please refer to
[Statistical Functions in Spark](http://www.legendu.net/misc/blog/spark-stat-functions)
for details.

## [Date Functions in Spark](http://www.legendu.net/misc/blog/spark-dataframe-func-date)

Please refer to 
[Date Functions in Spark](http://www.legendu.net/misc/blog/spark-dataframe-func-date)
for details.

## [Window Functions in Spark](http://www.legendu.net/misc/blog/window-functions-in-spark)

Please refer to 
[Window Functions in Spark](http://www.legendu.net/misc/blog/window-functions-in-spark)
for details.

## lit

In [4]:
val x = lit(1)

In [5]:
x

1

## when

1. `null` in when condition is considered as false.

In [1]:
import org.apache.spark.sql.functions._

val df = spark.read.json("../data/people.json")
df.show

+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+



df = [age: bigint, name: string]


[age: bigint, name: string]

`null` in when condition is considered as `false`.

In [3]:
df.select(when($"age" > 20, 1).otherwise(0).alias("gt20")).show

+----+
|gt20|
+----+
|   0|
|   1|
|   0|
+----+



In [5]:
df.select(when($"age" <= 20, 1).otherwise(0).alias("le20")).show

+----+
|le20|
+----+
|   0|
|   0|
|   1|
+----+



In [6]:
df.select(when($"age".isNull, 0).when($"age" > 20 , 100).otherwise(10).alias("age")).show

+---+
|age|
+---+
|  0|
|100|
| 10|
+---+



In [7]:
df.select(when($"age".isNull, 0).alias("age")).show

+----+
| age|
+----+
|   0|
|null|
|null|
+----+



## References

https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/sql/Dataset.html

https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/sql/functions.html

https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Row.html