## *EXPLODE FUNCTION*

In [0]:
# Crating array dataframe
df_array = [
    ("Parker", ["Tv", "Refrigerator", "Oven", "AC"]),
    ("Marcus", None),
    ("Daniel", ["AC", "Tv", "Mixer", None]),
    ("James", ["Refrigerator", None]),
    ("Antony", ["AC", "Mixer", "Washing Machine", "Tv"]),
]

schema = ["Name", "Appliances"]

In [0]:
# Crating map dataframe
df_map_values = [ 
    ("Sterling", {"TV": "Sony", "Refrigerator": "LG", "Mixer": "Butterfly"}),
    ("Andreas", {"AC": "Bluestar", "TV": ""}),
    ("Ramos", {"Refrigerator": "LG", "AC": "Voltas"}),
    ("Brad", {"Mixer": "Preethi", "Grinder": "Butterfly", "TV": "Samsung"}),
    ("Shelby", None),
]

Schema = ["Name", "Appliances"]

In [0]:
df_arr = spark.createDataFrame(df_array, schema)
df_map = spark.createDataFrame(df_map_values, Schema)

In [0]:
df_arr.display()
df_map.display()

Name,Appliances
Parker,"List(Tv, Refrigerator, Oven, AC)"
Marcus,
Daniel,"List(AC, Tv, Mixer, null)"
James,"List(Refrigerator, null)"
Antony,"List(AC, Mixer, Washing Machine, Tv)"


Name,Appliances
Sterling,"Map(Refrigerator -> LG, TV -> Sony, Mixer -> Butterfly)"
Andreas,"Map(TV -> , AC -> Bluestar)"
Ramos,"Map(Refrigerator -> LG, AC -> Voltas)"
Brad,"Map(TV -> Samsung, Mixer -> Preethi, Grinder -> Butterfly)"
Shelby,


In [0]:
df_arr.printSchema()
df_map.printSchema()

root
 |-- Name: string (nullable = true)
 |-- Appliances: array (nullable = true)
 |    |-- element: string (containsNull = true)

root
 |-- Name: string (nullable = true)
 |-- Appliances: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)



### *explode()*
PySpark function explode(Column) is used to explode or create array or map columns to rows. When an array is passed to this function, it creates a new default column
"col1" and it contains all array elements. When a map is passed, it creates two new columns one for key and one for value and each element in map split into the rows.

In [0]:
from pyspark.sql.functions import explode, explode_outer, posexplode, posexplode_outer

In [0]:
df_arr.select("Name",explode("Appliances").alias("List")).display()
df_map.select("name",explode("appliances")).display()

Name,List
Parker,Tv
Parker,Refrigerator
Parker,Oven
Parker,AC
Daniel,AC
Daniel,Tv
Daniel,Mixer
Daniel,
James,Refrigerator
James,


name,key,value
Sterling,Refrigerator,LG
Sterling,TV,Sony
Sterling,Mixer,Butterfly
Andreas,TV,
Andreas,AC,Bluestar
Ramos,Refrigerator,LG
Ramos,AC,Voltas
Brad,TV,Samsung
Brad,Mixer,Preethi
Brad,Grinder,Butterfly


### *explode_outer()*
PySpark SQL explode_outer(e: Column) function is used to create a row for each element in the array or map column. Unlike explode, if the array or map is null or empty, explode_outer returns null.


In [0]:
df_arr.select("Name",explode_outer("Appliances")).display()
df_map.select("name",explode_outer("appliances")).display()

Name,col
Parker,Tv
Parker,Refrigerator
Parker,Oven
Parker,AC
Marcus,
Daniel,AC
Daniel,Tv
Daniel,Mixer
Daniel,
James,Refrigerator


name,key,value
Sterling,Refrigerator,LG
Sterling,TV,Sony
Sterling,Mixer,Butterfly
Andreas,TV,
Andreas,AC,Bluestar
Ramos,Refrigerator,LG
Ramos,AC,Voltas
Brad,TV,Samsung
Brad,Mixer,Preethi
Brad,Grinder,Butterfly


### *posexplode()*
posexplode(e: Column) creates a row for each element in the array and creates two columns "pos" to hold the position of the array element and the "col" to hold the actual
array value. And when the input column is a map, posexplode function creates 3 columns “pos” to hold the position of the map element, "key" and "value" columns.This will 
ignore elements that have null or empty. 

In [0]:
df_arr.select("Name",posexplode("appliances")).display()
df_map.select("name",posexplode("appliances")).display()

Name,pos,col
Parker,0,Tv
Parker,1,Refrigerator
Parker,2,Oven
Parker,3,AC
Daniel,0,AC
Daniel,1,Tv
Daniel,2,Mixer
Daniel,3,
James,0,Refrigerator
James,1,


name,pos,key,value
Sterling,0,Refrigerator,LG
Sterling,1,TV,Sony
Sterling,2,Mixer,Butterfly
Andreas,0,TV,
Andreas,1,AC,Bluestar
Ramos,0,Refrigerator,LG
Ramos,1,AC,Voltas
Brad,0,TV,Samsung
Brad,1,Mixer,Preethi
Brad,2,Grinder,Butterfly


### *posexplode_outer()*
Spark posexplode_outer(e: Column) creates a row for each element in the array and creates two columns "pos" to hold the position of the array element and the "col" to hold   
the actual array value. Unlike posexplode, if the array or map is null or empty, posexplode_outer function returns null, null for pos and col columns. Similarly for the map, 
it returns rows with nulls.

In [0]:
df_arr.select("Name",posexplode_outer("Appliances")).display()
df_map.select("name",posexplode_outer("appliances")).display()

Name,pos,col
Parker,0.0,Tv
Parker,1.0,Refrigerator
Parker,2.0,Oven
Parker,3.0,AC
Marcus,,
Daniel,0.0,AC
Daniel,1.0,Tv
Daniel,2.0,Mixer
Daniel,3.0,
James,0.0,Refrigerator


name,pos,key,value
Sterling,0.0,Refrigerator,LG
Sterling,1.0,TV,Sony
Sterling,2.0,Mixer,Butterfly
Andreas,0.0,TV,
Andreas,1.0,AC,Bluestar
Ramos,0.0,Refrigerator,LG
Ramos,1.0,AC,Voltas
Brad,0.0,TV,Samsung
Brad,1.0,Mixer,Preethi
Brad,2.0,Grinder,Butterfly
