d-sandbox
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 400px">
</div>

# Complex Types

##### Methods
- DataFrame (<a href="https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=dataframe#pyspark.sql.DataFrame" target="_blank">Python</a>/<a href="http://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html" target="_blank">Scala</a>): `union`
- Built-In Functions (<a href="https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=functions#module-pyspark.sql.functions" target="_blank">Python</a>/<a href="http://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html" target="_blank">Scala</a>):
  - Collection: `explode`, `array_contains`, `element_at`, `collect_set`
  - String: `split`

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) User Purchases
List all size and quality options purchased by each buyer.
1. Extract item details from purchases
2. Extract size and quality options from mattress purchases
3. Extract size and quality options from pillow purchases
4. Combine data for mattress and pillows
5. List all size and quality options bought by each user

In [0]:
%run ./Includes/Classroom-Setup

In [0]:
df = spark.read.parquet(salesPath)
display(df)

### 1. Extract item details from purchases
- Explode **`items`** field in **`df`**
- Select **`email`** and **`item.item_name`** fields
- Split words in **`item_name`** into an array and alias with "details"

Assign the resulting DataFrame to **`detailsDF`**.

In [0]:
from pyspark.sql.functions import *

detailsDF = (df.withColumn("items", explode("items"))
  .select("email", "items.item_name")
  .withColumn("details", split(col("item_name"), " "))             
)
display(detailsDF)

### 2. Extract size and quality options from mattress purchases
- Filter **`detailsDF`** for records where **`details`** contains "Mattress"
- Add **`size`** column from extracting element at position 2
- Add **`quality`** column from extracting element at position 1

Save result as **`mattressDF`**.

In [0]:
mattressDF = (detailsDF.filter(array_contains(col("details"), "Mattress"))
  .withColumn("size", element_at(col("details"), 2))
  .withColumn("quality", element_at(col("details"), 1))
)           
display(mattressDF)

### 3. Extract size and quality options from pillow purchases
- Filter **`detailsDF`** for records where **`details`** contains "Pillow"
- Add **`size`** column from extracting element at position 1
- Add **`quality`** column from extracting element at position 2

Note the positions of **`size`** and **`quality`** are switched for mattresses and pillows.

Save result as **`pillowDF`**.

In [0]:
pillowDF = (detailsDF.filter(array_contains(col("details"), "Pillow"))
  .withColumn("size", element_at(col("details"), 1))
  .withColumn("quality", element_at(col("details"), 2))
)           
display(pillowDF)

### 4. Combine data for mattress and pillows
- Perform a union on **`mattressDF`** and **`pillowDF`** by column names
- Drop **`details`** column

Save result as **`unionDF`**.

In [0]:
unionDF = (mattressDF.unionByName(pillowDF)
  .drop("details"))
display(unionDF)

### 5. List all size and quality options bought by each user
- Group rows in **`unionDF`** by **`email`**
  - Collect set of all items in **`size`** for each user with alias "size options"
  - Collect set of all items in **`quality`** for each user with alias "quality options"
  
Save result as **`optionsDF`**.

In [0]:
optionsDF = (unionDF.groupBy("email")
  .agg(collect_set("size").alias("size options"),
       collect_set("quality").alias("quality options"))
)
display(optionsDF)

### Clean up classroom

In [0]:
%run ./Includes/Classroom-Cleanup
