### Details

My experience with data stored within a Spark framework is that maps and arrays are used heavily.  This short script will provide an example of handling columns in that format.

#### Import libraries and data

In [0]:
import pandas as pd
import pyspark.sql.functions as F

from pyspark.sql.window import Window
from plotnine import * 

In [0]:
wy_patterns = spark.table("monthly_all.monthly_patterns").filter(F.col("region") == "wy").select('placekey', 'location_name', 'city', 'date_range_start', 'raw_visitor_counts', 'visits_by_day', 'related_same_day_brand')

### Exploding data

#### Exploding an array

In [0]:
display(wy_patterns.select('placekey','date_range_start', F.explode('visits_by_day')))

placekey,date_range_start,col
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,2
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,0
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,1
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,1
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,0
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,0
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,0
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,0
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,1
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,0


#### Using `posexplode()`

With `posexplode()` two columns are returned (`pos` and `col`).  Notice that I am renaming those columns so that they are more descriptive for the exploded values.

In [0]:
bydays = wy_patterns.select('placekey','date_range_start', F.posexplode('visits_by_day'))\
  .withColumn('dayofmonth', F.col('pos') + 1 )\
  .drop('pos')\
  .withColumnRenamed('col', 'visits_day')
display(bydays)

placekey,date_range_start,visits_day,dayofmonth
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,2,1
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,0,2
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,1,3
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,1,4
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,0,5
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,0,6
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,0,7
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,0,8
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,1,9
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,0,10


#### Exploding a mapped column

The previous columns were arrays.  A few of our columns are maps or dictionaries with key/value pairs.  When we use `explode()` we get the columns `key` and `value`.

In [0]:
display(wy_patterns.select('placekey','date_range_start', F.explode('related_same_day_brand')))

placekey,date_range_start,key,value
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,Smith's Food & Drug Stores,12
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,Chevron,7
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,Maverik,5
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,Orvis,2
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,Loaf 'N Jug,2
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,NAPA Auto Parts,2
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,T.J. Maxx,2
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,Starbucks,2
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,Wendy's,2
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,Staples,2


We could use `posexplode()` to get a rank column.  We would then have three columns that are worth fixing and renaming.

In [0]:
wy_patterns.select('placekey','date_range_start', F.posexplode('related_same_day_brand'))\
  .withColumnRenamed('key', 'same_day_brands')\
  .withColumnRenamed('value', 'count_same_day_brands')\
  .withColumn('rank', F.col('pos') + 1)\
  .drop('pos')\
  .display()

placekey,date_range_start,same_day_brands,count_same_day_brands,rank
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,Smith's Food & Drug Stores,12,1
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,Chevron,7,2
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,Maverik,5,3
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,Orvis,2,4
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,Loaf 'N Jug,2,5
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,NAPA Auto Parts,2,6
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,T.J. Maxx,2,7
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,Starbucks,2,8
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,Wendy's,2,9
zzy-222@5qp-t5g-99f,2020-02-01T00:00:00-07:00,Staples,2,10
