### 1. **Initializing Spark Session**:

In [2]:
!pip install pyspark

Collecting pyspark
  Downloading pyspark-3.5.1.tar.gz (317.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m317.0/317.0 MB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pyspark
  Building wheel for pyspark (setup.py) ... [?25l[?25hdone
  Created wheel for pyspark: filename=pyspark-3.5.1-py2.py3-none-any.whl size=317488491 sha256=f8e38230cf9f05f1df008d437f729115fe9e4bf1d6b7a55edc40bba4d6522493
  Stored in directory: /root/.cache/pip/wheels/80/1d/60/2c256ed38dddce2fdd93be545214a63e02fbd8d74fb0b7f3a6
Successfully built pyspark
Installing collected packages: pyspark
Successfully installed pyspark-3.5.1


In [10]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import explode,map_keys,col
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()

   - Initializes a Spark session with the application name 'SparkByExamples.com'.


### 2. **Defining Sample Data**:


In [4]:
dataDictionary = [
    ('James', {'hair': 'black', 'eye': 'brown'}),
    ('Michael', {'hair': 'brown', 'eye': None}),
    ('Robert', {'hair': 'red', 'eye': 'black'}),
    ('Washington', {'hair': 'grey', 'eye': 'grey'}),
    ('Jefferson', {'hair': 'brown', 'eye': ''})
]

   - Defines sample data with a dictionary column.


### 3. **Creating DataFrame**:


In [5]:
df = spark.createDataFrame(data=dataDictionary, schema=['name', 'properties'])
df.printSchema()
df.show(truncate=False)

root
 |-- name: string (nullable = true)
 |-- properties: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

+----------+-----------------------------+
|name      |properties                   |
+----------+-----------------------------+
|James     |{eye -> brown, hair -> black}|
|Michael   |{eye -> NULL, hair -> brown} |
|Robert    |{eye -> black, hair -> red}  |
|Washington|{eye -> grey, hair -> grey}  |
|Jefferson |{eye -> , hair -> brown}     |
+----------+-----------------------------+



   - Creates a DataFrame with the specified schema.
   - Prints the schema and shows the DataFrame content.


### 4. **Converting RDD to DataFrame with Separate Columns**:


In [6]:
df3 = df.rdd.map(lambda x: (x.name, x.properties["hair"], x.properties["eye"])) \
            .toDF(["name", "hair", "eye"])
df3.printSchema()
df3.show()

root
 |-- name: string (nullable = true)
 |-- hair: string (nullable = true)
 |-- eye: string (nullable = true)

+----------+-----+-----+
|      name| hair|  eye|
+----------+-----+-----+
|     James|black|brown|
|   Michael|brown| NULL|
|    Robert|  red|black|
|Washington| grey| grey|
| Jefferson|brown|     |
+----------+-----+-----+



- Converts the DataFrame to an RDD and then maps each row to extract `name`, `hair`, and `eye`.
- Converts the RDD back to a DataFrame with separate columns for `name`, `hair`, and `eye`.
- Prints the schema and shows the new DataFrame content.


### 5. **Extracting Columns from Dictionary Using withColumn and getItem**:


In [7]:
df.withColumn("hair", df.properties.getItem("hair")) \
  .withColumn("eye", df.properties.getItem("eye")) \
  .drop("properties") \
  .show()

+----------+-----+-----+
|      name| hair|  eye|
+----------+-----+-----+
|     James|black|brown|
|   Michael|brown| NULL|
|    Robert|  red|black|
|Washington| grey| grey|
| Jefferson|brown|     |
+----------+-----+-----+



- Uses `withColumn` and `getItem` to create new columns `hair` and `eye` from the `properties` dictionary.
- Drops the original `properties` column.
- Shows the updated DataFrame content.


### 6. **Extracting Columns from Dictionary Using withColumn and Direct Indexing**:


In [8]:
df.withColumn("hair", df.properties["hair"]) \
  .withColumn("eye", df.properties["eye"]) \
  .drop("properties") \
  .show()

+----------+-----+-----+
|      name| hair|  eye|
+----------+-----+-----+
|     James|black|brown|
|   Michael|brown| NULL|
|    Robert|  red|black|
|Washington| grey| grey|
| Jefferson|brown|     |
+----------+-----+-----+



   - Similar to the previous step, but uses direct indexing to extract `hair` and `eye` from the `properties` dictionary.
   - Shows the updated DataFrame content.


### 7. **Extracting Keys from the Map and Creating Separate Columns**:


In [11]:
keysDF = df.select(explode(map_keys(df.properties))).distinct()
keysList = keysDF.rdd.map(lambda x: x[0]).collect()
keyCols = list(map(lambda x: col("properties").getItem(x).alias(str(x)), keysList))
df.select(df.name, *keyCols).show()

+----------+-----+-----+
|      name|  eye| hair|
+----------+-----+-----+
|     James|brown|black|
|   Michael| NULL|brown|
|    Robert|black|  red|
|Washington| grey| grey|
| Jefferson|     |brown|
+----------+-----+-----+



- Extracts the keys from the `properties` map using `map_keys` and `explode`.
- Collects the distinct keys into a list.
- Creates new columns for each key in the `properties` map using `getItem` and `alias`.
- Selects the `name` column and the newly created columns, and shows the updated DataFrame content.


### Key Points

- **Creating DataFrame**: Demonstrates how to create a DataFrame with a dictionary column.
- **Converting RDD to DataFrame**: Shows how to convert an RDD back to a DataFrame with separate columns.
- **Extracting Columns from Dictionary**: Uses `withColumn` with `getItem` and direct indexing to extract individual fields from a dictionary column.
- **Exploding Map Keys**: Extracts keys from a map column and creates separate columns for each key.
