### 1. **Initializing Spark Session**:


In [1]:
!pip install pyspark

Collecting pyspark
  Downloading pyspark-3.5.1.tar.gz (317.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m317.0/317.0 MB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pyspark
  Building wheel for pyspark (setup.py) ... [?25l[?25hdone
  Created wheel for pyspark: filename=pyspark-3.5.1-py2.py3-none-any.whl size=317488491 sha256=2d8859aa10edd1fb6bf41d0691c76d30c505bc21cb2ab3e0fa02166724cd69e2
  Stored in directory: /root/.cache/pip/wheels/80/1d/60/2c256ed38dddce2fdd93be545214a63e02fbd8d74fb0b7f3a6
Successfully built pyspark
Installing collected packages: pyspark
Successfully installed pyspark-3.5.1


In [2]:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
from pyspark.sql.functions import col, lit, create_map

spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()

   - Imports necessary PySpark libraries.
   - Initializes a Spark session with the application name 'SparkByExamples.com'.


### 2. **Defining Sample Data and Schema**:


In [3]:
data = [
    ("36636", "Finance", 3000, "USA"),
    ("40288", "Finance", 5000, "IND"),
    ("42114", "Sales", 3900, "USA"),
    ("39192", "Marketing", 2500, "CAN"),
    ("34534", "Sales", 6500, "USA")
]

schema = StructType([
    StructField('id', StringType(), True),
    StructField('dept', StringType(), True),
    StructField('salary', IntegerType(), True),
    StructField('location', StringType(), True)
])

- Defines sample data as a list of tuples, where each tuple represents a row in the DataFrame.
- Defines a schema with four fields: `id`, `dept`, `salary`, and `location`.


### 3. **Creating DataFrame**:


In [4]:
df = spark.createDataFrame(data=data, schema=schema)
df.printSchema()
df.show(truncate=False)

root
 |-- id: string (nullable = true)
 |-- dept: string (nullable = true)
 |-- salary: integer (nullable = true)
 |-- location: string (nullable = true)

+-----+---------+------+--------+
|id   |dept     |salary|location|
+-----+---------+------+--------+
|36636|Finance  |3000  |USA     |
|40288|Finance  |5000  |IND     |
|42114|Sales    |3900  |USA     |
|39192|Marketing|2500  |CAN     |
|34534|Sales    |6500  |USA     |
+-----+---------+------+--------+



- Creates a DataFrame from the sample data and schema.
- Prints the schema of the DataFrame.
- Displays the content of the DataFrame without truncating the output.


### 4. **Converting Columns to a Map and Dropping Original Columns**:


In [5]:
df = df.withColumn("propertiesMap", create_map(
    lit("salary"), col("salary"),
    lit("location"), col("location")
)).drop("salary", "location")

- Uses `withColumn` and `create_map` to create a new column `propertiesMap` that maps the values of `salary` and `location`.
- Uses `lit` to create literal expressions for the map keys (`"salary"` and `"location"`).
- Uses `col` to refer to the values of `salary` and `location` columns.
- Drops the original `salary` and `location` columns using `drop`.


### 5. **Displaying the Updated DataFrame**:


In [6]:
df.printSchema()
df.show(truncate=False)

root
 |-- id: string (nullable = true)
 |-- dept: string (nullable = true)
 |-- propertiesMap: map (nullable = false)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

+-----+---------+---------------------------------+
|id   |dept     |propertiesMap                    |
+-----+---------+---------------------------------+
|36636|Finance  |{salary -> 3000, location -> USA}|
|40288|Finance  |{salary -> 5000, location -> IND}|
|42114|Sales    |{salary -> 3900, location -> USA}|
|39192|Marketing|{salary -> 2500, location -> CAN}|
|34534|Sales    |{salary -> 6500, location -> USA}|
+-----+---------+---------------------------------+



  - Prints the schema of the updated DataFrame to show the new `propertiesMap` column.
  - Displays the content of the updated DataFrame without truncating the output.


### Key Points

- **Creating DataFrame**: Demonstrates how to create a DataFrame with a given schema and data.
- **Converting Columns to Map**: Shows how to convert specific columns to a map type and drop the original columns.
- **Using Functions**: Utilizes `col`, `lit`, and `create_map` functions from PySpark to manipulate the DataFrame.
