In [0]:
=> Use PySpark withColumnRenamed() to rename a DataFrame column, we often need to rename one column or multiple (or all) columns on PySpark DataFrame, you can do this in several ways. 
=> When columns are nested it becomes complicated.
= Since DataFrame’s are an **immutable collection**,you can’t rename or update a column .
=> while when we using withColumnRenamed() it creates a new DataFrame with updated column names

In [0]:
# UseCase1. PySpark "withColumnRenamed"– To rename DataFrame column name:_

PySpark has a withColumnRenamed() function on DataFrame to change a column name. This is the most straight forward approach; this function takes two parameters; the first is your existing column name and the second is the new column name you wish for.

# PySpark withColumnRenamed() Syntax:

**withColumnRenamed(existingName, newName)**

#output/result:-

Returns a new DataFrame with a column renamed.

#note:-

withColumnRenamed function returns a new DataFrame and doesn’t modify the current DataFrame.

In [0]:

# UseCase2.PySpark withColumnRenamed – To rename multiple columns
To change multiple column names, we should chain withColumnRenamed functions
#code:- Renaming columns
df2 = df.withColumnRenamed("dob", "DateOfBirth") \
        .withColumnRenamed("salary", "salary_amount")

In [0]:
3. Using PySpark StructType – To rename a nested column in Dataframe
Changing a column name on nested data is not straight forward and we can do this by creating a new schema with new DataFrame columns using StructType and use it using cast function 



In [0]:
4. Using Select – To rename nested elements.
Let’s see another way to change nested columns by transposing the structure to flat.
here use col funcation to selected structcolumn then use alias method then print schema.

In [0]:
# UseCase5. Using PySpark DataFrame withColumn – To rename nested columns
When you have nested columns on PySpark DatFrame and if you want to rename it, use withColumn on a data frame object to create a new column from an existing and we will need to drop the existing column. Below example creates a “fname” column from “name.firstname” and drops the “name” column

In [0]:
# 6. Using col() function – To Dynamically rename all or multiple columns

In [0]:
# Note
# in usecase5To rename nested columns 

 create new df then use df.withColumn("new_column_name", col("existing_col_name")) then drop existing column.

4. Using Select – To rename nested elements.
df.select(col("name.firstname").alias("fname").printSchema()
Let’s see another way to change nested columns by transposing the structure to flat.
here use col funcation to selected structcolumn then use alias method then print schema.

# in usecase 6. Using col() function – To Dynamically rename all or multiple columns
create new columne as list of col("existing_col_name").alias(newcol_name))
then create newdf as df6 = df.select(*newColumns)
then print schema df6.printSchema()
Each element of the newColumns list is a column expression created using col() function to reference specific columns from the DataFrame, with an alias assigned using the alias() method. This allows you to rename the columns while selecting them or creating new ones

In [0]:
# 7. Using toDF() – To change all columns in a PySpark DataFrame

In [0]:
# Example Data
dataDF = [
    (('James', '', 'Smith'), '1991-04-01', 'M', 3000),
    (('Michael', 'Rose', ''), '2000-05-19', 'M', 4000),
    (('Robert', '', 'Williams'), '1978-09-05', 'M', 4000),
    (('Maria', 'Anne', 'Jones'), '1967-12-01', 'F', 4000),
    (('Jen', 'Mary', 'Brown'), '1980-02-17', 'F', -1)
]

# Define the schema with nested structure
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
schema = StructType([
    StructField('name', StructType([
        StructField('firstname', StringType(), True),
        StructField('middlename', StringType(), True),
        StructField('lastname', StringType(), True)
    ])),
    StructField('dob', StringType(), True),
    StructField('gender', StringType(), True),
    StructField('salary', IntegerType(), True)
])

# Create a Spark session and DataFrame
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
df = spark.createDataFrame(data = dataDF, schema = schema)
df.printSchema()

# 1. PySpark withColumnRenamed – To rename a DataFrame column name
df.withColumnRenamed("dob", "DateOfBirth").printSchema()

# 2. PySpark withColumnRenamed – To rename multiple columns
df2 = df.withColumnRenamed("dob", "DateOfBirth") \
    .withColumnRenamed("salary", "salary_amount")
df2.printSchema()

# 3. Using PySpark StructType – To rename a nested column in DataFrame
from pyspark.sql.functions import col
schema2 = StructType([
    StructField("fname", StringType(), True),
    StructField("middlename", StringType(), True),
    StructField("lname", StringType(), True)
])

df.select(col("name").cast(schema2), col("dob"), col("gender"), col("salary")).printSchema()

# 4. Using Select – To rename nested elements
df.select(col("name.firstname").alias("fname"), 
          col("name.middlename").alias("mname"), 
          col("name.lastname").alias("lname"), 
          col("dob"), col("gender"), col("salary")).printSchema()

# 5. Using PySpark DataFrame withColumn – To rename nested columns
df4 = df.withColumn("fname", col("name.firstname")) \
        .withColumn("mname", col("name.middlename")) \
        .withColumn("lname", col("name.lastname")) \
        .drop("name")
df4.printSchema()

# 6. Using col() function – To Dynamically rename all or multiple columns
newColumns = [col("name.firstname").alias("fname"),
              col("name.middlename").alias("mname"),
              col("name.lastname").alias("lname"),
              col("dob").alias("DateOfBirth"),
              col("gender").alias("sex"),
              col("salary").alias("income")]
df6 = df.select(*newColumns)
df6.printSchema()

# 7. Using toDF() – To change all columns in a PySpark DataFrame
newColumns = ["newCol1","newCol2","newCol3","newCol4"]
df.toDF(*newColumns).printSchema() #(*newColumns) error withoutbracket
df.show()

root
 |-- name: struct (nullable = true)
 |    |-- firstname: string (nullable = true)
 |    |-- middlename: string (nullable = true)
 |    |-- lastname: string (nullable = true)
 |-- dob: string (nullable = true)
 |-- gender: string (nullable = true)
 |-- salary: integer (nullable = true)

root
 |-- name: struct (nullable = true)
 |    |-- firstname: string (nullable = true)
 |    |-- middlename: string (nullable = true)
 |    |-- lastname: string (nullable = true)
 |-- DateOfBirth: string (nullable = true)
 |-- gender: string (nullable = true)
 |-- salary: integer (nullable = true)

root
 |-- name: struct (nullable = true)
 |    |-- firstname: string (nullable = true)
 |    |-- middlename: string (nullable = true)
 |    |-- lastname: string (nullable = true)
 |-- DateOfBirth: string (nullable = true)
 |-- gender: string (nullable = true)
 |-- salary_amount: integer (nullable = true)

root
 |-- name: struct (nullable = true)
 |    |-- fname: string (nullable = true)
 |    |-- middlena