In [0]:
from pyspark.sql import * 
from pyspark.sql.functions import *
from pyspark.sql.types import *
from datetime import datetime


### Function Explanation: `to_date_df`

The `to_date_df` function is designed to convert a string column in a PySpark DataFrame to a date type. It takes three arguments:

- `df`: The input DataFrame.
- `fmt`: The date format string that matches the format of the string dates in the column.
- `fld`: The name of the column to convert.

The function uses PySpark's `to_date` function to parse the string column according to the specified format and returns a new DataFrame with the column converted to `DateType`.

**Function Definition:**
<pre>
python
def to_date_df(df, fmt, fld):
    return df.withColumn(fld, to_date(col(fld), fmt))
<pre>

**Usage Example:**
Suppose you have a DataFrame with a column `"date_str"` containing dates as strings in the format `"yyyy/MM/dd"`. You can convert this column to date type as follows:




In [0]:
def to_date_df(df,fmt,fld):
    return df.withColumn(fld,to_date(col(fld),fmt))

In [0]:
my_schema = StructType([
    StructField("ID",StringType()),
    StructField("EventDate",StringType())
])

my_rows = [Row("122","04/05/2020"),Row("123","04/06/2020"),Row("124","04/07/2020"),Row("125","04/08/2020")]
my_df =spark.createDataFrame(my_rows,my_schema)








In [0]:
my_df.printSchema()
my_df.show()

new_df = to_date_df(my_df,"MM/dd/yyyy","EventDate") 
new_df.printSchema()
new_df.show()


root
 |-- ID: string (nullable = true)
 |-- EventDate: string (nullable = true)

+---+----------+
| ID| EventDate|
+---+----------+
|122|04/05/2020|
|123|04/06/2020|
|124|04/07/2020|
|125|04/08/2020|
+---+----------+

root
 |-- ID: string (nullable = true)
 |-- EventDate: date (nullable = true)

+---+----------+
| ID| EventDate|
+---+----------+
|122|2020-04-05|
|123|2020-04-06|
|124|2020-04-07|
|125|2020-04-08|
+---+----------+

