## Using to_date and to_timestamp

Let us understand how to convert non standard dates and timestamps to standard dates and timestamps.

* `yyyy-MM-dd` is the standard date format
* `yyyy-MM-dd HH:mm:ss.SSS` is the standard timestamp format
* Most of the date manipulation functions expect date and time using standard format. However, we might not have data in the expected standard format.
* In those scenarios we can use `to_date` and `to_timestamp` to convert non standard dates and timestamps to standard ones respectively.

### Tasks

Let us perform few tasks to extract the information we need from date or timestamp.

* Create a Dataframe by name datetimesDF with columns date and time.

In [0]:
datetimes = [(20140228, "28-Feb-2014 10:00:00.123"),
                     (20160229, "20-Feb-2016 08:08:08.999"),
                     (20171031, "31-Dec-2017 11:59:59.123"),
                     (20191130, "31-Aug-2019 00:00:00.000")
                ]

In [0]:
datetimesDF = spark.createDataFrame(datetimes, schema="date BIGINT, time STRING")

In [0]:
datetimesDF.show(truncate=False)

In [0]:
from pyspark.sql.functions import lit, to_date

In [0]:
l = [("X", )]

In [0]:
df = spark.createDataFrame(l).toDF("dummy")

In [0]:
df.show()

In [0]:
df.select(to_date(lit('20210302'), 'yyyyMMdd').alias('to_date')).show()

In [0]:
# year and day of year to standard date
df.select(to_date(lit('2021061'), 'yyyyDDD').alias('to_date')).show()

In [0]:
df.select(to_date(lit('02/03/2021'), 'dd/MM/yyyy').alias('to_date')).show()

In [0]:
df.select(to_date(lit('02-03-2021'), 'dd-MM-yyyy').alias('to_date')).show()

In [0]:
df.select(to_date(lit('02-Mar-2021'), 'dd-MMM-yyyy').alias('to_date')).show()

In [0]:
df.select(to_date(lit('02-March-2021'), 'dd-MMMM-yyyy').alias('to_date')).show()

In [0]:
df.select(to_date(lit('March 2, 2021'), 'MMMM d, yyyy').alias('to_date')).show()

In [0]:
from pyspark.sql.functions import to_timestamp

In [0]:
df.select(to_timestamp(lit('02-Mar-2021'), 'dd-MMM-yyyy').alias('to_date')).show()

In [0]:
df.select(to_timestamp(lit('02-Mar-2021 17:30:15'), 'dd-MMM-yyyy HH:mm:ss').alias('to_date')).show()

* Let us convert data in datetimesDF to standard dates or timestamps

In [0]:
datetimesDF.printSchema()

In [0]:
datetimesDF.show(truncate=False)

In [0]:
from pyspark.sql.functions import col, to_date, to_timestamp

In [0]:
datetimesDF. \
    withColumn('to_date', to_date(col('date').cast('string'), 'yyyyMMdd')). \
    withColumn('to_timestamp', to_timestamp(col('time'), 'dd-MMM-yyyy HH:mm:ss.SSS')). \
    show(truncate=False)