## Date and Time Manipulation Functions

* We can use `current_date` to get today’s server date. 
  * Date will be returned using **yyyy-MM-dd** format.
* We can use `current_timestamp` to get current server time. 
  * Timestamp will be returned using **yyyy-MM-dd HH:mm:ss:SSS** format.
  * Hours will be by default in 24 hour format.

In [0]:
l = [('X',)]

In [0]:
df = spark.createDataFrame(l).toDF('dummy')

In [0]:
from pyspark.sql.functions import current_date, current_timestamp

In [0]:
df.select(current_date()).show()

+--------------+
|current_date()|
+--------------+
|    2022-04-20|
+--------------+



In [0]:
df.select(current_timestamp()).show(truncate=False)

+-----------------------+
|current_timestamp()    |
+-----------------------+
|2022-04-20 13:37:45.324|
+-----------------------+



* We can convert a string which contain date or timestamp in non-standard format to standard date or time using `to_date` or `to_timestamp` function respectively.

In [0]:
from pyspark.sql.functions import lit,to_date,to_timestamp

In [0]:
df.select(to_date(lit('20210228'),'yyyyMMdd').alias('to_date')).show()

+----------+
|   to_date|
+----------+
|2021-02-28|
+----------+



In [0]:
df.select(to_timestamp(lit('20210228 1725'),'yyyyMMdd HHmm').alias('to_timestamp')).show()

+-------------------+
|       to_timestamp|
+-------------------+
|2021-02-28 17:25:00|
+-------------------+



Date and Time Arithmetic

* Adding days to a date or timestamp - `date_add`
* Subtracting days from a date or timestamp - `date_sub`
* Getting difference between 2 dates or timestamps - `datediff`
* Getting the number of months between 2 dates or timestamps - `months_between`
* Adding months to a date or timestamp - `add_months`
* Getting next day from a given date - `next_day`
* All the functions are self explanatory. We can apply these on standard date or timestamp. All the functions return date even when applied on timestamp field.

In [0]:
datetimes = [("2014-02-28", "2014-02-28 10:00:00.123"),
                     ("2016-02-29", "2016-02-29 08:08:08.999"),
                     ("2017-10-31", "2017-12-31 11:59:59.123"),
                     ("2019-11-30", "2019-08-31 00:00:00.000")
                ]

In [0]:
datetimesDF = spark.createDataFrame(datetimes, schema="date STRING, time STRING")


In [0]:
datetimesDF.show(truncate=False)

+----------+-----------------------+
|date      |time                   |
+----------+-----------------------+
|2014-02-28|2014-02-28 10:00:00.123|
|2016-02-29|2016-02-29 08:08:08.999|
|2017-10-31|2017-12-31 11:59:59.123|
|2019-11-30|2019-08-31 00:00:00.000|
+----------+-----------------------+



* Add 10 days to both date and time values.
* Subtract 10 days from both date and time values.

In [0]:
from pyspark.sql.functions import date_add, date_sub

In [0]:
help(date_add)

Help on function date_add in module pyspark.sql.functions:

date_add(start, days)
    Returns the date that is `days` days after `start`
    
    .. versionadded:: 1.5.0
    
    Examples
    --------
    >>> df = spark.createDataFrame([('2015-04-08',)], ['dt'])
    >>> df.select(date_add(df.dt, 1).alias('next_date')).collect()
    [Row(next_date=datetime.date(2015, 4, 9))]



In [0]:
datetimesDF.\
  withColumn('date_add_date', date_add('date',10)).\
  withColumn('time_add_date',date_add('time',10)).\
  withColumn('date_sub_date',date_sub('date',10)).\
  withColumn('time_sub_date',date_sub('time',10)).show()

+----------+--------------------+-------------+-------------+-------------+-------------+
|      date|                time|date_add_date|time_add_date|date_sub_date|time_sub_date|
+----------+--------------------+-------------+-------------+-------------+-------------+
|2014-02-28|2014-02-28 10:00:...|   2014-03-10|   2014-03-10|   2014-02-18|   2014-02-18|
|2016-02-29|2016-02-29 08:08:...|   2016-03-10|   2016-03-10|   2016-02-19|   2016-02-19|
|2017-10-31|2017-12-31 11:59:...|   2017-11-10|   2018-01-10|   2017-10-21|   2017-12-21|
|2019-11-30|2019-08-31 00:00:...|   2019-12-10|   2019-09-10|   2019-11-20|   2019-08-21|
+----------+--------------------+-------------+-------------+-------------+-------------+



* Get the difference between current_date and date values as well as current_timestamp and time values.

In [0]:
from pyspark.sql.functions import current_date, current_timestamp, datediff

In [0]:
datetimesDF.\
  withColumn('datediff_date',datediff(current_date(),'date')).\
  withColumn('datediff_time',datediff(current_timestamp(),'time')).show()

+----------+--------------------+-------------+-------------+
|      date|                time|datediff_date|datediff_time|
+----------+--------------------+-------------+-------------+
|2014-02-28|2014-02-28 10:00:...|         2973|         2973|
|2016-02-29|2016-02-29 08:08:...|         2242|         2242|
|2017-10-31|2017-12-31 11:59:...|         1632|         1571|
|2019-11-30|2019-08-31 00:00:...|          872|          963|
+----------+--------------------+-------------+-------------+



* Get the number of months between current_date and date values as well as current_timestamp and time values.
* Add 3 months to both date values as well as time values.

In [0]:
from pyspark.sql.functions import months_between, add_months , round

In [0]:
datetimesDF.\
  withColumn('months_between_date',round(months_between(current_date(),'date'),2)).\
 withColumn('months_between_time',round(months_between(current_timestamp(),'time'),2)).\
 withColumn('add_months_date',add_months('date',3)).\
 withColumn('add_months_time',add_months('time',3)).show(truncate=False)

+----------+-----------------------+-------------------+-------------------+---------------+---------------+
|date      |time                   |months_between_date|months_between_time|add_months_date|add_months_time|
+----------+-----------------------+-------------------+-------------------+---------------+---------------+
|2014-02-28|2014-02-28 10:00:00.123|97.74              |97.75              |2014-05-28     |2014-05-28     |
|2016-02-29|2016-02-29 08:08:08.999|73.71              |73.72              |2016-05-29     |2016-05-29     |
|2017-10-31|2017-12-31 11:59:59.123|53.65              |51.65              |2018-01-31     |2018-03-31     |
|2019-11-30|2019-08-31 00:00:00.000|28.68              |31.66              |2020-02-29     |2019-11-30     |
+----------+-----------------------+-------------------+-------------------+---------------+---------------+



## Using Date and Time Trunc Functions
In Data Warehousing we quite often run to date reports such as week to date, month to date, year to date etc.

* We can use `trunc` or `date_trunc` for the same to get the beginning date of the week, month, current year etc by passing date or timestamp to it.
* We can use `trunc` to get beginning date of the month or year by passing date or timestamp to it - for example `trunc(current_date(), "MM")` will give the first of the current month.
* We can use `date_trunc` to get beginning date of the month or year as well as beginning time of the day or hour by passing timestamp to it.
  * Get beginning date based on month - `date_trunc("MM", current_timestamp())`
  * Get beginning time based on day - `date_trunc("DAY", current_timestamp())`

In [0]:
from pyspark.sql.functions import trunc, date_trunc

In [0]:
help(trunc)

Help on function trunc in module pyspark.sql.functions:

trunc(date, format)
    Returns date truncated to the unit specified by the format.
    
    .. versionadded:: 1.5.0
    
    Parameters
    ----------
    date : :class:`~pyspark.sql.Column` or str
    format : str
        'year', 'yyyy', 'yy' to truncate by year,
        or 'month', 'mon', 'mm' to truncate by month
        Other options are: 'week', 'quarter'
    
    Examples
    --------
    >>> df = spark.createDataFrame([('1997-02-28',)], ['d'])
    >>> df.select(trunc(df.d, 'year').alias('year')).collect()
    [Row(year=datetime.date(1997, 1, 1))]
    >>> df.select(trunc(df.d, 'mon').alias('month')).collect()
    [Row(month=datetime.date(1997, 2, 1))]



In [0]:
help(date_trunc)

Help on function date_trunc in module pyspark.sql.functions:

date_trunc(format, timestamp)
    Returns timestamp truncated to the unit specified by the format.
    
    .. versionadded:: 2.3.0
    
    Parameters
    ----------
    format : str
        'year', 'yyyy', 'yy' to truncate by year,
        'month', 'mon', 'mm' to truncate by month,
        'day', 'dd' to truncate by day,
        Other options are:
        'microsecond', 'millisecond', 'second', 'minute', 'hour', 'week', 'quarter'
    timestamp : :class:`~pyspark.sql.Column` or str
    
    Examples
    --------
    >>> df = spark.createDataFrame([('1997-02-28 05:02:11',)], ['t'])
    >>> df.select(date_trunc('year', df.t).alias('year')).collect()
    [Row(year=datetime.datetime(1997, 1, 1, 0, 0))]
    >>> df.select(date_trunc('mon', df.t).alias('month')).collect()
    [Row(month=datetime.datetime(1997, 2, 1, 0, 0))]



In [0]:
datetimesDF.show(truncate=False)

+----------+-----------------------+
|date      |time                   |
+----------+-----------------------+
|2014-02-28|2014-02-28 10:00:00.123|
|2016-02-29|2016-02-29 08:08:08.999|
|2017-10-31|2017-12-31 11:59:59.123|
|2019-11-30|2019-08-31 00:00:00.000|
+----------+-----------------------+



In [0]:
from pyspark.sql.functions import trunc

In [0]:
datetimesDF.\
  withColumn('date_trunc',trunc('date','MM')).\
  withColumn('time_trunc',trunc('time','yy')).show(truncate=False)

+----------+-----------------------+----------+----------+
|date      |time                   |date_trunc|time_trunc|
+----------+-----------------------+----------+----------+
|2014-02-28|2014-02-28 10:00:00.123|2014-02-01|2014-01-01|
|2016-02-29|2016-02-29 08:08:08.999|2016-02-01|2016-01-01|
|2017-10-31|2017-12-31 11:59:59.123|2017-10-01|2017-01-01|
|2019-11-30|2019-08-31 00:00:00.000|2019-11-01|2019-01-01|
+----------+-----------------------+----------+----------+



Get begginning hour time using date and time field.

In [0]:
from pyspark.sql.functions import date_trunc

In [0]:
datetimesDF.\
 withColumn('date_trunc',date_trunc('MM','date')).\
 withColumn('time_trunc',date_trunc('yy','time')).show(truncate=False)

+----------+-----------------------+-------------------+-------------------+
|date      |time                   |date_trunc         |time_trunc         |
+----------+-----------------------+-------------------+-------------------+
|2014-02-28|2014-02-28 10:00:00.123|2014-02-01 00:00:00|2014-01-01 00:00:00|
|2016-02-29|2016-02-29 08:08:08.999|2016-02-01 00:00:00|2016-01-01 00:00:00|
|2017-10-31|2017-12-31 11:59:59.123|2017-10-01 00:00:00|2017-01-01 00:00:00|
|2019-11-30|2019-08-31 00:00:00.000|2019-11-01 00:00:00|2019-01-01 00:00:00|
+----------+-----------------------+-------------------+-------------------+



In [0]:
datetimesDF.\
  withColumn('date_dt',date_trunc('HOUR','date')).\
  withColumn('time_dt',date_trunc('HOUR','time')).\
  withColumn('time_dt1', date_trunc('dd','time')).show()

+----------+--------------------+-------------------+-------------------+-------------------+
|      date|                time|            date_dt|            time_dt|           time_dt1|
+----------+--------------------+-------------------+-------------------+-------------------+
|2014-02-28|2014-02-28 10:00:...|2014-02-28 00:00:00|2014-02-28 10:00:00|2014-02-28 00:00:00|
|2016-02-29|2016-02-29 08:08:...|2016-02-29 00:00:00|2016-02-29 08:00:00|2016-02-29 00:00:00|
|2017-10-31|2017-12-31 11:59:...|2017-10-31 00:00:00|2017-12-31 11:00:00|2017-12-31 00:00:00|
|2019-11-30|2019-08-31 00:00:...|2019-11-30 00:00:00|2019-08-31 00:00:00|2019-08-31 00:00:00|
+----------+--------------------+-------------------+-------------------+-------------------+



* `year`
* `month`
* `weekofyear`
* `dayofyear`
* `dayofmonth`
* `dayofweek`
* `hour`
* `minute`
* `second`

There might be few more functions. You can review based up on your requirements.

In [0]:
df = spark.createDataFrame([('X',)],['dummy'])

In [0]:
df.show()

+-----+
|dummy|
+-----+
|    X|
+-----+



In [0]:
from pyspark.sql.functions import year,month,weekofyear,dayofmonth,dayofyear,dayofweek,current_date

In [0]:
df.select(
 current_date(),\
  year(current_date()).alias('year'),\
  month(current_date()).alias('month'),\
  weekofyear(current_date()).alias('weekofyear'),\
  dayofmonth(current_date()).alias('dayofmonth'),\
  dayofyear(current_date()).alias('dayofyear'),\
  dayofweek(current_date()).alias('dayofweek')
).show()

+--------------+----+-----+----------+----------+---------+---------+
|current_date()|year|month|weekofyear|dayofmonth|dayofyear|dayofweek|
+--------------+----+-----+----------+----------+---------+---------+
|    2022-04-20|2022|    4|        16|        20|      110|        4|
+--------------+----+-----+----------+----------+---------+---------+



In [0]:
help(weekofyear)

Help on function weekofyear in module pyspark.sql.functions:

weekofyear(col)
    Extract the week number of a given date as integer.
    
    .. versionadded:: 1.5.0
    
    Examples
    --------
    >>> df = spark.createDataFrame([('2015-04-08',)], ['dt'])
    >>> df.select(weekofyear(df.dt).alias('week')).collect()
    [Row(week=15)]



In [0]:
from pyspark.sql.functions import current_timestamp,hour,minute,second

In [0]:
df.select(
    current_timestamp().alias('current_timestamp'), 
    year(current_timestamp()).alias('year'),
    month(current_timestamp()).alias('month'),
    dayofmonth(current_timestamp()).alias('dayofmonth'),
    hour(current_timestamp()).alias('hour'),
    minute(current_timestamp()).alias('minute'),
    second(current_timestamp()).alias('second')
).show(truncate=False) #yyyy-MM-dd HH:mm:ss.SSS

+-----------------------+----+-----+----------+----+------+------+
|current_timestamp      |year|month|dayofmonth|hour|minute|second|
+-----------------------+----+-----+----------+----+------+------+
|2022-04-20 13:38:05.065|2022|4    |20        |13  |38    |5     |
+-----------------------+----+-----+----------+----+------+------+



Using to_date and to_timestamp

* `yyyy-MM-dd` is the standard date format
* `yyyy-MM-dd HH:mm:ss.SSS` is the standard timestamp format
* Most of the date manipulation functions expect date and time using standard format. However, we might not have data in the expected standard format.
* In those scenarios we can use `to_date` and `to_timestamp` to convert non standard dates and timestamps to standard ones respectively.

In [0]:
datetimes = [(20140228, "28-Feb-2014 10:00:00.123"),
                     (20160229, "20-Feb-2016 08:08:08.999"),
                     (20171031, "31-Dec-2017 11:59:59.123"),
                     (20191130, "31-Aug-2019 00:00:00.000")
                ]

In [0]:
datetimesDF = spark.createDataFrame(datetimes, schema="date BIGINT, time STRING")

In [0]:
datetimesDF.show(truncate=False)

+--------+------------------------+
|date    |time                    |
+--------+------------------------+
|20140228|28-Feb-2014 10:00:00.123|
|20160229|20-Feb-2016 08:08:08.999|
|20171031|31-Dec-2017 11:59:59.123|
|20191130|31-Aug-2019 00:00:00.000|
+--------+------------------------+



In [0]:
from pyspark.sql.functions import lit, to_date

In [0]:
df.show()

+-----+
|dummy|
+-----+
|    X|
+-----+



In [0]:
df.select(to_date(lit('20210809'),'yyyyMMdd').alias('to_date')).show()

+----------+
|   to_date|
+----------+
|2021-08-09|
+----------+



In [0]:
# year and day of year to standard date
df.select(to_date(lit('2021061'),'yyyyDDD').alias('to_date')).show()

+----------+
|   to_date|
+----------+
|2021-03-02|
+----------+



In [0]:
df.select(to_date(lit('02/03/2022'),'dd/MM/yyyy').alias('to_date')).show()

+----------+
|   to_date|
+----------+
|2022-03-02|
+----------+



In [0]:
df.select(to_date(lit('02-03-2021'),'dd-MM-yyyy').alias('to_date')).show()

+----------+
|   to_date|
+----------+
|2021-03-02|
+----------+



In [0]:
df.select(to_date(lit('02-Mar-2021'),'dd-MMM-yyyy').alias('to_date')).show()

+----------+
|   to_date|
+----------+
|2021-03-02|
+----------+



In [0]:
df.select(to_date(lit('02-March-2021'),'dd-MMMM-yyyy').alias('to_date')).show()

+----------+
|   to_date|
+----------+
|2021-03-02|
+----------+



In [0]:
df.select(to_date(lit('March 2,2022'),'MMMM d,yyyy').alias('to_date')).show()

+----------+
|   to_date|
+----------+
|2022-03-02|
+----------+



In [0]:
from pyspark.sql.functions import to_timestamp

In [0]:
df.select(to_timestamp(lit('02-Apr-2022'),'dd-MMM-yyyy').alias('to_date')).show()

+-------------------+
|            to_date|
+-------------------+
|2022-04-02 00:00:00|
+-------------------+



In [0]:
df.select(to_timestamp(lit('02-Mar-2021 17:30:15'),'dd-MMM-yyyy HH:mm:ss').alias('to_date')).show()

+-------------------+
|            to_date|
+-------------------+
|2021-03-02 17:30:15|
+-------------------+



In [0]:
from pyspark.sql.functions import col

In [0]:
datetimesDF.\
  withColumn('to_date',to_date(col('date').cast('string'),'yyyyMMdd')).\
  withColumn('to_timestamp',to_timestamp(col('time'),'dd-MMM-yyyy HH:mm:ss.SSS')).show(truncate=False)

+--------+------------------------+----------+-----------------------+
|date    |time                    |to_date   |to_timestamp           |
+--------+------------------------+----------+-----------------------+
|20140228|28-Feb-2014 10:00:00.123|2014-02-28|2014-02-28 10:00:00.123|
|20160229|20-Feb-2016 08:08:08.999|2016-02-29|2016-02-20 08:08:08.999|
|20171031|31-Dec-2017 11:59:59.123|2017-10-31|2017-12-31 11:59:59.123|
|20191130|31-Aug-2019 00:00:00.000|2019-11-30|2019-08-31 00:00:00    |
+--------+------------------------+----------+-----------------------+



* We can use `date_format` to extract the required information in a desired format from standard date or timestamp. Earlier we have explored `to_date` and `to_timestamp` to convert non standard date or timestamp to standard ones respectively.
* There are also specific functions to extract year, month, day with in a week, a day with in a month, day with in a year etc. These are covered as part of earlier topics in this section or module.

In [0]:
datetimes = [("2014-02-28", "2014-02-28 10:00:00.123"),
                     ("2016-02-29", "2016-02-29 08:08:08.999"),
                     ("2017-10-31", "2017-12-31 11:59:59.123"),
                     ("2019-11-30", "2019-08-31 00:00:00.000")
                ]

In [0]:
datetimesDF = spark.createDataFrame(datetimes, schema="date STRING, time STRING")

In [0]:
datetimesDF.show(truncate=False)

+----------+-----------------------+
|date      |time                   |
+----------+-----------------------+
|2014-02-28|2014-02-28 10:00:00.123|
|2016-02-29|2016-02-29 08:08:08.999|
|2017-10-31|2017-12-31 11:59:59.123|
|2019-11-30|2019-08-31 00:00:00.000|
+----------+-----------------------+



In [0]:
from pyspark.sql.functions import date_format

In [0]:
datetimesDF.\
  withColumn('date_ym',date_format('date','yyyyMM')).\
  withColumn('time_ym',date_format('time','yyyyMM')).\
show(truncate=False)

# yyyy
# MM
# dd
# DD
# HH
# hh
# mm
# ss
# SSS

+----------+-----------------------+-------+-------+
|date      |time                   |date_ym|time_ym|
+----------+-----------------------+-------+-------+
|2014-02-28|2014-02-28 10:00:00.123|201402 |201402 |
|2016-02-29|2016-02-29 08:08:08.999|201602 |201602 |
|2017-10-31|2017-12-31 11:59:59.123|201710 |201712 |
|2019-11-30|2019-08-31 00:00:00.000|201911 |201908 |
+----------+-----------------------+-------+-------+



In [0]:
datetimesDF.\
  withColumn('date_dt',date_format('date','yyyyMMddHHmmss')).\
  withColumn('date_ts',date_format('time','yyyyMMddHHmmss')).\
show(truncate=False)

+----------+-----------------------+--------------+--------------+
|date      |time                   |date_dt       |date_ts       |
+----------+-----------------------+--------------+--------------+
|2014-02-28|2014-02-28 10:00:00.123|20140228000000|20140228100000|
|2016-02-29|2016-02-29 08:08:08.999|20160229000000|20160229080808|
|2017-10-31|2017-12-31 11:59:59.123|20171031000000|20171231115959|
|2019-11-30|2019-08-31 00:00:00.000|20191130000000|20190831000000|
+----------+-----------------------+--------------+--------------+



In [0]:
datetimesDF. \
    withColumn("date_dt", date_format("date", "yyyyMMddHHmmss").cast('long')). \
    withColumn("date_ts", date_format("time", "yyyyMMddHHmmss").cast('long')). \
    show(truncate=False)

+----------+-----------------------+--------------+--------------+
|date      |time                   |date_dt       |date_ts       |
+----------+-----------------------+--------------+--------------+
|2014-02-28|2014-02-28 10:00:00.123|20140228000000|20140228100000|
|2016-02-29|2016-02-29 08:08:08.999|20160229000000|20160229080808|
|2017-10-31|2017-12-31 11:59:59.123|20171031000000|20171231115959|
|2019-11-30|2019-08-31 00:00:00.000|20191130000000|20190831000000|
+----------+-----------------------+--------------+--------------+



In [0]:
datetimesDF.\
  withColumn('date_yd',date_format('date','yyyyDDD').cast('int')).\
  withColumn('time_yd',date_format('time','yyyyDDD').cast('int')).\
show(truncate=False)

+----------+-----------------------+-------+-------+
|date      |time                   |date_yd|time_yd|
+----------+-----------------------+-------+-------+
|2014-02-28|2014-02-28 10:00:00.123|2014059|2014059|
|2016-02-29|2016-02-29 08:08:08.999|2016060|2016060|
|2017-10-31|2017-12-31 11:59:59.123|2017304|2017365|
|2019-11-30|2019-08-31 00:00:00.000|2019334|2019243|
+----------+-----------------------+-------+-------+



get complete description of the date

In [0]:
datetimesDF.\
  withColumn('date_desc',date_format('date','MMMM dd, yyyy')).\
show(truncate=False)

+----------+-----------------------+-----------------+
|date      |time                   |date_desc        |
+----------+-----------------------+-----------------+
|2014-02-28|2014-02-28 10:00:00.123|February 28, 2014|
|2016-02-29|2016-02-29 08:08:08.999|February 29, 2016|
|2017-10-31|2017-12-31 11:59:59.123|October 31, 2017 |
|2019-11-30|2019-08-31 00:00:00.000|November 30, 2019|
+----------+-----------------------+-----------------+



In [0]:
# name of the week day

datetimesDF.\
  withColumn('day_name_abbr',date_format('date','EE')).show()

+----------+--------------------+-------------+
|      date|                time|day_name_abbr|
+----------+--------------------+-------------+
|2014-02-28|2014-02-28 10:00:...|          Fri|
|2016-02-29|2016-02-29 08:08:...|          Mon|
|2017-10-31|2017-12-31 11:59:...|          Tue|
|2019-11-30|2019-08-31 00:00:...|          Sat|
+----------+--------------------+-------------+



In [0]:
datetimesDF.\
  withColumn('day_full_name',date_format('date','EEEE')).show()

+----------+--------------------+-------------+
|      date|                time|day_full_name|
+----------+--------------------+-------------+
|2014-02-28|2014-02-28 10:00:...|       Friday|
|2016-02-29|2016-02-29 08:08:...|       Monday|
|2017-10-31|2017-12-31 11:59:...|      Tuesday|
|2019-11-30|2019-08-31 00:00:...|     Saturday|
+----------+--------------------+-------------+



Dealing with Unix Timestamp

* It is an integer and started from January 1st 1970 Midnight UTC.
* Beginning time is also known as epoch and is incremented by 1 every second.
* We can convert Unix Timestamp to regular date or timestamp and vice versa.
* We can use `unix_timestamp` to convert regular date or timestamp to a unix timestamp value. For example `unix_timestamp(lit("2019-11-19 00:00:00"))`
* We can use `from_unixtime` to convert unix timestamp to regular date or timestamp. For example `from_unixtime(lit(1574101800))`
* We can also pass format to both the functions.

In [0]:
datetimes = [(20140228, "2014-02-28", "2014-02-28 10:00:00.123"),
                     (20160229, "2016-02-29", "2016-02-29 08:08:08.999"),
                     (20171031, "2017-10-31", "2017-12-31 11:59:59.123"),
                     (20191130, "2019-11-30", "2019-08-31 00:00:00.000")
                ]

In [0]:
datetimesDF = spark.createDataFrame(datetimes).toDF("dateid", "date", "time")

In [0]:
datetimesDF.show(truncate=False)

+--------+----------+-----------------------+
|dateid  |date      |time                   |
+--------+----------+-----------------------+
|20140228|2014-02-28|2014-02-28 10:00:00.123|
|20160229|2016-02-29|2016-02-29 08:08:08.999|
|20171031|2017-10-31|2017-12-31 11:59:59.123|
|20191130|2019-11-30|2019-08-31 00:00:00.000|
+--------+----------+-----------------------+



In [0]:
from pyspark.sql.functions import unix_timestamp, col

In [0]:
datetimesDF.\
  withColumn('unix_date_id',unix_timestamp(col('dateid').cast('string'),'yyyyMMdd')).\
  withColumn('unix_date',unix_timestamp('date','yyyy-MM-dd')).\
  withColumn('unix_time',unix_timestamp('time','yyyy-MM-dd HH:mm:ss.SSS')).show()

+--------+----------+--------------------+------------+----------+----------+
|  dateid|      date|                time|unix_date_id| unix_date| unix_time|
+--------+----------+--------------------+------------+----------+----------+
|20140228|2014-02-28|2014-02-28 10:00:...|  1393545600|1393545600|1393581600|
|20160229|2016-02-29|2016-02-29 08:08:...|  1456704000|1456704000|1456733288|
|20171031|2017-10-31|2017-12-31 11:59:...|  1509408000|1509408000|1514721599|
|20191130|2019-11-30|2019-08-31 00:00:...|  1575072000|1575072000|1567209600|
+--------+----------+--------------------+------------+----------+----------+



In [0]:
unixtimes = [(1393561800, ),
             (1456713488, ),
             (1514701799, ),
             (1567189800, )
            ]

In [0]:
unixtimesDF = spark.createDataFrame(unixtimes).toDF('unixtime')

In [0]:
unixtimesDF.show()

+----------+
|  unixtime|
+----------+
|1393561800|
|1456713488|
|1514701799|
|1567189800|
+----------+



In [0]:
from pyspark.sql.functions import from_unixtime

In [0]:
unixtimesDF.\
  withColumn('date',from_unixtime('unixtime','yyyyMMdd')).\
  withColumn('time',from_unixtime('unixtime')).show()

+----------+--------+-------------------+
|  unixtime|    date|               time|
+----------+--------+-------------------+
|1393561800|20140228|2014-02-28 04:30:00|
|1456713488|20160229|2016-02-29 02:38:08|
|1514701799|20171231|2017-12-31 06:29:59|
|1567189800|20190830|2019-08-30 18:30:00|
+----------+--------+-------------------+

