## Date Manipulation Functions

Let us go through some of the important date manipulation functions.

Let us start spark context for this Notebook so that we can execute the code provided. You can sign up for our [10 node state of the art cluster/labs](https://labs.itversity.com/plans) to learn Spark SQL using our unique integrated LMS.

In [1]:
val username = System.getProperty("user.name")

username = itv002461


itv002461

In [2]:
import org.apache.spark.sql.SparkSession

val username = System.getProperty("user.name")
val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
    enableHiveSupport.
    appName(s"${username} | Spark SQL - Predefined Functions").
    master("yarn").
    getOrCreate

username = itv002461
spark = org.apache.spark.sql.SparkSession@2806407c


org.apache.spark.sql.SparkSession@2806407c

If you are going to use CLIs, you can use Spark SQL using one of the 3 approaches.

**Using Spark SQL**

```
spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Scala**

```
spark2-shell \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Pyspark**

```
pyspark2 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

* Getting Current Date and Timestamp
* Date Arithmetic such as `date_add`
* Getting beginning date or time using `trunc` or `date_trunc`
* Extracting information using `date_format` as well as calendar functions.
* Dealing with unix timestamp using `from_unixtime`, `to_unix_timestamp`

### Getting Current Date and Timestamp

Let us understand how to get the details about current or today's date as well as current timestamp.

* `current_date` is the function or operator which will return today's date.
* `current_timestamp` is the function or operator which will return current time up to milliseconds.
* These are not like other functions and do not use **()** at the end.
* These are not listed as part of `SHOW functions` and we can get help using `DESCRIBE`.
* There is a format associated with date and timestamp.
  * Date - `yyyy-MM-dd`
  * Timestamp - `yyyy-MM-dd HH:mm:ss.SSS`
* Keep in mind that a date or timestamp in Spark SQL are nothing but special strings containing values using above specified formats. We can apply all string manipulation functions on date or timestamp.

In [3]:
%%sql

SELECT current_date AS current_date

Waiting for a Spark session to start...

+------------+
|current_date|
+------------+
|  2022-05-31|
+------------+



In [4]:
%%sql

SELECT current_date() AS current_date

+------------+
|current_date|
+------------+
|  2022-05-31|
+------------+



In [5]:
%%sql

SELECT current_timestamp AS current_timestamp

+--------------------+
|   current_timestamp|
+--------------------+
|2022-05-31 10:23:...|
+--------------------+



In [6]:
spark.sql("SELECT current_timestamp AS current_timestamp").show(false)

+-----------------------+
|current_timestamp      |
+-----------------------+
|2022-05-31 10:23:18.031|
+-----------------------+



### Date Arithmetic
Let us understand how to perform arithmetic on dates or timestamps.

* `date_add` can be used to add or subtract days.
* `date_sub` can be used to subtract or add days.
* `datediff` can be used to get difference between 2 dates
* `add_months` can be used add months to a date

In [7]:
%%sql

SELECT date_add(current_date, 32) AS result

+----------+
|    result|
+----------+
|2022-07-02|
+----------+



In [8]:
%%sql

SELECT date_add('2018-04-15', 730) AS result

+----------+
|    result|
+----------+
|2020-04-14|
+----------+



In [9]:
%%sql

SELECT date_add('2018-04-15', -730) AS result

+----------+
|    result|
+----------+
|2016-04-15|
+----------+



In [10]:
%%sql

SELECT date_sub(current_date, 30) AS result

+----------+
|    result|
+----------+
|2022-05-01|
+----------+



In [11]:
%%sql

SELECT datediff('2019-03-30', '2017-12-31') AS result

+------+
|result|
+------+
|   454|
+------+



In [12]:
%%sql

SELECT datediff('2017-12-31', '2019-03-30') AS result

+------+
|result|
+------+
|  -454|
+------+



In [13]:
%%sql

SELECT add_months(current_date, 3) AS result

+----------+
|    result|
+----------+
|2022-08-31|
+----------+



In [14]:
%%sql

SELECT add_months('2019-01-31', 1) AS result

+----------+
|    result|
+----------+
|2019-02-28|
+----------+



In [15]:
%%sql

SELECT add_months('2019-05-31', 1) AS result

+----------+
|    result|
+----------+
|2019-06-30|
+----------+



In [16]:
%%sql

SELECT add_months(current_timestamp, 3) AS result

+----------+
|    result|
+----------+
|2022-08-31|
+----------+



In [17]:
%%sql

SELECT date_add(current_timestamp, -730) AS result

+----------+
|    result|
+----------+
|2020-05-31|
+----------+



### Beginning Date or Time - trunc and date_trunc
Let us understand how to use `trunc` and `date_trunc` on dates or timestamps and get beginning date of the period.

* We can use **MM** to get beginning date of the month.
* **YY** can be used to get begining date of the year.
* We can apply trunc either on date or timestamp, however we cannot apply it other than month or year (such an hour or day).

In [18]:
%%sql

DESCRIBE FUNCTION trunc

+--------------------+
|       function_desc|
+--------------------+
|     Function: trunc|
|Class: org.apache...|
|Usage: 
    trunc...|
+--------------------+



In [19]:
spark.sql("DESCRIBE FUNCTION trunc").show(false)

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|function_desc                                                                                                                                                                                                       |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|Function: trunc                                                                                                                                                                                                     |
|Class: org.apache.spark.sql.catalyst.expressions.TruncDate                                                                                 

In [20]:
%%sql

SELECT trunc(current_date, 'MM') AS beginning_date_month

+--------------------+
|beginning_date_month|
+--------------------+
|          2022-05-01|
+--------------------+



In [21]:
%%sql

SELECT trunc('2019-01-23', 'MM') AS beginning_date_month

+--------------------+
|beginning_date_month|
+--------------------+
|          2019-01-01|
+--------------------+



In [22]:
%%sql

SELECT trunc(current_date, 'YY') AS beginning_date_year 

+-------------------+
|beginning_date_year|
+-------------------+
|         2022-01-01|
+-------------------+



* This will not work

In [23]:
%%sql

SELECT trunc(current_timestamp, 'HH') AS doesnt_work

+-----------+
|doesnt_work|
+-----------+
|       null|
+-----------+



* While `trunc` can be used to get beginning time of a given month or year, we can get the beginning time up to Second using `date_trunc`.

In [24]:
spark.sql("DESCRIBE FUNCTION date_trunc").show(false)

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|function_desc                                                                                                                                                                                                                                             |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|Function: date_trunc                                                                                                                                                                                                                            

In [25]:
%%sql

SELECT date_trunc('HOUR', current_timestamp) AS hour_beginning

+-------------------+
|     hour_beginning|
+-------------------+
|2022-05-31 10:00:00|
+-------------------+



### Extracting information using date_format

Let us understand how to use `date_format` to extract information from date or timestamp.

Here is how we can get date related information such as year, month, day etc from date or timestamp.

In [26]:
spark.sql("DESCRIBE FUNCTION date_format").show(false)

+--------------------------------------------------------------------------------------------------------------------------------+
|function_desc                                                                                                                   |
+--------------------------------------------------------------------------------------------------------------------------------+
|Function: date_format                                                                                                           |
|Class: org.apache.spark.sql.catalyst.expressions.DateFormatClass                                                                |
|Usage: date_format(timestamp, fmt) - Converts `timestamp` to a value of string in the format specified by the date format `fmt`.|
+--------------------------------------------------------------------------------------------------------------------------------+



In [27]:
%%sql

SELECT current_timestamp AS current_timestamp

+--------------------+
|   current_timestamp|
+--------------------+
|2022-05-31 10:35:...|
+--------------------+



In [28]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    date_format(current_timestamp, 'yyyy') AS year

+--------------------+----+
|   current_timestamp|year|
+--------------------+----+
|2022-05-31 10:36:...|2022|
+--------------------+----+



In [29]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    date_format(current_timestamp, 'yy') AS year

+--------------------+----+
|   current_timestamp|year|
+--------------------+----+
|2022-05-31 10:36:...|  22|
+--------------------+----+



In [30]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    date_format(current_timestamp, 'MM') AS month

+--------------------+-----+
|   current_timestamp|month|
+--------------------+-----+
|2022-05-31 10:36:...|   05|
+--------------------+-----+



In [31]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    date_format(current_timestamp, 'dd') AS day_of_month

+--------------------+------------+
|   current_timestamp|day_of_month|
+--------------------+------------+
|2022-05-31 10:37:...|          31|
+--------------------+------------+



In [32]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    date_format(current_timestamp, 'DD') AS day_of_year

+--------------------+-----------+
|   current_timestamp|day_of_year|
+--------------------+-----------+
|2022-05-31 10:37:...|        151|
+--------------------+-----------+



In [33]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    date_format(current_timestamp, 'MMM') AS month_name

+--------------------+----------+
|   current_timestamp|month_name|
+--------------------+----------+
|2022-05-31 10:37:...|       May|
+--------------------+----------+



In [34]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    date_format(current_timestamp, 'MMMM') AS month_name

+--------------------+----------+
|   current_timestamp|month_name|
+--------------------+----------+
|2022-05-31 10:37:...|       May|
+--------------------+----------+



In [35]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    date_format(current_timestamp, 'EE') AS dayname

+--------------------+-------+
|   current_timestamp|dayname|
+--------------------+-------+
|2022-05-31 10:37:...|    Tue|
+--------------------+-------+



In [36]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    date_format(current_timestamp, 'EEEE') AS dayname

+--------------------+-------+
|   current_timestamp|dayname|
+--------------------+-------+
|2022-05-31 10:37:...|Tuesday|
+--------------------+-------+



* Here is how we can get time related information such as hour, minute, seconds, milliseconds etc from timestamp.

In [37]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    date_format(current_timestamp, 'HH') AS hour24

+--------------------+------+
|   current_timestamp|hour24|
+--------------------+------+
|2022-05-31 10:37:...|    10|
+--------------------+------+



In [38]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    date_format(current_timestamp, 'hh') AS hour12

+--------------------+------+
|   current_timestamp|hour12|
+--------------------+------+
|2022-05-31 10:37:...|    10|
+--------------------+------+



In [39]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    date_format(current_timestamp, 'mm') AS minutes

+--------------------+-------+
|   current_timestamp|minutes|
+--------------------+-------+
|2022-05-31 10:37:...|     37|
+--------------------+-------+



In [40]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    date_format(current_timestamp, 'ss') AS seconds

+--------------------+-------+
|   current_timestamp|seconds|
+--------------------+-------+
|2022-05-31 10:38:...|     01|
+--------------------+-------+



In [41]:
%%sql

SELECT current_timestamp AS current_timestamp, 
    date_format(current_timestamp, 'SS') AS millis

+--------------------+------+
|   current_timestamp|millis|
+--------------------+------+
|2022-05-31 10:38:...|   950|
+--------------------+------+



* Here is how we can get the information from date or timestamp in the format we require.

In [42]:
%%sql

SELECT date_format(current_timestamp, 'yyyyMM') AS current_month

+-------------+
|current_month|
+-------------+
|       202205|
+-------------+



In [43]:
%%sql

SELECT date_format(current_timestamp, 'yyyyMMdd') AS current_date

+------------+
|current_date|
+------------+
|    20220531|
+------------+



In [44]:
%%sql

SELECT date_format(current_timestamp, 'yyyy/MM/dd') AS current_date

+------------+
|current_date|
+------------+
|  2022/05/31|
+------------+



### Extracting information - Calendar functions

We can get year, month, day etc from date or timestamp using functions. There are functions such as `day`, `dayofmonth`, `month`, `weekofyear`, `year` etc available for us.

In [45]:
spark.sql("DESCRIBE FUNCTION day").show(false)

+------------------------------------------------------------------+
|function_desc                                                     |
+------------------------------------------------------------------+
|Function: day                                                     |
|Class: org.apache.spark.sql.catalyst.expressions.DayOfMonth       |
|Usage: day(date) - Returns the day of month of the date/timestamp.|
+------------------------------------------------------------------+



In [46]:
spark.sql("DESCRIBE FUNCTION dayofmonth").show(false)

+-------------------------------------------------------------------------+
|function_desc                                                            |
+-------------------------------------------------------------------------+
|Function: dayofmonth                                                     |
|Class: org.apache.spark.sql.catalyst.expressions.DayOfMonth              |
|Usage: dayofmonth(date) - Returns the day of month of the date/timestamp.|
+-------------------------------------------------------------------------+



In [47]:
spark.sql("DESCRIBE FUNCTION month").show(false)

+-----------------------------------------------------------------------+
|function_desc                                                          |
+-----------------------------------------------------------------------+
|Function: month                                                        |
|Class: org.apache.spark.sql.catalyst.expressions.Month                 |
|Usage: month(date) - Returns the month component of the date/timestamp.|
+-----------------------------------------------------------------------+



In [48]:
spark.sql("DESCRIBE FUNCTION weekofyear").show(false)

+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
|function_desc                                                                                                                                                 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
|Function: weekofyear                                                                                                                                          |
|Class: org.apache.spark.sql.catalyst.expressions.WeekOfYear                                                                                                   |
|Usage: weekofyear(date) - Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days.|
+---------------------------------

In [49]:
spark.sql("DESCRIBE FUNCTION year").show(false)

+---------------------------------------------------------------------+
|function_desc                                                        |
+---------------------------------------------------------------------+
|Function: year                                                       |
|Class: org.apache.spark.sql.catalyst.expressions.Year                |
|Usage: year(date) - Returns the year component of the date/timestamp.|
+---------------------------------------------------------------------+



* Let us see the usage of the functions such as day, dayofmonth, month, weekofyear, year etc.

In [50]:
%%sql

SELECT year(current_date) AS year

+----+
|year|
+----+
|2022|
+----+



In [51]:
%%sql

SELECT month(current_date) AS month

+-----+
|month|
+-----+
|    5|
+-----+



In [52]:
%%sql

SELECT weekofyear(current_date) AS weekofyear

+----------+
|weekofyear|
+----------+
|        22|
+----------+



In [53]:
%%sql

SELECT day(current_date) AS day

+---+
|day|
+---+
| 31|
+---+



In [54]:
%%sql

SELECT dayofmonth(current_date) AS dayofmonth

+----------+
|dayofmonth|
+----------+
|        31|
+----------+



### Dealing with Unix Timestamp

Let us go through the functions that can be used to deal with Unix Timestamp.

* `from_unixtime` can be used to convert Unix epoch to regular timestamp.
* `unix_timestamp` or `to_unix_timestamp` can be used to convert timestamp to Unix epoch.
* We can get Unix epoch or Unix timestamp by running `date '+%s'` in Unix/Linux terminal
* We can DESCRIBE on the above functions to get details about them.

Let us sww how we can use functions such as `from_unixtime`, `unix_timestamp` or `to_unix_timestamp` to convert between timestamp and Unix timestamp or epoch.

* We can unix epoch in Unix/Linux terminal using `date '+%s'`

In [55]:
%%sql

SELECT from_unixtime(1556662731) AS timestamp

+-------------------+
|          timestamp|
+-------------------+
|2019-04-30 18:18:51|
+-------------------+



In [56]:
%%sql

SELECT to_unix_timestamp('2019-04-30 18:18:51') AS unixtime

+----------+
|  unixtime|
+----------+
|1556662731|
+----------+



In [57]:
%%sql

SELECT from_unixtime(1556662731, 'yyyyMM') AS month

+------+
| month|
+------+
|201904|
+------+



In [58]:
%%sql

SELECT from_unixtime(1556662731, 'yyyy-MM-dd') AS date

+----------+
|      date|
+----------+
|2019-04-30|
+----------+



In [59]:
%%sql

SELECT from_unixtime(1556662731, 'yyyy-MM-dd HH:mm') AS timestamp

+----------------+
|       timestamp|
+----------------+
|2019-04-30 18:18|
+----------------+



In [60]:
%%sql

SELECT from_unixtime(1556662731, 'yyyy-MM-dd hh:mm') AS timestamp

+----------------+
|       timestamp|
+----------------+
|2019-04-30 06:18|
+----------------+



In [61]:
%%sql

SELECT to_unix_timestamp('20190430 18:18:51', 'yyyyMMdd') AS date

+----------+
|      date|
+----------+
|1556596800|
+----------+



In [62]:
%%sql

SELECT to_unix_timestamp('20190430 18:18:51', 'yyyyMMdd HH:mm:ss') AS timestamp

+----------+
| timestamp|
+----------+
|1556662731|
+----------+



In [65]:
%%sql
select from_unixtime(1654008236,'yyyy-MM-dd') as date

+----------+
|      date|
+----------+
|2022-05-31|
+----------+

