#PySpark – Difference between two dates (days, months, years)


---


**Using PySpark SQL functions datediff(), months_between() you can calculate the difference between two dates in days, months, and year, let’s see this by using a DataFrame example. You can also use these to calculate age.**


---


##datediff() Function


**First Let’s see getting the difference between two dates using datediff() PySpark function.**

In [0]:
from pyspark.sql.functions import *
data = [("1","2019-07-01"),("2","2019-06-24"),("3","2019-08-24")]
df = spark.createDataFrame(data=data, schema=["id", "date"])

df.select(
col("date"),
current_date().alias("curret_date"),
datediff(current_date(), col("date")).alias("datediff")).show(truncate=False)

+----------+-----------+--------+
|date      |curret_date|datediff|
+----------+-----------+--------+
|2019-07-01|2023-02-05 |1315    |
|2019-06-24|2023-02-05 |1322    |
|2019-08-24|2023-02-05 |1261    |
+----------+-----------+--------+



##months_between() Function

**Now, Let’s see how to get month and year differences between two dates using months_between() function.**

In [0]:
df.withColumn("dateDiff", datediff(current_date(), col("date")))\
.withColumn("monthsDiff", months_between(current_date(), col("date")))\
.withColumn("monthDiff", round(months_between(current_date(), col("date")), 2))\
.withColumn("yearsDiff", months_between(current_date(), col("date"))/lit(12))\
.withColumn("yearsDiff_round", round(months_between(current_date(), col("date"))/lit(12),2))\
.show(truncate=False)

+---+----------+--------+-----------+---------+------------------+---------------+
|id |date      |dateDiff|monthsDiff |monthDiff|yearsDiff         |yearsDiff_round|
+---+----------+--------+-----------+---------+------------------+---------------+
|1  |2019-07-01|1315    |43.12903226|43.13    |3.594086021666667 |3.59           |
|2  |2019-06-24|1322    |43.38709677|43.39    |3.6155913975      |3.62           |
|3  |2019-08-24|1261    |41.38709677|41.39    |3.4489247308333333|3.45           |
+---+----------+--------+-----------+---------+------------------+---------------+



**Let’s see another example of the difference between two dates when dates are not in PySpark DateType format yyyy-MM-dd. when dates are not in DateType format, all date functions return null. Hence, you need to first convert the input date to Spark DateType using to_date() function.**

In [0]:
data2 = [("1","07-01-2019"),("2","06-24-2019"),("3","08-24-2019")]

df2 = spark.createDataFrame(data=data2, schema=["id", "date"])

df2.select(to_date(col("date"), "MM-dd-yyyy").alias("date"),
          current_date().alias("endDate")).show(truncate=False)

+----------+----------+
|date      |endDate   |
+----------+----------+
|2019-07-01|2023-02-05|
|2019-06-24|2023-02-05|
|2019-08-24|2023-02-05|
+----------+----------+



##SQL Example


**Let’s see how to calculate the difference between two dates in years using PySpark SQL example. similarly you can calculate the days and months between two dates.**

In [0]:
spark.sql(" select round(months_between('2019-07-01', current_date())/12, 2) as years_diff ").show(truncate=False)

+----------+
|years_diff|
+----------+
|-3.59     |
+----------+

