#PySpark to_timestamp() – Convert String to Timestamp type

---

Use <em>to_timestamp</em>() function to convert String to Timestamp (TimestampType) in PySpark. The converted time would be in a default format of MM-dd-yyyy HH:mm:ss.SSS, I will explain how to use this function with a few examples.


---

**Syntax – to_timestamp()**

###Syntax: to_timestamp(timestampString:Column) 
###Syntax: to_timestamp(timestampString:Column,format:String) 
 
 
 ---
 
 
**This function has above two signatures that defined in PySpark SQL Date & Timestamp Functions, the first syntax takes just one argument and the argument should be in Timestamp format ‘MM-dd-yyyy HH:mm:ss.SSS‘, when the format is not in this format, it returns null.**


---

**The second signature takes an additional String argument to specify the format of the input Timestamp; this support formats specified in SimeDateFormat. Using this additional argument, you can cast String from any format to Timestamp type in PySpark.**


---


##Convert String to PySpark Timestamp type


**In the below example we convert the string pattern which is in PySpark default format to Timestamp type, since the input DataFrame column is in default Timestamp format, we use the first signature for conversion. And the second example uses the cast function to do the same.**

In [0]:
from pyspark.sql.functions import *

In [0]:
df = spark.createDataFrame(data=[("1","2019-06-24 12:01:19.000")],
                          schema=["id", "input_timestamp"])

df.printSchema()


#Timestamp String to DateType

df = df.withColumn("timestamp", to_timestamp("input_timestamp"))
df.show(truncate=False)


#Using cast to convert TimestampType to DateType

df = df.withColumn("timestamp_string",\
             to_timestamp('timestamp').cast('string'))
df.show(truncate=False)

df.printSchema()

root
 |-- id: string (nullable = true)
 |-- input_timestamp: string (nullable = true)

+---+-----------------------+-------------------+
|id |input_timestamp        |timestamp          |
+---+-----------------------+-------------------+
|1  |2019-06-24 12:01:19.000|2019-06-24 12:01:19|
+---+-----------------------+-------------------+

+---+-----------------------+-------------------+-------------------+
|id |input_timestamp        |timestamp          |timestamp_string   |
+---+-----------------------+-------------------+-------------------+
|1  |2019-06-24 12:01:19.000|2019-06-24 12:01:19|2019-06-24 12:01:19|
+---+-----------------------+-------------------+-------------------+

root
 |-- id: string (nullable = true)
 |-- input_timestamp: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)
 |-- timestamp_string: string (nullable = true)



**In this snippet, we just add a new column timestamp by converting the input column from string to Timestamp type.**


---


##Custom string format to Timestamp type


**This example converts input timestamp string from custom format to PySpark Timestamp type, to do this, we use the second syntax where it takes an additional argument to specify user-defined patterns for date-time formatting,**

In [0]:
#when dates are not in Spark TimestampType format 'yyyy-MM-dd  HH:mm:ss.SSS'.
#Note that when dates are not in Spark Tiemstamp format, all Spark functions returns null
#Hence, first convert the input dates to Spark DateType using to_timestamp function
df.select(to_timestamp(lit('06-24-2019 12:01:19.000'),'MM-dd-yyyy HH:mm:ss.SSSS')) \
  .show(truncate=False)

#Displays

+---------------------------------------------------------------+
|to_timestamp(06-24-2019 12:01:19.000, MM-dd-yyyy HH:mm:ss.SSSS)|
+---------------------------------------------------------------+
|2019-06-24 12:01:19                                            |
+---------------------------------------------------------------+



**In case if you want to convert string to date format use to_date() function.**


---


##SQL Example

In [0]:
#SQL string to TimestampType
df2 = spark.sql(" select to_timestamp('2019-06-24 12:01:19.000') as timestamp ")
df2.printSchema()
df2.show(truncate=False)

root
 |-- timestamp: timestamp (nullable = true)

+-------------------+
|timestamp          |
+-------------------+
|2019-06-24 12:01:19|
+-------------------+



In [0]:
#SQL CAST timestamp string to TimestampType
df3 = spark.sql(" select timestamp('2019-06-24 12:01:19.000') as timestamp ")
df3.printSchema()
df3.show(truncate=False)

root
 |-- timestamp: timestamp (nullable = true)

+-------------------+
|timestamp          |
+-------------------+
|2019-06-24 12:01:19|
+-------------------+



In [0]:
#SQL Custom string to TimestampType
df4 = spark.sql(" select to_timestamp('06-24-2019 12:01:19.000','MM-dd-yyyy HH:mm:ss.SSSS') as timestamp ")
df4.printSchema()
df4.show(truncate=False)

root
 |-- timestamp: timestamp (nullable = true)

+-------------------+
|timestamp          |
+-------------------+
|2019-06-24 12:01:19|
+-------------------+

