Użyj każdą z tych funkcji 
* `unix_timestamp()` 
* `date_format()`
* `to_unix_timestamp()`
* `from_unixtime()`
* `to_date()` 
* `to_timestamp()` 
* `from_utc_timestamp()` 
* `to_utc_timestamp()`

In [0]:
%scala
import org.apache.spark.sql.functions._

val kolumny = Seq("timestamp","unix", "Date")
val dane = Seq(("2015-03-22T14:13:34", 1646641525847L,"May, 2021"),
               ("2015-03-22T15:03:18", 1646641557555L,"Mar, 2021"),
               ("2015-03-22T14:38:39", 1646641578622L,"Jan, 2021"))

var dataFrame = spark.createDataFrame(dane).toDF(kolumny:_*)
  .withColumn("current_date",current_date().as("current_date"))
  .withColumn("current_timestamp",current_timestamp().as("current_timestamp"))
display(dataFrame)

dataFrame.createOrReplaceTempView("data")

timestamp,unix,Date,current_date,current_timestamp
2015-03-22T14:13:34,1646641525847,"May, 2021",2025-03-29,2025-03-29T11:59:51.501+0000
2015-03-22T15:03:18,1646641557555,"Mar, 2021",2025-03-29,2025-03-29T11:59:51.501+0000
2015-03-22T14:38:39,1646641578622,"Jan, 2021",2025-03-29,2025-03-29T11:59:51.501+0000


In [0]:
%scala
dataFrame.printSchema()

## unix_timestamp(..) & cast(..)

Konwersja **string** to a **timestamp**.

Lokalizacja funkcji 
* `pyspark.sql.functions` in the case of Python
* `org.apache.spark.sql.functions` in the case of Scala & Java

## 1. Zmiana formatu wartości timestamp yyyy-MM-dd'T'HH:mm:ss 
`unix_timestamp(..)`

Dokumentacja API `unix_timestamp(..)`:
> Convert time string with given pattern (see <a href="http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html" target="_blank">SimpleDateFormat</a>) to Unix time stamp (in seconds), return null if fail.

`SimpleDataFormat` is part of the Java API and provides support for parsing and formatting date and time values.

In [0]:
%python
from pyspark.sql.functions import unix_timestamp, col

# load the data from scala cell
dataFrame = spark.sql("SELECT * FROM data")
dataFrame.show()

# convert timestamp column to unix_timestamp
converted_df = dataFrame.withColumn("unix_timestamp", unix_timestamp(col("timestamp"), "yyyy-MM-dd'T'HH:mm:ss"))
converted_df.printSchema()

converted_df.show()

+-------------------+-------------+---------+------------+--------------------+
|          timestamp|         unix|     Date|current_date|   current_timestamp|
+-------------------+-------------+---------+------------+--------------------+
|2015-03-22T14:13:34|1646641525847|May, 2021|  2025-03-29|2025-03-29 12:03:...|
|2015-03-22T15:03:18|1646641557555|Mar, 2021|  2025-03-29|2025-03-29 12:03:...|
|2015-03-22T14:38:39|1646641578622|Jan, 2021|  2025-03-29|2025-03-29 12:03:...|
+-------------------+-------------+---------+------------+--------------------+

root
 |-- timestamp: string (nullable = true)
 |-- unix: long (nullable = false)
 |-- Date: string (nullable = true)
 |-- current_date: date (nullable = false)
 |-- current_timestamp: timestamp (nullable = false)
 |-- unix_timestamp: long (nullable = true)

+-------------------+-------------+---------+------------+--------------------+--------------+
|          timestamp|         unix|     Date|current_date|   current_timestamp|unix_ti

2. Zmień format zgodnie z klasą `SimpleDateFormat`**yyyy-MM-dd HH:mm:ss**
  * a. Wyświetl schemat i dane żeby sprawdzicz czy wartości się zmieniły

In [0]:
%python
from pyspark.sql.functions import date_format, to_timestamp

# convert to usual timestamp
converted_ts_format = dataFrame.withColumn("formatted_timestamp", to_timestamp(col("timestamp"), "yyyy-MM-dd'T'HH:mm:ss"))

#change formatting
converted_ts_format = converted_ts_format.withColumn("SimpleDateFormat", date_format(col("timestamp"), "yyyy-MM-dd HH:mm:ss"))

converted_ts_format.show()
converted_ts_format.printSchema()

+-------------------+-------------+---------+------------+--------------------+-------------------+-------------------+
|          timestamp|         unix|     Date|current_date|   current_timestamp|formatted_timestamp|   SimpleDateFormat|
+-------------------+-------------+---------+------------+--------------------+-------------------+-------------------+
|2015-03-22T14:13:34|1646641525847|May, 2021|  2025-03-29|2025-03-29 12:01:...|2015-03-22 14:13:34|2015-03-22 14:13:34|
|2015-03-22T15:03:18|1646641557555|Mar, 2021|  2025-03-29|2025-03-29 12:01:...|2015-03-22 15:03:18|2015-03-22 15:03:18|
|2015-03-22T14:38:39|1646641578622|Jan, 2021|  2025-03-29|2025-03-29 12:01:...|2015-03-22 14:38:39|2015-03-22 14:38:39|
+-------------------+-------------+---------+------------+--------------------+-------------------+-------------------+

root
 |-- timestamp: string (nullable = true)
 |-- unix: long (nullable = false)
 |-- Date: string (nullable = true)
 |-- current_date: date (nullable = false)

## Stwórz nowe kolumny do DataFrame z wartościami year(..), month(..), dayofyear(..)

In [0]:
%python
from pyspark.sql.functions import date_format

yearDate = converted_ts_format.withColumn("Year", date_format(col("SimpleDateFormat"), "yyyy"))
display(yearDate)

timestamp,unix,Date,current_date,current_timestamp,formatted_timestamp,SimpleDateFormat,Year
2015-03-22T14:13:34,1646641525847,"May, 2021",2025-03-29,2025-03-29T12:01:30.863+0000,2015-03-22T14:13:34.000+0000,2015-03-22 14:13:34,2015
2015-03-22T15:03:18,1646641557555,"Mar, 2021",2025-03-29,2025-03-29T12:01:30.863+0000,2015-03-22T15:03:18.000+0000,2015-03-22 15:03:18,2015
2015-03-22T14:38:39,1646641578622,"Jan, 2021",2025-03-29,2025-03-29T12:01:30.863+0000,2015-03-22T14:38:39.000+0000,2015-03-22 14:38:39,2015


In [0]:
%python

toDate = converted_ts_format.withColumn("Year", date_format(col("SimpleDateFormat"), "yyyy"))
display(toDate)

timestamp,unix,Date,current_date,current_timestamp,formatted_timestamp,SimpleDateFormat,Year
2015-03-22T14:13:34,1646641525847,"May, 2021",2025-03-29,2025-03-29T12:01:35.003+0000,2015-03-22T14:13:34.000+0000,2015-03-22 14:13:34,2015
2015-03-22T15:03:18,1646641557555,"Mar, 2021",2025-03-29,2025-03-29T12:01:35.003+0000,2015-03-22T15:03:18.000+0000,2015-03-22 15:03:18,2015
2015-03-22T14:38:39,1646641578622,"Jan, 2021",2025-03-29,2025-03-29T12:01:35.003+0000,2015-03-22T14:38:39.000+0000,2015-03-22 14:38:39,2015


In [0]:
%python
from pyspark.sql.functions import from_unixtime, col

#from_unixtime()
fromUnix = converted_df.withColumn("from_unixtime", from_unixtime(col("unix_timestamp")))
display(fromUnix)

timestamp,unix,Date,current_date,current_timestamp,unix_timestamp,from_unixtime
2015-03-22T14:13:34,1646641525847,"May, 2021",2025-03-29,2025-03-29T12:03:50.292+0000,1427033614,2015-03-22 14:13:34
2015-03-22T15:03:18,1646641557555,"Mar, 2021",2025-03-29,2025-03-29T12:03:50.292+0000,1427036598,2015-03-22 15:03:18
2015-03-22T14:38:39,1646641578622,"Jan, 2021",2025-03-29,2025-03-29T12:03:50.292+0000,1427035119,2015-03-22 14:38:39


In [0]:
%python
#to_timestamp()
toTimestamp = dataFrame.withColumn("formatted_timestamp", to_timestamp(col("timestamp"), "yyyy-MM-dd'T'HH:mm:ss"))
display(toTimestamp)


timestamp,unix,Date,current_date,current_timestamp,formatted_timestamp
2015-03-22T14:13:34,1646641525847,"May, 2021",2025-03-29,2025-03-29T12:05:42.258+0000,2015-03-22T14:13:34.000+0000
2015-03-22T15:03:18,1646641557555,"Mar, 2021",2025-03-29,2025-03-29T12:05:42.258+0000,2015-03-22T15:03:18.000+0000
2015-03-22T14:38:39,1646641578622,"Jan, 2021",2025-03-29,2025-03-29T12:05:42.258+0000,2015-03-22T14:38:39.000+0000


In [0]:
%python
from pyspark.sql.functions import to_utc_timestamp
converted_df.show()
toUtcTimestamp = converted_df.withColumn("to_utc_timestamp", to_utc_timestamp(col("timestamp"), '+02:00'))
display(toUtcTimestamp)



+-------------------+-------------+---------+------------+--------------------+--------------+
|          timestamp|         unix|     Date|current_date|   current_timestamp|unix_timestamp|
+-------------------+-------------+---------+------------+--------------------+--------------+
|2015-03-22T14:13:34|1646641525847|May, 2021|  2025-03-29|2025-03-29 12:08:...|    1427033614|
|2015-03-22T15:03:18|1646641557555|Mar, 2021|  2025-03-29|2025-03-29 12:08:...|    1427036598|
|2015-03-22T14:38:39|1646641578622|Jan, 2021|  2025-03-29|2025-03-29 12:08:...|    1427035119|
+-------------------+-------------+---------+------------+--------------------+--------------+



timestamp,unix,Date,current_date,current_timestamp,unix_timestamp,to_utc_timestamp
2015-03-22T14:13:34,1646641525847,"May, 2021",2025-03-29,2025-03-29T12:08:48.402+0000,1427033614,2015-03-22T12:13:34.000+0000
2015-03-22T15:03:18,1646641557555,"Mar, 2021",2025-03-29,2025-03-29T12:08:48.402+0000,1427036598,2015-03-22T13:03:18.000+0000
2015-03-22T14:38:39,1646641578622,"Jan, 2021",2025-03-29,2025-03-29T12:08:48.402+0000,1427035119,2015-03-22T12:38:39.000+0000


In [0]:
%python
from pyspark.sql.functions import  from_utc_timestamp

fromUtcTimestamp = toUtcTimestamp.withColumn("from_utc_timestamp", from_utc_timestamp(col("to_utc_timestamp"), '+02:00'))
display(fromUtcTimestamp)

timestamp,unix,Date,current_date,current_timestamp,unix_timestamp,to_utc_timestamp,from_utc_timestamp
2015-03-22T14:13:34,1646641525847,"May, 2021",2025-03-29,2025-03-29T12:10:11.163+0000,1427033614,2015-03-22T12:13:34.000+0000,2015-03-22T14:13:34.000+0000
2015-03-22T15:03:18,1646641557555,"Mar, 2021",2025-03-29,2025-03-29T12:10:11.163+0000,1427036598,2015-03-22T13:03:18.000+0000,2015-03-22T15:03:18.000+0000
2015-03-22T14:38:39,1646641578622,"Jan, 2021",2025-03-29,2025-03-29T12:10:11.163+0000,1427035119,2015-03-22T12:38:39.000+0000,2015-03-22T14:38:39.000+0000
