In [40]:
import datetime
from pyspark.sql import SparkSession
from pyspark.sql.functions import explode, split, col, window, current_timestamp, sum

In [41]:
spark = SparkSession.builder.getOrCreate()

In [42]:
df = spark.sql(
    """
    select current_timestamp() as date, 1 as value union select current_timestamp() as date, 2 as value;
    """
)

In [43]:
df.show(truncate=False)

+-----------------------+-----+
|date                   |value|
+-----------------------+-----+
|2024-01-19 12:56:09.791|1    |
|2024-01-19 12:56:09.791|2    |
+-----------------------+-----+



In [44]:
# w = df.groupBy(window("current_date", "5 seconds")).agg(sum("val").alias("sum"))
# w.select(
#     w.window.start.cast("string").alias("start"),
#     w.window.end.cast("string").alias("end"),
#     "sum",
# ).collect()


df = df.groupBy(window("date", "5 seconds", "1 seconds", "0.1 seconds")).agg(
    sum("value").alias("sum")
)

df.select(
    df.window.start.cast("string").alias("start"),
    df.window.end.cast("string").alias("end"),
    "sum",
).show(truncate=False)

+---------------------+---------------------+---+
|start                |end                  |sum|
+---------------------+---------------------+---+
|2024-01-19 12:56:07.1|2024-01-19 12:56:12.1|3  |
|2024-01-19 12:56:09.1|2024-01-19 12:56:14.1|3  |
|2024-01-19 12:56:06.1|2024-01-19 12:56:11.1|3  |
|2024-01-19 12:56:08.1|2024-01-19 12:56:13.1|3  |
|2024-01-19 12:56:05.1|2024-01-19 12:56:10.1|3  |
+---------------------+---------------------+---+



## Types of Time Windows in Spark Streaming

In Spark Streaming, there are several types of time windows available for windowed computations. The available types of time windows are:

1. **Tumbling Window**: Tumbling windows are a series of fixed-sized, non-overlapping, and contiguous time intervals. An input can only be bound to a single window[3][4].

2. **Sliding Window**: Sliding windows allow you to apply transformations over a sliding window of data. The operation is applied over a specified time period of data and slides by a specified time interval[3].

3. **Session Window**: Session windows are a new type of window introduced in Apache Spark 3.2, which works for both streaming and batch queries. They represent windows based on the activity or "session" of the data, allowing for more flexible window definitions[4][5].
