### Ejemplo ventanas deslizantes
En primer lugar creamos la sesión como en los casos anteriores:


In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import explode
from pyspark.sql.functions import split
import string

spark = SparkSession.builder \
    .master("spark://spark-master:7077") \
    .appName("ejemplo_ventanas_deslizantes") \
    .config("spark.sql.legacy.timeParserPolicy", "LEGACY") \
    .config("spark.eventLog.enabled", "true") \
    .config("spark.eventLog.dir", "hdfs:///spark/logs/history") \
    .config("spark.history.fs.logDirectory", "hdfs:///spark/logs/history") \
    .getOrCreate()




Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


En este ejemplo vamos a leer, nuevamente, los  datos desde un socket. Antes de nada lo ponemos en marcha con el siguiente comando:
- nc -lk 9999

Lo siguiente es poner en marcha el *stream* de lectura, exactamente igual que en el caso anterior.

In [2]:
df_lineas = spark.readStream \
    .format("socket") \
    .option("host", "localhost") \
    .option("port", "9999") \
    .option('includeTimestamp', 'true')\
    .load()

df_lineas.printSchema()

25/05/02 14:22:08 WARN TextSocketSourceProvider: The socket source should not be used for production applications! It does not support recovery.


root
 |-- value: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)



La creación de *df_palabras* también es exactamente igual al caso anterior.

In [3]:
from pyspark.sql.functions import explode, split
df_palabras = df_lineas.select(
    explode(split(df_lineas.value, ' ')).alias('palabra'),
    df_lineas.timestamp)

Como en este caso vamos usar ventanas deslizante ahora, además de indicar la duración de la ventana (1 minuto), indicamos el deslizamiento (1 minuto).

In [4]:
from pyspark.sql.functions import window
windowed_counts = df_palabras.groupBy(
    window(df_palabras.timestamp, "2 minutes", "1 minute"), df_palabras.palabra
).count().orderBy('window')

Finalmente, lanzamos la consulta indicando la salida a consola. Como observaréis cuando introducimos una nueva línea sus palabras se agregan a dos ventanas. La del minuto actual y la del anterior.

In [5]:
query = windowed_counts \
          .writeStream \
          .outputMode("complete") \
          .format("console") \
          .queryName("consulta1") \
          .option("truncate","false") \
          .start()

25/05/02 14:24:43 WARN ResolveWriteToStream: Temporary checkpoint location created which is deleted normally when the query didn't fail: /tmp/temporary-3d0ecbdd-41ee-4c45-9bb6-f1aa01c7f003. If it's required to delete it under any circumstances, please set spark.sql.streaming.forceDeleteTempCheckpointLocation to true. Important to know deleting temp checkpoint folder is best effort.
25/05/02 14:24:43 WARN ResolveWriteToStream: spark.sql.adaptive.enabled is not supported in streaming DataFrames/Datasets and will be disabled.
                                                                                

-------------------------------------------
Batch: 0
-------------------------------------------
+------+-------+-----+
|window|palabra|count|
+------+-------+-----+
+------+-------+-----+



                                                                                

-------------------------------------------
Batch: 1
-------------------------------------------
+------------------------------------------+--------+-----+
|window                                    |palabra |count|
+------------------------------------------+--------+-----+
|{2025-05-02 14:23:00, 2025-05-02 14:25:00}|caracola|1    |
|{2025-05-02 14:23:00, 2025-05-02 14:25:00}|hola    |1    |
|{2025-05-02 14:24:00, 2025-05-02 14:26:00}|caracola|1    |
|{2025-05-02 14:24:00, 2025-05-02 14:26:00}|hola    |1    |
+------------------------------------------+--------+-----+



                                                                                

-------------------------------------------
Batch: 2
-------------------------------------------
+------------------------------------------+--------+-----+
|window                                    |palabra |count|
+------------------------------------------+--------+-----+
|{2025-05-02 14:23:00, 2025-05-02 14:25:00}|caracola|1    |
|{2025-05-02 14:23:00, 2025-05-02 14:25:00}|hola    |2    |
|{2025-05-02 14:23:00, 2025-05-02 14:25:00}|mundo   |1    |
|{2025-05-02 14:24:00, 2025-05-02 14:26:00}|caracola|1    |
|{2025-05-02 14:24:00, 2025-05-02 14:26:00}|hola    |2    |
|{2025-05-02 14:24:00, 2025-05-02 14:26:00}|mundo   |1    |
+------------------------------------------+--------+-----+



                                                                                

-------------------------------------------
Batch: 3
-------------------------------------------
+------------------------------------------+--------+-----+
|window                                    |palabra |count|
+------------------------------------------+--------+-----+
|{2025-05-02 14:23:00, 2025-05-02 14:25:00}|caracola|1    |
|{2025-05-02 14:23:00, 2025-05-02 14:25:00}|hola    |2    |
|{2025-05-02 14:23:00, 2025-05-02 14:25:00}|mundo   |1    |
|{2025-05-02 14:24:00, 2025-05-02 14:26:00}|caracola|1    |
|{2025-05-02 14:24:00, 2025-05-02 14:26:00}|hola    |2    |
|{2025-05-02 14:24:00, 2025-05-02 14:26:00}|mundo   |2    |
|{2025-05-02 14:24:00, 2025-05-02 14:26:00}|cruel   |1    |
|{2025-05-02 14:25:00, 2025-05-02 14:27:00}|mundo   |1    |
|{2025-05-02 14:25:00, 2025-05-02 14:27:00}|cruel   |1    |
+------------------------------------------+--------+-----+

