# Exemplo ventanas deslizantes
En primeiro lugar iniciamos a sesión como nos casos anteriores:


In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import explode
from pyspark.sql.functions import split
import string

spark = SparkSession.builder \
    .appName("SlidingWindows") \
    .config("spark.sql.legacy.timeParserPolicy", "LEGACY") \
    .getOrCreate()

print("Versión: ",spark.version)



Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


Neste exemplo imos ler, de novo, os datos desde un *socket*. Antes de nada póñese en marcha co seguinte comando:
```bash
nc -lk 9999
```

O seguinte paso é poñer en marcha o *stream* de lectura, exactamente igual ca no caso anterior.

In [1]:
df_lineas = spark.readStream \
    .format("socket") \
    .option("host", "localhost") \
    .option("port", "9999") \
    .option('includeTimestamp', 'true')\
    .load()

df_lineas.printSchema()

NameError: name 'spark' is not defined

La creación de *df_palabras* también es exactamente igual al caso anterior.

In [3]:
from pyspark.sql.functions import explode, split
df_palabras = df_lineas.select(
    explode(split(df_lineas.value, ' ')).alias('palabra'),
    df_lineas.timestamp)

Como en este caso vamos usar ventanas deslizante ahora, además de indicar la duración de la ventana (1 minuto), indicamos el deslizamiento (1 minuto).

In [4]:
from pyspark.sql.functions import window
windowed_counts = df_palabras.groupBy(
    window(df_palabras.timestamp, "2 minutes", "1 minute"), df_palabras.palabra
).count().orderBy('window')

Finalmente, lanzamos la consulta indicando la salida a consola. Como observaréis cuando introducimos una nueva línea sus palabras se agregan a dos ventanas. La del minuto actual y la del anterior.

In [5]:
query = windowed_counts \
          .writeStream \
          .outputMode("complete") \
          .format("console") \
          .queryName("consulta1") \
          .option("truncate","false") \
          .start()

25/05/05 14:22:12 WARN ResolveWriteToStream: Temporary checkpoint location created which is deleted normally when the query didn't fail: /tmp/temporary-efbe4242-b407-4637-8210-ac1c21f73524. If it's required to delete it under any circumstances, please set spark.sql.streaming.forceDeleteTempCheckpointLocation to true. Important to know deleting temp checkpoint folder is best effort.
25/05/05 14:22:12 WARN ResolveWriteToStream: spark.sql.adaptive.enabled is not supported in streaming DataFrames/Datasets and will be disabled.
                                                                                

-------------------------------------------
Batch: 0
-------------------------------------------
+------+-------+-----+
|window|palabra|count|
+------+-------+-----+
+------+-------+-----+



                                                                                

-------------------------------------------
Batch: 1
-------------------------------------------
+------------------------------------------+-------+-----+
|window                                    |palabra|count|
+------------------------------------------+-------+-----+
|{2025-05-05 14:21:00, 2025-05-05 14:23:00}|mundo  |1    |
|{2025-05-05 14:21:00, 2025-05-05 14:23:00}|ola    |1    |
|{2025-05-05 14:22:00, 2025-05-05 14:24:00}|mundo  |1    |
|{2025-05-05 14:22:00, 2025-05-05 14:24:00}|ola    |1    |
+------------------------------------------+-------+-----+



                                                                                

-------------------------------------------
Batch: 2
-------------------------------------------
+------------------------------------------+--------+-----+
|window                                    |palabra |count|
+------------------------------------------+--------+-----+
|{2025-05-05 14:21:00, 2025-05-05 14:23:00}|caracola|1    |
|{2025-05-05 14:21:00, 2025-05-05 14:23:00}|mundo   |1    |
|{2025-05-05 14:21:00, 2025-05-05 14:23:00}|ola     |2    |
|{2025-05-05 14:22:00, 2025-05-05 14:24:00}|mundo   |1    |
|{2025-05-05 14:22:00, 2025-05-05 14:24:00}|caracola|1    |
|{2025-05-05 14:22:00, 2025-05-05 14:24:00}|ola     |2    |
+------------------------------------------+--------+-----+



                                                                                

-------------------------------------------
Batch: 3
-------------------------------------------
+------------------------------------------+---------+-----+
|window                                    |palabra  |count|
+------------------------------------------+---------+-----+
|{2025-05-05 14:21:00, 2025-05-05 14:23:00}|caracola |1    |
|{2025-05-05 14:21:00, 2025-05-05 14:23:00}|mundo    |1    |
|{2025-05-05 14:21:00, 2025-05-05 14:23:00}|ola      |2    |
|{2025-05-05 14:22:00, 2025-05-05 14:24:00}|mundo    |1    |
|{2025-05-05 14:22:00, 2025-05-05 14:24:00}|caracola |1    |
|{2025-05-05 14:22:00, 2025-05-05 14:24:00}|ola      |3    |
|{2025-05-05 14:22:00, 2025-05-05 14:24:00}|dende    |1    |
|{2025-05-05 14:22:00, 2025-05-05 14:24:00}|instituto|1    |
|{2025-05-05 14:22:00, 2025-05-05 14:24:00}|o        |1    |
|{2025-05-05 14:23:00, 2025-05-05 14:25:00}|dende    |1    |
|{2025-05-05 14:23:00, 2025-05-05 14:25:00}|instituto|1    |
|{2025-05-05 14:23:00, 2025-05-05 14:25:00}|o    