# Lab15

## P1. (7 pts) Medir la temperatura y la humedad:
### Consultas requeridas
1. Obtener todas las mediciones de humedad del sensor `SENS001` del último día. La consulta debe incluir el tiempo restante de vida (TTL) de cada registro.

2. Detectar valores anómalos fuera del rango permitido en la última hora. Implementar una consulta que identifique mediciones de temperatura o humedad fuera del rango normal. La consulta debe permitir filtrar por sensor específico.

3. Verificar el tiempo restante de vida de los datos usando la función TTL. Implementar una consulta que muestre el TTL en diferentes unidades (segundos, horas, días). Crear una consulta para identificar datos que están próximos a expirar (ej: en las próximas 24 horas).

### Estructura de tablas propuestas
La tabla `sensor_readings` tendrá un propósito general que permite filtrar por tipo de medicion, sensor y día. Esta tabla se usará para la consulta 1.

La tabla `sensor_anomalies` será destinada a optimizar la consulta 2, esta nos permite filtrar por hora ya que se incluye este dato en el partition key, además de tener clustering por el valor de la medición lo que facilita hallar los valores anómalos.

La tabla `sensor_by_date` fue pensada para usarse con la consulta 3, porque particiona solo por la fecha y con esto poder filtrar los datos con más de 6 días de antigüedad.

Se respeta el tiempo de vida de los datos de 7 días de acuerdo a lo solicitado, y se considera adicionalmente un tiempo de vida de solo 2 horas para el seguimiento de datos anómalos ya que se espera que siempre se consulten los insertados en la hora pasada.

In [None]:
create table if not exists sensor_readings
(
    measurement_type text,
    sensor_id        text,
    date             text,
    event_time       timestamp,
    measurement      double,
    primary key ( (measurement_type, sensor_id, date), event_time )
) with clustering order by (event_time desc) and
        default_time_to_live = 604800; -- 7 days

create table if not exists sensor_anomalies
(
    measurement_type text,
    sensor_id        text,
    hour             text,
    event_time       timestamp,
    measurement      double,
    primary key ( (measurement_type, sensor_id, hour), measurement, event_time)
) with clustering order by (measurement asc, event_time desc) and
        default_time_to_live = 7200; -- 2 hour

create table if not exists sensor_by_date
(
    measurement_type text,
    sensor_id        text,
    date             text,
    event_time       timestamp,
    measurement      double,
    primary key ( date, event_time )
) with clustering order by (event_time asc) and
        default_time_to_live = 604800; -- 7 days

### Configuración del entorno de trabajo

Instalación de driver de cassandra para python, se requiere haber configurado un entorno local de conda previamente.

In [22]:
%conda update -n base -c defaults conda
%conda install -c anaconda libev
%conda install -c msys2 m2-make
%conda install -c conda-forge pkg-config
%pip install cassandra-driver

Channels:
 - defaults
Platform: win-64
Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.




    current version: 25.3.1
    latest version: 25.5.1

Please update conda by running

    $ conda update -n base -c defaults conda




Channels:
 - anaconda
 - defaults
Platform: win-64
Collecting package metadata (repodata.json): done
Solving environment: failed

Note: you may need to restart the kernel to use updated packages.



PackagesNotFoundError: The following packages are not available from current channels:

  - libev

Current channels:

  - https://conda.anaconda.org/anaconda
  - defaults

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.




Channels:
 - msys2
 - defaults
Platform: win-64
Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.




    current version: 25.3.1
    latest version: 25.5.1

Please update conda by running

    $ conda update -n base -c defaults conda




Channels:
 - conda-forge
 - defaults
Platform: win-64
Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.




    current version: 25.3.1
    latest version: 25.5.1

Please update conda by running

    $ conda update -n base -c defaults conda




Note: you may need to restart the kernel to use updated packages.


### Generación de datos y pruebas

In [None]:
import random
import time
from datetime import datetime, timedelta
from cassandra.cluster import Cluster
from cassandra.query import BatchStatement, BatchType

# Parametros de simulación
N_SENSORS = 5
SAMPLING_RATE = 60 # 1 minuto en segundos
SAMPLING_TIME = 604800  # 7 días en segundos
ID_PREFIX = 'SENS'
MESUREMENTS_TYPES = ('temperatura', 'humedad')
NORMAL_RANGE = {'temperatura': (15, 35),
                'humedad': (30, 80)}

# Precomputar rango extendido de valores para simulación de anomalías
EXTENDED_RANGE = {
    type: (low - (high - low) * 0.2,
        high + (high - low) * 0.2
    )
    for type, (low, high) in NORMAL_RANGE.items()
}

# Conexión a Cassandra en Docker
cluster = Cluster(
  ['localhost'], port=9042,
  protocol_version=4,
  connect_timeout=5,
  idle_heartbeat_interval=30,
  control_connection_timeout=10
)
cassandra = cluster.connect('my_keyspace')

def create_tables() -> None:
    # Crear tabla para lecturas de sensores
    cassandra.execute("""
        create table if not exists sensor_readings
        (
            measurement_type text,
            sensor_id        text,
            date             text,
            event_time       timestamp,
            measurement      double,
            primary key ( (measurement_type, sensor_id, date), event_time )
        ) with clustering order by (event_time desc) and
                default_time_to_live = 604800
    """)
    cassandra.execute("""
        create table if not exists sensor_anomalies
        (
            measurement_type text,
            sensor_id        text,
            hour             text,
            event_time       timestamp,
            measurement      double,
            primary key ( (measurement_type, sensor_id, hour), measurement, event_time)
        ) with clustering order by (measurement asc, event_time desc) and
                default_time_to_live = 7200
    """)
    cassandra.execute("""
        create table if not exists sensor_by_date
        (
            measurement_type text,
            sensor_id        text,
            date             text,
            event_time       timestamp,
            measurement      double,
            primary key ( date, event_time )
        ) with clustering order by (event_time asc) and
                default_time_to_live = 604800;
    """)

def drop_tables() -> None:
    # Eliminar tablas si existen
    cassandra.execute("drop table if exists sensor_readings")
    cassandra.execute("drop table if exists sensor_anomalies")
    cassandra.execute("drop table if exists sensor_by_date")
    
def generate_sensor_data() -> None:
    # Generar indetificadores únicos para cada sensor
    ids: list[str] = [f"{ID_PREFIX}{str(i).zfill(3)}" for i in range(1, N_SENSORS + 1)]

    # Preparar queries
    insert_reading = cassandra.prepare("""
        INSERT INTO sensor_readings (measurement_type, sensor_id, date, event_time, measurement)
        VALUES (?, ?, ?, ?, ?)
    """)
    insert_anomaly = cassandra.prepare("""
        INSERT INTO sensor_anomalies (measurement_type, sensor_id, hour, event_time, measurement)
        VALUES (?, ?, ?, ?, ?)
    """)
    insert_by_date = cassandra.prepare("""
        INSERT INTO sensor_by_date (measurement_type, sensor_id, date, event_time, measurement)
        VALUES (?, ?, ?, ?, ?)
    """)

    # Definir el tiempo de inicio y fin para la generación de datos
    start_time: datetime = datetime.now() - timedelta(seconds=SAMPLING_TIME)
    end_time: datetime = datetime.now()
    current_time: datetime = end_time   # Insertar en orden descendente

    # Usar BatchStatement para agrupar inserciones
    batch = BatchStatement(batch_type=BatchType.UNLOGGED)

    # Lista para almacenar las futuras ejecuciones asíncronas
    futures = []

    # Limite de inflight para evitar sobrecargar Cassandra
    max_inflight = 16

    while current_time >= start_time:
        date = current_time.strftime('%Y-%m-%d')
        hour = current_time.strftime('%Y-%m-%dT%H')

        for id in ids:
            for type in MESUREMENTS_TYPES:
                ext_low, ext_high = EXTENDED_RANGE[type]
                measurement = random.uniform(ext_low, ext_high)

                batch.add(insert_reading, (type, id, date, current_time, measurement))
                batch.add(insert_anomaly, (type, id, hour, current_time, measurement))
                batch.add(insert_by_date, (type, id, date, current_time, measurement))

        # Ejecutar el batch de forma asíncrona
        futures.append(cassandra.execute_async(batch))
        batch.clear()  # Limpiar el batch para la siguiente iteración

        # Si el batch alcanza el límite de inflight, esperar a que se completen
        if len(futures) >= max_inflight:
            for f in futures:
                f.result()
            futures.clear()

        # Retroceder el tiempo para la siguiente iteración
        current_time -= timedelta(seconds=SAMPLING_RATE)
    
    # Esperar batchs restantes
    for f in futures:
        f.result()

    result = cassandra.execute("select count(*) from sensor_readings")
    count = list(result)[0][0]
    print("Inserción de datos de prueba completada.\nTotal de registros insertados:", count)

def query_1(type: str, id: str) -> None:
    print("\nQuery 1:")
    # Calcular el bucket del día anterior
    target_date: str = (datetime.now() - timedelta(days=1)).strftime('%Y-%m-%d')
    # Ejecutar consulta y medir el tiempo de ejecución
    start_time = time.time()
    rows = cassandra.execute("""
        select event_time, measurement, ttl(measurement) as ttl
        from sensor_readings
        where measurement_type = %s
            and sensor_id = %s
            and date = %s
    """, (type, id, target_date))
    end_time = time.time()
    query_time = end_time - start_time
    rows: list = list(rows)  # Convertir a lista para poder usar len() y slicing
    print(f"Lecturas de {type} para el sensor {id} del día de ayer {target_date}:")
    print(f"{len(rows)} resultados encontrados en {query_time:.4f} segundos")
    for row in rows[:5]:  # Limitar a las primeras 5 lecturas
        event_time = row.event_time.strftime('%Y-%m-%d %H:%M:%S')
        measurement = row.measurement
        ttl_seconds = row.ttl
        print(f"  - {event_time}: {measurement} (TTL: {ttl_seconds} segundos)")

def query_2(type: str, id: str) -> None:
    print("\nQuery 2:")
    # Calcular el bucket de la hora anterior
    target_hour: str = (datetime.now() - timedelta(hours=1)).strftime('%Y-%m-%dT%H')
    # Obtener el rango normal para el tipo de medición
    low, high = NORMAL_RANGE[type]
    # Ejecutar consulta y medir el tiempo de ejecución
    start_time = time.time()
    rows_low = cassandra.execute("""
        select event_time, measurement
        from sensor_anomalies
        where measurement_type = %s
            and sensor_id = %s
            and hour = %s
            and measurement < %s
    """, (type, id, target_hour, low))

    rows_high = cassandra.execute("""
        select event_time, measurement
        from sensor_anomalies
        where measurement_type = %s
            and sensor_id = %s
            and hour = %s
            and measurement > %s
    """, (type, id, target_hour, high))
    end_time = time.time()
    query_time = end_time - start_time
    rows: list = list(rows_low) + list(rows_high)
    rows.sort(key=lambda row: row.event_time, reverse=True)
    print(f"Anomalías de {type} para el sensor {id} de la hora pasada {target_hour}:")
    print(f"{len(rows)} resultados encontrados en {query_time:.4f} segundos")
    for row in rows[:5]:  # Limitar a las primeras 5 anomalías
        event_time = row.event_time.strftime('%Y-%m-%d %H:%M:%S')
        measurement = row.measurement
        print(f"  - {event_time}: {measurement}")

def query_3() -> None:
    print("\nQuery 3:")
    # Calcular el bucket de la fecha de hace 6 días
    target_date: str = (datetime.now() - timedelta(days=6)).strftime('%Y-%m-%d')
    # Ejecutar consulta y medir el tiempo de ejecución
    start_time = time.time()
    rows = cassandra.execute("""
        select measurement_type, sensor_id, date, event_time, ttl(measurement) as ttl
        from sensor_by_date
        where date = %s
    """, (target_date,))
    end_time = time.time()
    query_time = end_time - start_time
    rows: list = list(rows)  # Convertir a lista para poder usar len() y slicing
    print(f"Datos de más de 6 días de antigüedad ({target_date}):")
    print(f"{len(rows)} resultados encontrados en {query_time:.4f} segundos")
    for row in rows[:5]:  # Limitar a las primeras 5 lecturas
        measurement_type = row.measurement_type
        sensor_id = row.sensor_id
        date = row.date
        ttl_seconds = row.ttl
        ttl_minutes = ttl_seconds // 60
        ttl_hours = ttl_minutes // 60
        ttl_days = ttl_hours // 24
        print(f"  - {measurement_type} del sensor {sensor_id} del día {date} (TTL: {ttl_seconds} segundos, {ttl_minutes} minutos, {ttl_hours} horas, {ttl_days} días)")

def test() -> None:
    # Crear tablas
    create_tables()
    
    # Generar datos de prueba
    generate_sensor_data()
    
    # Ejecutar consultas de prueba
    query_1('temperatura', 'SENS001')
    query_2('humedad', 'SENS002')
    query_3()
    
    # Limpiar tablas
    drop_tables()

test()
# Cerrar conexión a Cassandra
cluster.shutdown()

Inserción de datos de prueba completada.
Total de registros insertados: 100810

Query 1:
Lecturas de temperatura para el sensor SENS001 del día de ayer 2025-07-07:
1440 resultados encontrados en 0.0445 segundos
  - 2025-07-07 23:59:23: 29.201208291475858 (TTL: 604789 segundos)
  - 2025-07-07 23:58:23: 26.325772840598976 (TTL: 604789 segundos)
  - 2025-07-07 23:57:23: 16.36373289016412 (TTL: 604789 segundos)
  - 2025-07-07 23:56:23: 15.755248981596546 (TTL: 604789 segundos)
  - 2025-07-07 23:55:23: 29.982972045719457 (TTL: 604789 segundos)

Query 2:
Anomalías de humedad para el sensor SENS002 de la hora pasada 2025-07-08T15:
20 resultados encontrados en 0.0318 segundos
  - 2025-07-08 15:59:23: 26.105229674183267
  - 2025-07-08 15:57:23: 84.11479523132168
  - 2025-07-08 15:52:23: 29.90490526548601
  - 2025-07-08 15:50:23: 29.105740757183067
  - 2025-07-08 15:49:23: 26.49023812956321

Query 3:
Datos de más de 6 días de antigüedad (2025-07-02):
1440 resultados encontrados en 0.0457 segundo

## P2. (13 pts) Evaluación Experimental

### Cluster de Cassandra con Docker Compose:

In [10]:
with open("docker-compose.yml") as f:
    print(f.read())

version: '3.8'

services:
  cassandra1:
    image: cassandra:4.1
    container_name: cassandra1
    hostname: cassandra1
    networks:
      - cassandra-net
    ports:
      - "9042:9042"
    environment:
      CASSANDRA_CLUSTER_NAME: "CassandraCluster"
      CASSANDRA_DC: DC1
      CASSANDRA_RACK: RAC1
      CASSANDRA_SEEDS: "cassandra1,cassandra2,cassandra3"
      MAX_HEAP_SIZE: 1024M
      HEAP_NEWSIZE: 256M
    mem_limit: 1536m

  cassandra2:
    image: cassandra:4.1
    container_name: cassandra2
    hostname: cassandra2
    networks:
      - cassandra-net
    depends_on:
      - cassandra1
    environment:
      CASSANDRA_CLUSTER_NAME: "CassandraCluster"
      CASSANDRA_DC: DC1
      CASSANDRA_RACK: RAC1
      CASSANDRA_SEEDS: "cassandra1,cassandra2,cassandra3"
      MAX_HEAP_SIZE: 1024M
      HEAP_NEWSIZE: 256M
    mem_limit: 1536m

  cassandra3:
    image: cassandra:4.1
    container_name: cassandra3
    hostname: cassandra3
    networks:
      - cassandra-net
    depends_on:
 

In [15]:
!docker compose down -v
!docker compose up -d

 Container cassandra2  Stopping
 Container cassandra3  Stopping
 Container cassandra3  Stopped
 Container cassandra3  Removing
 Container cassandra3  Removed
 Container cassandra2  Stopped
 Container cassandra2  Removing
 Container cassandra2  Removed
 Container cassandra1  Stopping
 Container cassandra1  Stopped
 Container cassandra1  Removing
 Container cassandra1  Removed
 Network lab15_cassandra-net  Removing
 Network lab15_cassandra-net  Removed
 Network lab15_cassandra-net  Creating
 Network lab15_cassandra-net  Created
 Container cassandra1  Creating
 Container cassandra1  Created
 Container cassandra3  Creating
 Container cassandra2  Creating
 Container cassandra3  Created
 Container cassandra2  Created
 Container cassandra1  Starting
 Container cassandra1  Started
 Container cassandra2  Starting
 Container cassandra3  Starting
 Container cassandra3  Started
 Container cassandra2  Started


In [None]:
CREATE KEYSPACE IF NOT EXISTS my_keyspace
    WITH replication = {
        'class': 'NetworkTopologyStrategy',
        'datacenter1': '3'
    }

In [None]:
from cassandra.cluster import Cluster

cluster = Cluster(['localhost'], port=9042)
session = cluster.connect()

# Crear keyspace (si no existe)
session.execute("""
    CREATE KEYSPACE IF NOT EXISTS my_keyspace
    WITH replication = {
        'class': 'NetworkTopologyStrategy',
        'datacenter1': '3'
    }
""")

# Verificar la creación del keyspace
rows = session.execute("""
    SELECT keyspace_name, replication
    FROM system_schema.keyspaces
    WHERE keyspace_name = 'my_keyspace'
""")

for row in rows:
    print("Keyspace:", row.keyspace_name)
    print("Replicación:", row.replication)


Keyspace: my_keyspace
Replicación: {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'datacenter1': '3'}


In [26]:
!docker ps

CONTAINER ID   IMAGE           COMMAND                  CREATED          STATUS          PORTS                                                       NAMES
5674d14e8152   cassandra:4.1   "docker-entrypoint.s…"   10 minutes ago   Up 10 minutes   7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp                 cassandra2
3ba274a98a65   cassandra:4.1   "docker-entrypoint.s…"   10 minutes ago   Up 10 minutes   7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp                 cassandra3
c62be9070c46   cassandra:4.1   "docker-entrypoint.s…"   10 minutes ago   Up 10 minutes   7000-7001/tcp, 7199/tcp, 9160/tcp, 0.0.0.0:9042->9042/tcp   cassandra1


In [27]:
!docker exec cassandra1 nodetool status

Datacenter: datacenter1
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens  Owns (effective)  Host ID                               Rack 
UN  172.21.0.4  75.44 KiB  16      100.0%            91278d84-a6f5-4c4a-a15d-4d130f69f9c3  rack1
UN  172.21.0.2  75.45 KiB  16      100.0%            911df817-5bbb-4120-8416-5a526e0645ae  rack1
UN  172.21.0.3  75.44 KiB  16      100.0%            7d2e0357-0bdc-4cfe-be43-3236e4121186  rack1



### PostgreSQL
Se usa una instancia local en el puerto 5432.

In [24]:
%conda install -c conda-forge psycopg2
%conda install -c conda-forge pandas
%conda install -c conda-forge asyncpg

Retrieving notices: done
Channels:
 - conda-forge
 - defaults
Platform: win-64
Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.




    current version: 25.3.1
    latest version: 25.5.1

Please update conda by running

    $ conda update -n base -c defaults conda




Channels:
 - conda-forge
 - defaults
Platform: win-64
Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.




    current version: 25.3.1
    latest version: 25.5.1

Please update conda by running

    $ conda update -n base -c defaults conda




Channels:
 - conda-forge
 - defaults
Platform: win-64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: c:\Users\Jvnc\Documents\BD2\BD2-Labs\Lab15\.conda

  added / updated specs:
    - asyncpg


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    asyncpg-0.27.0             |  py311ha68e1ae_1         570 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         570 KB

The following NEW packages will be INSTALLED:

  asyncpg            conda-forge/win-64::asyncpg-0.27.0-py311ha68e1ae_1 



Downloading and Extracting Packages: ...working...
asyncpg-0.27.0       | 570 KB    |            |   0% 
asyncpg-0.27.0       | 570 KB    | 2          |   3% 
asyncpg-0.27.0       | 570 KB    | #6         |  17% 
asyncpg-0.27.0       | 570 KB    | ###6     



    current version: 25.3.1
    latest version: 25.5.1

Please update conda by running

    $ conda update -n base -c defaults conda




### Conexión desde Jupyter Notebook

In [14]:
import psycopg2
import random
import time
import pandas as pd
from cassandra.cluster import Cluster
from cassandra.query import BatchStatement
from datetime import datetime, timedelta
from psycopg2.extras import execute_values
from collections import defaultdict

cluster = Cluster(
  ['localhost'], port=9042,
  protocol_version=4,
  connect_timeout=5,
  idle_heartbeat_interval=30,
  control_connection_timeout=10
)
cassandra = cluster.connect('my_keyspace')

connect = psycopg2.connect(dbname="lab15",user="postgres",password="postgres",host="localhost",port=5432)
connect.autocommit = True
postgres = connect.cursor()

def create_table_cassandra() -> None:
    cassandra.execute("""
        create table if not exists temperature_measurements
        (
            sensor_id   text,
            date        text,
            event_time  timestamp,
            temperature double,
            humidity    double,
            primary key ((sensor_id, date), event_time)
        )
    """)

def create_table_postgres() -> None:
    postgres.execute("""
        create table if not exists temperature_measurements
        (
            sensor_id   varchar(20),
            date        varchar(10),
            event_time  timestamp,
            temperature double precision,
            humidity    double precision,
            primary key (sensor_id, date, event_time)
        )
    """)

def create_tables() -> None:
    create_table_cassandra()
    create_table_postgres()

def drop_table_cassandra() -> None:
    cassandra.execute("drop table if exists temperature_measurements")

def drop_table_postgres() -> None:
    postgres.execute("drop table if exists temperature_measurements")

def drop_tables() -> None:
    drop_table_cassandra()
    drop_table_postgres()

NORMAL_RANGE = [(15, 35), (30, 80)]  # Rango normal para temperatura y humedad
EXTENDED_RANGE = [(low - (high - low) * 0.2, high + (high - low) * 0.2) for low, high in NORMAL_RANGE]

def generate_data(sensors: int, days: int) -> list[tuple]:
    now = datetime.now()
    ids = [f"SENS{str(i).zfill(3)}" for i in range(1, sensors + 1)]
    data: list[tuple] = []
    for id in ids:
        for day in range(days):
            date_obj = now - timedelta(days=day)
            date_str = date_obj.strftime('%Y-%m-%d')
            for minute in range(24 * 60):
                timestamp = date_obj.replace(hour=0, minute=0, second=0, microsecond=0) + timedelta(minutes=minute)
                temperature = random.uniform(*EXTENDED_RANGE[0])
                humidity = random.uniform(*EXTENDED_RANGE[1])
                data.append((id, date_str, timestamp, temperature, humidity))
    return data

def insert_postgres(data: list[tuple]) -> float:
    drop_table_postgres()
    create_table_postgres()
    query = """
        INSERT INTO temperature_measurements (sensor_id, date, event_time, temperature, humidity)
        VALUES %s
    """
    start = time.time()
    execute_values(postgres, query, data)
    end = time.time()
    return end - start

def insert_cassandra(data: list[tuple], batch_size: int) -> float:
    drop_table_cassandra()
    create_table_cassandra()
    partitioned = defaultdict(list)
    for row in data:
        partitioned[(row[0], row[1])].append(row)
    
    prepared = cassandra.prepare("""
        INSERT INTO temperature_measurements (sensor_id, date, event_time, temperature, humidity)
        VALUES (?, ?, ?, ?, ?)
    """)
    
    batch = BatchStatement()
    start = time.time()
    for rows in partitioned.values():
        for i in range(0, len(rows), batch_size):
            for row in rows[i:i + batch_size]:
                batch.add(prepared, row)
            cassandra.execute(batch)
            batch.clear()
    end = time.time()
    return end - start

def insert_test() -> None:
    print("Prueba de inserción de datos:")
    volumes: list = [7, 15, 30, 60] # Volúmenes de datos en días
    batch_sizes: list = [100, 200, 500, 1000]
    columns = ['dias', 'postgres'] + [f'cassandra({size})' for size in batch_sizes]
    results = pd.DataFrame(columns=columns)
    for volume in volumes:
        data: list[tuple] = generate_data(5, volume)
        row = {'dias': volume}
        row['postgres'] = insert_postgres(data)
        for size in batch_sizes:
            row[f'cassandra({size})'] = insert_cassandra(data, size)
        results = pd.concat([results, pd.DataFrame([row])], ignore_index=True)
    display(results)

### a) Pruebas de Escritura (INSERT):
Se realizaron pruebas preliminares de inserción individual en Cassandra utilizando un volumen reducido de 5 sensores durante 7 días. El tiempo requerido para completar la inserción fue excesivo, por lo que se descartó esta estrategia y se optó por evaluar únicamente la inserción por lotes (batch).

Para los experimentos se emplearon los siguientes parámetros:
- Sensores: 5
- Volúmenes de datos en días: 7, 15, 30, 60

En el caso de PostgreSQL, el tamaño del batch corresponde siempre al total de datos generados para cada volumen, es decir, toda la inserción se realiza en una sola operación masiva. Para Cassandra, se evaluaron diferentes tamaños de batch (100, 200, 500 y 1000) para analizar su impacto en el rendimiento.

La tabla de resultados presenta una columna para PostgreSQL y una columna para cada tamaño de batch en Cassandra, agrupando los resultados por volumen de datos (días).

Este enfoque permite comparar de manera clara el efecto del tamaño de batch en Cassandra y la diferencia de desempeño respecto a PostgreSQL.

In [70]:
insert_test()

Prueba de inserción de datos:


Prueba de inserción de datos:


Unnamed: 0,dias,postgres,cassandra(100),cassandra(200),cassandra(500),cassandra(1000)
0,7,2.109822,30.822227,15.861103,4.538761,4.344583
1,15,3.271147,66.552932,34.268575,15.353234,8.453861
2,30,143.133282,130.770741,5650.858164,30.523972,17.589561
3,60,13.444289,999.740117,509.043554,61.017116,36.38892


#### Propuesta de Optimización: Inserción Asíncrona y Concurrente

Para mejorar el rendimiento en la inserción masiva de datos, se propone el uso de estrategias asíncronas y concurrentes tanto en Cassandra como en PostgreSQL:

- **Cassandra:** Utilizar `execute_async` para enviar múltiples lotes (batches) en paralelo, controlando la cantidad de operaciones simultáneas con un parámetro de workers. Esto permite aprovechar mejor los recursos del clúster y reducir el tiempo total de inserción.

- **PostgreSQL:** Implementar inserción concurrente usando `ThreadPoolExecutor` junto con `execute_values` (psycopg2), donde cada thread maneja su propio batch y conexión.

In [7]:
from concurrent.futures import ThreadPoolExecutor, as_completed

def insert_cassandra_async(data: list[tuple], batch_size: int, max_workers: int) -> float:
    drop_table_cassandra()
    create_table_cassandra()
    partitioned = defaultdict(list)
    for row in data:
        partitioned[(row[0], row[1])].append(row)
    prepared = cassandra.prepare("""
        INSERT INTO temperature_measurements (sensor_id, date, event_time, temperature, humidity)
        VALUES (?, ?, ?, ?, ?)
    """)
    batches: list[BatchStatement] = []
    for rows in partitioned.values():
        for i in range(0, len(rows), batch_size):
            batch = BatchStatement()
            for row in rows[i:i + batch_size]:
                batch.add(prepared, row)
            batches.append(batch)
    start = time.time()
    futures = []
    for batch in batches:
        futures.append(cassandra.execute_async(batch))
        if len(futures) >= max_workers:
            for future in futures:
                future.result()
            futures.clear()
    for future in futures:
        future.result()
    end = time.time()
    return end - start

def insert_postgres_async(data: list[tuple], batch_size: int, max_workers: int) -> float:
    drop_table_postgres()
    create_table_postgres()
    query = """
        insert into temperature_measurements (sensor_id, date, event_time, temperature, humidity)
        values %s
    """
    batches = [data[i:i + batch_size] for i in range(0, len(data), batch_size)]
    start = time.time()
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = [executor.submit(execute_values, postgres, query, batch) for batch in batches]
        for future in as_completed(futures):
            future.result()
    end = time.time()
    return end - start

def insert_test_async() -> None:
    print("Prueba de inserción de datos batch + async:")
    volumes: list[int] = [7, 15, 30, 60] # Volúmenes de datos en días
    workers_amounts: list[int] = [2, 4, 8, 16]   # Cantidad de workers para la inserción asíncrona
    columns = ['dias'] + [f'cassandra({workers})' for workers in workers_amounts] + [f'postgres({workers})' for workers in workers_amounts]
    results = pd.DataFrame(columns=columns, dtype=float)
    for volume in volumes:
        data: list[tuple] = generate_data(5, volume)
        row = {'dias': float(volume)}
        for workers in workers_amounts:
            row[f'cassandra({workers})'] = insert_cassandra_async(data, 1000, workers)
            row[f'postgres({workers})'] = insert_postgres_async(data, 1000, workers)
        results = pd.concat([results, pd.DataFrame([row])], ignore_index=True)
    display(results)

Se realizaron pruebas variando la cantidad de workers para analizar cómo escala el rendimiento de inserción en ambos motores.

Los resultados muestran que **Cassandra** aprovecha mucho mejor el aumento de concurrencia: a mayor número de workers y mayor volumen de datos, el tiempo de inserción disminuye significativamente, llegando incluso a superar el rendimiento de PostgreSQL en escenarios de alta concurrencia y grandes volúmenes.

Por el contrario, **PostgreSQL** no presenta mejoras notables al incrementar la cantidad de workers bajo este enfoque, ya que su modelo de concurrencia y manejo de conexiones limita el beneficio de la paralelización en la inserción por lotes. Esto resalta la arquitectura distribuida y orientada a la escalabilidad de Cassandra frente al enfoque tradicional de PostgreSQL para cargas masivas de escritura.

In [28]:
insert_test_async()

Prueba de inserción de datos batch + async:


Unnamed: 0,dias,cassandra(2),cassandra(4),cassandra(8),cassandra(16),postgres(2),postgres(4),postgres(8),postgres(16)
0,7.0,1.350659,0.528152,0.374644,0.442715,1.152745,1.220498,1.146912,1.151179
1,15.0,3.424133,0.980803,0.751128,0.654749,2.667092,2.705638,2.666091,2.746336
2,30.0,6.270462,1.993381,1.56474,1.475055,5.539926,5.50582,5.416455,5.349141
3,60.0,12.91506,3.903924,2.884975,2.543653,11.948052,11.177784,12.02829,13.132751


### b) Pruebas de Lectura (SELECT):
1. Consulta por sensor y rango temporal: Obtener datos de un sensor específico en un día
2. Consulta agregada: Calcular temperatura promedio de la última hora para múltiples sensores
3. Consulta de rango de valores: Encontrar lecturas anómalas (fuera de rango normal)

In [40]:
def query_1() -> None:
    sensor_id = 'SENS003'
    date = (datetime.now() - timedelta(days=3)).strftime('%Y-%m-%d')
    print(f"Query 1: Lecturas del sensor {sensor_id} del día {date}:")

    prepared = cassandra.prepare("""
        select event_time, temperature, humidity
        from temperature_measurements
        where sensor_id = ?
        and date = ?
        order by event_time desc
    """)
    start = time.time()
    rows = cassandra.execute(prepared, (sensor_id, date))
    end = time.time()
    cassandra_time = end - start
    print(f"Cassandra ({cassandra_time:.4f} segundos):")
    display(pd.DataFrame(rows, columns=['event_time', 'temperature', 'humidity']))

    start = time.time()
    postgres.execute("""
        select event_time, temperature, humidity
        from temperature_measurements
        where sensor_id = %s
        and date = %s
        order by event_time desc
    """, (sensor_id, date))
    rows = postgres.fetchall()
    end = time.time()
    postgres_time = end - start
    print(f"PostgreSQL ({postgres_time:.4f} segundos):")
    display(pd.DataFrame(rows, columns=['event_time', 'temperature', 'humidity']))

query_1()

Query 1: Lecturas del sensor SENS003 del día 2025-07-06:
Cassandra (0.0912 segundos):


Unnamed: 0,event_time,temperature,humidity
0,2025-07-06 23:59:00,20.432070,50.138416
1,2025-07-06 23:58:00,36.385529,65.201529
2,2025-07-06 23:57:00,26.675042,73.799254
3,2025-07-06 23:56:00,20.527720,53.530202
4,2025-07-06 23:55:00,15.134275,20.709181
...,...,...,...
1435,2025-07-06 00:04:00,32.254224,38.521825
1436,2025-07-06 00:03:00,38.323663,72.440591
1437,2025-07-06 00:02:00,38.082472,59.049028
1438,2025-07-06 00:01:00,34.118799,77.032230


PostgreSQL (0.0055 segundos):


Unnamed: 0,event_time,temperature,humidity
0,2025-07-06 23:59:00,20.432070,50.138416
1,2025-07-06 23:58:00,36.385529,65.201529
2,2025-07-06 23:57:00,26.675042,73.799254
3,2025-07-06 23:56:00,20.527720,53.530202
4,2025-07-06 23:55:00,15.134275,20.709181
...,...,...,...
1435,2025-07-06 00:04:00,32.254224,38.521825
1436,2025-07-06 00:03:00,38.323663,72.440591
1437,2025-07-06 00:02:00,38.082472,59.049028
1438,2025-07-06 00:01:00,34.118799,77.032230


In [None]:
def query_2() -> None:
    print("Query 2: Temperatura promedio de la última hora para múltiples sensores:")
    sensor_ids = [f'SENS{str(i).zfill(3)}' for i in range(1, 6)]
    date = datetime.now().strftime('%Y-%m-%d')
    hour = datetime.now() - timedelta(hours=1)

    prepared = cassandra.prepare("""
        select temperature
        from temperature_measurements
        where sensor_id = ?
        and date = ?
        and event_time >= ?
    """)
    results: list[dict] = []
    start = time.time()
    for sensor_id in sensor_ids:
        rows = list(cassandra.execute(prepared, (sensor_id, date, hour)))
        avg_temp = sum(row.temperature for row in rows) / len(rows) if rows else None
        row = {'sensor_id': sensor_id, 'avg_temp': avg_temp}
        results.append(row)
    end = time.time()
    cassandra_time = end - start
    print(f"Cassandra ({cassandra_time:.4f} segundos):")
    display(pd.DataFrame(results))

    query = """
        select sensor_id, avg(temperature) as avg_temp
        from temperature_measurements
        where date = %s
        and event_time >= %s
        group by sensor_id
    """
    start = time.time()
    postgres.execute(query, (date, hour))
    rows = postgres.fetchall()
    end = time.time()
    postgres_time = end - start
    print(f"PostgreSQL ({postgres_time:.4f} segundos):")
    display(pd.DataFrame(rows, columns=['sensor_id', 'avg_temp']))

query_2()

Query 2: Temperatura promedio de la última hora para múltiples sensores:
Cassandra (0.0929 segundos):


Unnamed: 0,sensor_id,avg_temp
0,SENS001,25.702443
1,SENS002,24.236884
2,SENS003,23.855362
3,SENS004,25.725693
4,SENS005,26.446482


PostgreSQL (0.0767 segundos):


Unnamed: 0,sensor_id,avg_temp
0,SENS001,25.702443
1,SENS002,24.236884
2,SENS003,23.855362
3,SENS004,25.725693
4,SENS005,26.446482


In [None]:
def query_3() -> None:
    print("Query 3: Anomalías de temperatura del último día para el sensor SENS003:")
    id = 'SENS003'
    date = (datetime.now() - timedelta(days=1)).strftime('%Y-%m-%d')
    low = NORMAL_RANGE[0][0]
    high = NORMAL_RANGE[0][1]
    prepared1 = cassandra.prepare("""
        select event_time, temperature
        from temperature_measurements
        where sensor_id = ?
        and date = ?
        and temperature < ?
        allow filtering
    """)
    prepared2 = cassandra.prepare("""
        select event_time, temperature
        from temperature_measurements
        where sensor_id = ?
        and date = ?
        and temperature > ?
        allow filtering
    """)
    start = time.time()
    rows_low = cassandra.execute(prepared1, (id, date, low))
    rows_high = cassandra.execute(prepared2, (id, date, high))
    end = time.time()
    cassandra_time = end - start
    print(f"Cassandra ({cassandra_time:.4f} segundos):")
    results: list = list(rows_low) + list(rows_high)
    display(pd.DataFrame(results))

    query = """
        select event_time, temperature
        from temperature_measurements
        where sensor_id = %s
        and date = %s
        and (temperature < %s or temperature > %s)
    """
    start = time.time()
    postgres.execute(query, (id, date, low, high))
    rows = postgres.fetchall()
    end = time.time()
    postgres_time = end - start
    print(f"PostgreSQL ({postgres_time:.4f} segundos):")
    display(pd.DataFrame(rows, columns=['event_time', 'temperature']))

query_3()

Query 3: Anomalías de temperatura del último día para el sensor SENS003:
Cassandra (0.0852 segundos):


Unnamed: 0,event_time,temperature
0,2025-07-08 00:07:00,11.719097
1,2025-07-08 00:24:00,11.483746
2,2025-07-08 00:26:00,13.354506
3,2025-07-08 00:33:00,11.203284
4,2025-07-08 00:34:00,12.737963
...,...,...
424,2025-07-08 22:55:00,36.212633
425,2025-07-08 23:22:00,38.680030
426,2025-07-08 23:31:00,37.853090
427,2025-07-08 23:47:00,36.550552


PostgreSQL (0.0030 segundos):


Unnamed: 0,event_time,temperature
0,2025-07-08 00:00:00,36.981614
1,2025-07-08 00:05:00,38.235439
2,2025-07-08 00:07:00,11.719097
3,2025-07-08 00:08:00,37.144994
4,2025-07-08 00:14:00,36.326375
...,...,...
424,2025-07-08 23:24:00,12.891641
425,2025-07-08 23:31:00,37.853090
426,2025-07-08 23:47:00,36.550552
427,2025-07-08 23:50:00,37.191001


### c) Pruebas de Escalabilidad y Distribución:

#### Evaluar el rendimiento con diferentes tamaños de dataset

A continuación se presenta una código que genera un tabla.


In [None]:
def query_1_test() -> tuple[float, float]:
    sensor_id = 'SENS003'
    date = (datetime.now() - timedelta(days=3)).strftime('%Y-%m-%d')
    prepared = cassandra.prepare("""
        select event_time, temperature, humidity
        from temperature_measurements
        where sensor_id = ?
        and date = ?
        order by event_time desc
    """)
    start = time.time()
    rows = cassandra.execute(prepared, (sensor_id, date))
    end = time.time()
    cassandra_time = end - start
    start = time.time()
    postgres.execute("""
        select event_time, temperature, humidity
        from temperature_measurements
        where sensor_id = %s
        and date = %s
        order by event_time desc
    """, (sensor_id, date))
    rows = postgres.fetchall()
    end = time.time()
    postgres_time = end - start
    return postgres_time, cassandra_time

def query_2_test() -> tuple[float, float]:
    date = datetime.now().strftime('%Y-%m-%d')
    hour = datetime.now() - timedelta(hours=1)
    prepared = cassandra.prepare("""
        select temperature
        from temperature_measurements
        where sensor_id = ?
        and date = ?
        and event_time >= ?
    """)
    sensor_ids = [f'SENS{str(i).zfill(3)}' for i in range(1, 6)]
    results: list[dict] = []
    start = time.time()
    for sensor_id in sensor_ids:
        rows = list(cassandra.execute(prepared, (sensor_id, date, hour)))
        avg_temp = sum(row.temperature for row in rows) / len(rows) if rows else None
        row = {'sensor_id': sensor_id, 'avg_temp': avg_temp}
        results.append(row)
    end = time.time()
    cassandra_time = end - start
    query = """
        select sensor_id, avg(temperature) as avg_temp
        from temperature_measurements
        where date = %s
        and event_time >= %s
        group by sensor_id
    """
    start = time.time()
    postgres.execute(query, (date, hour))
    rows = postgres.fetchall()
    end = time.time()
    postgres_time = end - start
    return postgres_time, cassandra_time

def query_3_test() -> tuple[float, float]:
    id = 'SENS003'
    date = (datetime.now() - timedelta(days=1)).strftime('%Y-%m-%d')
    low = NORMAL_RANGE[0][0]
    high = NORMAL_RANGE[0][1]
    prepared1 = cassandra.prepare("""
        select event_time, temperature
        from temperature_measurements
        where sensor_id = ?
        and date = ?
        and temperature < ?
        allow filtering
    """)
    prepared2 = cassandra.prepare("""
        select event_time, temperature
        from temperature_measurements
        where sensor_id = ?
        and date = ?
        and temperature > ?
        allow filtering
    """)
    start = time.time()
    rows_low = cassandra.execute(prepared1, (id, date, low))
    rows_high = cassandra.execute(prepared2, (id, date, high))
    end = time.time()
    cassandra_time = end - start
    query = """
        select event_time, temperature
        from temperature_measurements
        where sensor_id = %s
        and date = %s
        and (temperature < %s or temperature > %s)
    """
    start = time.time()
    postgres.execute(query, (id, date, low, high))
    rows = postgres.fetchall()
    end = time.time()
    postgres_time = end - start
    return postgres_time, cassandra_time

def test_queries() -> None:
    volumes: list[int] = [7, 15, 30, 60]
    columns = ['dias'] + [f'query_{i}_{db}' for i in range(1, 4) for db in ['postgres', 'cassandra']]
    results = pd.DataFrame(columns=columns, dtype=float)
    for volume in volumes:
        row = {'dias': float(volume)}
        drop_tables()
        create_tables()
        data: list[tuple] = generate_data(5, volume)
        insert_cassandra_async(data, 1000, 16)
        insert_postgres_async(data, 1000, 16)
        queries = [query_1_test, query_2_test, query_3_test]
        for i, query in enumerate(queries, start=1):
            row[f'query_{i}_postgres'], row[f'query_{i}_cassandra'] = query()
        results = pd.concat([results, pd.DataFrame([row])], ignore_index=True)
    display(results)

test_queries()

Unnamed: 0,dias,query_1_postgres,query_1_cassandra,query_2_postgres,query_2_cassandra,query_3_postgres,query_3_cassandra
0,7.0,0.0,0.046451,0.007188,0.080746,0.002201,0.052265
1,15.0,0.004999,0.062288,0.012226,0.080369,0.002729,0.038299
2,30.0,0.002019,0.032457,0.023674,0.090534,0.002949,0.037638
3,60.0,0.003553,0.057512,0.096373,0.121468,0.0,0.058627


#### Distribución de datos Cassandra

In [7]:
!docker exec cassandra1 nodetool status
!docker exec cassandra1 nodetool cfstats my_keyspace

Datacenter: datacenter1
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens  Owns (effective)  Host ID                               Rack 
UN  172.21.0.4  10.46 MiB  16      100.0%            91278d84-a6f5-4c4a-a15d-4d130f69f9c3  rack1
UN  172.21.0.2  10.48 MiB  16      100.0%            911df817-5bbb-4120-8416-5a526e0645ae  rack1
UN  172.21.0.3  10.45 MiB  16      100.0%            7d2e0357-0bdc-4cfe-be43-3236e4121186  rack1

Total number of tables: 45
----------------
Keyspace : my_keyspace
	Read Count: 0
	Read Latency: NaN ms
	Write Count: 280
	Write Latency: 4.262975 ms
	Pending Flushes: 0
		Table: temperature_measurements
		SSTable count: 2
		Old SSTable count: 0
		Space used (live): 10677017
		Space used (total): 10677017
		Space used by snapshots (total): 0
		Off heap memory used (total): 7508
		SSTable Compression Ratio: 0.7470015852751786
		Number of partitions (estimate): 300
		Memtable cell count: 0
		Memtable data size: 0
		Memtable off h

In [10]:
!docker exec cassandra2 nodetool status
!docker exec cassandra2 nodetool cfstats my_keyspace

Datacenter: datacenter1
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens  Owns (effective)  Host ID                               Rack 
UN  172.21.0.4  10.46 MiB  16      100.0%            91278d84-a6f5-4c4a-a15d-4d130f69f9c3  rack1
UN  172.21.0.2  10.48 MiB  16      100.0%            911df817-5bbb-4120-8416-5a526e0645ae  rack1
UN  172.21.0.3  10.45 MiB  16      100.0%            7d2e0357-0bdc-4cfe-be43-3236e4121186  rack1

Total number of tables: 45
----------------
Keyspace : my_keyspace
	Read Count: 0
	Read Latency: NaN ms
	Write Count: 283
	Write Latency: 4.909388692579505 ms
	Pending Flushes: 0
		Table: temperature_measurements
		SSTable count: 2
		Old SSTable count: 0
		Space used (live): 10677102
		Space used (total): 10677102
		Space used by snapshots (total): 0
		Off heap memory used (total): 7508
		SSTable Compression Ratio: 0.7469277255384783
		Number of partitions (estimate): 300
		Memtable cell count: 0
		Memtable data size: 0
		Memta

In [9]:
!docker exec cassandra3 nodetool status
!docker exec cassandra3 nodetool cfstats my_keyspace

Datacenter: datacenter1
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens  Owns (effective)  Host ID                               Rack 
UN  172.21.0.4  10.46 MiB  16      100.0%            91278d84-a6f5-4c4a-a15d-4d130f69f9c3  rack1
UN  172.21.0.2  10.48 MiB  16      100.0%            911df817-5bbb-4120-8416-5a526e0645ae  rack1
UN  172.21.0.3  10.45 MiB  16      100.0%            7d2e0357-0bdc-4cfe-be43-3236e4121186  rack1

Total number of tables: 45
----------------
Keyspace : my_keyspace
	Read Count: 0
	Read Latency: NaN ms
	Write Count: 280
	Write Latency: 4.197903571428571 ms
	Pending Flushes: 0
		Table: temperature_measurements
		SSTable count: 2
		Old SSTable count: 0
		Space used (live): 10677017
		Space used (total): 10677017
		Space used by snapshots (total): 0
		Off heap memory used (total): 7508
		SSTable Compression Ratio: 0.7470015852751786
		Number of partitions (estimate): 300
		Memtable cell count: 0
		Memtable data size: 0
		Memta

#### Tolerancia a fallos en Cassandra

In [17]:
def test_failures() -> None:
    print("Probando tolerancia a fallos:")
    # Simular fallo en cassandra1
    print("\nFallo en cassandra2:")
    !docker stop cassandra2
    time.sleep(5)  # Esperar a que el nodo se considere caído

    sensor_id = 'SENS003'
    date = (datetime.now() - timedelta(days=3)).strftime('%Y-%m-%d')
    prepared = cassandra.prepare("""
        select event_time, temperature, humidity
        from temperature_measurements
        where sensor_id = ?
        and date = ?
        order by event_time desc
    """)
    rows_failure = cassandra.execute(prepared, (sensor_id, date))
    print(f"Lecturas de {sensor_id} del día {date} tras el fallo:")
    display(pd.DataFrame(rows_failure, columns=['event_time', 'temperature', 'humidity']))

    # Simular recuperación de cassandra1
    print("\nRecuperando cassandra1:")
    !docker start cassandra2
    time.sleep(10)  # Esperar a que el nodo se una al clúster

    sensor_id = 'SENS003'
    date = (datetime.now() - timedelta(days=3)).strftime('%Y-%m-%d')
    prepared = cassandra.prepare("""
        select event_time, temperature, humidity
        from temperature_measurements
        where sensor_id = ?
        and date = ?
        order by event_time desc
    """)
    rows_recovery = cassandra.execute(prepared, (sensor_id, date))
    print(f"Lecturas de {sensor_id} del día {date} tras la recuperación:")
    display(pd.DataFrame(rows_recovery, columns=['event_time', 'temperature', 'humidity']))

    if len(list(rows_recovery)) == len(list(rows_failure)):
        print("La recuperación fue exitosa: se mantuvieron los datos tras el fallo.")
    else:
        print("La recuperación falló: los datos no se mantuvieron tras el fallo.")

test_failures()

Probando tolerancia a fallos:

Fallo en cassandra2:
cassandra2
Lecturas de SENS003 del día 2025-07-09 tras el fallo:


Unnamed: 0,event_time,temperature,humidity
0,2025-07-09 23:59:00,14.489894,88.452036
1,2025-07-09 23:58:00,27.333203,50.283353
2,2025-07-09 23:57:00,36.729118,61.678925
3,2025-07-09 23:56:00,32.053702,72.793595
4,2025-07-09 23:55:00,22.453724,22.655668
...,...,...,...
1435,2025-07-09 00:04:00,29.528217,69.688122
1436,2025-07-09 00:03:00,20.903408,44.615700
1437,2025-07-09 00:02:00,28.781048,20.147594
1438,2025-07-09 00:01:00,22.554355,84.396352



Recuperando cassandra1:
cassandra2
Lecturas de SENS003 del día 2025-07-09 tras la recuperación:


Unnamed: 0,event_time,temperature,humidity
0,2025-07-09 23:59:00,14.489894,88.452036
1,2025-07-09 23:58:00,27.333203,50.283353
2,2025-07-09 23:57:00,36.729118,61.678925
3,2025-07-09 23:56:00,32.053702,72.793595
4,2025-07-09 23:55:00,22.453724,22.655668
...,...,...,...
1435,2025-07-09 00:04:00,29.528217,69.688122
1436,2025-07-09 00:03:00,20.903408,44.615700
1437,2025-07-09 00:02:00,28.781048,20.147594
1438,2025-07-09 00:01:00,22.554355,84.396352


La recuperación fue exitosa: se mantuvieron los datos tras el fallo.
