<a href="https://colab.research.google.com/github/cam2149/MachineLearningV/blob/main/NN-RNN-CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Equipo**

- Nicolás Colmenares

- Carlos Martinez

1. Implementación de una Red Convolucional (CNN) adaptada a series temporales.- 1 pts

  -  .

  - .

  - .

**Situación:**
Una ciudad enfrenta un aumento significativo de casos de dengue, con una tasa de incidencia que supera el promedio nacional.
La anticipación de brotes es crucial para implementar medidas preventivas y reducir la propagación de la enfermedad.

**Objetivo:**
Desarrollar un modelo predictivo utilizando redes neuronales para pronosticar futuros brotes de dengue en cada barrio de la ciudad.
Utilizar una base de datos histórica de casos de dengue desde 2015 hasta 2022 para entrenar el modelo.
Anticiparse a los brotes con al menos 3 semanas de anticipación.

**Finalidad:**
Permitir a las autoridades de salud pública tomar acciones oportunas, como:
Preparar a las instituciones prestadoras de salud (IPS).
Gestionar recursos (carros fumigadores, limpieza de sumideros).
Capacitar a la comunidad.

*   Red Convolucional (CNN) adaptada a series temporales.
*   .
*   .

## Diccionario

train.parquet - El conjunto de datos de entrenamiento
test.parquet - El conjunto de datos de prueba
sample_submission.csv - un ejemplo de un archivo a someter en la competencia

| **Variable**         | **Descripción**                                                                                      |
|-----------------------|------------------------------------------------------------------------------------------------------|
| id_bar               | identificador único del barrio                                                                      |
| anio                 | Año de ocurrencia                                                                                   |
| semana               | Semana de ocurrencia                                                                               |
| Estrato              | Estrato socioeconómico del barrio                                                                   |
| area_barrio          | Área del barrio en km²                                                                             |
| dengue               | Conteo de casos de dengue                                                                          |
| concentraciones      | Cantidad de visitas e intervención a lugares de concentración humana (Instituciones)                |
| vivienda             | Conteo de las visitas a viviendas a revisión y control de criaderos                                 |
| equipesado           | Conteo de las fumigaciones con Maquinaria Pesada                                                   |
| sumideros            | Conteo de las intervenciones a los sumideros                                                       |
| maquina              | Conteo de las fumigaciones con motomochila                                                         |
| lluvia_mean          | Lluvia promedio en la semana i                                                                     |
| lluvia_var           | Varianza de la lluvia en la semana i                                                               |
| lluvia_max           | Lluvia máxima en la semana i                                                                       |
| lluvia_min           | Lluvia mínima en la semana i                                                                       |
| temperatura_mean     | Temperatura promedio en la semana i                                                                |
| temperatura_var      | Varianza de la temperatura en la semana i                                                          |
| temperatura_max      | Temperatura máxima en la semana i                                                                  |
| temperatura_min      | Temperatura mínima en la semana i                                                                  |


# 0. Configuraciones de Colab

Mover Kaggle.json a la ubicación correcta después de subirlo

In [None]:
#Estas líneas son comandos de shell que se ejecutan dentro del Jupyter notebook. Se usan para configurar las credenciales de la API de Kaggle, que son necesarias para descargar conjuntos de datos (datasets) desde Kaggle.

!mkdir -p ~/.kaggle
!mv kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

In [None]:
!rm -rf /content/kaggle/output
!rm -rf /content/kaggle/input

Descargar dataset de la competencia

In [None]:
!kaggle competitions download -c aa-v-2025-i-pronosticos-nn-rnn-cnn

In [None]:
!mkdir -p /content/kaggle/output
!mkdir -p /content/kaggle/input

In [None]:
!mv aa-v-2025-i-pronosticos-nn-rnn-cnn.zip /content/kaggle/input

In [None]:
!unzip /content/kaggle/input/aa-v-2025-i-pronosticos-nn-rnn-cnn.zip -d /content/kaggle/input/

In [None]:
#/kaggle/input
import os
for dirname, _, filenames in os.walk('/content/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


# 1. Imports

In [None]:
import os
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error, accuracy_score
from tqdm import tqdm

In [None]:
#Printing library versions
print('Pandas:', pd.__version__)
print('Numpy:', np.__version__)
print('PyTorch:', torch.__version__)

In [None]:
import warnings
warnings.filterwarnings("ignore")

#2. Configuración Inicial y Carga de Datos

In [None]:
config = {
    "TRAIN_DIR": '/content/kaggle/input/df_train.parquet',
    "TEST_DIR": '/content/kaggle/input/df_test.parquet',
    "SUBMISSION_DIR": '/content/sample_submission.csv',
    "batch_size" : 32
}

In [None]:
# Configuración del dispositivo
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
# Cargar datos
train_df = pd.read_parquet(config["TRAIN_DIR"])
test_df = pd.read_parquet(config["TEST_DIR"])

In [None]:
train_df.head(2)

#3. Preprocesamiento de Datos

##3.1 Generar Columna fecha
Creamos la columna fecha basada en anio y semana, asignando el último día de cada semana como índice.

In [None]:
def get_week_end_date(row):
    return pd.to_datetime(f'{row["anio"]}-W{row["semana"]}-6', format='%Y-W%W-%w')

train_df['fecha'] = train_df.apply(get_week_end_date, axis=1)
test_df['fecha'] = test_df.apply(get_week_end_date, axis=1)

# Establecer 'fecha' como índice
train_df.set_index('fecha', inplace=True)
test_df.set_index('fecha', inplace=True)

##3.2 Selección de Características
Definimos las características de entrada, considerando las correlaciones altas entre variables (e.g., lluvia_mean y lluvia_var: 0.82). Para simplificar, usamos todas las características disponibles y dejamos que el modelo aprenda las relaciones.

In [None]:
features = ['ESTRATO', 'area_barrio', 'concentraciones', 'vivienda', 'equipesado', 'sumideros', 'maquina',
            'lluvia_mean', 'lluvia_var', 'lluvia_max', 'lluvia_min', 'temperatura_mean', 'temperatura_var',
            'temperatura_max', 'temperatura_min']
target = 'dengue'

##3.3 Normalización
Normalizamos las características y el target usando MinMaxScaler.

In [None]:
scaler = MinMaxScaler()
train_df[features] = scaler.fit_transform(train_df[features])
test_df[features] = scaler.transform(test_df[features])

target_scaler = MinMaxScaler()
train_df[target] = target_scaler.fit_transform(train_df[[target]])

##3.3 Crear Secuencias para Series Temporales
Para predecir con 3 semanas de anticipación, usamos una ventana de 5 semanas (window_size=5) y un horizonte de 3 semanas (horizon=3).

In [None]:
window_size = 5
horizon = 3

train_df = train_df.sort_values(by=['id_bar', 'fecha'])

sequences = []
targets = []

for id_bar, group in train_df.groupby('id_bar'):
    group = group.sort_values('fecha')
    for i in range(window_size, len(group) - horizon + 1):
        seq = group[features].iloc[i - window_size:i].values
        target_val = group[target].iloc[i + horizon - 1]
        sequences.append(seq)
        targets.append(target_val)

sequences = np.array(sequences)
targets = np.array(targets)

##3.4 Dataset y DataLoader
Creamos un Dataset personalizado y dividimos en entrenamiento y validación.

In [None]:
class TimeSeriesDataset(Dataset):
    def __init__(self, sequences, targets):
        self.sequences = sequences
        self.targets = targets

    def __len__(self):
        return len(self.sequences)

    def __getitem__(self, idx):
        return torch.tensor(self.sequences[idx], dtype=torch.float32), torch.tensor(self.targets[idx], dtype=torch.float32)

num_sequences = len(sequences)
train_size = int(0.8 * num_sequences)
train_sequences, val_sequences = sequences[:train_size], sequences[train_size:]
train_targets, val_targets = targets[:train_size], targets[train_size:]

train_dataset = TimeSeriesDataset(train_sequences, train_targets)
val_dataset = TimeSeriesDataset(val_sequences, val_targets)

batch_size = config["batch_size"]
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

#4. Implementación de Modelos

##4.1 Modelo MLP
Un Perceptrón Multicapa que aplana las secuencias.

In [None]:
class MLPModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(MLPModel, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        x = x.view(x.size(0), -1)
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

input_dim = window_size * len(features)
hidden_dim = 64
output_dim = 1
mlp_model = MLPModel(input_dim, hidden_dim, output_dim).to(DEVICE)

#4.2 Modelo CNN para Series Temporales
Una CNN 1D adaptada a series temporales.

In [None]:
class CNNModel(nn.Module):
    def __init__(self, num_features, hidden_dim, output_dim):
        super(CNNModel, self).__init__()
        self.conv1 = nn.Conv1d(num_features, hidden_dim, kernel_size=3, padding=1)
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool1d(2)
        self.conv2 = nn.Conv1d(hidden_dim, hidden_dim, kernel_size=3, padding=1)
        self.fc = nn.Linear(hidden_dim * (window_size // 2), output_dim)

    def forward(self, x):
        x = x.permute(0, 2, 1)
        out = self.conv1(x)
        out = self.relu(out)
        out = self.pool(out)
        out = self.conv2(out)
        out = self.relu(out)
        out = out.view(out.size(0), -1)
        out = self.fc(out)
        return out

cnn_model = CNNModel(len(features), hidden_dim, output_dim).to(DEVICE)

#4.3 Modelo RNN Básico
Implementación proporcionada con estados iniciales definidos.

In [None]:
class RNNModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(RNNModel, self).__init__()
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.layer_dim = layer_dim
        self.output_dim = output_dim
        self.rnn = nn.RNN(input_dim, hidden_dim, layer_dim, batch_first=True, nonlinearity='relu')
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_().to(DEVICE)
        out, hn = self.rnn(x, h0)
        out = self.fc(out[:, -1, :])
        return out

rnn_model = RNNModel(len(features), hidden_dim, 1, output_dim).to(DEVICE)

#4.4 Modelo LSTM

In [None]:
class LSTMModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(LSTMModel, self).__init__()
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.layer_dim = layer_dim
        self.output_dim = output_dim
        self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True).to(DEVICE)
        self.fc = nn.Linear(hidden_dim, output_dim).to(DEVICE)

    def forward(self, x):
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_().to(DEVICE)
        c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_().to(DEVICE)
        out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
        out = self.fc(out[:, -1, :])
        return out

lstm_model = LSTMModel(len(features), hidden_dim, 1, output_dim).to(DEVICE)

#4.5 Modelo GRU

In [None]:
class GRUModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(GRUModel, self).__init__()
        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.layer_dim = layer_dim
        self.output_dim = output_dim
        self.gru = nn.GRU(input_dim, hidden_dim, layer_dim, batch_first=True).to(DEVICE)
        self.fc = nn.Linear(hidden_dim, output_dim).to(DEVICE)

    def forward(self, x):
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_().to(DEVICE)
        out, hn = self.gru(x, h0.detach())
        out = self.fc(out[:, -1, :])
        return out

gru_model = GRUModel(len(features), hidden_dim, 1, output_dim).to(DEVICE)

#5. Entrenamiento y Evaluación

##5.1 Función de Entrenamiento

In [None]:
def train_model(model, train_loader, val_loader, epochs, optimizer, criterion):
    train_losses = []
    val_losses = []

    for epoch in range(epochs):
        model.train()
        train_loss = 0

        # Create a progress bar for the training loop
        train_pbar = tqdm(train_loader, desc=f'Epoch {epoch+1}/{epochs} [Train]', leave=False)

        for x, y in train_pbar:
            x, y = x.to(DEVICE), y.to(DEVICE)
            optimizer.zero_grad()
            output = model(x)
            loss = criterion(output, y)
            loss.backward()
            optimizer.step()
            train_loss += loss.item()

            # Update progress bar
            train_pbar.set_postfix({'loss': f'{loss.item():.4f}'})

        train_loss /= len(train_loader)
        train_losses.append(train_loss)

        model.eval()
        val_loss = 0

        # Create a progress bar for the validation loop
        val_pbar = tqdm(val_loader, desc=f'Epoch {epoch+1}/{epochs} [Val]', leave=False)

        with torch.no_grad():
            for x, y in val_pbar:
                x, y = x.to(DEVICE), y.to(DEVICE)
                output = model(x)
                batch_loss = criterion(output, y).item()
                val_loss += batch_loss

                # Update progress bar
                val_pbar.set_postfix({'loss': f'{batch_loss:.4f}'})

        val_loss /= len(val_loader)
        val_losses.append(val_loss)

        if (epoch + 1) % 10 == 0:  # Imprimir solo cada 10 épocas
            print(f'Epoch {epoch+1}/{epochs}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}')

    return train_losses, val_losses

##5.2 Función de Evaluación

In [None]:
def evaluate_model(model, val_loader):
    model.eval()
    preds, actuals = [], []
    with torch.no_grad():
        for x, y in val_loader:
            x = x.to(DEVICE)
            output = model(x)
            preds.extend(output.cpu().numpy())
            actuals.extend(y.numpy())
    preds = target_scaler.inverse_transform(np.array(preds).reshape(-1, 1))
    actuals = target_scaler.inverse_transform(np.array(actuals).reshape(-1, 1))
    mae = mean_absolute_error(actuals, preds)
    mse = mean_squared_error(actuals, preds)
    rmse = np.sqrt(mse)
    print(f'MAE: {mae:.4f}, MSE: {mse:.4f}, RMSE: {rmse:.4f}')
    return mae, mse, rmse, preds, actuals

##5.3 Entrenar Modelos
Entrenamos cada modelo con hiperparámetros fijos para comparación inicial.

In [None]:
criterion = nn.MSELoss()
models = {'MLP': mlp_model, 'CNN': cnn_model, 'RNN': rnn_model, 'LSTM': lstm_model, 'GRU': gru_model}
results = {}

for name, model in models.items():
    print(f'\nEntrenando {name}...')
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    train_losses, val_losses = train_model(model, train_loader, val_loader, epochs=100, optimizer=optimizer, criterion=criterion)
    mae, mse, rmse, preds, actuals = evaluate_model(model, val_loader)
    results[name] = {'train_losses': train_losses, 'val_losses': val_losses, 'mae': mae, 'mse': mse, 'rmse': rmse}

##5.4 Gráficos
Generamos gráficos de pérdidas y predicciones vs reales.

In [None]:
def plot_losses(train_losses, val_losses, title):
    plt.plot(train_losses, label='Train Loss')
    plt.plot(val_losses, label='Val Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title(title)
    plt.legend()
    plt.show()

def plot_predictions(actuals, preds, title):
    plt.plot(actuals[:100], label='Actual')
    plt.plot(preds[:100], label='Predicted')
    plt.xlabel('Sample')
    plt.ylabel('Dengue Cases')
    plt.title(title)
    plt.legend()
    plt.show()

for name in models.keys():
    plot_losses(results[name]['train_losses'], results[name]['val_losses'], f'{name} Losses')
    _, _, _, preds, actuals = evaluate_model(models[name], val_loader)
    plot_predictions(actuals, preds, f'{name} Predictions vs Actual')

#6. Selección del Mejor Modelo
Realizamos una búsqueda simple sobre hiperparámetros para cada modelo y seleccionamos el mejor basado en RMSE.

In [None]:
epochs_list = [100, 300, 500]
learning_rates = [0.01, 0.001]
optimizers = [optim.Adam, optim.AdamW]
batch_sizes = [16, 32, 48]

best_model_name = None
best_rmse = float('inf')
best_config = {}

for name, model_class in {'MLP': MLPModel, 'CNN': CNNModel, 'RNN': RNNModel, 'LSTM': LSTMModel, 'GRU': GRUModel}.items():
  print(f'\nEntrenando {name}...')
  for epochs in epochs_list:
      for lr in learning_rates:
          for opt_class in optimizers:
              for bs in batch_sizes:
                  if name == 'MLP':
                      model = model_class(input_dim, hidden_dim, output_dim).to(DEVICE)
                  else:
                      model = model_class(len(features), hidden_dim, 1, output_dim).to(DEVICE)
                  train_loader = DataLoader(train_dataset, batch_size=bs, shuffle=True)
                  val_loader = DataLoader(val_dataset, batch_size=bs, shuffle=False)
                  optimizer = opt_class(model.parameters(), lr=lr)
                  train_losses, val_losses = train_model(model, train_loader, val_loader, epochs, optimizer, criterion)
                  _, _, rmse, _, _ = evaluate_model(model, val_loader)
                  if rmse < best_rmse:
                      best_rmse = rmse
                      best_model_name = name
                      best_config = {'epochs': epochs, 'lr': lr, 'optimizer': opt_class.__name__, 'batch_size': bs}

print(f'Mejor modelo: {best_model_name}, RMSE: {best_rmse:.4f}, Config: {best_config}')

#7. Predicción en el Test Set con MLP
Evaluamos el modelo MLP en el conjunto de test.

##7.1 Crear Secuencias para Test
Combinamos train y test para obtener las semanas previas necesarias.

In [None]:
combined_df = pd.concat([train_df.drop(columns=[target]), test_df], sort=False)
combined_df = combined_df.sort_values(by=['id_bar', 'fecha'])

test_sequences = []
test_ids = []

for idx, row in test_df.iterrows():
    id_bar = row['id_bar']
    fecha = row.name
    prev_dates = combined_df[(combined_df['id_bar'] == id_bar) & (combined_df.index < fecha)].tail(window_size)
    if len(prev_dates) == window_size:
        seq = prev_dates[features].values
        test_sequences.append(seq)
        test_ids.append(row['id'])

test_sequences = np.array(test_sequences)
test_tensor = torch.tensor(test_sequences, dtype=torch.float32).to(DEVICE)

##7.2 Entrenar MLP Final y Predecir

In [None]:
mlp_model = MLPModel(input_dim, hidden_dim, output_dim).to(DEVICE)
optimizer = optim.Adam(mlp_model.parameters(), lr=0.001)
train_losses, val_losses = train_model(mlp_model, train_loader, val_loader, epochs=100, optimizer=optimizer, criterion=criterion)

mlp_model.eval()
with torch.no_grad():
    preds = mlp_model(test_tensor).cpu().numpy()
preds = target_scaler.inverse_transform(preds)

submission = pd.DataFrame({'id': test_ids, 'dengue': preds.flatten()})
submission.to_csv(config["SUBMISSION_DIR"], index=False)
print("Archivo submission.csv generado exitosamente.")

#8. Resumen del Modelo

In [None]:
for name in results.keys():
    print(f"\nResumen del modelo {name}:")
    print(f"MAE: {results[name]['mae']:.4f}")
    print(f"MSE: {results[name]['mse']:.4f}")
    print(f"RMSE: {results[name]['rmse']:.4f}")
    print("Nota: Accuracy no aplica directamente en regresión; se usaron métricas MAE, MSE y RMSE.")