<a href="https://colab.research.google.com/github/c-cadona/gama/blob/main/Gama.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Readme

**Reprodução do Artigo - Previsão de Irradiância**

Este notebook tem o objetivo de reproduzir os experimentos descritos no artigo que o Lorenzo me enviou utilizando o dataset Folsom. O foco será obter resultados comparáveis ao artigo, considerando diferentes condições climáticas:

- All: Todas as imagens do dataset
- Clear: Apenas imagens de céu limpo --> **Apenas esse por enquanto.**
- Cloudy: Apenas imagens nubladas

As métricas de avaliação utilizadas serão:
- RMSE (Erro Quadrático Médio)
- MAE (Erro Médio Absoluto)
- R² (Coeficiente de Determinação)
- Cross-Validation (KFold)

## 1. Preparando ambiente

- Instalação de bibliotecas
- Cuda

### 1.1 Bibliotecas

In [None]:
!pip install -U pandas matplotlib torch torchvision scikit-learn --quiet
!pip install --upgrade numpy scipy pandas matplotlib scikit-learn torch torchvision

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m89.9/89.9 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.1/13.1 MB[0m [31m30.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.6/8.6 MB[0m [31m32.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m23.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m26.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m49.2 MB/s[0m eta [36m0:00:00[0m
[2K   [91m━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m68.7/664.8 MB[0m [31m25.0 MB/s[0m eta [36m0:00:24[0m
[?25h[31mERROR: Operation

In [None]:
# Importar PyTorch
import torch
from torch import nn
import torch.optim as optim
import torchvision
from torchvision import datasets, transforms, models
from torch.utils.data import Dataset, DataLoader

# Importar bibliotecas adicionais
import os
import json
import numpy as np
import pandas as pd
import pickle
import matplotlib.pyplot as plt

### 1.2 Cuda

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

if torch.cuda.is_available():
  torch.manual_seed(42)

print(device)

cuda


## 2. Carregando os dados

- Acessar dados no drive
- Carregar os dados
- Organizar os dados

## 2.1 Acessando dados do drive

In [None]:
import os
from google.colab import drive

drive.mount('/content/drive')

base_path = "/content/drive/MyDrive/Task - GAMMA"

# Verifica se já existe o link simbólico e remove antes de recriar
if os.path.exists("folsom_dataset") or os.path.islink("folsom_dataset"):
  os.unlink("folsom_dataset")  # Remove o link simbólico ou diretório existente

# Criar link simbólico para facilitar o acesso
os.symlink(os.path.join(base_path, "folsom_dataset"), "folsom_dataset")
print("Symbolic link created.")

print("Arquivos na pasta folsom_dataset:", os.listdir("folsom_dataset"))

ValueError: mount failed

### 2.2 Extraindo dados

In [None]:
# unzippar as imagens dentro da VM do colab, para nao ocupar disco do drive
!unzip folsom_dataset/folsom_images.zip

### 2.3 Organizando dados

In [None]:
# date modified das imagens
df_date_modif = pd.read_csv("folsom_dataset/df_date_modif.csv", index_col=0, parse_dates=True)
df_date_modif

In [None]:
# dados de irradiance
df_irradiance = pd.read_csv("folsom_dataset/Folsom_irradiance.csv", index_col=0, parse_dates=True)
df_irradiance

In [None]:
import pickle

# timestamps do conjunto de teste utilizado, LEVANDO EM CONSIDERAÇÃO O DATE MODIFIED COMO DATA REAL DAS IMAGENS
with open("folsom_dataset/test_timestamps.pkl", "rb") as f:
  test_timestamps = pd.to_datetime(pickle.load(f))
test_timestamps

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(10, 5))
ax.imshow(np.load(df_date_modif["path"].iloc[12345]))
ax.set_axis_off()

In [None]:
# hparams de treinamento
import json

with open("folsom_dataset/hparams.json", "r") as f:
  hparams = json.load(f)

batch_size = hparams["ResNet50"]["batch_size"]
learning_rate = hparams["ResNet50"]["learning_rate"]
dropout = hparams["ResNet50"]["dropout"]

## 3. Categorização - Sunmask

Utilizarei o pvlib - https://pvlib-python.readthedocs.io/en/stable/ para categorizar os dados e dividi-los como: Clear - Cloudy - All

**OBS**: Por enquanto utilizarei o csv que o Lorenzo me passou com os timestamps do céu limpo.

In [None]:
import pandas as pd

df_clear_sky = pd.read_csv("folsom_dataset/df_clear_sky.csv", index_col=0, parse_dates=True)

# Filtrando os dataframes para utilizar apenas céu limpo
df_date_modif_clear = df_date_modif[df_date_modif.index.isin(df_clear_sky.index)]
df_irradiance_clear = df_irradiance[df_irradiance.index.isin(df_clear_sky.index)]

In [None]:
# Instalando pvlib
# !pip uninstall -y numpy pvlib
# !pip install numpy
# !pip install --no-cache-dir pvlib

## 4. Dataset

### 4.1 Transform

In [None]:
from torchvision import transforms

transform = transforms.Compose([
  # transforms.Resize((64, 64)),
  transforms.ToTensor(),
  transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

### 4.2 Classe Dataset

In [None]:
from torch.utils.data import Dataset
from PIL import Image

class FolsomDataset(Dataset):
  def __init__(self, df_date_modif, df_irradiance, transform=None):
    """
      Args:
      df_date_modif (DataFrame): Contém os caminhos das imagens.
      df_irradiance (DataFrame): Contém os valores de irradiância indexados pelo timestamp.
      transform (callable, optional): Transformações a serem aplicadas às imagens.
      """
    self.df_date_modif = df_date_modif
    self.df_irradiance = df_irradiance
    self.transform = transform
    self.data = list(zip(df_date_modif["path"], df_irradiance["ghi"]))

  def __len__(self):
    return len(self.data)

  def __getitem__(self, idx):
    image_path, irradiance_value = self.data[idx]
    image_data = np.load(image_path)
    image = Image.fromarray(image_data.astype(np.uint8))

    if self.transform:
      image = self.transform(image)

    irradiance_value = torch.tensor(irradiance_value, dtype=torch.float32)
    return image, irradiance_value

### 4.3 DataLoader

In [None]:
from torch.utils.data import Subset, DataLoader
from sklearn.model_selection import train_test_split

# Criando os datasets (treino e teste)
train_dataset = FolsomDataset(df_date_modif=df_date_modif_clear, df_irradiance=df_irradiance_clear, transform=transform)
test_dataset = FolsomDataset(df_date_modif=df_date_modif_clear, df_irradiance=df_irradiance_clear, transform=transform)

# train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2)
# test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False,
# num_workers=2)

"""
Criando sub sets:
- Treino: 80% dos dados (dias ímpares)
- Teste: 20% dos dados (dias pares)
- Validação: 10% dos dados (dias ímpares sem interseção com o treino)
"""

df_date_modif['day_of_year'] = df_date_modif['timestamp'].dt.dayofyear

train_idx = df_date_modif[df_date_modif['day_of_year'] % 2 == 1].index
test_idx = df_date_modif[df_date_modif['day_of_year'] % 2 == 0].index

train_indices, val_indices = train_test_split(
  train_indices,
  test_size=0.20,
  random_state=42,
  shuffle=True
)

# Criando sub sets
train_subset = Subset(train_dataset, train_indices.tolist())
val_subset = Subset(train_dataset, val_indices.tolist())
test_subset = Subset(test_dataset, test_indices.tolist())

# Criando os DataLoaders
train_loader = DataLoader(train_subset, batch_size=batch_size, shuffle=True, num_workers=2)
val_loader = DataLoader(val_subset, batch_size=batch_size, shuffle=False, num_workers=2)
test_loader = DataLoader(test_subset, batch_size=batch_size, shuffle=False, num_workers=2)


## 5. Modelo

In [None]:
resnet50 = models.resnet50(weights="DEFAULT")

resnet50.fc = nn.Sequential(
  nn.Dropout(hparams["ResNet50"]["dropout"]),
  nn.Linear(resnet50.fc.in_features, 1)
)

loss_fn = nn.MSELoss()

optimizer = optim.AdamW(resnet50.parameters(), lr=hparams["ResNet50"]["learning_rate"], weight_decay=hparams["ResNet50"]["weight_decay"])

gamma=(1/10) ** (1/(0.75 * hparams["ResNet50"]["epochs"]))
scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=gamma)

# init_lr = hyperparameters["learning_rate"] #e-3
# last_lr = hyperparameters["learning_rate"] / 10 #e-4
# n_epochs = int(hyperparameters["epochs"] * hyperparameters["init_decay_epochs"])
# gamma = np.exp(np.log(last_lr / init_lr) / n_epochs)

model = resnet50.to(device)
print(model)

## 6. Treinamento

### 6.1 Loop de treinamento

In [None]:
epochs = hparams["ResNet50"]["epochs"]

for epoch in range(epochs):
  model.train()
  running_loss = 0.0

  for images, targets in train_loader:
    images, targets = images.to(device), targets.to(device)
    optimizer.zero_grad()
    outputs = resnet50(images).squeeze()
    loss = loss_fn(outputs, targets)
    loss.backward()
    optimizer.step()

    running_loss += loss.item()

  scheduler.step()

  print(f"Epoch {epoch+1}/{epochs}, Loss: {(running_loss/len(train_loader)):.4f}, LR: {optimizer.param_groups[0]['lr']:.15f}")


### 6.2 Fazendo predições

In [None]:
import numpy as np
import torch

torch.manual_seed(42)
torch.cuda.manual_seed_all(42)

def predict(model, dataloader, device):
  model.eval()
  predictions, ground_truths = [], []

  with torch.inference_mode():
    for images, targets in dataloader:
      images, targets = images.to(device), targets.to(device)
      outputs = model(images).squeeze()

      predictions.extend(outputs.cpu().numpy())
      ground_truths.extend(targets.cpu().numpy())

  return np.array(predictions), np.array(ground_truths)

## 7. Avaliando modelo

In [None]:
!pip install --upgrade numpy scipy torchmetrics

In [None]:
import torch

def evaluate_model(predictions, ground_truths):
    predictions = torch.tensor(predictions, dtype=torch.float32)
    ground_truths = torch.tensor(ground_truths, dtype=torch.float32)

    # Erro quadrático médio (MSE) e RMSE
    mse = torch.mean((predictions - ground_truths) ** 2)
    rmse = torch.sqrt(mse)

    # Erro absoluto médio (MAE)
    mae = torch.mean(torch.abs(predictions - ground_truths))

    # Desvio padrão dos erros (RMSE e MAE)
    std_rmse = torch.std((predictions - ground_truths) ** 2).sqrt()
    std_mae = torch.std(torch.abs(predictions - ground_truths))

    # R² Score (coeficiente de determinação)
    ss_res = torch.sum((ground_truths - predictions) ** 2)
    ss_tot = torch.sum((ground_truths - torch.mean(ground_truths)) ** 2)
    r2 = 1 - ss_res / ss_tot if ss_tot != 0 else torch.tensor(0.0)

    return rmse.item(), std_rmse.item(), mae.item(), std_mae.item(), r2.item()

In [None]:
predictions, ground_truths = predict(model, test_loader, device)
rmse, std_rmse, mae, std_mae, r2 = evaluate_model(predictions, ground_truths)

print(f"RMSE: {rmse:.2f} ± {std_rmse:.2f} W/m²")
print(f"MAE: {mae:.2f} ± {std_mae:.2f} W/m²")
print(f"R²: {r2}")

Gabarito:

- RMSE: 12.89 ± 0.39 W/m²
- MAE: 9.80 ± 0.23 W/m²

In [None]:
import matplotlib.pyplot as plt
import numpy as np

def plot_predictions_vs_truth(predictions, ground_truths):
    plt.figure(figsize=(9, 9))
    plt.scatter(ground_truths, predictions, alpha=0.5, color='royalblue', edgecolors='k')
    plt.plot([min(ground_truths), max(ground_truths)],
             [min(ground_truths), max(ground_truths)],
             color='red', linestyle='--', label='Ideal (y = x)')

    plt.xlabel("Valor Real (W/m²)")
    plt.ylabel("Predição do Modelo (W/m²)")
    plt.title("Predições vs Valores Reais")
    plt.legend()
    plt.grid(True)
    plt.axis('equal')
    plt.tight_layout()
    plt.show()


In [None]:
plot_predictions_vs_truth(predictions, ground_truths)

## 8. Salvando resultados

In [None]:
import os
import pandas as pd
from datetime import datetime

def salvar_resultados(predictions, ground_truths, rmse, std_rmse, mae, std_mae, r2, caminho="resultados.csv"):
    # Organizar dados
    data = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    pred_str = ";".join([f"{p:.4f}" for p in predictions])
    gt_str = ";".join([f"{g:.4f}" for g in ground_truths])

    nova_linha = {
        "Data": data,
        "Predictions": pred_str,
        "GroundTruths": gt_str,
        "RMSE": rmse,
        "STD_RMSE": std_rmse,
        "MAE": mae,
        "STD_MAE": std_mae,
        "R2": r2
    }

    # Verifica se o arquivo existe
    if os.path.exists(caminho):
        df_existente = pd.read_csv(caminho)
        df_novo = pd.concat([df_existente, pd.DataFrame([nova_linha])], ignore_index=True)
    else:
        df_novo = pd.DataFrame([nova_linha])

    # Salva
    df_novo.to_csv(caminho, index=False)
    print(f"Resultados salvos com sucesso em '{caminho}'.")

In [None]:
predictions, ground_truths = predict(model, test_loader, device)

salvar_resultados(predictions, ground_truths, rmse, std_rmse, mae, std_mae, r2)