<a href="https://colab.research.google.com/github/drdosan/cap1-despertar-da-rede-neural/blob/main/document/Entrega2_CNN_do_zero.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üìò Entrega 2 ‚Äî CNN do zero (Classifica√ß√£o: *blusas* vs *sapatos*)

Este notebook implementa uma **CNN simples do zero** (PyTorch) para classificar imagens em duas classes: **blusas** e **sapatos**.

Ele foi pensado para complementar a compara√ß√£o com:
- **YOLO Otimizado** (Entrega 1)
- **YOLO Tradicional** (defaults: `--img 640`, sem `--hyp` custom)

üîé **Objetivo pedag√≥gico**: classificador **n√£o** faz detec√ß√£o (sem *bounding boxes*). O foco √© **acur√°cia por classe** e **tempo de infer√™ncia**.


## 1) Imports e Configura√ß√µes
Nesta se√ß√£o definimos caminhos, *seeds* e dispositivo (CPU/GPU). Ajuste `BASE` se necess√°rio.


In [None]:
import os, time, numpy as np, torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from sklearn.metrics import classification_report, confusion_matrix
from google.colab import drive

drive.mount('/content/drive', force_remount=True)

BASE = "/content/drive/MyDrive/roupas"
TRAIN_DIR = f"{BASE}/images/train"
VAL_DIR   = f"{BASE}/images/val"
TEST_DIR  = f"{BASE}/images/test"

torch.manual_seed(42)
np.random.seed(42)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

## 2) Data Loaders
Usamos **ImageFolder** com pastas por classe e normaliza√ß√£o padr√£o do ImageNet. No treino, aplicamos *augmentations* leves.

Estrutura esperada:
```
/images/train/blusas, /images/train/sapatos
/images/val/blusas,   /images/val/sapatos
/images/test/blusas,  /images/test/sapatos
```


In [None]:
mean, std = (0.485, 0.456, 0.406), (0.229, 0.224, 0.225)
img_size = 224

tf_train = transforms.Compose([
    transforms.Resize((img_size, img_size)),
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(0.1, 0.1, 0.1, 0.05),
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])
tf_eval = transforms.Compose([
    transforms.Resize((img_size, img_size)),
    transforms.ToTensor(),
    transforms.Normalize(mean, std),
])

train_ds = datasets.ImageFolder(TRAIN_DIR, transform=tf_train)
val_ds   = datasets.ImageFolder(VAL_DIR,   transform=tf_eval)
test_ds  = datasets.ImageFolder(TEST_DIR,  transform=tf_eval)

train_loader = DataLoader(train_ds, batch_size=16, shuffle=True,  num_workers=2, pin_memory=True)
val_loader   = DataLoader(val_ds,   batch_size=16, shuffle=False, num_workers=2, pin_memory=True)
test_loader  = DataLoader(test_ds,  batch_size=16, shuffle=False, num_workers=2, pin_memory=True)

class_names = train_ds.classes
len(train_ds), len(val_ds), len(test_ds), class_names

## 3) Modelo ‚Äî TinyCNN (simples e did√°tico)
Arquitetura pequena para **convergir r√°pido** nesse dataset reduzido: 4 conv blocks + *global average pooling* + MLP.


In [None]:
class TinyCNN(nn.Module):
    def __init__(self, num_classes=2):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 16, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
            nn.Conv2d(16, 32, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
            nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(), nn.MaxPool2d(2),
            nn.Conv2d(64, 128, 3, padding=1), nn.ReLU(),
            nn.AdaptiveAvgPool2d((1,1))
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(128, 64), nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(64, num_classes)
        )
    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x

model = TinyCNN(num_classes=2).to(device)
sum(p.numel() for p in model.parameters())/1e6, device

## 4) Treino + Valida√ß√£o
Treinamos por **20 √©pocas** (equil√≠brio entre tempo e risco de *overfitting* num dataset pequeno). Usamos `ReduceLROnPlateau` para reduzir a taxa de aprendizado se o *val_loss* estagnar. Salvamos o **melhor estado** com base na *val_acc*.


In [None]:
import torch.optim as optim
from torch.optim.lr_scheduler import ReduceLROnPlateau

epochs = 20
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)
scheduler = ReduceLROnPlateau(optimizer, factor=0.5, patience=2)

def run_epoch(loader, train=True):
    model.train() if train else model.eval()
    total, correct, loss_sum = 0, 0, 0.0
    with torch.set_grad_enabled(train):
        for imgs, labels in loader:
            imgs, labels = imgs.to(device), labels.to(device)
            if train:
                optimizer.zero_grad(set_to_none=True)
            logits = model(imgs)
            loss = criterion(logits, labels)
            if train:
                loss.backward()
                optimizer.step()
            loss_sum += loss.item()*imgs.size(0)
            preds = logits.argmax(1)
            correct += (preds == labels).sum().item()
            total += imgs.size(0)
    return loss_sum/total, correct/total

history = {"tr_loss": [], "tr_acc": [], "va_loss": [], "va_acc": []}
best_val_acc, best_state = 0.0, None

for ep in range(1, epochs+1):
    tr_loss, tr_acc = run_epoch(train_loader, train=True)
    va_loss, va_acc = run_epoch(val_loader,   train=False)
    scheduler.step(va_loss)
    history["tr_loss"].append(tr_loss); history["tr_acc"].append(tr_acc)
    history["va_loss"].append(va_loss); history["va_acc"].append(va_acc)
    if va_acc > best_val_acc:
        best_val_acc, best_state = va_acc, {k: v.detach().cpu().clone() for k, v in model.state_dict().items()}
    print(f"Epoch {ep:02d} | train_loss={tr_loss:.4f} acc={tr_acc:.3f} | val_loss={va_loss:.4f} acc={va_acc:.3f}")

if best_state:
    model.load_state_dict(best_state)
save_path = f"{BASE}/runs/cnn_from_scratch.pt"
torch.save(model.state_dict(), save_path)
print("Modelo salvo em:", save_path)

## 5) (Opcional) Curvas de Treino
Gr√°ficos de *loss* e *accuracy* para visualizar converg√™ncia e poss√≠vel *overfitting*.


In [None]:
import matplotlib.pyplot as plt

plt.figure(); plt.plot(history['tr_loss']); plt.plot(history['va_loss']); plt.title('Loss'); plt.legend(['train','val']); plt.xlabel('epoch'); plt.ylabel('loss'); plt.show()
plt.figure(); plt.plot(history['tr_acc']);  plt.plot(history['va_acc']);  plt.title('Accuracy'); plt.legend(['train','val']); plt.xlabel('epoch'); plt.ylabel('acc');  plt.show()

## 6) Avalia√ß√£o no Teste
Geramos **tempo m√©dio de infer√™ncia (s/img)**, **classification_report** (precision/recall/F1 por classe) e **matriz de confus√£o**.


In [None]:
model.eval()
all_preds, all_labels = [], []
t0 = time.perf_counter()
with torch.no_grad():
    for imgs, labels in test_loader:
        imgs = imgs.to(device)
        logits = model(imgs)
        preds = logits.argmax(1).cpu().numpy()
        all_preds.append(preds)
        all_labels.append(labels.numpy())
if torch.cuda.is_available():
    torch.cuda.synchronize()
t1 = time.perf_counter()

import numpy as np
all_preds = np.concatenate(all_preds)
all_labels = np.concatenate(all_labels)

print(f"‚è±Ô∏è Tempo m√©dio de infer√™ncia (aprox.): {(t1 - t0)/len(test_ds):.4f} s/img")
print("\n== Classification Report ==")
print(classification_report(all_labels, all_preds, target_names=class_names, digits=4))

cm = confusion_matrix(all_labels, all_preds)
print("\n== Matriz de Confus√£o ==\n", cm)

## 7) Carregar e Usar o Modelo Salvo (opcional)
Exemplo de como restaurar o estado salvo e fazer infer√™ncia posteriormente.


In [None]:
loaded = TinyCNN(num_classes=2).to(device)
loaded.load_state_dict(torch.load(f"{BASE}/runs/cnn_from_scratch.pt", map_location=device))
loaded.eval()
loaded

## 8) Como Reportar no Relat√≥rio da Entrega 2
- **Acur√°cia global** e **F1 por classe** (tabela do *classification_report*).
- **Matriz de confus√£o** (erros por classe).
- **Tempo m√©dio de infer√™ncia** (s/img) desta CNN.
- **Comparar com YOLO** (mAP/Precis√£o/Recall e tempo de infer√™ncia do seu `val.py`/`detect.py`).

üß≠ **Mensagem final**: a CNN √© simples, r√°pida e funciona bem quando a imagem tem **um objeto dominante** e n√£o √© necess√°rio localizar/contar. J√° o YOLO √© prefer√≠vel quando voc√™ precisa **detectar e localizar** objetos com *bounding boxes* (invent√°rio, seguran√ßa, etc.).
