<a href="https://colab.research.google.com/github/MPelliccione/CV_project_Industrial_Anomaly_Detection/blob/main/notebooks/planner_and_tests.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### 1. Setup iniziale (giorni 1–3)


*  Studiare Anomalib e testare un pipeline ViT-based predefinita.
    
*  Scaricare e provare un dataset industriale (es. MVTec AD).

# TODO:  entro 09/05/2025
- creare GitHub con readme ecc
- scaricare il dataset e dividerlo in training, validation e test set (poi aggiustiamo la divisione) classi: bottle,cable, capsule
- vedere ed implementare parte di data preparation (vedere il colab degli schiavi) un paio di tecniche





### 2. Implementazione di DyT (giorni 4–7)

  * Scrivere il layer DynamicTanh come modulo PyTorch.

  *  Sostituire i layer di normalizzazione nel ViT con DyT.

  * Garantire che il modello sia compatibile con il framework esistente (es. Anomalib).

### 3. Esperimenti e confronto (giorni 8–11)

  * Allenare il modello modificato (ViT + DyT) sul dataset.

  *  Valutare:

        Tempo di training/inferenza

        Performance

   * Allenare anche la versione baseline (ViT + norm standard) o SOTA se poco tempo (?)



### 4. Report e presentazione (giorni 12–14)

   * Stesura report con: (aka github readme)

        Introduzione, metodo, esperimenti

        Grafici su runtime e performance

        Discussione critica dei risultati

   * Realizzare slide/demo per la presentazione

   prova update


# Import libraries

In [None]:
import numpy as np


# Util functions and global vars

# Data preparation

# The Network

# Training

# Evaluation

In [None]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Hyperparametri
IMG_SIZE = 32         # adatto a CIFAR-10
PATCH_SIZE = 4
NUM_CLASSES = 10
DIM = 64
DEPTH = 4
HEADS = 4
MLP_DIM = 128
BATCH_SIZE = 64
EPOCHES = 10

# Preprocessing (patches 4x4 su immagini 32x32 = 64 patches)
# transform by resizing and then to tensor
transform = transforms.Compose([
    transforms.Resize((IMG_SIZE, IMG_SIZE)),
    transforms.ToTensor(),
])

train_data = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True)

# DyT


# Patch Embedding
# define a class to divide the input image into patches and embeds
# them into a lower-dimensional space
class PatchEmbedding(nn.Module):
    def __init__(self, img_size=32, patch_size=4, dim=64):
        super().__init__()
        self.patch_dim = (patch_size ** 2) * 3
        self.num_patches = (img_size // patch_size) ** 2
        self.proj = nn.Conv2d(3, dim, kernel_size=patch_size, stride=patch_size)

    def forward(self, x):
        x = self.proj(x)          # [B, dim, H', W']
        x = x.flatten(2)          # [B, dim, N]
        x = x.transpose(1, 2)     # [B, N, dim]
        return x

# 👇 Singolo blocco Transformer
# here is applied the normalization
class TransformerBlock(nn.Module):
    def __init__(self, dim, heads, mlp_dim):
        super().__init__()
        self.norm1 = nn.LayerNorm(dim)
        self.attn = nn.MultiheadAttention(embed_dim=dim, num_heads=heads, batch_first=True)
        self.norm2 = nn.LayerNorm(dim)
        self.mlp = nn.Sequential(
            nn.Linear(dim, mlp_dim),
            nn.GELU(),
            nn.Linear(mlp_dim, dim)
        )

    def forward(self, x):
        x = x + self.attn(self.norm1(x), self.norm1(x), self.norm1(x))[0]
        x = x + self.mlp(self.norm2(x))
        return x

# ViT completo
# here the implementation of the ViT calling all the needed classes
class ViT(nn.Module):
    def __init__(self, img_size=32, patch_size=4, dim=64, depth=4, heads=4, mlp_dim=128, num_classes=10):
        super().__init__()
        self.patch_embed = PatchEmbedding(img_size, patch_size, dim)
        self.cls_token = nn.Parameter(torch.randn(1, 1, dim))
        self.pos_embed = nn.Parameter(torch.randn(1, (img_size // patch_size) ** 2 + 1, dim))

        self.transformer = nn.Sequential(*[
            TransformerBlock(dim, heads, mlp_dim) for _ in range(depth)
        ])

        self.mlp_head = nn.Sequential(
            nn.LayerNorm(dim),
            nn.Linear(dim, num_classes)
        )

    def forward(self, x):
        x = self.patch_embed(x)
        b, n, _ = x.shape

        cls_tokens = self.cls_token.expand(b, -1, -1)  # [B, 1, dim]
        x = torch.cat([cls_tokens, x], dim=1)          # [B, N+1, dim]
        x = x + self.pos_embed[:, :x.size(1), :]

        x = self.transformer(x)
        return self.mlp_head(x[:, 0])  # solo cls token

# Inizializza modello e ottimizzatore
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ViT().to(device)
opt = torch.optim.Adam(model.parameters(), lr=3e-4)
loss_fn = nn.CrossEntropyLoss()

#  Mini training loop (una epoch di esempio)
model.train()
for epoch in range(EPOCHES):
    total_loss = 0
    for images, labels in train_loader:

      images, labels = images.to(device), labels.to(device)
      preds = model(images)
      loss = loss_fn(preds, labels)

      opt.zero_grad()
      loss.backward()
      opt.step()
      total_loss += loss.item()
    print(f"Epoch {epoch+1} Validation Loss: {total_loss:.4f}")
   # break  # una sola batch per brevità


100%|██████████| 170M/170M [00:04<00:00, 39.7MB/s]


Epoch 1 Validation Loss: 1546.3523
Epoch 2 Validation Loss: 1331.2383
Epoch 3 Validation Loss: 1218.9838
Epoch 4 Validation Loss: 1154.3372
Epoch 5 Validation Loss: 1108.3489
Epoch 6 Validation Loss: 1068.5549
Epoch 7 Validation Loss: 1036.0100
Epoch 8 Validation Loss: 1005.7231
Epoch 9 Validation Loss: 974.8113
Epoch 10 Validation Loss: 952.4681
