<div align="center">

#### Lab 3

# National Tsing Hua University

#### Spring 2025

#### 11320IEEM 513600

#### Deep Learning and Industrial Applications
    
## Lab 3: Anomaly Detection in Industrial Applications

</div>

### Introduction

In today's industrial landscape, the ability to detect anomalies in manufacturing processes and products is critical for maintaining quality, efficiency, and safety. This lab focuses on leveraging deep learning techniques for anomaly detection in various industrial applications, using the MVTEC Anomaly Detection Dataset. By employing ImageNet-pretrained models available in torchvision, students will gain hands-on experience in classfying defects and irregularities across different types of industrial products.

Throughout this lab, you'll be involved in the following key activities:
- Explore and process the MVTec Anomaly Detection Dataset.
- Apply ImageNet-pretrained models from [Torchvision](https://pytorch.org/vision/stable/models.html) to detect anomalies in industrial products.
- Evaluate the performance of the models to understand their effectiveness in real-world industrial applications.

### Objectives

- Understand the principles of anomaly detection in the context of industrial applications.
- Learn how to implement and utilize ImageNet-pretrained models for detecting anomalies.
- Analyze and interpret the results of the anomaly detection models to assess their practicality in industrial settings.

### Dataset

The MVTec AD Dataset is a comprehensive collection of high-resolution images across different categories of industrial products, such as bottles, cables, and metal nuts, each with various types of defects. This dataset is pivotal for developing and benchmarking anomaly detection algorithms. You can download our lab's dataset [here](https://drive.google.com/file/d/19600hUOpx0hl78TdpdH0oyy-gGTk_F_o/view?usp=share_link). You can drop downloaded data and drop to colab, or you can put into yor google drive.

### References
- [MVTec AD Dataset](https://www.kaggle.com/datasets/ipythonx/mvtec-ad/data) for the dataset used in this lab.
- [Torchvision Models](https://pytorch.org/vision/stable/models.html) for accessing ImageNet-pretrained models to be used in anomaly detection tasks.
- [State-of-the-Art Anomaly Detection on MVTec AD](https://paperswithcode.com/sota/anomaly-detection-on-mvtec-ad) for insights into the latest benchmarks and methodologies in anomaly detection applied to the MVTec AD dataset.
- [CVPR 2019: MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection] for the original paper of MVTec AD dataset.

In [None]:
import glob
import matplotlib.pyplot as plt
import random
from tqdm.auto import tqdm
import cv2
import numpy as np
import os

In [None]:
file_paths = glob.glob('bottle/train/good/*.png')
file_paths = sorted([
    path for path in file_paths 
    if os.path.basename(path) in [f'{i:03}.png' for i in range(10)]
])

In [10]:
all_data = []

for img in tqdm(file_paths):
    img = cv2.imread(img)
    img = img[..., ::-1]
    all_data.append(img)

all_data = np.stack(all_data)
print(all_data.shape)

100%|██████████| 10/10 [00:00<00:00, 79.49it/s]

(10, 900, 900, 3)





In [12]:
root_dir = "bottle"
train_dir = os.path.join(root_dir, "train")
test_dir  = os.path.join(root_dir, "test")

def count_images_in_subfolders(folder):
    # 回傳一個 dict，key 是子資料夾名稱，value 是該子資料夾中的影像數量
    result = {}
    # 列出 folder 下所有子資料夾 (例如 good, broken_large 等)
    subfolders = [d for d in os.listdir(folder) if os.path.isdir(os.path.join(folder, d))]
    for subf in subfolders:
        subf_path = os.path.join(folder, subf)
        # 找該子資料夾底下的所有 png (或你可改成 *.jpg, *.bmp 等)
        images = glob.glob(os.path.join(subf_path, "*.png"))
        result[subf] = len(images)
    return result

# 統計 train 資料夾
train_stats = count_images_in_subfolders(train_dir)
print("Train folder stats:", train_stats)

# 統計 test 資料夾
test_stats = count_images_in_subfolders(test_dir)
print("Test folder stats:", test_stats)

Train folder stats: {'good': 209}
Test folder stats: {'broken_large': 20, 'broken_small': 22, 'contamination': 21, 'good': 20}


## Data Loading and Preprocessing

In [None]:
import torch
from torch import nn, optim
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms, models

class MVTecDataset(Dataset):
    """
    自訂 Dataset 以處理 MVTec AD 的 bottle 資料集。
    訓練階段只載入正常(good)影像；測試階段則載入所有影像，
    並將 good 標記為 0，缺陷影像標記為 1。
    """
    def __init__(self, root_dir, phase='train', transform=None):
        """
        Args:
            root_dir (str): 資料集根目錄，例如 'bottle'
            phase (str): 'train' 或 'test'
            transform: 影像預處理（torchvision transforms）
        """
        self.root_dir = root_dir
        self.phase = phase
        self.transform = transform
        
        self.image_paths = []
        self.labels = []  # 正常為 0，異常(任何 defect)為 1
        
        if phase == 'train':
            # 只讀取 train/good 資料夾內的影像
            good_dir = os.path.join(root_dir, 'train', 'good')
            self.image_paths = glob.glob(os.path.join(good_dir, '*.png'))
            self.labels = [0] * len(self.image_paths)
        elif phase == 'test':
            # 測試階段，讀取 test 資料夾下所有子資料夾的影像
            test_dir = os.path.join(root_dir, 'test')
            # 列出所有子資料夾，例如 'good', 'bad_marking', 'broken_large', 等
            subfolders = [d for d in os.listdir(test_dir) if os.path.isdir(os.path.join(test_dir, d))]
            for folder in subfolders:
                folder_path = os.path.join(test_dir, folder)
                # 使用 glob 取得該資料夾下所有 png 檔
                images = glob.glob(os.path.join(folder_path, '*.png'))
                self.image_paths.extend(images)
                # 若資料夾為 'good' 則標記為 0，其它標記為 1 (異常)
                if folder == 'good':
                    self.labels.extend([0] * len(images))
                else:
                    self.labels.extend([1] * len(images))
        else:
            raise ValueError("phase 必須是 'train' 或 'test'")

    def __len__(self):
        return len(self.image_paths)
    
    def __getitem__(self, idx):
        # 讀取影像 (OpenCV 預設 BGR)
        image_path = self.image_paths[idx]
        image = cv2.imread(image_path)
        if image is None:
            raise RuntimeError(f"無法讀取影像: {image_path}")
        # 轉換成 RGB
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        
        # 若有 transform，使用 transform 處理
        if self.transform:
            image = self.transform(image)
        else:
            # 否則轉換為 tensor 並正規化到 [0,1]
            image = torch.from_numpy(image).permute(2, 0, 1).float() / 255.0
        
        label = self.labels[idx]
        return image, label

## Attempt 1

In [59]:
# 定義帶有 data augmentation 的影像預處理
train_transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((224, 224)), 
    transforms.ToTensor(),
])

test_transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])

train_dataset = MVTecDataset(root_dir, phase='train', transform=train_transform)
test_dataset  = MVTecDataset(root_dir, phase='test', transform=test_transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

In [60]:
# 載入預訓練的 ResNet18 並移除最後一層（fc）
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
resnet = models.resnet18(pretrained=True)
resnet.fc = nn.Identity()  # 將 fc 層設為 identity，使輸出為特徵向量
resnet = resnet.to(device)
resnet.eval()

# 以訓練集 good 影像萃取特徵
train_features = []
with torch.no_grad():
    for images, _ in train_loader:
        images = images.to(device)
        feats = resnet(images)  # 輸出 shape: (batch, feature_dim)
        train_features.append(feats.cpu())
train_features = torch.cat(train_features, dim=0)  # shape: (N, feature_dim)

# 計算訓練集特徵平均向量，並以歐氏距離作為 anomaly score
mean_feat = train_features.mean(dim=0)
train_dists = torch.norm(train_features - mean_feat, dim=1)
# 設定閾值為訓練特徵距離的平均值 + 3 倍標準差
threshold = train_dists.mean().item() + 3 * train_dists.std().item()
print(f"設定異常判斷閾值: {threshold:.6f}")



設定異常判斷閾值: 4.221568


In [61]:
# 測試階段：根據特徵距離進行 anomaly 判斷
all_preds = []
all_labels = []
with torch.no_grad():
    for images, labels in test_loader:
        images = images.to(device)
        feats = resnet(images)
        dists = torch.norm(feats.cpu() - mean_feat, dim=1)
        preds = (dists > threshold).int()  # 若距離超過閾值，則標記 anomaly (1)
        all_preds.extend(preds.numpy())
        all_labels.extend(labels)
all_preds = np.array(all_preds)
all_labels = np.array(all_labels)
accuracy = (all_preds == all_labels).mean()
print(f"Test Accuracy: {accuracy:.4f}")

Test Accuracy: 0.9157


## Attempt 2

In [82]:
# 定義帶有 data augmentation 的影像預處理
train_transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])

test_transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])

In [83]:
train_dataset = MVTecDataset(root_dir, phase='train', transform=train_transform)
test_dataset  = MVTecDataset(root_dir, phase='test', transform=test_transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

In [84]:
# 載入預訓練的 ResNet18 並移除最後一層
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = models.resnet18(pretrained=True)
model.fc = nn.Identity()  # 使輸出為特徵向量
model = model.to(device)

# 計算初始中心向量 c (以所有訓練集 good 影像的特徵平均)
model.eval()
train_features = []
with torch.no_grad():
    for images, _ in train_loader:
        images = images.to(device)
        feats = model(images)  # shape: (batch, feature_dim)
        train_features.append(feats.cpu())
train_features = torch.cat(train_features, dim=0)
center = train_features.mean(dim=0).to(device)
print("初始中心向量計算完成.")

# 進行 fine tune (Deep SVDD 目標: 最小化每張影像特徵與 center 的平方距離)
model.train()
optimizer = optim.Adam(model.parameters(), lr=5e-5)
num_epochs = 10

for epoch in range(num_epochs):
    running_loss = 0.0
    for images, _ in train_loader:
        images = images.to(device)
        optimizer.zero_grad()
        feats = model(images)
        # Deep SVDD loss: 平均每個樣本的 squared distance to center
        loss = ((feats - center) ** 2).sum(dim=1).mean()
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * images.size(0)
    epoch_loss = running_loss / len(train_dataset)
    print(f"Epoch {epoch+1}/{num_epochs}, Loss: {epoch_loss:.6f}")

# 使用 fine tune 後的模型計算訓練集特徵距離，並設定 anomaly 判斷閾值
model.eval()
finetune_train_features = []
with torch.no_grad():
    for images, _ in train_loader:
        images = images.to(device)
        feats = model(images)
        finetune_train_features.append(feats.cpu())
finetune_train_features = torch.cat(finetune_train_features, dim=0)
train_dists = torch.norm(finetune_train_features - center.cpu(), dim=1)
threshold = train_dists.mean().item() + 3 * train_dists.std().item()
print(f"設定異常判斷閾值: {threshold:.6f}")

初始中心向量計算完成.
Epoch 1/10, Loss: 482.657778
Epoch 2/10, Loss: 395.066864
Epoch 3/10, Loss: 349.092155
Epoch 4/10, Loss: 318.232231
Epoch 5/10, Loss: 295.867916
Epoch 6/10, Loss: 279.359445
Epoch 7/10, Loss: 266.921362
Epoch 8/10, Loss: 257.392767
Epoch 9/10, Loss: 249.888794
Epoch 10/10, Loss: 243.871802
設定異常判斷閾值: 15.765369


In [85]:
# 測試階段：根據 fine tune 後的特徵距離進行 anomaly 判斷
all_preds = []
all_labels = []
with torch.no_grad():
    for images, labels in test_loader:
        images = images.to(device)
        feats = model(images)
        dists = torch.norm(feats.cpu() - center.cpu(), dim=1)
        preds = (dists > threshold).int()  # 超過閾值則視為 anomaly (1)
        all_preds.extend(preds.numpy())
        all_labels.extend(labels.numpy())
all_preds = np.array(all_preds)
all_labels = np.array(all_labels)
accuracy = (all_preds == all_labels).mean()
print(f"Test Accuracy: {accuracy:.4f}")

Test Accuracy: 0.9759


## Attempt 3

In [None]:
# 定義帶有 data augmentation 的影像預處理
train_transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((224, 224)),   
    transforms.ToTensor(),
])

test_transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])

In [35]:
train_dataset = MVTecDataset(root_dir, phase='train', transform=train_transform)
test_dataset  = MVTecDataset(root_dir, phase='test', transform=test_transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

In [36]:
# 載入預訓練的 VGG16 並移除 classifier 部分，使輸出為特徵向量
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
vgg16 = models.vgg16(pretrained=True)
vgg16.classifier = nn.Identity()  # 使模型直接輸出展平後的特徵 (形狀通常為 [N, 25088])
vgg16 = vgg16.to(device)
vgg16.eval()

# 以訓練集 good 影像萃取特徵
train_features = []
with torch.no_grad():
    for images, _ in train_loader:
        images = images.to(device)
        feats = vgg16(images)  # 輸出 shape: (batch, feature_dim)
        train_features.append(feats.cpu())
train_features = torch.cat(train_features, dim=0)  # shape: (N, feature_dim)

# 計算訓練集特徵平均向量，並以歐氏距離作為 anomaly score
mean_feat = train_features.mean(dim=0)
train_dists = torch.norm(train_features - mean_feat, dim=1)
# 設定閾值為訓練特徵距離的平均值 + 3 倍標準差
threshold = train_dists.mean().item() + 3 * train_dists.std().item()
print(f"設定異常判斷閾值: {threshold:.6f}")

Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to C:\Users\Tiffany/.cache\torch\hub\checkpoints\vgg16-397923af.pth
100%|██████████| 528M/528M [07:04<00:00, 1.30MB/s] 


設定異常判斷閾值: 57.693165


In [37]:
# 測試階段：根據特徵距離進行 anomaly 判斷
all_preds = []
all_labels = []
with torch.no_grad():
    for images, labels in test_loader:
        images = images.to(device)
        feats = vgg16(images)
        dists = torch.norm(feats.cpu() - mean_feat, dim=1)
        preds = (dists > threshold).int()  # 超過閾值則視為 anomaly (1)
        all_preds.extend(preds.numpy())
        all_labels.extend(labels.numpy())
all_preds = np.array(all_preds)
all_labels = np.array(all_labels)
accuracy = (all_preds == all_labels).mean()
print(f"Test Accuracy: {accuracy:.4f}")

Test Accuracy: 0.9277


## Attempt 4

In [73]:
# 定義帶有 data augmentation 的影像預處理
train_transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((224, 224)),  
    transforms.ToTensor(),
])

test_transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])

In [74]:
train_dataset = MVTecDataset(root_dir, phase='train', transform=train_transform)
test_dataset  = MVTecDataset(root_dir, phase='test', transform=test_transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

In [75]:
# 載入預訓練的 VGG16 並移除 classifier 部分，使模型輸出展平後的特徵向量
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
vgg16 = models.vgg16(pretrained=True)
vgg16.classifier = nn.Identity()  # 輸出特徵向量 (形狀通常為 [N, 25088])

for param in vgg16.features[:15].parameters():
    param.requires_grad = False

vgg16 = vgg16.to(device)

# 計算初始中心向量 c (利用所有訓練集 good 影像)
vgg16.eval()
train_features = []
with torch.no_grad():
    for images, _ in train_loader:
        images = images.to(device)
        feats = vgg16(images)
        train_features.append(feats.cpu())
train_features = torch.cat(train_features, dim=0)
center = train_features.mean(dim=0).to(device)
print("初始中心向量計算完成.")

# Fine tune 階段：使用 Deep SVDD Loss (最小化每張影像特徵與 center 的平方距離)
vgg16.train()
optimizer = optim.Adam(filter(lambda p: p.requires_grad, vgg16.parameters()), lr=1e-5)
num_epochs = 10

for epoch in range(num_epochs):
    running_loss = 0.0
    for images, _ in train_loader:
        images = images.to(device)
        optimizer.zero_grad()
        feats = vgg16(images)
        loss = ((feats - center) ** 2).sum(dim=1).mean()  # squared Euclidean distance
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * images.size(0)
    epoch_loss = running_loss / len(train_dataset)
    print(f"Epoch {epoch+1}/{num_epochs}, Loss: {epoch_loss:.6f}")

# 計算 fine tune 後的訓練集特徵距離，並設定 anomaly 判斷閾值
vgg16.eval()
finetune_train_features = []
with torch.no_grad():
    for images, _ in train_loader:
        images = images.to(device)
        feats = vgg16(images)
        finetune_train_features.append(feats.cpu())
finetune_train_features = torch.cat(finetune_train_features, dim=0)
train_dists = torch.norm(finetune_train_features - center.cpu(), dim=1)
threshold = train_dists.mean().item() + 3 * train_dists.std().item()
print(f"設定異常判斷閾值: {threshold:.6f}")



初始中心向量計算完成.
Epoch 1/10, Loss: 440.068024
Epoch 2/10, Loss: 358.512120
Epoch 3/10, Loss: 305.928372
Epoch 4/10, Loss: 267.261707
Epoch 5/10, Loss: 237.807198
Epoch 6/10, Loss: 214.617036
Epoch 7/10, Loss: 195.469256
Epoch 8/10, Loss: 179.644788
Epoch 9/10, Loss: 166.266373
Epoch 10/10, Loss: 154.868164
設定異常判斷閾值: 16.041825


In [76]:
# 測試階段：根據 fine tune 後的特徵距離進行 anomaly 判斷
all_preds = []
all_labels = []
with torch.no_grad():
    for images, labels in test_loader:
        images = images.to(device)
        feats = vgg16(images)
        dists = torch.norm(feats.cpu() - center.cpu(), dim=1)
        preds = (dists > threshold).int()  # 距離超過閾值則判定 anomaly (1)
        all_preds.extend(preds.numpy())
        all_labels.extend(labels.numpy())
all_preds = np.array(all_preds)
all_labels = np.array(all_labels)
accuracy = (all_preds == all_labels).mean()
print(f"Test Accuracy: {accuracy:.4f}")

Test Accuracy: 0.9639
