
# FiftyOne Workshop: Custom Embeddings for Anomaly Detection (March 12th 2025)

In this notebook, we will explore how to generate **custom embeddings** for **anomaly detection** using the **Padim model** from Anomalib. 
Unlike general-purpose embeddings from models like CLIP or ResNet, anomaly detection requires **task-specific embeddings** that can distinguish between normal and abnormal samples.

![Image](https://github.com/user-attachments/assets/f62e79c2-e031-4320-8a2d-98dd03161d98)

## 🏆 Learning Objectives:
- Understand the difference between standard embeddings and anomaly-specific embeddings.
- Explore how to compute embeddings using **Padim from Anomalib**.
- Integrate these embeddings into a FiftyOne dataset.
- Leverage FiftyOne for visualization and analysis.

---



## 📌 Why Use Custom Embeddings for Anomaly Detection?

Pre-trained models like **CLIP or ResNet** generate **general-purpose embeddings** that focus on visual similarity. However, detecting **abnormalities** requires learning **subtle deviations** from normal patterns, which these models cannot capture effectively.

Instead, we use a dedicated anomaly detection model like **Padim from Anomalib**, which:
- Learns representations specific to normal and anomalous samples.
- Extracts feature maps from an encoder (e.g., ResNet).
- Compares new samples against normal feature distributions.

### 🔗 Further Reading:
- [Anomalib Documentation](https://github.com/openvinotoolkit/anomalib)
- [Understanding Memory-Based Anomaly Detection](https://arxiv.org/pdf/2011.08785)


## Load the MVTec Dataset as usual

In [1]:
import fiftyone as fo
import fiftyone.utils.huggingface as fouh # Hugging Face integration

# Load dataset
# Load the dataset
dataset_ = fouh.load_from_hub("Voxel51/mvtec-ad", persistent=True, overwrite=True)
#dataset = fo.load_dataset("Voxel51/mvtec-ad") # Use this CLI if you already have the dataset 
                                               # in your disk or if this is not the first time you run this notebook 

# Define the new dataset name
dataset_name = "mvtec-ad_4"

# Check if the dataset exists
if dataset_name in fo.list_datasets():
    print(f"Dataset '{dataset_name}' exists. Loading...")
    dataset = fo.load_dataset(dataset_name)
else:
    print(f"Dataset '{dataset_name}' does not exist. Creating a new one...")
    # Clone the dataset with a new name and make it persistent
    dataset = dataset_.clone(dataset_name, persistent=True)

Downloading config file fiftyone.yml from Voxel51/mvtec-ad
Loading dataset
Importing samples...
 100% |███████████████| 5354/5354 [68.5ms elapsed, 0s remaining, 78.2K samples/s]   
Migrating dataset 'Voxel51/mvtec-ad' to v1.3.0
Dataset 'mvtec-ad_4' does not exist. Creating a new one...



## 🚀 Extracting Custom Embeddings from Padim (Anomalib)

Instead of using a general embedding model, we will:
1. **Load a Padim anomaly detection model** using Anomalib.
2. **Run inference on a dataset** to extract anomaly embeddings.
3. **Store the embeddings in FiftyOne** for further visualization.

🔗 **Relevant Documentation:**  
- [Anomalib Models](https://anomalib.readthedocs.io/en/latest/markdown/guides/reference/models/image/index.html)  
- [Remotely-sourced Zoo Models](https://docs.voxel51.com/model_zoo/remote.html)


In [2]:
import torch
from anomalib.models.image.padim.torch_model import PadimModel

# Create a PaDiM model
model = PadimModel(
    backbone="resnet18",           # or "wide_resnet50_2", etc.
    layers=["layer1", "layer2"],   # choose the layers you want
    pre_trained=True,
    n_features=100                 # optional dimension reduction
)
model.eval()  # set to eval mode

INFO:timm.models._builder:Loading pretrained weights from Hugging Face hub (timm/resnet18.a1_in1k)
INFO:timm.models._hub:[timm/resnet18.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.
INFO:timm.models._builder:Missing keys (fc.weight, fc.bias) discovered while loading pretrained weights. This is expected if model is being adapted.


PadimModel(
  (feature_extractor): TimmFeatureExtractor(
    (feature_extractor): FeatureListNet(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): BasicBlock(
          (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (drop_block): Identity()
          (act1): ReLU(inplace=True)
          (aa): Identity()
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (act2): ReLU(inplace=True)
        )
     

In [3]:
print(model)

PadimModel(
  (feature_extractor): TimmFeatureExtractor(
    (feature_extractor): FeatureListNet(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): BasicBlock(
          (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (drop_block): Identity()
          (act1): ReLU(inplace=True)
          (aa): Identity()
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (act2): ReLU(inplace=True)
        )
     

In [4]:
from anomalib.models.image.padim.lightning_model import Padim
import torch
from PIL import Image
import torchvision.transforms as T

# 1) Create the Lightning-based PaDiM
padim = Padim(
    backbone="resnet18",
    layers=["layer1", "layer2"],
    pre_trained=True
)
padim.train()  # so forward(...) returns embeddings

# 2) Load image
transform = T.Compose([T.Resize(224), T.ToTensor()])
pil_image = Image.open("/Users/paularamos/fiftyone/huggingface/hub/Voxel51/mvtec-ad/data/data_50/018-57.png").convert("RGB")
tensor = transform(pil_image).unsqueeze(0)  # (1, C, H, W)

# 3) Pass it through the model in train mode
with torch.no_grad():
    embeddings = padim.model(tensor)  # shape (1, embed_dim, H', W')
print(embeddings.shape)

INFO:anomalib.models.components.base.anomalib_module:Initializing Padim model.
INFO:timm.models._builder:Loading pretrained weights from Hugging Face hub (timm/resnet18.a1_in1k)
INFO:timm.models._hub:[timm/resnet18.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.
INFO:timm.models._builder:Missing keys (fc.weight, fc.bias) discovered while loading pretrained weights. This is expected if model is being adapted.


torch.Size([1, 100, 56, 56])



## 🔍 Integrating Anomaly Embeddings into FiftyOne

Once we obtain embeddings from Padim, we will add them to our FiftyOne dataset.
This allows us to:
- Perform **similarity searches** based on anomaly scores.
- Compare normal vs. abnormal sample distributions.
- Leverage **FiftyOne App** to inspect anomalies.

```python
import fiftyone as fo

dataset = fo.Dataset("object_from_mvtec_ad")

# Add embeddings to each sample
for sample in dataset:
    ...
    # Convert to CPU NumPy for storage
    embedding_1d = patch_embedding.squeeze(0).cpu().numpy()  # shape (D,)

    # Store as a list in a new field
    sample["embedding"] = embedding_1d.tolist()
    sample.save()
    ...
```
🔗 **Relevant Documentation:** [Adding Custom Fields to FiftyOne Datasets](https://docs.voxel51.com/user_guide/using_datasets.html)


## Selecting object from MVTec AD Dataset

In [7]:
from fiftyone import ViewField as F # helper for defining views

## get the test split of the dataset
test_split = dataset.match(F("category.label") == 'bottle')

# Clone the dataset into a new one called "mvtec_bottle"
mvtec_bottle = test_split.clone("mvtec-bottle", persistent=True)

print(mvtec_bottle)


Name:        mvtec-bottle
Media type:  image
Num samples: 292
Persistent:  True
Tags:        []
Sample fields:
    id:               fiftyone.core.fields.ObjectIdField
    filepath:         fiftyone.core.fields.StringField
    tags:             fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:         fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    created_at:       fiftyone.core.fields.DateTimeField
    last_modified_at: fiftyone.core.fields.DateTimeField
    category:         fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
    defect:           fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
    split:            fiftyone.core.fields.StringField
    defect_mask:      fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Segmentation)


In [8]:
print(dataset)
print(mvtec_bottle)

Name:        mvtec-ad_4
Media type:  image
Num samples: 5354
Persistent:  True
Tags:        []
Sample fields:
    id:               fiftyone.core.fields.ObjectIdField
    filepath:         fiftyone.core.fields.StringField
    tags:             fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:         fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    created_at:       fiftyone.core.fields.DateTimeField
    last_modified_at: fiftyone.core.fields.DateTimeField
    category:         fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
    defect:           fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
    split:            fiftyone.core.fields.StringField
    defect_mask:      fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Segmentation)
Name:        mvtec-bottle
Media type:  image
Num samples: 292
Persistent:  True
Tags:        []
Sample fields:
    

## Calculating Embeddings using Inference with Padim Model

In [9]:
import numpy as np
from PIL import Image

for sample in mvtec_bottle:
    # Load the image via PIL
    pil_image = Image.open(sample.filepath).convert("RGB")

    # Apply your transform
    input_tensor = transform(pil_image).unsqueeze(0)  # shape (1, C, H, W)

    # Compute patch embeddings in train mode
    with torch.no_grad():
        patch_embedding = padim.model(input_tensor)  # shape (1, D, H', W')

    # Optional: flatten or pool across spatial dims
    # Here we use mean pooling to get a (1, D) vector
    patch_embedding = patch_embedding.mean(dim=[2, 3])  # shape (1, D)

    # Convert to CPU NumPy for storage
    embedding_1d = patch_embedding.squeeze(0).cpu().numpy()  # shape (D,)

    # Store as a list in a new field
    sample["embedding"] = embedding_1d.tolist()
    sample.save()


## Visualizing Embeddings in FiftyOne

In [10]:
from fiftyone.brain import compute_visualization

# This will perform PCA on the "embedding" field
compute_visualization(
    mvtec_bottle,
    embeddings="padin_emb",
    brain_key="embedding_pca",
    method="pca",
)

Computing embeddings...


INFO:fiftyone.brain.internal.core.utils:Computing embeddings...


 100% |█████████████████| 292/292 [46.6s elapsed, 0s remaining, 6.2 samples/s]      


INFO:eta.core.utils: 100% |█████████████████| 292/292 [46.6s elapsed, 0s remaining, 6.2 samples/s]      


Generating visualization...


INFO:fiftyone.brain.visualization:Generating visualization...


<fiftyone.brain.visualization.VisualizationResults at 0x34e26fd90>

In [11]:
mvtec_bottle.reload()
print(mvtec_bottle)
print(mvtec_bottle.last())

Name:        mvtec-bottle
Media type:  image
Num samples: 292
Persistent:  True
Tags:        []
Sample fields:
    id:               fiftyone.core.fields.ObjectIdField
    filepath:         fiftyone.core.fields.StringField
    tags:             fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:         fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    created_at:       fiftyone.core.fields.DateTimeField
    last_modified_at: fiftyone.core.fields.DateTimeField
    category:         fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
    defect:           fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
    split:            fiftyone.core.fields.StringField
    defect_mask:      fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Segmentation)
    embedding:        fiftyone.core.fields.ListField(fiftyone.core.fields.FloatField)
    padin_emb:        fiftyo

In [12]:
session = fo.launch_app(mvtec_bottle, port=5154, auto=False)

  for conn in process.connections(kind="tcp"):
  for conn in process.connections(kind="tcp"):
  for conn in process.connections(kind="tcp"):
  for conn in process.connections(kind="tcp"):
  for conn in process.connections(kind="tcp"):
  for conn in process.connections(kind="tcp"):


Session launched. Run `session.show()` to open the App in a cell output.


  for conn in process.connections(kind="tcp"):
INFO:fiftyone.core.session.session:Session launched. Run `session.show()` to open the App in a cell output.


![Image](https://github.com/user-attachments/assets/09338015-ed1d-49ae-820a-9065b8965bae)

### Next Steps:
Try using different anomaly detection models from Anomalib and compare their embeddings with FiftyOne's visualization tools! 🚀

🔗 **More Resources:**  
- [FiftyOne Docs](https://voxel51.com/docs/fiftyone/)  
- [Anomalib GitHub Repository](https://github.com/openvinotoolkit/anomalib)
