# ðŸ”Ž Recognizability and ðŸŽ† Diversity

The ideas of recognizability and diversity of a data set introducted by Boutin et al. (2022) are used to evaluate the ability of a generative model to create useful data. The recognizability metric is easiest to understand as simply how easy (or difficult) it is for the data to be classified. Therefore, in the case of the drone data it is just a measure of how easily the drone objects can be identified within the images. Diversity is best thought of as the variance of the feature space of the data set.

`Boutin, V., Singhal, L., Thomas, X., & Serre, T. (2022). Diversity vs. Recognizability: Human-like generalization in one-shot generative models. Advances in Neural Information Processing Systems, 35, 20933-20946.`

In [None]:
from ultralytics import YOLO
import pandas as pd
import os

from src import utils

## 1. File Prep

In [None]:
# Create the staging directory if it doesn't exist
os.makedirs(os.path.join("data", "staging"), exist_ok=True)
# Create the test folder in the staging directory if it doesn't exist
os.makedirs(os.path.join("data", "staging", "test"), exist_ok=True)
# Create train and val folders (they won't be used but need to exist for the YOLO model functions)
os.makedirs(os.path.join("data", "staging", "train"), exist_ok=True)
os.makedirs(os.path.join("data", "staging", "val"), exist_ok=True)

In [None]:
# Create the YOLO yaml file
yaml_path = os.path.join("data", "staging", "evaluation.yaml")

# YAML content
yaml_content = """
path: data/staging  # dataset root dir (leave empty for HUB)
train: train  # train images (relative to 'path')
val:   val    # val images (relative to 'path')
test:  test   # test images (relative to 'path')

names:
  0: drone
"""

# Write the YAML content to the file
with open(yaml_path, 'w') as yaml_file:
    yaml_file.write(yaml_content)

| **Model** | **Baseline Data** | **3D Model Data** | **Clipart Data** | **Gen AI Data** |
| --- | --- | --- | --- | --- |
| 0 | 100 | 0   | 0   | 0   |
| A | 0   | 0   | 100 | 0   |
| B | 0   | 100 | 0   | 0   |
| C | 0   | 0   | 50  | 50  |
| D | 0   | 50  | 0   | 50  |
| E | 0   | 50  | 50  | 0   |
| F | 0   | 0   | 0   | 100 |
| G | 0   | 33  | 33  | 33  |

In [None]:
# Model suffixes
model_0 = "baseline"

model_a = "0-0-100-0"
model_b = "0-100-0-0"
model_c = "0-0-50-50"
model_d = "0-50-0-50"
model_e = "0-50-50-0"
model_f = "0-0-0-100"
model_g = "0-33-33-33"

# Combine all model suffixes into a list
model_suffixes = [model_0, model_a, model_b, model_c, model_d, model_e, model_f, model_g]

In [None]:
# Create directories if they don't exist
for suffix in model_suffixes:
    dir_path = os.path.join("model", f"model-{suffix}")
    os.makedirs(dir_path, exist_ok=True)

In [None]:
# Create a dictionary of important file paths for each model
model_paths = {
    suffix: {
        "weights": os.path.join("model", f"model-{suffix}", "weights", "best.pt"),
        "train_df": os.path.join("model", f"model-{suffix}", f"train_data_{suffix}.csv"),
        "val_df": os.path.join("model", f"model-{suffix}", f"val_data_{suffix}.csv"),
        "results_df": os.path.join("model", f"model-{suffix}", "results", f"results_{suffix}.csv"),
        }
    for suffix in model_suffixes
}

In [None]:
# Weights file for the baseline model for comparison purposes
baseline_model = model_paths[model_0]["weights"]

## 2. Data Collection

In [None]:
# Loop through the different models
for model in model_suffixes:
    # Move data to the staging directory
    train_df = pd.read_csv(model_paths[model]["train_df"])
    val_df = pd.read_csv(model_paths[model]["val_df"])
    utils.files.copy_to_staging(train_df, stage="test")
    utils.files.copy_to_staging(val_df, stage="test")
    
    # Do stuff
    # Load the model
    model = YOLO(baseline_model)
    
    # Cleanup staging directory
    utils.files.cleanup_staging()