# Overview

This assignment builds a complete evaluation pipeline for a pretrained text-to-image diffusion model. First, images of flowers are generated using classifier-free guided DDPM sampling conditioned on CLIP text embeddings, while intermediate U-Net bottleneck features are extracted via forward hooks.

The generated samples are then evaluated quantitatively using CLIP Score (semantic alignment) and Fr√©chet Inception Distance (distribution realism).

To analyze internal representations, the extracted U-Net embeddings are organized in a FiftyOne dataset, enabling computation of uniqueness and representativeness scores and interactive visualization.  

**Note: Generation Cardinalities in the Sampling Pipeline**

Each of the three text prompts is evaluated under multiple guidance strengths ùë§, producing several variations per prompt. With seven guidance values, this results in 21 final images. Internally, classifier-free guidance doubles the batch size during sampling to combine conditioned and unconditioned predictions, but this duplication is only used to compute the guidance signal and does not increase the number of output images. All downstream analyses therefore operate on exactly 21 images and 21 corresponding embeddings.


# Setup

The project repository is mounted from Google Drive and added to the Python path to allow clean imports from the src module. The dataset is copied to the local Colab filesystem to improve I/O performance during training. All global settings (random seed, device selection, paths, batch sizes) are defined once and reused across the notebook to ensure consistency and reproducibility.

In [None]:
import sys
from pathlib import Path

from google.colab import drive
drive.mount('/content/drive')

%cd "/content/drive/MyDrive/Applied-Computer-Vision-Projects/Diffusion_Model_03"

PROJECT_ROOT = Path.cwd()
if str(PROJECT_ROOT) not in sys.path:
    sys.path.append(str(PROJECT_ROOT))

/Users/MicheleMarschner/Documents/Uni/WiSe_2025_26/Applied CV/Applied-Computer-Vision-Projects/Diffusion_Model_03


In [None]:
# Install dependencies
%%capture
%pip install --no-cache-dir -r requirements.txt

In [None]:
import os
from google.colab import userdata

import numpy as np
from scipy.linalg import sqrtm
from PIL import Image
from collections import defaultdict
from tqdm import tqdm

import torch
import torchvision.transforms.v2 as transforms
from torchvision.transforms import ToPILImage
from torch.utils.data import DataLoader
from torchvision.models import inception_v3, Inception_V3_Weights

import clip
import open_clip

import wandb
import fiftyone as fo
import fiftyone.brain as fob
from huggingface_hub import HfApi

In [None]:
from utils import UNet_utils, ddpm_utils, other_utils, config

In [None]:
!rm -rf /content/data
!cp -r "$config.DRIVE_ROOT/data"* /content

In [None]:
other_utils.set_seeds(config.SEED)

# Part 1: Image Generation and Embedding Extraction

In this section, you will load the pre-trained U-Net model from notebook 05_CLIP.ipynb of the corresponding NVIDIA course, generate images of flowers, and extract embeddings from the model's bottleneck.


## Recreate CLIP + DDPM + sampling

In this section, the CLIP text encoder and the DDPM sampling process are reinitialized to match the training setup. The noise schedule and diffusion parameters are defined to ensure compatibility with the pretrained U-Net. This reconstruction is necessary to generate images that are consistent with the original training regime.

In [None]:
# Load CLIP for encoding the text prompts
clip_model, clip_preprocess = clip.load("ViT-B/32", device=config.DEVICE)
clip_model.eval()

In [None]:
# Re-initialize DDPM wrapper
B_start = 0.0001
B_end = 0.02
B = torch.linspace(B_start, B_end, config.TIMESTEPS).to(config.DEVICE)

ddpm = ddpm_utils.DDPM(B, config.DEVICE)

## Load the pre-trained U-Net

Here, the U-Net architecture is instantiated and pretrained weights are loaded from disk. The model is switched to evaluation mode to disable training-specific behavior. This step restores the trained generative model used for all subsequent image synthesis.

In [None]:
# Define the uNet Architecture
uNet_model = UNet_utils.UNet(
    T=config.TIMESTEPS,
    img_ch=config.IMG_CH,
    img_size=config.IMG_SIZE,
    down_chs=(256, 256, 512),
    t_embed_dim=8,
    c_embed_dim=config.CLIP_FEATURES
).to(config.DEVICE)

print("Num params: ", sum(p.numel() for p in uNet_model.parameters()))

# Load the model weights
try:
    uNet_model.load_state_dict(torch.load(config.UNET_MODEL_PATH))
    print("Model weights loaded successfully.")
except FileNotFoundError:
    print("Error: Model weights not found.")

uNet_model.eval()

NameError: name 'UNet_utils' is not defined

In [2]:
# Define prompts
text_prompts = [
    "A photo of a red rose",
    "A photo of a white daisy",
    "A photo of a yellow sunflower"
]

In [None]:
# Sanity check: Calculate how many images are to be generated
# Guidance strengths for classifier-free guidance
P = len(text_prompts)           # number of prompts
W = len(config.W_TESTS)         # guidance values per prompt
n_samples = P * W               # Total images generated: one per (prompt, guidance) pair
                                # 7 guidance weights => 7 images per prompt

print("Expected n_samples:", n_samples)

In [None]:
# Store intermediate feature maps extracted via forward hooks
embeddings_storage = {}

def get_embedding_hook(name):
    """
    Creates a forward hook that stores the output of a given layer.

    The output is detached from the computation graph to avoid
    gradient tracking and reduce memory usage.
    """
    def hook(model, input, output):
        # We use .detach() to disconnect from the gradient graph (saves memory)
        embeddings_storage[name] = output.detach()
    return hook

# Register a forward hook on the U-Net bottleneck layer
uNet_model.down2.register_forward_hook(get_embedding_hook('down2'))
print("Hook registered on model.down2")

In [None]:
def sample_flowers_with_hook(text_list, model, ddpm, input_size, T, device, w_tests):
    """
    Generates images from text prompts using classifier-free guided diffusion
    while capturing intermediate U-Net embeddings via a forward hook.

    Args:
        text_list (list[str]): Text prompts used for conditioning.
        model (nn.Module): Pretrained U-Net diffusion model.
        ddpm: Diffusion process wrapper.
        input_size (tuple): Spatial size of generated images.
        T (int): Number of diffusion timesteps.
        device (torch.device): Computation device.

    Returns:
        torch.Tensor: Final generated images.
        torch.Tensor: Stored intermediate diffusion states (for visualization).
    """
    # Encode text prompts using CLIP
    text_tokens = clip.tokenize(text_list).to(device)
    c = clip_model.encode_text(text_tokens).float()

    # Run diffusion sampling with classifier-free guidance
    x_gen, x_gen_store = ddpm_utils.sample_w(model, ddpm, input_size, T, c, device, w_tests)

    return x_gen, x_gen_store

# Run the generation
other_utils.set_seeds(config.SEED)

print("Generating images...")
generated_images, _ = sample_flowers_with_hook(
    text_list=text_prompts,
    model=uNet_model,
    ddpm=ddpm,
    input_size=config.INPUT_SIZE,
    T=config.TIMESTEPS,
    device=config.DEVICE,
    w_tests=config.W_TESTS
)

# Retrieve the embedding captured by the bottleneck hook
extracted_embedding_all = embeddings_storage['down2']

# Forward hooks capture both conditioned and unconditioned batches (CFG);
# retain only the conditioned samples corresponding to the final images
assert extracted_embedding_all.shape[0] >= n_samples, (
    f"embedding batch {extracted_embedding_all.shape[0]} < {n_samples}"
)
extracted_embeddings = extracted_embedding_all[:n_samples]

print("Using embeddings:", extracted_embeddings.shape)
print(f"Generated {len(generated_images)} images.")
print(f"Extracted embedding shape: {extracted_embedding_all.shape}")

In [None]:
# Visualize the generated samples in a grid: rows correspond to text prompts,
# and columns show the effect of varying guidance strengths (w).
other_utils.show_generated_images_grid(
    generated_images,
    prompts=text_prompts,
    w_tests=config.W_TESTS
)

In [None]:
# Save generated images to disk for downstream evaluation
to_pil = ToPILImage()

# Track saved image paths together with their prompts and guidance values
saved_samples = []

print("Saving images to disk...")
assert len(generated_images) == n_samples, (
    f"generated_images={len(generated_images)} != {n_samples}"
)

for i, img_tensor in enumerate(generated_images):
    # Recover prompt and guidance value from the sampling order
    prompt = text_prompts[i % P]
    w_val = config.W_TESTS[i // P]

    # Map model output from [-1, 1] to [0, 1] for image saving and clip any artifacts that fell outside the valid range
    img_norm = ((img_tensor + 1) / 2).clamp(0, 1).detach().cpu()
    pil_img = to_pil(img_norm)

    filename = os.path.join(
        config.SAVE_DIR, f"flower_w{w_val:+.1f}_p{i % P}_{i}.png"
    )
    pil_img.save(filename)

    saved_samples.append((filename, prompt, float(w_val)))

print("All images saved.")

# Part 2: Evaluation with CLIP Score and FID
In this section, the quality of the generated images is evaluated using CLIP Score and Fr√©chet Inception Distance (FID), following the definitions provided in the assignment task.

## CLIP Score

The CLIP score is computed as the cosine similarity between image and text embeddings produced by a pretrained CLIP model. It answers the question: "How accurately does the generated image depict the content described in the text prompt?"

A higher score indicates stronger semantic alignment. Scores are computed for all generated images, allowing comparison across different guidance strengths and prompts.

In [None]:
# Initialize OpenCLIP model for CLIP-score evaluation
clip_scorer, _, clip_preprocess_val = open_clip.create_model_and_transforms(
    "ViT-B-32", pretrained="laion2b_s34b_b79k"
)
clip_scorer.to(config.DEVICE).eval()
tokenizer = open_clip.get_tokenizer("ViT-B-32")

In [None]:
def calculate_clip_score(image_path, text_prompt, device=None):
    """
    Computes a CLIP similarity score between an image and a text prompt.

    The image and text are embedded using a pretrained OpenCLIP model, L2-normalized,
    and compared via cosine similarity (dot product of normalized embeddings).

    Args:
        image_path (str | Path): Path to the image file on disk.
        text_prompt (str): Text prompt to compare against.

    Returns:
        float: Cosine similarity score (higher means stronger semantic alignment).
    """
    # Preprocess and move to the same device as the CLIP model
    image = clip_preprocess_val(Image.open(image_path)).unsqueeze(0).to(device)
    text = tokenizer([text_prompt]).to(device)

    with torch.no_grad():
        image_features = clip_scorer.encode_image(image)
        text_features = clip_scorer.encode_text(text)

        # Normalize to turn dot product into cosine similarity
        image_features /= image_features.norm(dim=-1, keepdim=True)
        text_features /= text_features.norm(dim=-1, keepdim=True)

        score = (image_features @ text_features.T).item()

    return score


# Compute CLIP scores for all generated samples
clip_scores = []

print("Calculating scores...")

for i, (filepath, prompt, w_val) in enumerate(saved_samples):
    score = calculate_clip_score(
        image_path=filepath,
        text_prompt=prompt,
        device=config.DEVICE,
    )
    clip_scores.append(score)

avg_clip_score = float(np.mean(clip_scores))
print(f"Average CLIP Score: {avg_clip_score:.4f}")

In [None]:
# Aggregate CLIP scores by guidance strength w and compute mean score per w
by_w = defaultdict(list)
for (filepath, prompt, w_val), score in zip(saved_samples, clip_scores):
    by_w[w_val].append(score)

print("Show aggregated CLIP scores by guidance strength w")
print(f"{'w':>6} | {'Mean CLIP':>10} | {'n':>3}")
print("-" * 26)

for w in sorted(by_w):
    mean_score = np.mean(by_w[w])
    n = len(by_w[w])
    print(f"{w:>6.1f} | {mean_score:>10.4f} | {n:>3}")

In [None]:
# example image of a real red rose
real_rose_path = (
    Path(config.TMP_ROOT)
    / "data/cropped_flowers/roses/15032112248_30c5284e54_n.jpg"
)

other_utils.compare_generated_vs_real_roses(
    generated_images,
    prompt_idx=0,  # "A photo of a red rose"
    prompts=text_prompts,
    w_tests=config.W_TESTS,
    real_rose_path=real_rose_path,
)

## FID Score

To measure how realistic our generated images are, we calculate the Fr√©chet Inception Distance (FID) score. We use a powerful pre-trained InceptionV3 model as a feature "judge." Both our generated images and the real flower images are prepared identically‚Äîresized to 299x299 and normalized‚Äîto ensure a fair comparison.

FID is computed by comparing feature statistics extracted from the model for real and generated images. It statistically compares these two collections; a lower score indicates that the generated images are more similar to the real data.

In [None]:
# Load Pretrained InceptionV3;
inception = inception_v3(
    weights=Inception_V3_Weights.DEFAULT,
    transform_input=False,
).to(config.DEVICE)

# To return features (2048) - not classes as the standard Inception model does - the final
# "Fully Connected" layer needs to be replaced with a "Pass Through" (Identity)
inception.fc = torch.nn.Identity()

inception.eval()

image_net_mean = [0.485, 0.456, 0.406]
image_net_std  = [0.229, 0.224, 0.225]

# Inception expects 299x299 size instead of now 32x32 and specific normalization
inception_transform = transforms.Compose([
    transforms.Resize((config.INCEPTION_IMG_SIZE, config.INCEPTION_IMG_SIZE)), # Up-sample from 32x32
    transforms.ToImage(),
    transforms.ToDtype(torch.float32, scale=True),
    transforms.Normalize(mean=image_net_mean, std=image_net_std)   # from original pytorch docs
])

In [None]:
def get_inception_features_from_raw(dataset_path, batch_size, model, device=None, num_workers=0):
    """
    Extracts 2048-dimensional InceptionV3 feature embeddings for a dataset of images.

    Args:
        raw_dataset): dataset of the original images
        model (nn.Module): Pretrained InceptionV3 feature extractor (fc = Identity).

    Returns:
        np.ndarray: Array of shape (N, 2048) containing feature embeddings for all images.
    """
    raw_dataset = other_utils.MyDataset(dataset_path, inception_transform, config.CLASSES)
    raw_dataloader  = DataLoader(raw_dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers)

    features = []                         # Stores feature batches
    with torch.no_grad():
      for img, _ in tqdm(raw_dataloader):
          img = img.to(device)
          f = model(img)                # Runs Inception forward pass
          features.append(f.cpu().numpy())  # Transform to numpy for later mathematical operations
    return np.concatenate(features, axis=0) # Concatenate batches to one array

In [None]:
def get_inception_features_from_files(saved_samples, batch_size, model, transform, device=None, num_workers=0):
    """
    Loads images from disk and extracts InceptionV3 feature embeddings.

    Args:
        saved_samples: List of tuples containing image filepaths.
        model: Pretrained feature extractor.
        transform: Image preprocessing pipeline (resize/normalize).

    Returns:
        np.ndarray: Feature matrix of shape (N, 2048).
    """
    dataset = other_utils.GeneratedListDataset(saved_samples, transform=transform)
    loader = DataLoader(dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers)

    features = []

    with torch.no_grad():
        for img_batch in tqdm(loader, desc="Extracting Generated Features"):
            img_batch = img_batch.to(device)

            # The transform handles Resize, Scale, and Normalize
            f = model(img_batch)
            features.append(f.cpu().numpy())

    return np.concatenate(features, axis=0)

In [None]:
def calculate_fid(real_embeddings, gen_embeddings):

    # Calculate mean and covariance
    mu1, sigma1 = real_embeddings.mean(axis=0), np.cov(real_embeddings, rowvar=False)
    mu2, sigma2 = gen_embeddings.mean(axis=0), np.cov(gen_embeddings, rowvar=False)

    # Sum squared difference between means
    ssdiff = np.sum((mu1 - mu2)**2)

    # Product of covariances
    covmean = sqrtm(sigma1.dot(sigma2))

    # Numerical error handling
    if np.iscomplexobj(covmean):
        covmean = covmean.real

    # Final FID calculation
    fid = ssdiff + np.trace(sigma1 + sigma2 - 2.0 * covmean)

    return fid

In [None]:
# Extract features from real images on disk
dataset_path = config.TMP_ROOT / "data/cropped_flowers"
real_embeddings = get_inception_features_from_raw(dataset_path, config.BATCH_SIZE, inception, device=config.DEVICE, num_workers=config.NUM_WORKERS)

print("Real Embeddings:", real_embeddings.shape)

# Extract features from generated images on disk
gen_embeddings = get_inception_features_from_files(
    saved_samples=saved_samples,
    batch_size=config.BATCH_SIZE,
    model=inception,
    transform=inception_transform,
    device=config.DEVICE,
    num_workers=config.NUM_WORKERS
)

print("Generated Embeddings:", gen_embeddings.shape)

# Compute FID Score: checks if the images look "real" compared to the original dataset
fid_score = calculate_fid(real_embeddings, gen_embeddings)
print(f"FID Score: {fid_score:.4f}")

# Part 3: Embedding Analysis with FiftyOne Brain
This section organizes the generated images into a structured FiftyOne dataset to analyze the model's internal behavior.
Each image is paired with its corresponding prompt, guidance weight (w), and CLIP score, while the extracted U-Net bottleneck features are stored as vector embeddings.
Afterwards, uniqueness (identifying visually distinct samples) and representativeness (identifying the most typical examples) is computed.

In [None]:
# Create a new FiftyOne dataset

# Delete existing dataset if it exists
if config.FIFTYONE_DATASET_NAME in fo.list_datasets():
    print(f"Deleting existing dataset: {config.FIFTYONE_DATASET_NAME}")
    fo.delete_dataset(config.FIFTYONE_DATASET_NAME)

dataset = fo.Dataset(name=config.FIFTYONE_DATASET_NAME)

In [None]:
# Build a FiftyOne dataset where each image is paired with prompt, guidance w,
# CLIP score, and a flattened U-Net embedding (used for embedding-based analysis)
samples = []

print("Building FiftyOne dataset...")

assert len(saved_samples) == n_samples
assert len(clip_scores) == n_samples
assert extracted_embeddings.shape[0] >= n_samples  # hook may capture CFG-doubled batch

for i, (filepath, prompt, w_val) in enumerate(saved_samples):
    # FiftyOne Brain expects a 1D embedding vector per sample for distance computations
    raw_embedding = extracted_embeddings[i]                 # e.g., (512, 8, 8)
    flat_embedding = raw_embedding.flatten().cpu().numpy() # (512*8*8,)

    sample = fo.Sample(filepath=filepath)

    # Store fields for filtering and analysis in the FiftyOne App
    sample["ground_truth"] = fo.Classification(label=prompt)
    sample["w"] = float(w_val)
    sample["clip_score"] = float(clip_scores[i])
    sample["unet_embedding"] = flat_embedding

    samples.append(sample)

# Add all samples in one call for efficiency
dataset.add_samples(samples)
print(f"Added {len(samples)} samples to the dataset.")


In [None]:
# Compute Uniqueness (Visual diversity)
fob.compute_uniqueness(dataset, embeddings="unet_embedding")

# Compute Representativeness using the extracted U-Net embeddings
fob.compute_representativeness(dataset, embeddings="unet_embedding")

In [None]:
# Launch the FiftyOne App to visualize your dataset and analyze the results
session = fo.launch_app(dataset)

# Evaluation

##### 1. Consistency Between Visuals and Metrics
The results show a clear and monotonic relationship between guidance strength ùë§ and semantic alignment as measured by the CLIP score. As ùë§ increases, the generated images become progressively more aligned with their text prompts.

**Observations:**
* w ‚â§ 0 - clip score at 0.11-0.18:
  * Images appear weakly related or unrelated to the text prompt
  * Flowers look blurry, abstract, or ambiguous
* w = 0 - clip score at 0.179:
  * Images start to show recognizable flower-like structures
  * Prompt-specific features (such as color or shape) become more consistent
* w ‚â• 0 - clip score at 0.24-0.28:
  * Images more clearly match the text prompt (e.g. roses, daisies, sunflowers become visually distinct)
  * Color and structure are more stable across samples

Overall, the quantitative CLIP scores are well aligned with qualitative visual inspection, indicating that the model successfully leverages text conditioning to guide semantic generation.

##### 2. Diminishing Returns at High Guidance

| Guidance w | Mean CLIP Score | n |
|-----------:|---------------:|--:|
| -2.0 | 0.1154 | 3 |
| -1.0 | 0.1819 | 3 |
| -0.5 | 0.1892 | 3 |
|  0.0 | 0.1790 | 3 |
|  0.5 | 0.2466 | 3 |
|  1.0 | 0.2854 | 3 |
|  2.0 | 0.2870 | 3 |

Although CLIP scores increase with guidance strength, the improvement becomes marginal at higher values. This suggests that semantic alignment begins to saturate and that w‚âà1.0 to 2.0 might be the "sweet spot" for this model. Beyond this range, higher guidance primarily amplifies contrast or introduces artifacts rather than adding meaningful semantic information.

##### 3. Overall CLIP Performance and Experimental Context
Across all experiments, the average CLIP score is 0.2121, indicating a moderate but meaningful level of semantic alignment between generated images and their prompts. Given the experimental constraints‚Äîmost notably the low image resolution (32√ó32) and the inclusion of weak or negative guidance values‚Äîthis score is well within the expected range.

Importantly, the goal of this assignment is not to maximize image quality, but to analyze how semantic alignment and diversity evolve as a function of guidance strength. Including negative and low guidance values provides insight into the trade-off between prompt adherence and sample diversity. This trade-off is further explored through U-Net embedding analysis and visualization with FiftyOne, offering a complementary perspective beyond CLIP scores alone.

##### 4. Interpretation of the High FID Score
The Fr√©chet Inception Distance (FID) score of approximately 320 is numerically high and would normally suggest poor generative quality. However, in this experimental setup, the absolute FID value is not a reliable indicator of visual or semantic quality.
Small sample size (n=21) and the resolution mismatch (upsampling of 32x32 to 299x299) might influence the high score rather than just poor semantic quality.


##### 5. Future adjustments to increase the quality of the generated images
To further enhance the generative quality and semantic fidelity of the diffusion model, the following optimizations should be implemented:

1. Extended Training & Dynamic Learning Rate: Extending the training of the original U-Net model and automatically reducing the learning rate with a dedicated scheduler that monitors the validation loss will enable the model to better capture high-frequency details and intricate textures. 
2. Prompt Variation & Semantic Generalization: Text prompts should be increased to include more semantic variation, e.g. "A red rose with layered petals" or "A close-up of a daisy". This creates a more robust mapping between CLIP text embeddings and visual features and ensures that the model can generalize to diverse descriptions.
3. Hyperparameter Search for Guidance (w): A systematic search to find the optimal guidance weight would help to identifiy the point where prompt alignment is maximized before over-saturation begins to degrade image quality again.
4. Targeted Quality Filtering: The final output should be filtered to retain only the images generated with the highest guidance strengths. This ensures that the final evaluation is only done on high-fidelity samples that possess clear structures and definitive subject matter.


# Part 4: Logging with Weights & Biases
All experiments are logged to Weights & Biases for reproducibility and comparison. Hyperparameters, generated images, guidance values, CLIP scores, and embedding-based metrics are stored in a structured table, together with aggregate evaluation metrics such as average CLIP score and FID.

In [None]:
# Load W&B API key from Colab Secrets and make it available as env variable
wandb_key = userdata.get('WANDB_API_KEY')
os.environ["WANDB_API_KEY"] = wandb_key
wandb.login()

In [None]:
# Initialize Run
timestamp = other_utils.get_timestamp()
run = wandb.init(project="diffusion_model_assessment_v2", name=f"experiment_run_{timestamp}")

# Log Hyperparameters
wandb.config.update({
    "steps_T": config.TIMESTEPS,
    "image_size": config.IMG_SIZE,
    "clip_features": config.CLIP_FEATURES,
    "prompts": text_prompts
})

# Create a Table for Visual Results
columns = ["image generated", "prompt", "guidance_w", "clip_score", "uniqueness", "representativeness"]

diffusion_test_table = wandb.Table(columns=columns)

# Populate Table
# Grab uniqueness and representativeness scores back from FiftyOne
uniqueness_scores = dataset.values("uniqueness")
representativeness_scores = dataset.values("representativeness")

for i, (filepath, prompt, w_val) in enumerate(saved_samples):
    wandb_img = wandb.Image(filepath)

    diffusion_test_table.add_data(
        wandb_img,
        prompt,
        w_val,
        clip_scores[i],
        uniqueness_scores[i],
        representativeness_scores[i],
    )

# Log the Table and Metrics
wandb.log({
    "generation_results": diffusion_test_table,
    "evaluation/fid_score": fid_score,
    "evaluation/average_clip_score": avg_clip_score
    })

# Finish
run.finish()

# Publish Dataset on Hugging Face

Here the FiftyOne dataset is exported and published to Hugging Face.

In [None]:
# Save FiftyOne dataset (images + metadata) to disk
print(f"Exporting dataset to {config.EXPORT_DIR}...")

dataset.export(
    export_dir=str(config.EXPORT_DIR),
    dataset_type=fo.types.FiftyOneDataset,
    export_media=True, # This ensures the actual .png images are included
)

print("Export complete.")

In [None]:
os.environ["HF_TOKEN"] = "HF_TOKEN"

# Token needs to be stored in Colab Secrets
HF_TOKEN = os.getenv("HF_TOKEN")
assert HF_TOKEN is not None, "HF_TOKEN env var is not set!"

api = HfApi(token=HF_TOKEN)

api.upload_large_folder(
    folder_path=f"{config.EXPORT_DIR}",
    repo_id="mmarschn/generated_flowers_experiment",      # ! must already exist on HF
    repo_type="dataset",
    ignore_patterns=["*.ipynb_checkpoints"],
)