Prompt(Chatgpt 4o): How to calculate FID and Clip score of trained emoji generator model

In [None]:
# Imports
import torch
from diffusers import StableDiffusionPipeline
from peft import PeftModel
from PIL import Image
import os
import pandas as pd
from tqdm import tqdm
from transformers import CLIPProcessor, CLIPModel

In [None]:


# Setup
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using device:", device)

# Prompts (your 30)
PROMPTS = [
    "happy face with sunglasses", "crying emoji", "smiling ghost", "angry robot",
    "winking face", "sleeping moon emoji", "party popper explosion", "emoji with heart eyes",
    "thinking face emoji", "face with mask", "fire emoji", "thumbs up", "clapping hands",
    "shocked face", "money eyes emoji", "face vomiting", "zany face emoji", "face with monocle",
    "pleading eyes emoji", "face screaming in fear", "nerd face", "cold face", "melting face",
    "face with spiral eyes", "face with symbols on mouth", "hugging face", "smiling face with halo",
    "face with raised eyebrow", "emoji blowing a kiss", "saluting face"
]

# Output directory
save_dir = "outputs/manual_prompt_clip_only"
os.makedirs(save_dir, exist_ok=True)

# Load Stable Diffusion with LoRA
model_id = "sd-legacy/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to(device)
pipe.unet = PeftModel.from_pretrained(pipe.unet, "../evaluation/lora_google_emoji_diffusion/final_model")
pipe.enable_attention_slicing()
pipe.safety_checker = lambda images, clip_input, **kwargs: (images, [False] * len(images))
print("✅ LoRA model loaded and ready.")

# Generate images
results = []
for i, prompt in enumerate(tqdm(PROMPTS, desc="Generating images")):
    result = pipe(prompt, num_inference_steps=50, guidance_scale=7.5, height=256, width=256)
    image = result.images[0]
    image_path = os.path.join(save_dir, f"emoji_{i:02}.png")
    image.save(image_path)
    results.append({"prompt": prompt, "generated_image_path": image_path})

results_df = pd.DataFrame(results)

# Load CLIP model
clip_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32").eval().to(device)
clip_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

# Compute CLIP scores
clip_scores = []
for _, row in tqdm(results_df.iterrows(), total=len(results_df), desc="Calculating CLIP scores"):
    image = Image.open(row["generated_image_path"]).convert("RGB")
    inputs = clip_processor(text=row["prompt"], images=image, return_tensors="pt", padding=True).to(device)
    with torch.no_grad():
        # Run processor
        inputs = clip_processor(text=row["prompt"], images=image, return_tensors="pt", padding=True).to(device)

        # Extract pixel and text features separately
        image_inputs = {"pixel_values": inputs["pixel_values"]}
        text_inputs = {
            "input_ids": inputs["input_ids"],
            "attention_mask": inputs["attention_mask"]
        }

        # Get features
        image_embeds = clip_model.get_image_features(**image_inputs)
        text_embeds = clip_model.get_text_features(**text_inputs)

        # Normalize and compute cosine similarity
        image_embeds = image_embeds / image_embeds.norm(dim=-1, keepdim=True)
        text_embeds = text_embeds / text_embeds.norm(dim=-1, keepdim=True)

        score = torch.nn.functional.cosine_similarity(image_embeds, text_embeds).item()
        clip_scores.append(score)

results_df["clip_score"] = clip_scores
results_df.to_parquet(os.path.join(save_dir, "clip_scores.parquet"), index=False)

# Print summary
print(f"\n All done! Results saved to: {save_dir}")
print(f" Average CLIP Score: {results_df['clip_score'].mean():.4f}")

Using device: cuda


Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

✅ LoRA model loaded and ready.


Generating images:   0%|          | 0/30 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:   3%|▎         | 1/30 [00:43<21:13, 43.92s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:   7%|▋         | 2/30 [01:27<20:29, 43.92s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  10%|█         | 3/30 [02:11<19:40, 43.72s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  13%|█▎        | 4/30 [02:54<18:52, 43.54s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  17%|█▋        | 5/30 [03:37<18:05, 43.41s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  20%|██        | 6/30 [04:21<17:20, 43.35s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  23%|██▎       | 7/30 [05:04<16:36, 43.33s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  27%|██▋       | 8/30 [05:48<16:01, 43.69s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  30%|███       | 9/30 [06:33<15:25, 44.08s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  33%|███▎      | 10/30 [07:17<14:42, 44.13s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  37%|███▋      | 11/30 [08:01<13:53, 43.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  40%|████      | 12/30 [08:44<13:06, 43.67s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  43%|████▎     | 13/30 [09:27<12:20, 43.53s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  47%|████▋     | 14/30 [10:10<11:34, 43.42s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  50%|█████     | 15/30 [10:53<10:50, 43.35s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  53%|█████▎    | 16/30 [11:37<10:06, 43.30s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  57%|█████▋    | 17/30 [12:20<09:22, 43.28s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  60%|██████    | 18/30 [13:03<08:38, 43.23s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  63%|██████▎   | 19/30 [13:46<07:55, 43.20s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  67%|██████▋   | 20/30 [14:29<07:12, 43.21s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  70%|███████   | 21/30 [15:13<06:28, 43.20s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  73%|███████▎  | 22/30 [15:56<05:45, 43.19s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  77%|███████▋  | 23/30 [16:39<05:02, 43.18s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  80%|████████  | 24/30 [17:22<04:18, 43.16s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  83%|████████▎ | 25/30 [18:05<03:35, 43.15s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  87%|████████▋ | 26/30 [18:48<02:52, 43.17s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  90%|█████████ | 27/30 [19:31<02:09, 43.16s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  93%|█████████▎| 28/30 [20:15<01:26, 43.19s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images:  97%|█████████▋| 29/30 [20:58<00:43, 43.19s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

Generating images: 100%|██████████| 30/30 [21:41<00:00, 43.38s/it]
Calculating CLIP scores: 100%|██████████| 30/30 [00:01<00:00, 28.12it/s]


 All done! Results saved to: outputs/manual_prompt_clip_only
 Average CLIP Score: 0.2712





In [1]:
import torch
import os
import pandas as pd
from torch.utils.data import DataLoader, random_split
from torchvision import transforms
from torchvision.datasets.folder import default_loader
from tqdm import tqdm
from diffusers import StableDiffusionPipeline
from peft import PeftModel
from PIL import Image
from pytorch_fid.fid_score import calculate_fid_given_paths

# Setup
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using device:", device)

# Load validation dataset
df = pd.read_parquet("../data/processed_emoji_dataset.parquet")

class EmojiDataset(torch.utils.data.Dataset):
    def __init__(self, df):
        self.df = df
        self.transform = transforms.Compose([
            transforms.Resize((256, 256)),
            transforms.ToTensor()
        ])

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        image_tensor = torch.load(self.df.iloc[idx]['image_path']).float()
        prompt = self.df.iloc[idx]['prompt']
        return {"pixel_values": image_tensor, "prompt": prompt}

dataset = EmojiDataset(df)
train_size = int(0.97 * len(dataset))
_, val_set = random_split(dataset, [train_size, len(dataset) - train_size])

# Generate images from validation prompts
model_id = "sd-legacy/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to(device)
pipe.unet = PeftModel.from_pretrained(pipe.unet, "../evaluation/lora_google_emoji_diffusion/final_model")
pipe.enable_attention_slicing()
pipe.safety_checker = lambda images, clip_input, **kwargs: (images, [False] * len(images))
print("✅ LoRA model loaded.")

# Output directories for FID
generated_dir = "outputs/fid/generated"
real_dir = "outputs/fid/real"
os.makedirs(generated_dir, exist_ok=True)
os.makedirs(real_dir, exist_ok=True)

print("🖼 Generating validation images...")
for i, sample in enumerate(tqdm(val_set)):
    prompt = sample["prompt"]
    real_image_tensor = sample["pixel_values"]
    
    # Save real image
    real_image = transforms.ToPILImage()(real_image_tensor)
    real_image.save(os.path.join(real_dir, f"real_{i:04}.png"))

    # Generate and save fake image
    result = pipe(prompt, num_inference_steps=50, guidance_scale=7.5, height=256, width=256)
    gen_image = result.images[0]
    gen_image.save(os.path.join(generated_dir, f"gen_{i:04}.png"))

# Compute FID using pytorch_fid
print("📊 Calculating FID...")
fid_score = calculate_fid_given_paths([real_dir, generated_dir], batch_size=32, device=device, dims=2048)
print(f"✅ FID Score: {fid_score:.4f}")


Using device: cuda


Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

✅ LoRA model loaded.
🖼 Generating validation images...


  0%|          | 0/91 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

  1%|          | 1/91 [00:04<07:00,  4.68s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  2%|▏         | 2/91 [00:09<06:38,  4.48s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  3%|▎         | 3/91 [00:13<06:21,  4.34s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  4%|▍         | 4/91 [00:17<06:24,  4.42s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  5%|▌         | 5/91 [00:21<06:08,  4.29s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  7%|▋         | 6/91 [00:25<05:58,  4.22s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  8%|▊         | 7/91 [00:30<05:56,  4.25s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

  9%|▉         | 8/91 [00:34<06:00,  4.34s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 10%|▉         | 9/91 [00:39<05:54,  4.32s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 11%|█         | 10/91 [00:43<05:53,  4.37s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 12%|█▏        | 11/91 [00:47<05:47,  4.35s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 13%|█▎        | 12/91 [00:52<05:45,  4.37s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 14%|█▍        | 13/91 [00:56<05:36,  4.31s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 15%|█▌        | 14/91 [01:00<05:29,  4.27s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 16%|█▋        | 15/91 [01:05<05:30,  4.35s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 18%|█▊        | 16/91 [01:09<05:22,  4.30s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 19%|█▊        | 17/91 [01:13<05:17,  4.29s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 20%|█▉        | 18/91 [01:18<05:18,  4.36s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 21%|██        | 19/91 [01:22<05:11,  4.33s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 22%|██▏       | 20/91 [01:26<05:11,  4.38s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 23%|██▎       | 21/91 [01:31<05:08,  4.41s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 24%|██▍       | 22/91 [01:35<05:01,  4.37s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 25%|██▌       | 23/91 [01:39<04:55,  4.35s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 26%|██▋       | 24/91 [01:44<04:55,  4.41s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 27%|██▋       | 25/91 [01:48<04:51,  4.41s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 29%|██▊       | 26/91 [01:53<04:45,  4.39s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 30%|██▉       | 27/91 [01:57<04:40,  4.39s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 31%|███       | 28/91 [02:01<04:34,  4.36s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 32%|███▏      | 29/91 [02:06<04:26,  4.30s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 33%|███▎      | 30/91 [02:10<04:20,  4.27s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 34%|███▍      | 31/91 [02:14<04:19,  4.32s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 35%|███▌      | 32/91 [02:18<04:11,  4.27s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 36%|███▋      | 33/91 [02:23<04:10,  4.32s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 37%|███▋      | 34/91 [02:27<04:10,  4.40s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 38%|███▊      | 35/91 [02:32<04:03,  4.35s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 40%|███▉      | 36/91 [02:36<03:58,  4.34s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 41%|████      | 37/91 [02:40<03:55,  4.36s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 42%|████▏     | 38/91 [02:45<03:58,  4.50s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 43%|████▎     | 39/91 [02:53<04:44,  5.47s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 44%|████▍     | 40/91 [02:57<04:20,  5.10s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 45%|████▌     | 41/91 [03:01<04:04,  4.88s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 46%|████▌     | 42/91 [03:06<03:49,  4.69s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 47%|████▋     | 43/91 [03:10<03:40,  4.60s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 48%|████▊     | 44/91 [03:14<03:30,  4.48s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 49%|████▉     | 45/91 [03:19<03:23,  4.42s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 51%|█████     | 46/91 [03:23<03:15,  4.35s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 52%|█████▏    | 47/91 [03:27<03:08,  4.29s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 53%|█████▎    | 48/91 [03:31<03:05,  4.30s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 54%|█████▍    | 49/91 [03:36<03:00,  4.31s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 55%|█████▍    | 50/91 [03:40<02:53,  4.24s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 56%|█████▌    | 51/91 [03:44<02:50,  4.26s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 57%|█████▋    | 52/91 [03:48<02:45,  4.23s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 58%|█████▊    | 53/91 [03:52<02:38,  4.16s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 59%|█████▉    | 54/91 [03:56<02:33,  4.15s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 60%|██████    | 55/91 [04:00<02:26,  4.08s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 62%|██████▏   | 56/91 [04:04<02:24,  4.13s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 63%|██████▎   | 57/91 [04:09<02:22,  4.20s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 64%|██████▎   | 58/91 [04:13<02:19,  4.24s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 65%|██████▍   | 59/91 [04:17<02:15,  4.22s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 66%|██████▌   | 60/91 [04:22<02:12,  4.26s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 67%|██████▋   | 61/91 [04:26<02:10,  4.35s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 68%|██████▊   | 62/91 [04:30<02:04,  4.28s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 69%|██████▉   | 63/91 [04:35<02:02,  4.36s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 70%|███████   | 64/91 [04:39<01:57,  4.35s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 71%|███████▏  | 65/91 [04:45<02:03,  4.75s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 73%|███████▎  | 66/91 [04:52<02:16,  5.45s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 74%|███████▎  | 67/91 [04:59<02:21,  5.91s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 75%|███████▍  | 68/91 [05:06<02:22,  6.19s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 76%|███████▌  | 69/91 [05:13<02:20,  6.40s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 77%|███████▋  | 70/91 [05:19<02:13,  6.35s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 78%|███████▊  | 71/91 [05:24<01:57,  5.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 79%|███████▉  | 72/91 [05:28<01:44,  5.51s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 80%|████████  | 73/91 [05:33<01:35,  5.28s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 81%|████████▏ | 74/91 [05:38<01:25,  5.05s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 82%|████████▏ | 75/91 [05:42<01:18,  4.93s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 84%|████████▎ | 76/91 [05:47<01:12,  4.86s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 85%|████████▍ | 77/91 [05:52<01:06,  4.78s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 86%|████████▌ | 78/91 [05:56<00:59,  4.58s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 87%|████████▋ | 79/91 [06:00<00:54,  4.54s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 88%|████████▊ | 80/91 [06:04<00:48,  4.40s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 89%|████████▉ | 81/91 [06:08<00:43,  4.31s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 90%|█████████ | 82/91 [06:12<00:37,  4.22s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 91%|█████████ | 83/91 [06:16<00:33,  4.16s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 92%|█████████▏| 84/91 [06:20<00:29,  4.14s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 93%|█████████▎| 85/91 [06:25<00:25,  4.22s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 95%|█████████▍| 86/91 [06:29<00:21,  4.33s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 96%|█████████▌| 87/91 [06:34<00:17,  4.42s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 97%|█████████▋| 88/91 [06:38<00:13,  4.40s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 98%|█████████▊| 89/91 [06:43<00:08,  4.39s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

 99%|█████████▉| 90/91 [06:47<00:04,  4.40s/it]

  0%|          | 0/50 [00:00<?, ?it/s]

100%|██████████| 91/91 [06:51<00:00,  4.52s/it]
Downloading: "https://github.com/mseitzer/pytorch-fid/releases/download/fid_weights/pt_inception-2015-12-05-6726825d.pth" to C:\Users\Jyoti Prakash Uprety/.cache\torch\hub\checkpoints\pt_inception-2015-12-05-6726825d.pth


📊 Calculating FID...


100%|██████████| 91.2M/91.2M [00:01<00:00, 67.4MB/s]
100%|██████████| 3/3 [00:04<00:00,  1.45s/it]
100%|██████████| 3/3 [00:04<00:00,  1.39s/it]


✅ FID Score: 174.8789
