### 编写yaml配置文件

对于生成图像数据评估，可编写如下格式的yaml文件，其中data下的配置用于指定数据集的路径和相关信息，scorers下的配置用于指定您想使用的评估指标。
```yaml
model_cache_path: '../ckpt' # Path to cache models
num_workers: 2

data:
  image:
    meta_data_path: "../data/image_data.jsonl" # 元数据的存放位置
    data_path: "../data/images" # 图像数据的存放位置
    image_key: 'image' # 元数据中图像路径（或图像名）对应的键
    id_key: 'id' # 元数据中id对应的键
    formatter: 'GenImageFormatter' # 生成图片数据固定使用GenImageFormatter

    # 可选，对于一些指标需要参考数据，例如FID和KID，必须在这里指定。对于IS，默认为None
    ref_meta_data_path: "../data/ref_image_data.jsonl" 
    ref_data_path: "../data/images"

scorers:
  ISScorer:
    batch_size: 32
    resize: True
    splits: 10
    device: "cpu"
  FIDScorer:
    batch_size: 50
    dims: 2048
    model: https://github.com/mseitzer/pytorch-fid/releases/download/fid_weights/pt_inception-2015-12-05-6726825d.pth
    device: "cpu"
  KIDScorer:
    batch_size: 50
    dims: 2048
    model:  inception
    device: "cpu"
```

以上配置文件对应的元数据格式如下(以LLaVA pretrain数据为例)：
```json
{"id": "003633483", "text": "a logo for district nurse with a rose design mugs", "image": "images/00011/000116355.jpg"}
{"id": "003241608", "text": "a playmobil police activity calendar", "image": "images/00025/000256174.jpg"}
{"id": "004275678", "text": "the front cover of squares three theories by john e stauffer", "image": "images/00013/000137376.jpg"}
{"id": "004551653", "text": "camel special lights cigarettes box", "image": "images/00035/000359122.jpg"}
{"id": "001114792", "text": "a barn in the rain iphone case", "image": "images/00039/000395694.jpg"}
```
ref_image_data.json的格式与image_data.json相同。

##### 数据加载
对于图片数据集，支持json、jsonl等格式。

对于fid和kid，需要传入两个数据集，来计算两个数据集间的分数。

对于is，只需要传入需要评估的数据集。

In [1]:
import sys
import os
datagym_path = os.path.abspath(os.path.join(os.getcwd(), '..', '..')) 
sys.path.insert(0, datagym_path)
import datagym
from datagym.data.image_dataset import jsonImageDataset

image_folder_path = "../data/images"
image_dataset1 = jsonImageDataset("../data/image_data.jsonl", image_folder_path)
image_dataset2 = jsonImageDataset("../data/ref_image_data.jsonl", image_folder_path)
print(len(image_dataset1), len(image_dataset2))

750 750


##### IS使用示例
IS（Inception Score）是一种评估图像生成模型性能的方法。以下代码创建了IS评分器，并对指定的数据集进行了评分。配置yaml文件中的scorers如下，并打印输出。
```yaml
scorers:
  ISScorer:
    batch_size: 32
    resize: True
    splits: 10
    device: "cpu"
```

In [2]:
!python run_images.py --config ../../configs/is_score.yaml



  return F.softmax(x).data.cpu().numpy()


{}
{'ISScorer': (4.777031371280047, 0.25132291953034813)}


##### FID使用示例

FID（Frechet Inception Distance）是通过比较真实图像和生成图像的特征分布差异来评估图像质量的方法。以下代码演示了如何初始化FID评分器，并对两个数据集进行评分。配置yaml文件中的scorers如下，并打印输出。
```yaml
scorers:
  FIDScorer:
    batch_size: 50
    dims: 2048
    model: https://github.com/mseitzer/pytorch-fid/releases/download/fid_weights/pt_inception-2015-12-05-6726825d.pth
    device: "cpu"
```

In [3]:
!python run_images.py --config ../../configs/fid_score.yaml

2048



  0%|                                                    | 0/15 [00:00<?, ?it/s]


  7%|██▉                                         | 1/15 [00:00<00:12,  1.16it/s]


 20%|████████▊                                   | 3/15 [00:00<00:03,  3.76it/s]


 33%|██████████████▋                             | 5/15 [00:01<00:01,  6.32it/s]


 47%|████████████████████▌                       | 7/15 [00:01<00:00,  8.76it/s]


 60%|██████████████████████████▍                 | 9/15 [00:01<00:00, 10.90it/s]


 73%|███████████████████████████████▌           | 11/15 [00:01<00:00, 12.69it/s]


 87%|█████████████████████████████████████▎     | 13/15 [00:01<00:00, 14.12it/s]


100%|███████████████████████████████████████████| 15/15 [00:01<00:00, 15.10it/s]


100%|███████████████████████████████████████████| 15/15 [00:01<00:00,  8.86it/s]



  0%|                                                    | 0/15 [00:00<?, ?it/s]


  7%|██▉                                         | 1/15 [00:00<00:02,  5.76it/s]


 20%|████████▊                                   | 3/15 [00:00<00:01, 11.85it/s]


 33%|██████████████▋                             | 5/15 [00:00<00:00, 14.56it/s]


 47%|████████████████████▌                       | 7/15 [00:00<00:00, 16.03it/s]


 60%|██████████████████████████▍                 | 9/15 [00:00<00:00, 16.92it/s]


 73%|███████████████████████████████▌           | 11/15 [00:00<00:00, 17.47it/s]


 87%|█████████████████████████████████████▎     | 13/15 [00:00<00:00, 17.90it/s]


100%|███████████████████████████████████████████| 15/15 [00:00<00:00, 18.21it/s]


100%|███████████████████████████████████████████| 15/15 [00:00<00:00, 15.39it/s]


{}
{'FIDScorer': 62.86046101873319}


##### KID使用示例

KID（Kernel Inception Distance）与FID类似，但使用不同的统计方法来评估分布差异。以下代码同样展示了如何进行KID评分的计算。配置yaml文件中的scorers如下，并打印输出。
```yaml
scorers:
  KIDScorer:
    batch_size: 50
    dims: 2048
    model:  inception
    device: "cpu"
```

In [4]:
!python run_images.py --config ../../configs/kid_score.yaml


  0%|                                                    | 0/15 [00:00<?, ?it/s]


  7%|██▉                                         | 1/15 [00:00<00:06,  2.08it/s]


 20%|████████▊                                   | 3/15 [00:00<00:02,  5.65it/s]


 33%|██████████████▋                             | 5/15 [00:00<00:01,  8.33it/s]


 47%|████████████████████▌                       | 7/15 [00:00<00:00, 10.32it/s]


 60%|██████████████████████████▍                 | 9/15 [00:01<00:00, 11.77it/s]


 73%|███████████████████████████████▌           | 11/15 [00:01<00:00, 12.83it/s]


 87%|█████████████████████████████████████▎     | 13/15 [00:01<00:00, 13.60it/s]


100%|███████████████████████████████████████████| 15/15 [00:01<00:00, 14.13it/s]
100%|███████████████████████████████████████████| 15/15 [00:01<00:00, 10.67it/s]

  0%|                                                    | 0/15 [00:00<?, ?it/s]


 13%|█████▊                                      | 2/15 [00:00<00:00, 15.62it/s]


 27%|███████████▋                                | 4/15 [00:00<00:00, 15.53it/s]


 40%|█████████████████▌                          | 6/15 [00:00<00:00, 15.55it/s]


 53%|███████████████████████▍                    | 8/15 [00:00<00:00, 15.63it/s]


 67%|████████████████████████████▋              | 10/15 [00:00<00:00, 15.75it/s]


 80%|██████████████████████████████████▍        | 12/15 [00:00<00:00, 15.85it/s]


 93%|████████████████████████████████████████▏  | 14/15 [00:00<00:00, 15.90it/s]


100%|███████████████████████████████████████████| 15/15 [00:00<00:00, 15.77it/s]

MMD:   0%|                                              | 0/100 [00:00<?, ?it/s]


MMD:   0%|                                | 0/100 [00:00<?, ?it/s, mean=0.00132]
MMD:   1%|▏                       | 1/100 [00:00<00:34,  2.88it/s, mean=0.00132]


MMD:   1%|▎                        | 1/100 [00:00<00:34,  2.88it/s, mean=0.0012]
MMD:   2%|▌                        | 2/100 [00:00<00:35,  2.73it/s, mean=0.0012]


MMD:   2%|▍                       | 2/100 [00:01<00:35,  2.73it/s, mean=0.00113]
MMD:   3%|▋                       | 3/100 [00:01<00:36,  2.63it/s, mean=0.00113]


MMD:   3%|▋                       | 3/100 [00:01<00:36,  2.63it/s, mean=0.00117]
MMD:   4%|▉                       | 4/100 [00:01<00:36,  2.65it/s, mean=0.00117]


MMD:   4%|▉                       | 4/100 [00:01<00:36,  2.65it/s, mean=0.00113]
MMD:   5%|█▏                      | 5/100 [00:01<00:34,  2.75it/s, mean=0.00113]


MMD:   5%|█▏                      | 5/100 [00:02<00:34,  2.75it/s, mean=0.00116]
MMD:   6%|█▍                      | 6/100 [00:02<00:33,  2.78it/s, mean=0.00116]


MMD:   6%|█▍                      | 6/100 [00:02<00:33,  2.78it/s, mean=0.00114]
MMD:   7%|█▋                      | 7/100 [00:02<00:33,  2.74it/s, mean=0.00114]


MMD:   7%|█▋                      | 7/100 [00:02<00:33,  2.74it/s, mean=0.00113]
MMD:   8%|█▉                      | 8/100 [00:02<00:33,  2.78it/s, mean=0.00113]


MMD:   8%|█▉                      | 8/100 [00:03<00:33,  2.78it/s, mean=0.00107]
MMD:   9%|██▏                     | 9/100 [00:03<00:32,  2.81it/s, mean=0.00107]


MMD:   9%|██▏                     | 9/100 [00:03<00:32,  2.81it/s, mean=0.00105]
MMD:  10%|██▎                    | 10/100 [00:03<00:31,  2.83it/s, mean=0.00105]


MMD:  10%|██▎                    | 10/100 [00:03<00:31,  2.83it/s, mean=0.00107]
MMD:  11%|██▌                    | 11/100 [00:03<00:31,  2.87it/s, mean=0.00107]


MMD:  11%|██▌                    | 11/100 [00:04<00:31,  2.87it/s, mean=0.00107]
MMD:  12%|██▊                    | 12/100 [00:04<00:30,  2.92it/s, mean=0.00107]


MMD:  12%|██▊                    | 12/100 [00:04<00:30,  2.92it/s, mean=0.00113]
MMD:  13%|██▉                    | 13/100 [00:04<00:29,  2.96it/s, mean=0.00113]


MMD:  13%|██▉                    | 13/100 [00:04<00:29,  2.96it/s, mean=0.00112]
MMD:  14%|███▏                   | 14/100 [00:04<00:28,  2.97it/s, mean=0.00112]


MMD:  14%|███▏                   | 14/100 [00:05<00:28,  2.97it/s, mean=0.00112]
MMD:  15%|███▍                   | 15/100 [00:05<00:28,  2.98it/s, mean=0.00112]


MMD:  15%|███▍                   | 15/100 [00:05<00:28,  2.98it/s, mean=0.00114]
MMD:  16%|███▋                   | 16/100 [00:05<00:28,  2.99it/s, mean=0.00114]


MMD:  16%|███▋                   | 16/100 [00:05<00:28,  2.99it/s, mean=0.00114]
MMD:  17%|███▉                   | 17/100 [00:05<00:28,  2.93it/s, mean=0.00114]


MMD:  17%|███▉                   | 17/100 [00:06<00:28,  2.93it/s, mean=0.00116]
MMD:  18%|████▏                  | 18/100 [00:06<00:27,  2.95it/s, mean=0.00116]


MMD:  18%|████▏                  | 18/100 [00:06<00:27,  2.95it/s, mean=0.00115]
MMD:  19%|████▎                  | 19/100 [00:06<00:27,  2.96it/s, mean=0.00115]


MMD:  19%|████▎                  | 19/100 [00:06<00:27,  2.96it/s, mean=0.00115]
MMD:  20%|████▌                  | 20/100 [00:06<00:27,  2.94it/s, mean=0.00115]


MMD:  20%|████▌                  | 20/100 [00:07<00:27,  2.94it/s, mean=0.00114]
MMD:  21%|████▊                  | 21/100 [00:07<00:26,  2.97it/s, mean=0.00114]


MMD:  21%|████▊                  | 21/100 [00:07<00:26,  2.97it/s, mean=0.00114]
MMD:  22%|█████                  | 22/100 [00:07<00:27,  2.87it/s, mean=0.00114]


MMD:  22%|█████                  | 22/100 [00:08<00:27,  2.87it/s, mean=0.00114]
MMD:  23%|█████▎                 | 23/100 [00:08<00:26,  2.92it/s, mean=0.00114]


MMD:  23%|█████▎                 | 23/100 [00:08<00:26,  2.92it/s, mean=0.00113]
MMD:  24%|█████▌                 | 24/100 [00:08<00:25,  2.93it/s, mean=0.00113]


MMD:  24%|█████▌                 | 24/100 [00:08<00:25,  2.93it/s, mean=0.00112]
MMD:  25%|█████▊                 | 25/100 [00:08<00:25,  2.94it/s, mean=0.00112]


MMD:  25%|█████▊                 | 25/100 [00:09<00:25,  2.94it/s, mean=0.00113]
MMD:  26%|█████▉                 | 26/100 [00:09<00:24,  2.97it/s, mean=0.00113]


MMD:  26%|█████▉                 | 26/100 [00:09<00:24,  2.97it/s, mean=0.00112]
MMD:  27%|██████▏                | 27/100 [00:09<00:24,  3.00it/s, mean=0.00112]


MMD:  27%|██████▏                | 27/100 [00:09<00:24,  3.00it/s, mean=0.00111]
MMD:  28%|██████▍                | 28/100 [00:09<00:24,  2.98it/s, mean=0.00111]


MMD:  28%|██████▍                | 28/100 [00:10<00:24,  2.98it/s, mean=0.00111]
MMD:  29%|██████▋                | 29/100 [00:10<00:23,  3.00it/s, mean=0.00111]


MMD:  29%|██████▋                | 29/100 [00:10<00:23,  3.00it/s, mean=0.00111]
MMD:  30%|██████▉                | 30/100 [00:10<00:23,  3.02it/s, mean=0.00111]


MMD:  30%|██████▉                | 30/100 [00:10<00:23,  3.02it/s, mean=0.00111]
MMD:  31%|███████▏               | 31/100 [00:10<00:22,  3.03it/s, mean=0.00111]


MMD:  31%|███████▍                | 31/100 [00:11<00:22,  3.03it/s, mean=0.0011]
MMD:  32%|███████▋                | 32/100 [00:11<00:22,  2.97it/s, mean=0.0011]


MMD:  32%|███████▋                | 32/100 [00:11<00:22,  2.97it/s, mean=0.0011]
MMD:  33%|███████▉                | 33/100 [00:11<00:22,  3.01it/s, mean=0.0011]


MMD:  33%|███████▉                | 33/100 [00:11<00:22,  3.01it/s, mean=0.0011]
MMD:  34%|████████▏               | 34/100 [00:11<00:22,  2.99it/s, mean=0.0011]


MMD:  34%|████████▏               | 34/100 [00:12<00:22,  2.99it/s, mean=0.0011]
MMD:  35%|████████▍               | 35/100 [00:12<00:21,  2.99it/s, mean=0.0011]


MMD:  35%|████████               | 35/100 [00:12<00:21,  2.99it/s, mean=0.00109]
MMD:  36%|████████▎              | 36/100 [00:12<00:21,  3.01it/s, mean=0.00109]


MMD:  36%|████████▎              | 36/100 [00:12<00:21,  3.01it/s, mean=0.00109]
MMD:  37%|████████▌              | 37/100 [00:12<00:20,  3.00it/s, mean=0.00109]


MMD:  37%|████████▌              | 37/100 [00:13<00:20,  3.00it/s, mean=0.00109]
MMD:  38%|████████▋              | 38/100 [00:13<00:20,  2.99it/s, mean=0.00109]


MMD:  38%|█████████               | 38/100 [00:13<00:20,  2.99it/s, mean=0.0011]
MMD:  39%|█████████▎              | 39/100 [00:13<00:20,  2.97it/s, mean=0.0011]


MMD:  39%|████████▉              | 39/100 [00:13<00:20,  2.97it/s, mean=0.00109]
MMD:  40%|█████████▏             | 40/100 [00:13<00:20,  2.98it/s, mean=0.00109]


MMD:  40%|█████████▌              | 40/100 [00:14<00:20,  2.98it/s, mean=0.0011]
MMD:  41%|█████████▊              | 41/100 [00:14<00:19,  2.99it/s, mean=0.0011]


MMD:  41%|█████████▊              | 41/100 [00:14<00:19,  2.99it/s, mean=0.0011]
MMD:  42%|██████████              | 42/100 [00:14<00:19,  2.98it/s, mean=0.0011]


MMD:  42%|█████████▋             | 42/100 [00:14<00:19,  2.98it/s, mean=0.00111]
MMD:  43%|█████████▉             | 43/100 [00:14<00:19,  2.95it/s, mean=0.00111]


MMD:  43%|██████████▎             | 43/100 [00:15<00:19,  2.95it/s, mean=0.0011]
MMD:  44%|██████████▌             | 44/100 [00:15<00:18,  2.96it/s, mean=0.0011]


MMD:  44%|██████████             | 44/100 [00:15<00:18,  2.96it/s, mean=0.00109]
MMD:  45%|██████████▎            | 45/100 [00:15<00:18,  2.95it/s, mean=0.00109]


MMD:  45%|██████████▎            | 45/100 [00:15<00:18,  2.95it/s, mean=0.00109]
MMD:  46%|██████████▌            | 46/100 [00:15<00:18,  2.97it/s, mean=0.00109]


MMD:  46%|██████████▌            | 46/100 [00:16<00:18,  2.97it/s, mean=0.00109]
MMD:  47%|██████████▊            | 47/100 [00:16<00:17,  3.00it/s, mean=0.00109]


MMD:  47%|██████████▊            | 47/100 [00:16<00:17,  3.00it/s, mean=0.00109]
MMD:  48%|███████████            | 48/100 [00:16<00:17,  3.02it/s, mean=0.00109]


MMD:  48%|███████████▌            | 48/100 [00:16<00:17,  3.02it/s, mean=0.0011]
MMD:  49%|███████████▊            | 49/100 [00:16<00:16,  3.04it/s, mean=0.0011]


MMD:  49%|███████████▊            | 49/100 [00:17<00:16,  3.04it/s, mean=0.0011]
MMD:  50%|████████████            | 50/100 [00:17<00:16,  3.03it/s, mean=0.0011]


MMD:  50%|████████████            | 50/100 [00:17<00:16,  3.03it/s, mean=0.0011]
MMD:  51%|████████████▏           | 51/100 [00:17<00:15,  3.07it/s, mean=0.0011]


MMD:  51%|████████████▏           | 51/100 [00:17<00:15,  3.07it/s, mean=0.0011]
MMD:  52%|████████████▍           | 52/100 [00:17<00:15,  3.05it/s, mean=0.0011]


MMD:  52%|███████████▉           | 52/100 [00:18<00:15,  3.05it/s, mean=0.00109]
MMD:  53%|████████████▏          | 53/100 [00:18<00:15,  3.02it/s, mean=0.00109]


MMD:  53%|████████████▋           | 53/100 [00:18<00:15,  3.02it/s, mean=0.0011]
MMD:  54%|████████████▉           | 54/100 [00:18<00:15,  2.97it/s, mean=0.0011]


MMD:  54%|████████████▉           | 54/100 [00:18<00:15,  2.97it/s, mean=0.0011]
MMD:  55%|█████████████▏          | 55/100 [00:18<00:15,  2.93it/s, mean=0.0011]


MMD:  55%|█████████████▏          | 55/100 [00:19<00:15,  2.93it/s, mean=0.0011]
MMD:  56%|█████████████▍          | 56/100 [00:19<00:15,  2.93it/s, mean=0.0011]


MMD:  56%|█████████████▍          | 56/100 [00:19<00:15,  2.93it/s, mean=0.0011]
MMD:  57%|█████████████▋          | 57/100 [00:19<00:14,  2.88it/s, mean=0.0011]


MMD:  57%|█████████████          | 57/100 [00:19<00:14,  2.88it/s, mean=0.00109]
MMD:  58%|█████████████▎         | 58/100 [00:19<00:14,  2.88it/s, mean=0.00109]


MMD:  58%|█████████████▉          | 58/100 [00:20<00:14,  2.88it/s, mean=0.0011]
MMD:  59%|██████████████▏         | 59/100 [00:20<00:14,  2.84it/s, mean=0.0011]


MMD:  59%|██████████████▏         | 59/100 [00:20<00:14,  2.84it/s, mean=0.0011]
MMD:  60%|██████████████▍         | 60/100 [00:20<00:13,  2.89it/s, mean=0.0011]


MMD:  60%|█████████████▊         | 60/100 [00:20<00:13,  2.89it/s, mean=0.00111]
MMD:  61%|██████████████         | 61/100 [00:20<00:13,  2.92it/s, mean=0.00111]


MMD:  61%|██████████████         | 61/100 [00:21<00:13,  2.92it/s, mean=0.00111]
MMD:  62%|██████████████▎        | 62/100 [00:21<00:12,  2.96it/s, mean=0.00111]


MMD:  62%|██████████████▎        | 62/100 [00:21<00:12,  2.96it/s, mean=0.00111]
MMD:  63%|██████████████▍        | 63/100 [00:21<00:12,  2.95it/s, mean=0.00111]


MMD:  63%|██████████████▍        | 63/100 [00:21<00:12,  2.95it/s, mean=0.00111]
MMD:  64%|██████████████▋        | 64/100 [00:21<00:12,  2.92it/s, mean=0.00111]


MMD:  64%|██████████████▋        | 64/100 [00:22<00:12,  2.92it/s, mean=0.00112]
MMD:  65%|██████████████▉        | 65/100 [00:22<00:11,  2.92it/s, mean=0.00112]


MMD:  65%|██████████████▉        | 65/100 [00:22<00:11,  2.92it/s, mean=0.00111]
MMD:  66%|███████████████▏       | 66/100 [00:22<00:11,  2.94it/s, mean=0.00111]


MMD:  66%|███████████████▏       | 66/100 [00:22<00:11,  2.94it/s, mean=0.00112]
MMD:  67%|███████████████▍       | 67/100 [00:22<00:11,  2.96it/s, mean=0.00112]


MMD:  67%|███████████████▍       | 67/100 [00:23<00:11,  2.96it/s, mean=0.00112]
MMD:  68%|███████████████▋       | 68/100 [00:23<00:10,  2.93it/s, mean=0.00112]


MMD:  68%|███████████████▋       | 68/100 [00:23<00:10,  2.93it/s, mean=0.00111]
MMD:  69%|███████████████▊       | 69/100 [00:23<00:10,  2.93it/s, mean=0.00111]


MMD:  69%|███████████████▊       | 69/100 [00:23<00:10,  2.93it/s, mean=0.00111]
MMD:  70%|████████████████       | 70/100 [00:23<00:10,  2.93it/s, mean=0.00111]


MMD:  70%|████████████████       | 70/100 [00:24<00:10,  2.93it/s, mean=0.00111]
MMD:  71%|████████████████▎      | 71/100 [00:24<00:09,  2.95it/s, mean=0.00111]


MMD:  71%|████████████████▎      | 71/100 [00:24<00:09,  2.95it/s, mean=0.00111]
MMD:  72%|████████████████▌      | 72/100 [00:24<00:09,  2.99it/s, mean=0.00111]


MMD:  72%|████████████████▌      | 72/100 [00:24<00:09,  2.99it/s, mean=0.00111]
MMD:  73%|████████████████▊      | 73/100 [00:24<00:08,  3.00it/s, mean=0.00111]


MMD:  73%|████████████████▊      | 73/100 [00:25<00:08,  3.00it/s, mean=0.00111]
MMD:  74%|█████████████████      | 74/100 [00:25<00:08,  3.03it/s, mean=0.00111]


MMD:  74%|█████████████████      | 74/100 [00:25<00:08,  3.03it/s, mean=0.00111]
MMD:  75%|█████████████████▎     | 75/100 [00:25<00:08,  2.98it/s, mean=0.00111]


MMD:  75%|█████████████████▎     | 75/100 [00:25<00:08,  2.98it/s, mean=0.00112]
MMD:  76%|█████████████████▍     | 76/100 [00:25<00:08,  2.95it/s, mean=0.00112]


MMD:  76%|█████████████████▍     | 76/100 [00:26<00:08,  2.95it/s, mean=0.00111]
MMD:  77%|█████████████████▋     | 77/100 [00:26<00:07,  2.91it/s, mean=0.00111]


MMD:  77%|█████████████████▋     | 77/100 [00:26<00:07,  2.91it/s, mean=0.00111]
MMD:  78%|█████████████████▉     | 78/100 [00:26<00:07,  2.90it/s, mean=0.00111]


MMD:  78%|█████████████████▉     | 78/100 [00:26<00:07,  2.90it/s, mean=0.00112]
MMD:  79%|██████████████████▏    | 79/100 [00:26<00:07,  2.87it/s, mean=0.00112]


MMD:  79%|██████████████████▏    | 79/100 [00:27<00:07,  2.87it/s, mean=0.00112]
MMD:  80%|██████████████████▍    | 80/100 [00:27<00:06,  2.92it/s, mean=0.00112]


MMD:  80%|██████████████████▍    | 80/100 [00:27<00:06,  2.92it/s, mean=0.00112]
MMD:  81%|██████████████████▋    | 81/100 [00:27<00:06,  2.91it/s, mean=0.00112]


MMD:  81%|██████████████████▋    | 81/100 [00:27<00:06,  2.91it/s, mean=0.00111]
MMD:  82%|██████████████████▊    | 82/100 [00:27<00:06,  2.92it/s, mean=0.00111]


MMD:  82%|██████████████████▊    | 82/100 [00:28<00:06,  2.92it/s, mean=0.00111]
MMD:  83%|███████████████████    | 83/100 [00:28<00:05,  2.96it/s, mean=0.00111]


MMD:  83%|███████████████████    | 83/100 [00:28<00:05,  2.96it/s, mean=0.00111]
MMD:  84%|███████████████████▎   | 84/100 [00:28<00:05,  2.96it/s, mean=0.00111]


MMD:  84%|███████████████████▎   | 84/100 [00:28<00:05,  2.96it/s, mean=0.00111]
MMD:  85%|███████████████████▌   | 85/100 [00:28<00:05,  2.96it/s, mean=0.00111]


MMD:  85%|████████████████████▍   | 85/100 [00:29<00:05,  2.96it/s, mean=0.0011]
MMD:  86%|████████████████████▋   | 86/100 [00:29<00:04,  2.99it/s, mean=0.0011]


MMD:  86%|████████████████████▋   | 86/100 [00:29<00:04,  2.99it/s, mean=0.0011]
MMD:  87%|████████████████████▉   | 87/100 [00:29<00:04,  2.98it/s, mean=0.0011]


MMD:  87%|████████████████████▉   | 87/100 [00:29<00:04,  2.98it/s, mean=0.0011]
MMD:  88%|█████████████████████   | 88/100 [00:29<00:04,  2.92it/s, mean=0.0011]


MMD:  88%|████████████████████▏  | 88/100 [00:30<00:04,  2.92it/s, mean=0.00111]
MMD:  89%|████████████████████▍  | 89/100 [00:30<00:03,  2.91it/s, mean=0.00111]


MMD:  89%|████████████████████▍  | 89/100 [00:30<00:03,  2.91it/s, mean=0.00111]
MMD:  90%|████████████████████▋  | 90/100 [00:30<00:03,  2.90it/s, mean=0.00111]


MMD:  90%|████████████████████▋  | 90/100 [00:30<00:03,  2.90it/s, mean=0.00111]
MMD:  91%|████████████████████▉  | 91/100 [00:30<00:03,  2.95it/s, mean=0.00111]


MMD:  91%|████████████████████▉  | 91/100 [00:31<00:03,  2.95it/s, mean=0.00111]
MMD:  92%|█████████████████████▏ | 92/100 [00:31<00:02,  3.02it/s, mean=0.00111]


MMD:  92%|█████████████████████▏ | 92/100 [00:31<00:02,  3.02it/s, mean=0.00111]
MMD:  93%|█████████████████████▍ | 93/100 [00:31<00:02,  3.06it/s, mean=0.00111]


MMD:  93%|█████████████████████▍ | 93/100 [00:31<00:02,  3.06it/s, mean=0.00111]
MMD:  94%|█████████████████████▌ | 94/100 [00:31<00:01,  3.09it/s, mean=0.00111]


MMD:  94%|█████████████████████▌ | 94/100 [00:32<00:01,  3.09it/s, mean=0.00112]
MMD:  95%|█████████████████████▊ | 95/100 [00:32<00:01,  3.12it/s, mean=0.00112]


MMD:  95%|█████████████████████▊ | 95/100 [00:32<00:01,  3.12it/s, mean=0.00112]
MMD:  96%|██████████████████████ | 96/100 [00:32<00:01,  3.14it/s, mean=0.00112]


MMD:  96%|██████████████████████ | 96/100 [00:32<00:01,  3.14it/s, mean=0.00112]
MMD:  97%|██████████████████████▎| 97/100 [00:32<00:00,  3.15it/s, mean=0.00112]


MMD:  97%|██████████████████████▎| 97/100 [00:33<00:00,  3.15it/s, mean=0.00111]
MMD:  98%|██████████████████████▌| 98/100 [00:33<00:00,  3.16it/s, mean=0.00111]


MMD:  98%|██████████████████████▌| 98/100 [00:33<00:00,  3.16it/s, mean=0.00112]
MMD:  99%|██████████████████████▊| 99/100 [00:33<00:00,  3.17it/s, mean=0.00112]


MMD:  99%|██████████████████████▊| 99/100 [00:33<00:00,  3.17it/s, mean=0.00112]
MMD: 100%|██████████████████████| 100/100 [00:33<00:00,  3.18it/s, mean=0.00112]
MMD: 100%|██████████████████████| 100/100 [00:33<00:00,  2.96it/s, mean=0.00112]


{}
{'KIDScorer': (0.0011184606246640437, 0.0002335252548380459)}


##### 具体应用
该部分详细介绍 FID, KID 和 IS 的具体应用，并通过这些指标来评估和比较不同生成模型在图像生成质量上的表现。使用了四种模型：flux-dev, flux-schnell, stable-diffusion-3-medium 和 sdxl，对在 LLaVA Pretrain 数据集上随机选取的500个 image-caption 对进行测试。每个模型根据给定的 caption 生成相应的图片，并通过上述三个指标对生成图片的质量进行全面评估。下面提供模型评估结果：

| 模型名称                  | Inception Score (IS)               | Fréchet Inception Distance (FID) | Kernel Inception Distance (KID)               |
|-------------------------|-----------------------------------|---------------------------------|-----------------------------------------------|
| flux-dev                | 7.195 ± 0.809                     | 101.572                         | 0.00903 ± 0.00069                             |
| flux-schnell            | 6.193 ± 0.546                     | 102.739                         | 0.00667 ± 0.00055                             |
| stable-diffusion-3-medium | 6.740 ± 0.582                     | 100.235                         | 0.00609 ± 0.00056                             |
| sdxl                    | 6.809 ± 0.994                     | 112.807                         | 0.01051 ± 0.00065                             |