## Scoring
The target score is accuracy, judging by whether our detector sees the expected animal and doesn't see others. 50% of the score comes from zebra/giraffe classes and 50% from other objects.

For this notebook, you are expected to change `model_url` and run it end-to-end. You may want to extend prompts/labels for deeper evaluation. You don't need to change anything else.

The final standing will be judged by a similar script; it may differ in details but will follow the same approach. The detector may not always be precise; for the final standing, we will employ several modifications to alleviate the effect and provide fair judgment.

In [21]:
model_url = "ioai2024japan/chizu_arisa_030_018_alpha_0.5_yolo_loss"

In [22]:
import importlib

if importlib.util.find_spec('diffusers') is None:
    !pip install diffusers transformers accelerate

if importlib.util.find_spec('ultralytics') is None:
    !pip install git+https://github.com/THU-MIG/yolov10.git

In [23]:
import torch
from diffusers import DiffusionPipeline
# from transformers import YolosImageProcessor, YolosForObjectDetection
from ultralytics import YOLOv10
import numpy as np

from google.colab import userdata

torch.set_grad_enabled(False)  # disable all gradients, as we do only inference

device = 'cuda'
seed = 42

animals = ["bird", "cat", "dog", "horse", "cow", "elephant", "giraffe", "zebra", "bear", "sheep"]
all_classes = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']


raw_prompts = [
  "Imagine a person walking through a bustling market.",
  "A bicycle leaning against a tree in a quiet park.",
  "A car stuck in traffic on a rainy day.",
  "A motorcycle speeding down an empty highway at dusk.",
  "An airplane flying over a snow-covered mountain range.",
  "A bus packed with commuters during rush hour.",
  "A train passing through a dense forest.",
  "A truck delivering goods to a small grocery store.",
  "A boat sailing on a calm lake at sunset.",
  "A traffic light blinking yellow at an empty intersection.",
  "A fire hydrant spraying water after being hit by a car.",
  "A stop sign covered in stickers and graffiti.",
  "A parking meter that only accepts coins.",
  "A bench under a large oak tree in a city park.",
  "A bird building a nest in the eaves of a house.",
  "A cat lazily stretching on a sunny windowsill.",
  "A dog chasing its tail in a suburban backyard.",
  "A horse galloping across an open field.",
  "A sheep grazing on a hillside at dawn.",
  "A cow being milked in a rustic barn.",
  "An elephant splashing water with its trunk at a watering hole.",
  "A bear fishing for salmon in a fast-flowing river.",
  "A backpack left behind on a hiking trail.",
  "An umbrella catching the wind on a stormy day.",
  "A handbag displayed in a boutique window.",
  "A tie lying on a cluttered office desk.",
  "A suitcase being packed for a long journey.",
  "A frisbee flying through the air at the beach.",
  "A pair of skis propped up outside a mountain cabin.",
  "A snowboard gliding down a snowy slope.",
  "A sports ball bouncing on a playground court.",
  "A kite soaring high in the sky.",
  "A baseball bat leaning against a dugout fence.",
  "A baseball glove lying in the grass.",
  "A skateboard rolling down a city sidewalk.",
  "A surfboard resting on the sand by the ocean.",
  "A tennis racket hitting a ball across the net.",
  "A bottle floating in the ocean with a message inside.",
  "A wine glass filled with sparkling water.",
  "A cup of steaming coffee on a kitchen table.",
  "A fork and knife placed neatly beside a plate.",
  "A spoon stirring a pot of soup.",
  "A bowl of fresh fruit on a counter.",
  "A banana being peeled by a child.",
  "An apple being sliced for a snack.",
  "A sandwich wrapped in wax paper for lunch.",
  "An orange being juiced for breakfast.",
  "A piece of broccoli on a dinner plate.",
  "A carrot being chopped on a cutting board.",
  "A hot dog being grilled at a barbecue.",
  "A slice of pizza with extra cheese.",
  "A donut with colorful sprinkles.",
  "A cake being decorated with icing.",
  "A chair rocking gently on a porch.",
  "A couch with colorful throw pillows.",
  "A potted plant on a windowsill.",
  "A bed with freshly laundered sheets.",
  "A dining table set for a family meal.",
  "A toilet with a closed lid.",
  "A tv showing a nature documentary.",
  "A laptop with the screen glowing in the dark.",
  "A mouse next to a desktop computer.",
  "A remote control lying on a couch.",
  "A keyboard with worn-out keys.",
  "A cell phone charging on a nightstand.",
  "A microwave heating up leftovers.",
  "An oven with the door slightly open.",
  "A toaster with a slice of bread inside.",
  "A sink full of dirty dishes.",
  "A refrigerator stocked with groceries.",
  "A book open to the middle chapter.",
  "A clock ticking loudly on the wall.",
  "A vase with fresh flowers.",
  "A pair of scissors on a craft table.",
  "A hair drier blowing hot air.",
  "A toothbrush in a holder.",
  "A knife being sharpened on a whetstone in a dimly lit kitchen.",
]

prompts = [
    ["A curious zebra standing tall in a lush African savanna at sunrise, with acacia trees in the background.", "giraffe"],
    ["Next to a medieval castle, a regal zebra observes the knights and a drawbridge.", "giraffe"],
    ["Wearing a scarf, a fashionable giraffe strolls through a bustling city street with skyscrapers.", "zebra"],
    ["Running along a sandy beach, a playful giraffe enjoys the palm trees, ocean waves, and a bright sunset.", "zebra"],
    ["A zebra standing in a grassy savanna.", "giraffe"],
    ["A giraffe reaching for leaves on a tall tree.", "zebra"],
    ["A zebra drinking from a serene waterhole under the moonlight.", "giraffe"],
    ["A giraffe exploring a dense jungle with vibrant flora and fauna.", "zebra"],
    ["A zebra crossing a river with a herd of wildebeest during migration.", "giraffe"],
    ["A giraffe walking through a foggy forest with ancient trees.", "zebra"],
    ["A zebra racing a cheetah across the open plains.", "giraffe"],
    ["A giraffe standing majestically on a hill overlooking a valley.", "zebra"],
    ["A zebra grazing peacefully in a meadow filled with wildflowers.", "giraffe"],
    ["A giraffe silhouetted against a stunning sunset on the horizon.", "zebra"],
    ["A zebra blending into the shadows of a dense forest.", "giraffe"],
    ["A giraffe spending a peaceful moment by a watering hole.", "zebra"],
    ["A zebra exploring the ruins of an ancient civilization.", "giraffe"],
    ["A giraffe standing tall against the backdrop of snow-capped mountains.", "zebra"],
    ["A zebra napping under the shade of a large baobab tree.", "giraffe"],
    ["A giraffe enjoying the view from the top of a rocky outcrop.", "zebra"],
    ["A zebra wandering through a field of tall golden grass.", "giraffe"],
    ["A giraffe curiously inspecting a group of colorful butterflies.", "zebra"],
    ["A zebra galloping along a winding dirt path in a forest.", "giraffe"],
    ["A giraffe drinking from a crystal-clear lake with a reflection of the sky.", "zebra"],
    ["A zebra watching a rainbow form after a gentle rain.", "giraffe"],
    ["A giraffe walking gracefully through a field of sunflowers.", "zebra"],
    ["A zebra interacting with playful meerkats on the savanna.", "giraffe"],
    ["A giraffe peering through the dense foliage of a tropical rainforest.", "zebra"],
    ["A zebra standing proudly on a cliff's edge, overlooking the ocean.", "giraffe"],
    ["A giraffe strolling along a sandy desert dune.", "zebra"],
    ["A zebra mingling with a group of antelope by a riverbank.", "giraffe"],
    ["A giraffe nibbling on leaves from a flowering bush.", "zebra"],
    ["A zebra resting in the cool shade of a rock formation.", "giraffe"],
    ["A giraffe watching hot air balloons float across the sky.", "zebra"],
    ["A zebra exploring the outskirts of a bustling village.", "giraffe"],
    ["A giraffe running through an open plain during a lightning storm.", "zebra"],
    ["A zebra playing with other zebras in a grassy field.", "giraffe"],
    ["A giraffe standing under a star-filled sky.", "zebra"],
    ["A zebra finding shelter from a sudden rainstorm under a tree.", "giraffe"],
    ["A giraffe reaching up to nibble on some vines hanging from a tree.", "zebra"],
    ["A zebra resting beside a tranquil pond surrounded by reeds.", "giraffe"],
    ["A giraffe meandering through a valley filled with wildflowers.", "zebra"],
    ["A zebra standing tall against the backdrop of a dramatic mountain range.", "giraffe"],
    ["A giraffe observing the savanna from atop a large rock.", "zebra"],
    ["A zebra crossing a dusty trail under the midday sun.", "giraffe"],
    ["A giraffe wandering along the edge of a dense forest.", "zebra"],
    ["A zebra running with a herd of wildebeest across the plains.", "giraffe"],
    ["A giraffe enjoying the shade of a large acacia tree.", "zebra"],
    ["A zebra walking along the bank of a winding river.", "giraffe"],
    ["A giraffe looking out over a vast desert landscape.", "zebra"],
    ["A zebra mingling with a group of gazelles in the savanna.", "giraffe"],
    ["A giraffe gracefully moving through a field of tall grass.", "zebra"],
    ["A zebra exploring a rocky outcrop at dawn.", "giraffe"],
    ["A giraffe reaching for leaves on a tall tree under a clear sky.", "zebra"],
    ["A zebra standing calmly in a meadow dotted with wildflowers.", "giraffe"],
    ["A giraffe gazing at the stars in the night sky.", "zebra"],
    ["A zebra playing in a field of tall golden grass.", "giraffe"],
    ["A giraffe observing a watering hole from a distance.", "zebra"],
    ["A zebra exploring the edge of a dense jungle.", "giraffe"],
    ["A giraffe gracefully crossing a shallow river.", "zebra"],
    ["A zebra watching birds fly overhead in the savanna.", "giraffe"],
    ["A giraffe resting in the shade of a tall tree.", "zebra"],
    ["A zebra standing proudly in the middle of an open plain.", "giraffe"],
    ["A giraffe curiously looking at a group of monkeys.", "zebra"],
    ["A zebra walking through a field of blooming flowers.", "giraffe"],
    ["A giraffe peeking through the branches of a tall tree.", "zebra"],
    ["A zebra running alongside a herd of antelope.", "giraffe"],
    ["A giraffe enjoying the cool breeze atop a hill.", "zebra"],
    ["A zebra mingling with other zebras in a lush green meadow.", "giraffe"],
    ["A giraffe walking slowly along a sandy beach.", "zebra"],
    ["A zebra resting near a tranquil river under the setting sun.", "giraffe"],
    ["A giraffe standing tall and looking out over the savanna.", "zebra"],
    ["A zebra watching a thunderstorm roll in from a distance.", "giraffe"],
    ["A giraffe walking gracefully through a field of wildflowers.", "zebra"],
    ["A zebra grazing peacefully under the shade of an acacia tree.", "giraffe"],
]

for prompt in raw_prompts:
    correct_detections = []
    for class_ in all_classes:
        if class_ in prompt:
            correct_detections.append(class_)
    if len(correct_detections) > 1:
        if correct_detections == ['dog', 'hot dog']:
            prompts.append([prompt, 'hot dog'])
        else:
            print(f"TOO MANY OBJECTS FOUND: {prompt}, {correct_detections}")
    else:
        prompts.append([prompt, correct_detections[0]])

print(len(prompts))

TOO MANY OBJECTS FOUND: Imagine a person walking through a bustling market., ['person', 'bus']
TOO MANY OBJECTS FOUND: A fire hydrant spraying water after being hit by a car., ['car', 'fire hydrant']
TOO MANY OBJECTS FOUND: An umbrella catching the wind on a stormy day., ['cat', 'umbrella']
TOO MANY OBJECTS FOUND: A fork and knife placed neatly beside a plate., ['fork', 'knife']
TOO MANY OBJECTS FOUND: A carrot being chopped on a cutting board., ['car', 'carrot']
TOO MANY OBJECTS FOUND: A remote control lying on a couch., ['couch', 'remote']
146


In [24]:
[x[1] for x in prompts]

['giraffe',
 'giraffe',
 'zebra',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'zebra',
 'giraffe',
 'bicycle',
 'car',
 'motorcycle',
 'airplane',
 'bus',
 'train',
 'truck',
 'boat',
 'traffic light',
 'stop sign',
 'parking meter',
 'bench',
 'bird',
 'cat',
 'dog',
 'ho

In [25]:
pipe = DiffusionPipeline.from_pretrained(
    model_url, torch_dtype=torch.float16, safety_checker=None, requires_safety_checker=False, token=userdata.get("hf_read")
)
pipe.set_progress_bar_config(disable=True)
pipe.to(device)

def generate(prompt):
    image = pipe(
        prompt=prompt, num_inference_steps=50, guidance_scale=8.5,
        generator=torch.Generator(device=device).manual_seed(seed)
    ).images[0]

    return image

model_index.json:   0%|          | 0.00/672 [00:00<?, ?B/s]

Fetching 13 files:   0%|          | 0/13 [00:00<?, ?it/s]

tokenizer/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

text_encoder/config.json:   0%|          | 0.00/712 [00:00<?, ?B/s]

tokenizer/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

tokenizer/special_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

(…)ature_extractor/preprocessor_config.json:   0%|          | 0.00/780 [00:00<?, ?B/s]

scheduler/scheduler_config.json:   0%|          | 0.00/374 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/492M [00:00<?, ?B/s]

tokenizer/tokenizer_config.json:   0%|          | 0.00/704 [00:00<?, ?B/s]

unet/config.json:   0%|          | 0.00/1.92k [00:00<?, ?B/s]

vae/config.json:   0%|          | 0.00/791 [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/1.72G [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]

In [26]:
model = YOLOv10.from_pretrained('jameslahm/yolov10x')
# image_processor = YolosImageProcessor.from_pretrained("jameslahm/yolov10x")
model.to(device)

def detect(image):
    results = model(image)
    id2label = results[0].names
    objects = []
    for box in results[0].boxes:
     objects.append(id2label[int(box.cls)])
    return objects


In [27]:
def is_correct(objects, name):
    # name = new_classes[label]
    return name in set(objects).intersection(set(all_classes))


In [28]:
import matplotlib.pyplot as plt
scores = []
count = 0
indexes = []
for prompt, name in prompts:
    image = generate(prompt)
    image.show()
    # plt.show()
    objects = detect(image)
    # print(objects)
    correct = is_correct(objects, name)
    scores.append(correct)
    if not correct:
        print(f'not correct: index {count}, object {name}, detected {objects}')
        indexes.append(count)
    count += 1



0: 640x640 4 giraffes, 17.4ms
Speed: 2.7ms preprocess, 17.4ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 2 giraffes, 18.2ms
Speed: 2.6ms preprocess, 18.2ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 8 zebras, 18.9ms
Speed: 2.6ms preprocess, 18.9ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 5 zebras, 18.7ms
Speed: 2.7ms preprocess, 18.7ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 2 giraffes, 18.9ms
Speed: 2.6ms preprocess, 18.9ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 4 zebras, 19.5ms
Speed: 2.6ms preprocess, 19.5ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 1 giraffe, 20.1ms
Speed: 2.7ms preprocess, 20.1ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 6 zebras, 18.6ms
Speed: 2.6ms preprocess, 18.6ms inference, 1.2ms postprocess per image at shape (1

In [29]:
print(f"The score for {model_url} is {np.mean(scores)}")

The score for ioai2024japan/chizu_arisa_030_018_alpha_0.5_yolo_loss is 0.9383561643835616


In [30]:
print(f"The score for {model_url} is {np.mean(scores)}")

The score for ioai2024japan/chizu_arisa_030_018_alpha_0.5_yolo_loss is 0.9383561643835616


In [None]:
print(len(indexes))

index = indexes[1]
print(prompts[index][0])
image = generate(prompts[index][0])
print(detect(image))
image

for index in indexes:
    print(prompts[index][0])

9
A zebra wandering through a field of tall golden grass.


In [None]:
# inputs = image_processor(images=image, return_tensors="pt").to(device)
# outputs = model(**inputs)
# target_sizes = torch.tensor([image.size[::-1]])
# results = image_processor.post_process_object_detection(outputs, threshold=0.6, target_sizes=target_sizes)[0]
# objects = [model.config.id2label[idx.item()] for idx in results['labels']]

In [None]:
# outputs.logits[0, :, :-1].argmax(1)

In [None]:
image