# 02. Zero-Shot Inference Baseline

### Concept: Open-Vocabulary Detection
Traditional object detectors (like YOLOv8) can only detect classes they were trained on (e.g., the 80 classes in COCO). **YOLO-World**, however, is an *Open-Vocabulary* model. It uses a text encoder (CLIP) to understand user inputs.

This means we can ask it to find a "CMMG Banshee" even if it has never seen one before, simply by understanding the semantics of the words "submachine gun" or "rifle".

### Objective
This notebook establishes a **Baseline**. We want to see how well the model performs *before* we train it on our synthetic data. This highlights the "Domain Gap" that our project aims to solve.

In [None]:
from ultralytics import YOLOWorld
import cv2
import matplotlib.pyplot as plt
import config

# Load the base pre-trained model
model = YOLOWorld(config.MODEL_NAME)

# Define the Text Prompts
# This is where we inject "Language Knowledge" into the vision model
prompts = config.TEXT_PROMPTS
model.set_classes(prompts)

print(f"Model loaded with prompts: {prompts}")

### Inference on Real Data
We will test the model on a real-world image containing the target weapon. Note the confidence scores. Without specific training, the model might confuse specific weapon types (e.g., confusing a Sniper with a generic Gun).

In [None]:
# Select a sample real-world image
test_image = config.RAW_DATA_DIR / "Main_Dataset/Real/101-102.png"

if test_image.exists():
    # Run prediction with a low confidence threshold
    results = model.predict(str(test_image), conf=0.10)
    
    # Visualization
    res_plotted = results[0].plot()
    plt.figure(figsize=(12, 10))
    plt.imshow(cv2.cvtColor(res_plotted, cv2.COLOR_BGR2RGB))
    plt.axis('off')
    plt.title("Zero-Shot Baseline (No Specific Training)")
    plt.show()
else:
    print("Test image not found. Check the path in config.py")