## Imports:

In [1]:
import cv2
import numpy as np
from pathlib import Path
from PIL import Image
from transformers import pipeline
import time

## Organize previous code into organized code below

In [3]:
# =============================================================================
# CLIP-BASED IMAGE QUALITY FILTER WITH BLUR DETECTION
# =============================================================================
# 
# PROBLEM:
# The cats vs dogs dataset contains problematic images that need to be filtered:
# - Digital artwork/cartoons (not real photos)
# - Images without animals (logos, text, flowers, fences)
# - Blank/solid color images
# - Images with text overlays
#
# SOLUTION:
# Use CLIP (Contrastive Language-Image Pre-training) for zero-shot classification
# to ask three independent questions about each image:
# 1. Is this a camera photograph or digital artwork?
# 2. Does this contain an animal or not?
# 3. Does this contain text or not?
#
# CHALLENGE:
# CLIP struggles with blurry/small images:
# - Blurry photos get misclassified as "digital artwork" (low photo score)
# - Obscured/partial animals don't get recognized (low animal score)
# - Random patterns get detected as "text" (false text detection)
#
# FIX:
# Use Laplacian variance to detect blur. Blurry images have LOW variance
# (soft edges), while sharp images and artwork have HIGH variance (crisp edges).
# Key insight: Blurry artwork is rare, but blurry photos are common.
# So if an image is blurry, we apply more lenient thresholds.
#
# =============================================================================


# -----------------------------------------------------------------------------
# SETUP: Load CLIP model (only run once)
# -----------------------------------------------------------------------------
# clip-vit-large-patch14-336 is a larger CLIP model with 336x336 input resolution
# It uses ~1.7GB VRAM and gives better accuracy than the base model
# device=0 means use GPU

classifier = pipeline(
    "zero-shot-image-classification",
    model="openai/clip-vit-large-patch14-336",
    device=0
)

# -----------------------------------------------------------------------------
# THRESHOLDS (tuned through experimentation)
# -----------------------------------------------------------------------------

# Photo detection thresholds
PHOTO_THRESHOLD = 0.60          # Standard: must score >= 0.60 to be considered a real photo
PHOTO_THRESHOLD_LENIENT = 0.40  # Lenient: used for blurry images where CLIP struggles

# Animal detection thresholds  
ANIMAL_THRESHOLD = 0.50         # Standard: must score >= 0.50 to be considered containing an animal
ANIMAL_THRESHOLD_LENIENT = 0.15 # Lenient: used for blurry images (CLIP often misses obscured animals)

# Text detection threshold
TEXT_THRESHOLD = 0.35           # If text score >= 0.35, consider image as containing text

# Blur detection threshold (Laplacian variance)
# LOW variance = blurry (soft edges), HIGH variance = sharp (crisp edges)
# Blurry real photos: typically 43 - 5141
# Sharp artwork/drawings: typically 2187 - 26156
BLUR_THRESHOLD = 5500           # Below this = likely a blurry photo, not artwork

# -----------------------------------------------------------------------------
# HELPER FUNCTION: Blur Detection
# -----------------------------------------------------------------------------
def get_blur_score(filepath):
    """
    Calculate blur score using Laplacian variance.
    
    The Laplacian operator detects edges in an image. Sharp images have many
    strong edges (high variance), while blurry images have soft edges (low variance).
    
    Returns:
        float: Variance of the Laplacian. Low = blurry, High = sharp
        
    Why this works:
    - Real photos that are blurry have soft gradients -> low Laplacian variance
    - Artwork/drawings have crisp lines and edges -> high Laplacian variance
    - This helps us distinguish "blurry real photo" from "digital artwork"
    """
    img = cv2.imread(str(filepath), cv2.IMREAD_GRAYSCALE)
    if img is None:
        return 0
    return cv2.Laplacian(img, cv2.CV_64F).var()

# -----------------------------------------------------------------------------
# HELPER FUNCTION: CLIP Multi-Check
# -----------------------------------------------------------------------------
def check_image_multi(filepath):
    """
    Run three independent CLIP classifications on an image.
    
    Instead of one multi-class classification, we run three binary classifications.
    This gives us independent scores for each attribute:
    
    1. Photo vs Artwork: "camera photograph" vs "digital artwork"
       - High photo_score = likely a real photo taken with a camera
       - Low photo_score = likely digital art, cartoon, or illustration
       
    2. Animal vs No Animal: "an animal" vs "not an animal"  
       - High animal_score = CLIP detects an animal in the image
       - Low animal_score = no animal detected (could be obscured or wrong subject)
       
    3. Text vs No Text: "text and words" vs "no text"
       - High text_score = CLIP detects text, letters, numbers, signs
       - Low text_score = no text detected
    
    Returns:
        tuple: (photo_score, animal_score, text_score) - all floats 0.0 to 1.0
    """
    img = Image.open(filepath)
    
    # Check 1: Real photo or artwork?
    result1 = classifier(img, candidate_labels=["camera photograph", "digital artwork"])
    photo_score = result1[0]['score'] if result1[0]['label'] == "camera photograph" else 1 - result1[0]['score']
    
    # Check 2: Contains an animal?
    result2 = classifier(img, candidate_labels=["an animal", "not an animal"])
    animal_score = result2[0]['score'] if result2[0]['label'] == "an animal" else 1 - result2[0]['score']
    
    # Check 3: Contains text?
    result3 = classifier(img, candidate_labels=["text and words", "no text"])
    text_score = result3[0]['score'] if result3[0]['label'] == "text and words" else 1 - result3[0]['score']
    
    return photo_score, animal_score, text_score

# -----------------------------------------------------------------------------
# MAIN FUNCTION: Decision Logic
# -----------------------------------------------------------------------------
def should_keep_v3(photo_score, animal_score, text_score, blur_score):
    """
    Determine if an image should be kept or rejected based on CLIP scores and blur.
    
    Decision Logic:
    
    1. CLEAR KEEP: Real photo (>=0.60) with animal (>=0.50)
       -> Standard case, both scores are confident
       
    2. BLURRY PHOTO HANDLING: If blur_score < 5500 (indicates blurry image)
       -> CLIP struggles with blurry images, so we apply lenient thresholds
       -> Photo threshold: 0.40 (down from 0.60)
       -> Animal threshold: 0.15 (down from 0.50)
       -> Rationale: Blurry artwork is rare; if it's blurry, it's probably a real photo
       
    3. NOT A REAL PHOTO: Photo score < 0.40
       -> Even with lenient threshold, this is likely digital artwork
       -> REJECT
       
    4. NO ANIMAL DETECTED: Animal score < 0.50 (or < 0.15 for blurry)
       -> Check if there's text: if yes, probably a logo/sign -> REJECT
       -> If no text, still wrong subject (fence, flower, etc.) -> REJECT
    
    Args:
        photo_score: float, 0-1, how likely this is a real camera photograph
        animal_score: float, 0-1, how likely this contains an animal
        text_score: float, 0-1, how likely this contains text
        blur_score: float, Laplacian variance (low = blurry, high = sharp)
    
    Returns:
        tuple: (keep: bool, reason: str)
    """
    is_blurry = blur_score < BLUR_THRESHOLD
    
    # Rule 1: Clear keep - real photo with animal (standard thresholds)
    if photo_score >= PHOTO_THRESHOLD and animal_score >= ANIMAL_THRESHOLD:
        return True, "real photo with animal"
    
    # Rule 2: Blurry photo - be lenient on both photo and animal detection
    # CLIP struggles with blurry images, but blurry artwork is rare
    if is_blurry:
        # If photo score is borderline but it's blurry, trust it's a real photo
        if photo_score >= PHOTO_THRESHOLD_LENIENT:
            # Lower animal threshold for blurry images
            if animal_score >= ANIMAL_THRESHOLD_LENIENT:
                return True, "blurry photo with animal (lenient)"
            # Still reject if no animal at all
            if animal_score < ANIMAL_THRESHOLD_LENIENT:
                if text_score >= TEXT_THRESHOLD:
                    return False, "blurry photo, no animal, has text"
                return False, "blurry photo but no animal"
    
    # Rule 3: Not blurry, not a real photo -> REJECT
    if photo_score < PHOTO_THRESHOLD_LENIENT:
        return False, "not a real photo"
    
    # Rule 4: Real photo but no animal detected (not blurry, so trust CLIP)
    if animal_score < ANIMAL_THRESHOLD:
        if text_score >= TEXT_THRESHOLD:
            return False, "real photo, no animal, has text"
        return False, "real photo but no animal"
    
    return False, "did not meet criteria"

# -----------------------------------------------------------------------------
# TEST DATA: Known bad images and false positives for validation
# -----------------------------------------------------------------------------

# Images that SHOULD be rejected (confirmed manually)
known_bad = [
    "50x39_10747.jpg",   # fence - real photo but no animal
    "4x4_5673.jpg",      # 4x4 pixels - digital art / garbage
    "60x60_835.jpg",     # white only - digital art / blank
    "82x159_9517.jpg",   # SAVE ALIVE label - real photo, text, no animal
    "75x80_8470.jpg",    # cat drawing - not a real photo
    "85x95_4833.jpg",    # cat drawing - not a real photo  
    "90x162_1259.jpg",   # dog drawing with text
    "99x125_9188.jpg",   # dog drawing
    "100x93_7968.jpg",   # cat mail drawing with text
    "145x39_9171.jpg",   # rose drawing - wrong subject
    "194x83_2663.jpg",   # cat with stripes drawing
    "196x33_4367.jpg",   # yahoo mail logo - text, no animal
    "88x131_11184.jpg",  # husky dog drawing - FOUND BY THIS ALGORITHM
]

# Images that SHOULD be kept (were incorrectly rejected by old algorithm)
false_positives = [
    ("120x90_7630.jpg", "no animal, has text", "real, animal, no text", "animal not recognized"),
    ("140x93_9589.jpg", "no animal, has text", "real, animal, no text", "animal not recognized"),
    ("142x93_7610.jpg", "no animal, has text", "real, animal, no text", "animal not recognized"),
    ("144x86_10807.jpg", "no animal, has text", "real, animal, no text", "animal not recognized"),
    ("150x97_9703.jpg", "no animal, has text", "real, animal, no text", "animal not recognized"),
    ("183x92_11263.jpg", "no animal, has text", "real, animal, no text", "animal not recognized"),
    ("200x94_3250.jpg", "no animal, has text", "real, animal, no text", "cats not showing faces"),
    ("300x94_5773.jpg", "no animal, has text", "real, animal, no text", "cats with eyes closed"),
    ("95x76_4134.jpg", "no animal, has text", "real, animal, no text", "fence in the way of dog"),
    ("96x65_3074.jpg", "no animal", "real, animal, no text", "very small and blurry dog"),
    ("96x71_8087.jpg", "no animal, has text", "real, animal, no text", "cat with human hands, blurred"),
    ("96x72_9456.jpg", "no animal, has text", "real, animal, no text", "dog face behind chain link fence"),
    ("60x33_6402.jpg", "no animal", "real, animal, no text", "very small cat head looking away"),
    ("60x70_7314.jpg", "no animal, has text", "real, animal, no text", "blurry small image with two cats"),
    ("60x39_9705.jpg", "no animal, has text", "real, animal, no text", "very blurry image"),
    ("50x50_10392.jpg", "not a real photo", "real, animal, no text", "blurry small cat head only"),
    ("60x36_5534.jpg", "not a real photo", "real, animal, no text", "very extra blurry"),
    ("60x40_4821.jpg", "not a real photo", "real, animal, no text", "very blurry"),
    ("60x41_2433.jpg", "not a real photo", "real, animal, no text", "very blurry and small"),
]

# -----------------------------------------------------------------------------
# TEST OPTIONS
# -----------------------------------------------------------------------------

# Set which tests to run
TEST_FALSE_POSITIVES = True   # Test images that should be KEPT
TEST_KNOWN_BAD = True         # Test images that should be REJECTED  
TEST_FULL_DATASET = True      # Test ALL images in the tiny_images folder

# Folder paths
tiny_folder = Path("../outputs/01_tiny_images")

# -----------------------------------------------------------------------------
# RUN TESTS
# -----------------------------------------------------------------------------

if TEST_FALSE_POSITIVES:
    print("=" * 110)
    print("TEST 1: FALSE POSITIVES (should be KEEP)")
    print("These are real photos of animals that were incorrectly rejected before blur detection")
    print("=" * 110)
    print("photo  animal  text   blur      RESULT    reason                              file")
    print("-" * 110)
    
    fp_kept = 0
    fp_rejected = 0
    for filename, algo_reason, true_state, notes in false_positives:
        f = tiny_folder / filename
        if f.exists():
            photo, animal, text = check_image_multi(f)
            blur = get_blur_score(f)
            keep, reason = should_keep_v3(photo, animal, text, blur)
            status = "KEEP" if keep else "REJECT"
            flag = "" if keep else "<- WRONG"
            if keep:
                fp_kept += 1
            else:
                fp_rejected += 1
            print(f"{photo:.2f}   {animal:.2f}    {text:.2f}   {blur:7.1f}   {status:6}    {reason:35} {filename} {flag}")
    
    print(f"\nResult: {fp_kept}/{len(false_positives)} correctly KEPT, {fp_rejected}/{len(false_positives)} still wrong")

if TEST_KNOWN_BAD:
    print("\n" + "=" * 110)
    print("TEST 2: KNOWN BAD IMAGES (should be REJECT)")
    print("These are confirmed bad images: drawings, wrong subjects, logos, blank images")
    print("=" * 110)
    print("photo  animal  text   blur      RESULT    reason                              file")
    print("-" * 110)
    
    bad_rejected = 0
    bad_kept = 0
    for filename in known_bad:
        f = tiny_folder / filename
        if f.exists():
            photo, animal, text = check_image_multi(f)
            blur = get_blur_score(f)
            keep, reason = should_keep_v3(photo, animal, text, blur)
            status = "KEEP" if keep else "REJECT"
            flag = "<- BROKEN!" if keep else ""
            if keep:
                bad_kept += 1
            else:
                bad_rejected += 1
            print(f"{photo:.2f}   {animal:.2f}    {text:.2f}   {blur:7.1f}   {status:6}    {reason:35} {filename} {flag}")
    
    print(f"\nResult: {bad_rejected}/{len(known_bad)} correctly REJECTED, {bad_kept}/{len(known_bad)} incorrectly kept")

if TEST_FULL_DATASET:
    print("\n" + "=" * 110)
    print("TEST 3: FULL TINY IMAGES DATASET")
    print("Testing all images in the tiny_images folder")
    print("=" * 110)
    
    all_tiny = list(tiny_folder.glob("*.jpg"))
    
    kept_images = []
    rejected_images = []
    
    print("photo  animal  text   blur      RESULT    reason                              file")
    print("-" * 110)
    
    for f in sorted(all_tiny):
        try:
            photo, animal, text = check_image_multi(f)
            blur = get_blur_score(f)
            keep, reason = should_keep_v3(photo, animal, text, blur)
            status = "KEEP" if keep else "REJECT"
            
            is_known_bad = f.name in known_bad
            flag = "KNOWN_BAD" if is_known_bad else ""
            
            print(f"{photo:.2f}   {animal:.2f}    {text:.2f}   {blur:7.1f}   {status:6}    {reason:35} {f.name} {flag}")
            
            if keep:
                kept_images.append((f.name, photo, animal, text, blur, reason))
            else:
                rejected_images.append((f.name, photo, animal, text, blur, reason))
        except Exception as e:
            print(f"Error processing {f.name}: {e}")
    
    print("\n" + "=" * 110)
    print("FULL DATASET SUMMARY")
    print("=" * 110)
    print(f"Total images:  {len(all_tiny)}")
    print(f"KEPT:          {len(kept_images)}")
    print(f"REJECTED:      {len(rejected_images)}")
    
    # Check if any known bad images slipped through
    kept_but_bad = [img for img in kept_images if img[0] in known_bad]
    if kept_but_bad:
        print(f"\n⚠ WARNING: {len(kept_but_bad)} known bad images were incorrectly KEPT:")
        for img in kept_but_bad:
            print(f"   {img[0]}")
    else:
        print(f"\n✓ All {len(known_bad)} known bad images were correctly rejected")

print("\n" + "=" * 110)
print("FINAL SUMMARY")
print("=" * 110)
if TEST_FALSE_POSITIVES:
    print(f"False positives (should KEEP):  {fp_kept}/{len(false_positives)} correct")
if TEST_KNOWN_BAD:
    print(f"Known bad (should REJECT):      {bad_rejected}/{len(known_bad)} correct")
if TEST_FULL_DATASET:
    print(f"Full dataset: {len(kept_images)} kept, {len(rejected_images)} rejected")

Loading weights:   0%|          | 0/590 [00:00<?, ?it/s]

CLIPModel LOAD REPORT from: openai/clip-vit-large-patch14-336
Key                                  | Status     |  | 
-------------------------------------+------------+--+-
text_model.embeddings.position_ids   | UNEXPECTED |  | 
vision_model.embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
The image processor of type `CLIPImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. 


TEST 1: FALSE POSITIVES (should be KEEP)
These are real photos of animals that were incorrectly rejected before blur detection
photo  animal  text   blur      RESULT    reason                              file
--------------------------------------------------------------------------------------------------------------
1.00   0.29    0.76     630.1   KEEP      blurry photo with animal (lenient)  120x90_7630.jpg 
0.99   0.35    0.71    1078.9   KEEP      blurry photo with animal (lenient)  140x93_9589.jpg 


You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


0.98   0.46    0.45     632.2   KEEP      blurry photo with animal (lenient)  142x93_7610.jpg 
0.97   0.44    0.70     340.3   KEEP      blurry photo with animal (lenient)  144x86_10807.jpg 
0.97   0.43    0.71     185.8   KEEP      blurry photo with animal (lenient)  150x97_9703.jpg 
0.97   0.33    0.58    1636.1   KEEP      blurry photo with animal (lenient)  183x92_11263.jpg 
0.99   0.39    0.79      43.6   KEEP      blurry photo with animal (lenient)  200x94_3250.jpg 
0.98   0.30    0.41     222.3   KEEP      blurry photo with animal (lenient)  300x94_5773.jpg 
0.98   0.44    0.80    2083.6   KEEP      blurry photo with animal (lenient)  95x76_4134.jpg 
0.93   0.33    0.09    2705.5   KEEP      blurry photo with animal (lenient)  96x65_3074.jpg 
0.98   0.47    0.44     570.8   KEEP      blurry photo with animal (lenient)  96x71_8087.jpg 
0.91   0.41    0.69    1169.6   KEEP      blurry photo with animal (lenient)  96x72_9456.jpg 
0.73   0.16    0.22    3912.1   KEEP      blurry pho

## Now with extra images entered from first attempt to sort images in entire dataset where the new false positives were found and new bad examples were found for a 4th test:

In [2]:
# =============================================================================
# CLIP-BASED IMAGE QUALITY FILTER WITH BLUR DETECTION
# =============================================================================
# 
# PROBLEM:
# The cats vs dogs dataset contains problematic images that need to be filtered:
# - Digital artwork/cartoons (not real photos)
# - Images without animals (logos, text, flowers, fences)
# - Blank/solid color images
# - Images with text overlays
#
# SOLUTION:
# Use CLIP (Contrastive Language-Image Pre-training) for zero-shot classification
# to ask three independent questions about each image:
# 1. Is this a camera photograph or digital artwork?
# 2. Does this contain an animal or not?
# 3. Does this contain text or not?
#
# CHALLENGE:
# CLIP struggles with blurry/small images:
# - Blurry photos get misclassified as "digital artwork" (low photo score)
# - Obscured/partial animals don't get recognized (low animal score)
# - Random patterns get detected as "text" (false text detection)
#
# FIX:
# Use Laplacian variance to detect blur. Blurry images have LOW variance
# (soft edges), while sharp images and artwork have HIGH variance (crisp edges).
# Key insight: Blurry artwork is rare, but blurry photos are common.
# So if an image is blurry, we apply more lenient thresholds.
#
# =============================================================================

import cv2
import numpy as np
from pathlib import Path
from PIL import Image
from transformers import pipeline
import time

# -----------------------------------------------------------------------------
# SETUP: Load CLIP model (only run once)
# -----------------------------------------------------------------------------
# clip-vit-large-patch14-336 is a larger CLIP model with 336x336 input resolution
# It uses ~1.7GB VRAM and gives better accuracy than the base model
# device=0 means use GPU

classifier = pipeline(
    "zero-shot-image-classification",
    model="openai/clip-vit-large-patch14-336",
    device=0
)

# -----------------------------------------------------------------------------
# THRESHOLDS (tuned through experimentation)
# -----------------------------------------------------------------------------

# Photo detection thresholds
PHOTO_THRESHOLD = 0.60          # Standard: must score >= 0.60 to be considered a real photo
PHOTO_THRESHOLD_LENIENT = 0.40  # Lenient: used for blurry images where CLIP struggles

# Animal detection thresholds  
ANIMAL_THRESHOLD = 0.50         # Standard: must score >= 0.50 to be considered containing an animal
ANIMAL_THRESHOLD_LENIENT = 0.15 # Lenient: used for blurry images (CLIP often misses obscured animals)

# Text detection threshold
TEXT_THRESHOLD = 0.35           # If text score >= 0.35, consider image as containing text

# Blur detection threshold (Laplacian variance)
# LOW variance = blurry (soft edges), HIGH variance = sharp (crisp edges)
# Blurry real photos: typically 43 - 5141
# Sharp artwork/drawings: typically 2187 - 26156
BLUR_THRESHOLD = 5500           # Below this = likely a blurry photo, not artwork

# -----------------------------------------------------------------------------
# NEW THRESHOLDS FOR V4 (added to handle multiple animals and high-confidence cases)
# -----------------------------------------------------------------------------

# High-confidence thresholds for special cases
PHOTO_VERY_HIGH = 0.95          # Very confident it's a real photo - trust even with low animal
ANIMAL_VERY_HIGH = 0.80         # Very confident there's an animal - trust even with lower photo score
ANIMAL_MINIMAL = 0.05           # Absolute minimum - just needs SOMETHING detected for very high photo

# -----------------------------------------------------------------------------
# HELPER FUNCTION: Blur Detection
# -----------------------------------------------------------------------------
def get_blur_score(filepath):
    """
    Calculate blur score using Laplacian variance.
    
    The Laplacian operator detects edges in an image. Sharp images have many
    strong edges (high variance), while blurry images have soft edges (low variance).
    
    Returns:
        float: Variance of the Laplacian. Low = blurry, High = sharp
        
    Why this works:
    - Real photos that are blurry have soft gradients -> low Laplacian variance
    - Artwork/drawings have crisp lines and edges -> high Laplacian variance
    - This helps us distinguish "blurry real photo" from "digital artwork"
    """
    img = cv2.imread(str(filepath), cv2.IMREAD_GRAYSCALE)
    if img is None:
        return 0
    return cv2.Laplacian(img, cv2.CV_64F).var()

# -----------------------------------------------------------------------------
# HELPER FUNCTION: CLIP Multi-Check
# -----------------------------------------------------------------------------
def check_image_multi(filepath):
    """
    Run three independent CLIP classifications on an image.
    
    Instead of one multi-class classification, we run three binary classifications.
    This gives us independent scores for each attribute:
    
    1. Photo vs Artwork: "camera photograph" vs "digital artwork"
       - High photo_score = likely a real photo taken with a camera
       - Low photo_score = likely digital art, cartoon, or illustration
       
    2. Animal vs No Animal: "an animal" vs "not an animal"  
       - High animal_score = CLIP detects an animal in the image
       - Low animal_score = no animal detected (could be obscured or wrong subject)
       
    3. Text vs No Text: "text and words" vs "no text"
       - High text_score = CLIP detects text, letters, numbers, signs
       - Low text_score = no text detected
    
    Returns:
        tuple: (photo_score, animal_score, text_score) - all floats 0.0 to 1.0
    """
    img = Image.open(filepath)
    
    # Check 1: Real photo or artwork?
    result1 = classifier(img, candidate_labels=["camera photograph", "digital artwork"])
    photo_score = result1[0]['score'] if result1[0]['label'] == "camera photograph" else 1 - result1[0]['score']
    
    # Check 2: Contains an animal?
    result2 = classifier(img, candidate_labels=["an animal", "not an animal"])
    animal_score = result2[0]['score'] if result2[0]['label'] == "an animal" else 1 - result2[0]['score']
    
    # Check 3: Contains text?
    result3 = classifier(img, candidate_labels=["text and words", "no text"])
    text_score = result3[0]['score'] if result3[0]['label'] == "text and words" else 1 - result3[0]['score']
    
    return photo_score, animal_score, text_score

# -----------------------------------------------------------------------------
# NEW HELPER FUNCTION: Check for multiple animals (V4)
# -----------------------------------------------------------------------------
def check_multiple_animals(filepath):
    """
    Secondary CLIP check specifically for multiple animals.
    
    Use when standard "an animal" check fails but photo looks real.
    The singular "an animal" prompt often fails when there are multiple
    cats/dogs in the image, cats behind bars, or cats held by humans.
    
    Returns:
        tuple: (multiple_score, single_score) - scores for multiple vs single animal
    """
    img = Image.open(filepath)
    result = classifier(img, candidate_labels=["multiple cats or dogs", "one cat or dog", "no animals"])
    
    # Find scores for each label
    scores = {r['label']: r['score'] for r in result}
    multiple_score = scores.get("multiple cats or dogs", 0)
    single_score = scores.get("one cat or dog", 0)
    
    return multiple_score, single_score

# -----------------------------------------------------------------------------
# MAIN FUNCTION: Decision Logic (V3 - Original)
# -----------------------------------------------------------------------------
def should_keep_v3(photo_score, animal_score, text_score, blur_score):
    """
    Determine if an image should be kept or rejected based on CLIP scores and blur.
    
    Decision Logic:
    
    1. CLEAR KEEP: Real photo (>=0.60) with animal (>=0.50)
       -> Standard case, both scores are confident
       
    2. BLURRY PHOTO HANDLING: If blur_score < 5500 (indicates blurry image)
       -> CLIP struggles with blurry images, so we apply lenient thresholds
       -> Photo threshold: 0.40 (down from 0.60)
       -> Animal threshold: 0.15 (down from 0.50)
       -> Rationale: Blurry artwork is rare; if it's blurry, it's probably a real photo
       
    3. NOT A REAL PHOTO: Photo score < 0.40
       -> Even with lenient threshold, this is likely digital artwork
       -> REJECT
       
    4. NO ANIMAL DETECTED: Animal score < 0.50 (or < 0.15 for blurry)
       -> Check if there's text: if yes, probably a logo/sign -> REJECT
       -> If no text, still wrong subject (fence, flower, etc.) -> REJECT
    
    Args:
        photo_score: float, 0-1, how likely this is a real camera photograph
        animal_score: float, 0-1, how likely this contains an animal
        text_score: float, 0-1, how likely this contains text
        blur_score: float, Laplacian variance (low = blurry, high = sharp)
    
    Returns:
        tuple: (keep: bool, reason: str)
    """
    is_blurry = blur_score < BLUR_THRESHOLD
    
    # Rule 1: Clear keep - real photo with animal (standard thresholds)
    if photo_score >= PHOTO_THRESHOLD and animal_score >= ANIMAL_THRESHOLD:
        return True, "real photo with animal"
    
    # Rule 2: Blurry photo - be lenient on both photo and animal detection
    # CLIP struggles with blurry images, but blurry artwork is rare
    if is_blurry:
        # If photo score is borderline but it's blurry, trust it's a real photo
        if photo_score >= PHOTO_THRESHOLD_LENIENT:
            # Lower animal threshold for blurry images
            if animal_score >= ANIMAL_THRESHOLD_LENIENT:
                return True, "blurry photo with animal (lenient)"
            # Still reject if no animal at all
            if animal_score < ANIMAL_THRESHOLD_LENIENT:
                if text_score >= TEXT_THRESHOLD:
                    return False, "blurry photo, no animal, has text"
                return False, "blurry photo but no animal"
    
    # Rule 3: Not blurry, not a real photo -> REJECT
    if photo_score < PHOTO_THRESHOLD_LENIENT:
        return False, "not a real photo"
    
    # Rule 4: Real photo but no animal detected (not blurry, so trust CLIP)
    if animal_score < ANIMAL_THRESHOLD:
        if text_score >= TEXT_THRESHOLD:
            return False, "real photo, no animal, has text"
        return False, "real photo but no animal"
    
    return False, "did not meet criteria"

# -----------------------------------------------------------------------------
# MAIN FUNCTION: Decision Logic (V4 - Enhanced with multiple animal detection)
# -----------------------------------------------------------------------------
def should_keep_v4(photo_score, animal_score, text_score, blur_score, filepath=None):
    """
    Enhanced decision logic with three new rules for handling edge cases.
    
    NEW RULES (V4 additions):
    
    1. HIGH ANIMAL OVERRIDE: If animal_score >= 0.80, keep it even with lower photo scores
       -> Catches real photos with unusual poses/backgrounds (e.g., yawning cat, solid background)
       -> Rationale: CLIP is very confident there's an animal, so trust it
       
    2. VERY HIGH PHOTO + LOW ANIMAL: If photo >= 0.95 but animal < 0.50,
       -> Use minimal animal threshold (0.05) - if there's ANY animal signal, keep it
       -> Optionally do secondary check for "multiple cats or dogs"
       -> Catches group photos that fail singular "an animal" prompt
       
    3. MINIMAL ANIMAL FOR PERFECT PHOTOS: If photo >= 0.98 and animal > 0.05,
       -> Keep it (CLIP is absolutely certain it's a real photo)
       -> Only reject if there's high text AND very low animal (likely a sign/logo)
    
    ORIGINAL RULES (from V3):
    - Standard keep: photo >= 0.60 AND animal >= 0.50
    - Blurry handling: Lower thresholds for blurry images
    - Reject artwork: photo < 0.40
    - Reject no animal: animal < threshold without special conditions
    
    Args:
        photo_score: float, 0-1, how likely this is a real camera photograph
        animal_score: float, 0-1, how likely this contains an animal
        text_score: float, 0-1, how likely this contains text
        blur_score: float, Laplacian variance (low = blurry, high = sharp)
        filepath: Path, optional - needed for secondary multiple animal check
    
    Returns:
        tuple: (keep: bool, reason: str)
    """
    is_blurry = blur_score < BLUR_THRESHOLD
    
    # ===========================================
    # NEW RULE 1: High animal score override
    # If CLIP is very confident there's an animal (>=0.80), trust it
    # even if photo score is borderline (catches unusual poses/backgrounds)
    # ===========================================
    if animal_score >= ANIMAL_VERY_HIGH:
        if photo_score >= PHOTO_THRESHOLD_LENIENT:  # At least 0.40
            return True, "high animal confidence"
        # Even below 0.40, if it's blurry, still trust high animal score
        if is_blurry and photo_score >= 0.25:
            return True, "blurry but high animal confidence"
    
    # ===========================================
    # ORIGINAL RULE 1: Clear keep - real photo with animal (standard thresholds)
    # ===========================================
    if photo_score >= PHOTO_THRESHOLD and animal_score >= ANIMAL_THRESHOLD:
        return True, "real photo with animal"
    
    # ===========================================
    # NEW RULE 2: Very high photo confidence (>=0.95)
    # When CLIP is >95% sure it's a real photo, use very lenient animal thresholds
    # This catches multiple animals, cats behind bars, cats held by humans
    # ===========================================
    if photo_score >= PHOTO_VERY_HIGH:
        # If there's SOME animal detection (>0.05), likely keep it
        if animal_score >= ANIMAL_MINIMAL:
            # Exception: high text AND very low animal = probably a sign/logo with incidental pattern
            if text_score >= 0.70 and animal_score < 0.20:
                return False, "very high photo but mostly text, minimal animal"
            return True, "very high photo confidence with some animal"
        
        # If animal score is near zero, try secondary check for multiple animals
        if filepath and animal_score < ANIMAL_MINIMAL:
            multiple_score, single_score = check_multiple_animals(filepath)
            if multiple_score > 0.30 or single_score > 0.30:
                return True, "multiple animals detected"
    
    # ===========================================
    # ORIGINAL RULE 2: Blurry photo - be lenient on both photo and animal detection
    # CLIP struggles with blurry images, but blurry artwork is rare
    # ===========================================
    if is_blurry:
        # If photo score is borderline but it's blurry, trust it's a real photo
        if photo_score >= PHOTO_THRESHOLD_LENIENT:
            # Lower animal threshold for blurry images
            if animal_score >= ANIMAL_THRESHOLD_LENIENT:
                return True, "blurry photo with animal (lenient)"
            # Still reject if no animal at all
            if animal_score < ANIMAL_THRESHOLD_LENIENT:
                if text_score >= TEXT_THRESHOLD:
                    return False, "blurry photo, no animal, has text"
                return False, "blurry photo but no animal"
    
    # ===========================================
    # ORIGINAL RULE 3: Not blurry, not a real photo -> REJECT
    # ===========================================
    if photo_score < PHOTO_THRESHOLD_LENIENT:
        return False, "not a real photo"
    
    # ===========================================
    # ORIGINAL RULE 4: Real photo but no animal detected (not blurry, so trust CLIP)
    # ===========================================
    if animal_score < ANIMAL_THRESHOLD:
        if text_score >= TEXT_THRESHOLD:
            return False, "real photo, no animal, has text"
        return False, "real photo but no animal"
    
    return False, "did not meet criteria"

# -----------------------------------------------------------------------------
# TEST DATA: Known bad images and false positives for validation
# -----------------------------------------------------------------------------

# Images that SHOULD be rejected (confirmed manually)
known_bad = [
    "50x39_10747.jpg",   # fence - real photo but no animal
    "4x4_5673.jpg",      # 4x4 pixels - digital art / garbage
    "60x60_835.jpg",     # white only - digital art / blank
    "82x159_9517.jpg",   # SAVE ALIVE label - real photo, text, no animal
    "75x80_8470.jpg",    # cat drawing - not a real photo
    "85x95_4833.jpg",    # cat drawing - not a real photo  
    "90x162_1259.jpg",   # dog drawing with text
    "99x125_9188.jpg",   # dog drawing
    "100x93_7968.jpg",   # cat mail drawing with text
    "145x39_9171.jpg",   # rose drawing - wrong subject
    "194x83_2663.jpg",   # cat with stripes drawing
    "196x33_4367.jpg",   # yahoo mail logo - text, no animal
    "88x131_11184.jpg",  # husky dog drawing - FOUND BY THIS ALGORITHM
]

# Images that SHOULD be kept (were incorrectly rejected by old algorithm)
false_positives = [
    ("120x90_7630.jpg", "no animal, has text", "real, animal, no text", "animal not recognized"),
    ("140x93_9589.jpg", "no animal, has text", "real, animal, no text", "animal not recognized"),
    ("142x93_7610.jpg", "no animal, has text", "real, animal, no text", "animal not recognized"),
    ("144x86_10807.jpg", "no animal, has text", "real, animal, no text", "animal not recognized"),
    ("150x97_9703.jpg", "no animal, has text", "real, animal, no text", "animal not recognized"),
    ("183x92_11263.jpg", "no animal, has text", "real, animal, no text", "animal not recognized"),
    ("200x94_3250.jpg", "no animal, has text", "real, animal, no text", "cats not showing faces"),
    ("300x94_5773.jpg", "no animal, has text", "real, animal, no text", "cats with eyes closed"),
    ("95x76_4134.jpg", "no animal, has text", "real, animal, no text", "fence in the way of dog"),
    ("96x65_3074.jpg", "no animal", "real, animal, no text", "very small and blurry dog"),
    ("96x71_8087.jpg", "no animal, has text", "real, animal, no text", "cat with human hands, blurred"),
    ("96x72_9456.jpg", "no animal, has text", "real, animal, no text", "dog face behind chain link fence"),
    ("60x33_6402.jpg", "no animal", "real, animal, no text", "very small cat head looking away"),
    ("60x70_7314.jpg", "no animal, has text", "real, animal, no text", "blurry small image with two cats"),
    ("60x39_9705.jpg", "no animal, has text", "real, animal, no text", "very blurry image"),
    ("50x50_10392.jpg", "not a real photo", "real, animal, no text", "blurry small cat head only"),
    ("60x36_5534.jpg", "not a real photo", "real, animal, no text", "very extra blurry"),
    ("60x40_4821.jpg", "not a real photo", "real, animal, no text", "very blurry"),
    ("60x41_2433.jpg", "not a real photo", "real, animal, no text", "very blurry and small"),
]

# -----------------------------------------------------------------------------
# NEW TEST DATA (V4): Images from full dataset scan
# -----------------------------------------------------------------------------

# Images from full Cat folder that SHOULD be KEPT (false positives to fix)
# These are real photos with animals that V3 incorrectly rejected
# Pattern: mostly multiple animals, cats behind bars, cats held by humans
new_false_positives = [
    # filename, v3_reason, true_state, notes
    ("10607.jpg", "not a real photo", "real, animal", "yawning cat unusual pose"),
    ("11197.jpg", "blurry photo, no animal, has text", "real, animal", "4 kittens held by human"),
    ("11282.jpg", "blurry photo, no animal, has text", "real, animal", "2 cats in cage"),
    ("11345.jpg", "blurry photo, no animal, has text", "real, animal", "3 kittens"),
    ("11786.jpg", "blurry photo, no animal, has text", "real, animal", "3 black cats"),
    ("1210.jpg", "blurry photo but no animal", "real, animal", "extremely blurry 2 kittens"),
    ("1429.jpg", "blurry photo, no animal, has text", "real, animal", "kittens in basket"),
    ("1885.jpg", "real photo, no animal, has text", "real, animal", "cat in cage"),
    ("2337.jpg", "blurry photo but no animal", "real, animal", "people holding cats"),
    ("3182.jpg", "blurry photo, no animal, has text", "real, animal", "many kittens"),
    ("3220.jpg", "blurry photo, no animal, has text", "real, animal", "many kittens"),
    ("3469.jpg", "real photo, no animal, has text", "real, animal", "3 cats together"),
    ("3500.jpg", "not a real photo", "real, animal", "cat with solid background"),
    ("3504.jpg", "blurry photo but no animal", "real, animal", "2 cats"),
    ("3632.jpg", "blurry photo, no animal, has text", "real, animal", "2 cats"),
]

# Images from full Cat folder that SHOULD be REJECTED (true negatives - correctly rejected)
# These are genuinely bad images: drawings, photoshopped, edited, or ambiguous
new_true_negatives = [
    # filename, notes
    ("10029.jpg", "bad quality photo - accurately rejected"),
    ("10035.jpg", "cat with glasses - fake/edited"),
    ("10827.jpg", "only paws showing - ambiguous, can't tell animal"),
    ("11484.jpg", "drawing of a cat"),
    ("1600.jpg", "photoshopped cat on unrealistic background"),
    ("2013.jpg", "thought bubble above cat head - edited"),
    ("2663.jpg", "cat with lines through it - drawing"),
    ("2939.jpg", "drawing"),
    ("3016.jpg", "photoshopped cat on background"),
    ("354.jpg", "photo realistic but likely fake"),
]

# -----------------------------------------------------------------------------
# TEST OPTIONS
# -----------------------------------------------------------------------------

# Set which tests to run
TEST_FALSE_POSITIVES = True   # Test images that should be KEPT (tiny_images)
TEST_KNOWN_BAD = True         # Test images that should be REJECTED (tiny_images)
TEST_FULL_DATASET = True      # Test ALL images in the tiny_images folder
TEST_NEW_DATASET = True       # Test NEW images from full Cat folder (V4 test)

# Folder paths
tiny_folder = Path("../outputs/01_tiny_images")
cat_folder = Path("../data/PetImages/Cat")  # Full dataset Cat folder

# -----------------------------------------------------------------------------
# RUN TESTS
# -----------------------------------------------------------------------------

if TEST_FALSE_POSITIVES:
    print("=" * 120)
    print("TEST 1: FALSE POSITIVES - TINY IMAGES (should be KEEP)")
    print("These are real photos of animals that were incorrectly rejected before blur detection")
    print("Testing with V4 logic")
    print("=" * 120)
    print("photo  animal  text   blur      RESULT    reason                                       file")
    print("-" * 120)
    
    fp_kept = 0
    fp_rejected = 0
    for filename, algo_reason, true_state, notes in false_positives:
        f = tiny_folder / filename
        if f.exists():
            photo, animal, text = check_image_multi(f)
            blur = get_blur_score(f)
            keep, reason = should_keep_v4(photo, animal, text, blur, f)
            status = "KEEP" if keep else "REJECT"
            flag = "" if keep else "<- WRONG"
            if keep:
                fp_kept += 1
            else:
                fp_rejected += 1
            print(f"{photo:.2f}   {animal:.2f}    {text:.2f}   {blur:7.1f}   {status:6}    {reason:40} {filename} {flag}")
    
    print(f"\nResult: {fp_kept}/{len(false_positives)} correctly KEPT, {fp_rejected}/{len(false_positives)} still wrong")

if TEST_KNOWN_BAD:
    print("\n" + "=" * 120)
    print("TEST 2: KNOWN BAD IMAGES - TINY IMAGES (should be REJECT)")
    print("These are confirmed bad images: drawings, wrong subjects, logos, blank images")
    print("Testing with V4 logic")
    print("=" * 120)
    print("photo  animal  text   blur      RESULT    reason                                       file")
    print("-" * 120)
    
    bad_rejected = 0
    bad_kept = 0
    for filename in known_bad:
        f = tiny_folder / filename
        if f.exists():
            photo, animal, text = check_image_multi(f)
            blur = get_blur_score(f)
            keep, reason = should_keep_v4(photo, animal, text, blur, f)
            status = "KEEP" if keep else "REJECT"
            flag = "<- BROKEN!" if keep else ""
            if keep:
                bad_kept += 1
            else:
                bad_rejected += 1
            print(f"{photo:.2f}   {animal:.2f}    {text:.2f}   {blur:7.1f}   {status:6}    {reason:40} {filename} {flag}")
    
    print(f"\nResult: {bad_rejected}/{len(known_bad)} correctly REJECTED, {bad_kept}/{len(known_bad)} incorrectly kept")

if TEST_FULL_DATASET:
    print("\n" + "=" * 120)
    print("TEST 3: FULL TINY IMAGES DATASET")
    print("Testing all images in the tiny_images folder with V4 logic")
    print("=" * 120)
    
    all_tiny = list(tiny_folder.glob("*.jpg"))
    
    kept_images = []
    rejected_images = []
    
    print("photo  animal  text   blur      RESULT    reason                                       file")
    print("-" * 120)
    
    for f in sorted(all_tiny):
        try:
            photo, animal, text = check_image_multi(f)
            blur = get_blur_score(f)
            keep, reason = should_keep_v4(photo, animal, text, blur, f)
            status = "KEEP" if keep else "REJECT"
            
            is_known_bad = f.name in known_bad
            flag = "KNOWN_BAD" if is_known_bad else ""
            
            print(f"{photo:.2f}   {animal:.2f}    {text:.2f}   {blur:7.1f}   {status:6}    {reason:40} {f.name} {flag}")
            
            if keep:
                kept_images.append((f.name, photo, animal, text, blur, reason))
            else:
                rejected_images.append((f.name, photo, animal, text, blur, reason))
        except Exception as e:
            print(f"Error processing {f.name}: {e}")
    
    print("\n" + "=" * 120)
    print("FULL DATASET SUMMARY")
    print("=" * 120)
    print(f"Total images:  {len(all_tiny)}")
    print(f"KEPT:          {len(kept_images)}")
    print(f"REJECTED:      {len(rejected_images)}")
    
    # Check if any known bad images slipped through
    kept_but_bad = [img for img in kept_images if img[0] in known_bad]
    if kept_but_bad:
        print(f"\n⚠ WARNING: {len(kept_but_bad)} known bad images were incorrectly KEPT:")
        for img in kept_but_bad:
            print(f"   {img[0]}")
    else:
        print(f"\n✓ All {len(known_bad)} known bad images were correctly rejected")

# -----------------------------------------------------------------------------
# NEW TEST 4: Full Cat folder images (V4 specific test)
# -----------------------------------------------------------------------------
if TEST_NEW_DATASET:
    print("\n" + "=" * 120)
    print("TEST 4A: NEW FALSE POSITIVES - FULL CAT FOLDER (should be KEEP)")
    print("These are real photos with multiple animals, cats behind bars, unusual poses")
    print("Testing with V4 logic - these failed in V3")
    print("=" * 120)
    print("photo  animal  text   blur      RESULT    reason                                       file                 notes")
    print("-" * 120)
    
    new_fp_kept = 0
    new_fp_rejected = 0
    for filename, v3_reason, true_state, notes in new_false_positives:
        f = cat_folder / filename
        if f.exists():
            photo, animal, text = check_image_multi(f)
            blur = get_blur_score(f)
            keep, reason = should_keep_v4(photo, animal, text, blur, f)
            status = "KEEP" if keep else "REJECT"
            flag = "" if keep else "<- WRONG"
            if keep:
                new_fp_kept += 1
            else:
                new_fp_rejected += 1
            print(f"{photo:.2f}   {animal:.2f}    {text:.2f}   {blur:7.1f}   {status:6}    {reason:40} {filename:20} {notes} {flag}")
        else:
            print(f"NOT FOUND: {f}")
    
    print(f"\nResult: {new_fp_kept}/{len(new_false_positives)} correctly KEPT, {new_fp_rejected}/{len(new_false_positives)} still wrong")

    print("\n" + "=" * 120)
    print("TEST 4B: NEW TRUE NEGATIVES - FULL CAT FOLDER (should be REJECT)")
    print("These are confirmed bad images: drawings, photoshopped, edited")
    print("Testing with V4 logic - these should STILL be rejected")
    print("=" * 120)
    print("photo  animal  text   blur      RESULT    reason                                       file                 notes")
    print("-" * 120)
    
    new_tn_rejected = 0
    new_tn_kept = 0
    for filename, notes in new_true_negatives:
        f = cat_folder / filename
        if f.exists():
            photo, animal, text = check_image_multi(f)
            blur = get_blur_score(f)
            keep, reason = should_keep_v4(photo, animal, text, blur, f)
            status = "KEEP" if keep else "REJECT"
            flag = "<- BROKEN!" if keep else ""
            if keep:
                new_tn_kept += 1
            else:
                new_tn_rejected += 1
            print(f"{photo:.2f}   {animal:.2f}    {text:.2f}   {blur:7.1f}   {status:6}    {reason:40} {filename:20} {notes} {flag}")
        else:
            print(f"NOT FOUND: {f}")
    
    print(f"\nResult: {new_tn_rejected}/{len(new_true_negatives)} correctly REJECTED, {new_tn_kept}/{len(new_true_negatives)} incorrectly kept")

# -----------------------------------------------------------------------------
# FINAL SUMMARY
# -----------------------------------------------------------------------------
print("\n" + "=" * 120)
print("FINAL SUMMARY (V4 Algorithm)")
print("=" * 120)
if TEST_FALSE_POSITIVES:
    print(f"Test 1 - Tiny false positives (should KEEP):     {fp_kept}/{len(false_positives)} correct")
if TEST_KNOWN_BAD:
    print(f"Test 2 - Tiny known bad (should REJECT):         {bad_rejected}/{len(known_bad)} correct")
if TEST_FULL_DATASET:
    print(f"Test 3 - Full tiny dataset:                      {len(kept_images)} kept, {len(rejected_images)} rejected")
if TEST_NEW_DATASET:
    print(f"Test 4A - New false positives (should KEEP):     {new_fp_kept}/{len(new_false_positives)} correct")
    print(f"Test 4B - New true negatives (should REJECT):    {new_tn_rejected}/{len(new_true_negatives)} correct")

print("\n" + "-" * 120)
print("V4 CHANGES SUMMARY:")
print("-" * 120)
print("1. HIGH ANIMAL OVERRIDE: animal >= 0.80 with photo >= 0.40 -> KEEP")
print("   (catches unusual poses like yawning cat, solid backgrounds)")
print("2. VERY HIGH PHOTO: photo >= 0.95 with animal >= 0.05 -> KEEP")
print("   (catches multiple animals, cats behind bars, cats held by humans)")
print("3. MULTIPLE ANIMAL FALLBACK: secondary CLIP check for 'multiple cats or dogs'")
print("   (when photo is very high but animal detection fails)")
print("=" * 120)

Loading weights:   0%|          | 0/590 [00:00<?, ?it/s]

CLIPModel LOAD REPORT from: openai/clip-vit-large-patch14-336
Key                                  | Status     |  | 
-------------------------------------+------------+--+-
text_model.embeddings.position_ids   | UNEXPECTED |  | 
vision_model.embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
The image processor of type `CLIPImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. 


TEST 1: FALSE POSITIVES - TINY IMAGES (should be KEEP)
These are real photos of animals that were incorrectly rejected before blur detection
Testing with V4 logic
photo  animal  text   blur      RESULT    reason                                       file
------------------------------------------------------------------------------------------------------------------------
1.00   0.29    0.76     630.1   KEEP      very high photo confidence with some animal 120x90_7630.jpg 
0.99   0.35    0.71    1078.9   KEEP      very high photo confidence with some animal 140x93_9589.jpg 


You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


0.98   0.46    0.45     632.2   KEEP      very high photo confidence with some animal 142x93_7610.jpg 
0.97   0.44    0.70     340.3   KEEP      very high photo confidence with some animal 144x86_10807.jpg 
0.97   0.43    0.71     185.8   KEEP      very high photo confidence with some animal 150x97_9703.jpg 
0.97   0.33    0.58    1636.1   KEEP      very high photo confidence with some animal 183x92_11263.jpg 
0.99   0.39    0.79      43.6   KEEP      very high photo confidence with some animal 200x94_3250.jpg 
0.98   0.30    0.41     222.3   KEEP      very high photo confidence with some animal 300x94_5773.jpg 
0.98   0.44    0.80    2083.6   KEEP      very high photo confidence with some animal 95x76_4134.jpg 
0.93   0.33    0.09    2705.5   KEEP      blurry photo with animal (lenient)       96x65_3074.jpg 
0.98   0.47    0.44     570.8   KEEP      very high photo confidence with some animal 96x71_8087.jpg 
0.91   0.41    0.69    1169.6   KEEP      blurry photo with animal (lenient) 

## As can be seen above this was unsuccesful and I need to update to get some of the false posives and truely bad examples labelled correctly

1. High Animal Override Now Requires Very Low Blur
Old: If animal ≥ 0.80 and photo ≥ 0.40, keep it regardless of blur
New: If animal ≥ 0.80 AND blur < 500, then keep it
The critical insight is that drawings of animals also score high on animal detection but are NOT blurry. Real blurry photos have blur < 500, while drawings/edited images have blur > 1500. The old version was incorrectly keeping sharp drawings because they had high animal scores.


In [2]:
# =============================================================================
# CLIP-BASED IMAGE QUALITY FILTER V3+ (V3 with Rescue Rules)
# =============================================================================
# 
# DESIGN PRINCIPLE:
# If V3 keeps it → V3+ keeps it. Always.
# Rescue rules can ONLY save images that V3 would reject.
# This ensures we never break working cases while improving recall.
#
# RESCUE RULES:
# 1. Very blurry + high animal: blur < 500, animal >= 0.80, photo >= 0.25
#    - Catches: yawning cats, unusual poses where CLIP says "not a real photo"
#    - Why blur < 500: Drawings have sharp edges (blur > 1500), only trust on VERY blurry
#
# 2. Multiple animals fallback: When animal score is low but photo is high
#    - Catches: Group photos where "an animal" prompt fails
#    - Secondary CLIP check for "multiple animals"
#
# =============================================================================

import cv2
import numpy as np
from pathlib import Path
from PIL import Image
from transformers import pipeline
import time

# -----------------------------------------------------------------------------
# SETUP: Load CLIP model (only run once)
# -----------------------------------------------------------------------------
classifier = pipeline(
    "zero-shot-image-classification",
    model="openai/clip-vit-large-patch14-336",
    device=0
)

# -----------------------------------------------------------------------------
# V3 THRESHOLDS (unchanged - these are the proven values)
# -----------------------------------------------------------------------------
PHOTO_THRESHOLD = 0.60          # Standard: must score >= 0.60 to be considered a real photo
PHOTO_THRESHOLD_LENIENT = 0.40  # Lenient: used for blurry images where CLIP struggles
ANIMAL_THRESHOLD = 0.50         # Standard: must score >= 0.50 to be considered containing an animal
ANIMAL_THRESHOLD_LENIENT = 0.15 # Lenient: used for blurry images
TEXT_THRESHOLD = 0.35           # If text score >= 0.35, consider image as containing text
BLUR_THRESHOLD = 5500           # Below this = likely a blurry photo, not artwork

# -----------------------------------------------------------------------------
# RESCUE RULE THRESHOLDS (new for V3+)
# -----------------------------------------------------------------------------
# Rescue Rule 1: Very blurry + high animal
RESCUE1_BLUR_MAX = 500          # Must be VERY blurry (real photos only)
RESCUE1_ANIMAL_MIN = 0.80       # Must have high animal confidence
RESCUE1_PHOTO_MIN = 0.25        # Must have SOME photo signal (not pure garbage)

# Rescue Rule 2: Multiple animals fallback
RESCUE2_PHOTO_MIN = 0.90        # Only try multiple check when photo is high
RESCUE2_ANIMAL_MAX = 0.50       # Only when standard animal check failed
RESCUE2_MULTIPLE_MIN = 0.30     # Threshold for multiple animals detection

# -----------------------------------------------------------------------------
# HELPER FUNCTIONS
# -----------------------------------------------------------------------------
def get_blur_score(filepath):
    """Calculate blur score using Laplacian variance. Low = blurry, High = sharp."""
    img = cv2.imread(str(filepath), cv2.IMREAD_GRAYSCALE)
    if img is None:
        return 0
    return cv2.Laplacian(img, cv2.CV_64F).var()


def check_image_multi(filepath):
    """Run three independent CLIP classifications: photo, animal, text."""
    img = Image.open(filepath)
    
    # Check 1: Real photo or artwork?
    result1 = classifier(img, candidate_labels=["camera photograph", "digital artwork"])
    photo_score = result1[0]['score'] if result1[0]['label'] == "camera photograph" else 1 - result1[0]['score']
    
    # Check 2: Contains an animal?
    result2 = classifier(img, candidate_labels=["an animal", "not an animal"])
    animal_score = result2[0]['score'] if result2[0]['label'] == "an animal" else 1 - result2[0]['score']
    
    # Check 3: Contains text?
    result3 = classifier(img, candidate_labels=["text and words", "no text"])
    text_score = result3[0]['score'] if result3[0]['label'] == "text and words" else 1 - result3[0]['score']
    
    return photo_score, animal_score, text_score


def check_multiple_animals(filepath):
    """
    Secondary CLIP check for multiple animals.
    
    Use when standard "an animal" check fails but image looks like a real photo.
    The singular "an animal" prompt often fails when there are multiple cats/dogs.
    
    Returns:
        tuple: (multiple_score, single_score) - scores for detecting animals
    """
    img = Image.open(filepath)
    result = classifier(img, candidate_labels=["multiple animals", "one animal", "no animals"])
    
    scores = {r['label']: r['score'] for r in result}
    multiple_score = scores.get("multiple animals", 0)
    single_score = scores.get("one animal", 0)
    
    return multiple_score, single_score

# -----------------------------------------------------------------------------
# V3 DECISION LOGIC (unchanged - this is the baseline)
# -----------------------------------------------------------------------------
def should_keep_v3(photo_score, animal_score, text_score, blur_score):
    """
    Original V3 decision logic. Returns (keep, reason).
    This function is NEVER modified - it's our stable baseline.
    """
    is_blurry = blur_score < BLUR_THRESHOLD
    
    # Rule 1: Clear keep - real photo with animal (standard thresholds)
    if photo_score >= PHOTO_THRESHOLD and animal_score >= ANIMAL_THRESHOLD:
        return True, "real photo with animal"
    
    # Rule 2: Blurry photo - be lenient on both photo and animal detection
    if is_blurry:
        if photo_score >= PHOTO_THRESHOLD_LENIENT:
            if animal_score >= ANIMAL_THRESHOLD_LENIENT:
                return True, "blurry photo with animal (lenient)"
            if animal_score < ANIMAL_THRESHOLD_LENIENT:
                if text_score >= TEXT_THRESHOLD:
                    return False, "blurry photo, no animal, has text"
                return False, "blurry photo but no animal"
    
    # Rule 3: Not blurry, not a real photo -> REJECT
    if photo_score < PHOTO_THRESHOLD_LENIENT:
        return False, "not a real photo"
    
    # Rule 4: Real photo but no animal detected
    if animal_score < ANIMAL_THRESHOLD:
        if text_score >= TEXT_THRESHOLD:
            return False, "real photo, no animal, has text"
        return False, "real photo but no animal"
    
    return False, "did not meet criteria"

# -----------------------------------------------------------------------------
# V3+ DECISION LOGIC (V3 + rescue rules)
# -----------------------------------------------------------------------------
def should_keep_v3plus(photo_score, animal_score, text_score, blur_score, filepath=None):
    """
    V3+ decision logic: V3 baseline + rescue rules for rejected images.
    
    PRINCIPLE: If V3 keeps it → V3+ keeps it. Always.
    Rescue rules can ONLY save images that V3 would reject.
    
    RESCUE RULES:
    1. Very blurry + high animal: blur < 500, animal >= 0.80, photo >= 0.25
    2. Multiple animals fallback: photo >= 0.90, animal < 0.50 → check for multiple
    
    Args:
        photo_score, animal_score, text_score: CLIP scores (0-1)
        blur_score: Laplacian variance (low = blurry)
        filepath: Path to image (needed for multiple animals check)
    
    Returns:
        tuple: (keep: bool, reason: str)
    """
    # ===========================================
    # STEP 1: Run V3 first
    # If V3 keeps it, we keep it. Period.
    # ===========================================
    v3_keep, v3_reason = should_keep_v3(photo_score, animal_score, text_score, blur_score)
    
    if v3_keep:
        return True, v3_reason
    
    # ===========================================
    # STEP 2: V3 rejected it. Try rescue rules.
    # ===========================================
    
    # -----------------------------------------
    # RESCUE RULE 1: Very blurry + high animal
    # -----------------------------------------
    # Catches: yawning cats, unusual poses, solid backgrounds
    # These get low photo scores but are real photos
    # 
    # Key insight: Drawings have sharp edges (blur > 1500)
    # Only trust "high animal + low photo" when VERY blurry (blur < 500)
    # -----------------------------------------
    if blur_score < RESCUE1_BLUR_MAX:
        if animal_score >= RESCUE1_ANIMAL_MIN:
            if photo_score >= RESCUE1_PHOTO_MIN:
                return True, "RESCUED: very blurry with high animal confidence"
    
    # -----------------------------------------
    # RESCUE RULE 2: Multiple animals fallback
    # -----------------------------------------
    # Catches: Group photos (multiple kittens, cats in cage, etc.)
    # CLIP's "an animal" prompt fails with multiple animals
    #
    # Only try when:
    # - Photo score is high (likely a real photo)
    # - Animal score is low (standard check failed)
    # - We have a filepath to do the secondary check
    # -----------------------------------------
    if filepath is not None:
        if photo_score >= RESCUE2_PHOTO_MIN and animal_score < RESCUE2_ANIMAL_MAX:
            multiple_score, single_score = check_multiple_animals(filepath)
            
            # If multiple animals check finds something, rescue the image
            if multiple_score >= RESCUE2_MULTIPLE_MIN or single_score >= RESCUE2_MULTIPLE_MIN:
                return True, f"RESCUED: multiple animals detected (multi={multiple_score:.2f}, single={single_score:.2f})"
    
    # ===========================================
    # STEP 3: No rescue applied, return V3 rejection
    # ===========================================
    return False, v3_reason

# -----------------------------------------------------------------------------
# TEST DATA
# -----------------------------------------------------------------------------

# Original tiny_images known bad (should be REJECTED)
known_bad = [
    "50x39_10747.jpg",   # fence - real photo but no animal
    "4x4_5673.jpg",      # 4x4 pixels - garbage
    "60x60_835.jpg",     # white only - blank
    "82x159_9517.jpg",   # SAVE ALIVE label - text, no animal
    "75x80_8470.jpg",    # cat drawing
    "85x95_4833.jpg",    # cat drawing
    "90x162_1259.jpg",   # dog drawing with text
    "99x125_9188.jpg",   # dog drawing
    "100x93_7968.jpg",   # cat mail drawing with text
    "145x39_9171.jpg",   # rose drawing - wrong subject
    "194x83_2663.jpg",   # cat with stripes drawing
    "196x33_4367.jpg",   # yahoo mail logo
    "88x131_11184.jpg",  # husky dog drawing
]

# Original tiny_images false positives (should be KEPT)
false_positives = [
    ("120x90_7630.jpg", "animal not recognized"),
    ("140x93_9589.jpg", "animal not recognized"),
    ("142x93_7610.jpg", "animal not recognized"),
    ("144x86_10807.jpg", "animal not recognized"),
    ("150x97_9703.jpg", "animal not recognized"),
    ("183x92_11263.jpg", "animal not recognized"),
    ("200x94_3250.jpg", "cats not showing faces"),
    ("300x94_5773.jpg", "cats with eyes closed"),
    ("95x76_4134.jpg", "fence in the way of dog"),
    ("96x65_3074.jpg", "very small and blurry dog"),
    ("96x71_8087.jpg", "cat with human hands, blurred"),
    ("96x72_9456.jpg", "dog face behind chain link fence"),
    ("60x33_6402.jpg", "very small cat head looking away"),
    ("60x70_7314.jpg", "blurry small image with two cats"),
    ("60x39_9705.jpg", "very blurry image"),
    ("50x50_10392.jpg", "blurry small cat head only"),
    ("60x36_5534.jpg", "very extra blurry"),
    ("60x40_4821.jpg", "very blurry"),
    ("60x41_2433.jpg", "very blurry and small"),
]

# NEW: Images from full Cat folder that should be KEPT
# These are the images V3 incorrectly rejected
new_false_positives = [
    ("10607.jpg", "yawning cat unusual pose"),
    ("11197.jpg", "4 kittens held by human"),
    ("11282.jpg", "2 cats in cage"),
    ("11345.jpg", "3 kittens"),
    ("11786.jpg", "3 black cats"),
    ("1210.jpg", "extremely blurry 2 kittens"),
    ("1429.jpg", "kittens in basket"),
    ("1885.jpg", "cat in cage"),
    ("2337.jpg", "people holding cats"),
    ("3182.jpg", "many kittens"),
    ("3220.jpg", "many kittens"),
    ("3469.jpg", "3 cats together"),
    ("3500.jpg", "cat with solid background"),
    ("3504.jpg", "2 cats"),
    ("3632.jpg", "2 cats"),
]

# NEW: Images from full Cat folder that should remain REJECTED
new_true_negatives = [
    ("10029.jpg", "bad quality photo"),
    ("10035.jpg", "cat with glasses - fake"),
    ("10827.jpg", "only paws - ambiguous"),
    ("11484.jpg", "drawing"),
    ("1600.jpg", "photoshopped"),
    ("2013.jpg", "thought bubble - edited"),
    ("2663.jpg", "drawing with lines"),
    ("2939.jpg", "drawing"),
    ("3016.jpg", "photoshopped"),
    ("354.jpg", "fake photo realistic"),
]

# -----------------------------------------------------------------------------
# TEST OPTIONS
# -----------------------------------------------------------------------------
TEST_TINY_FP = True       # Test 1: tiny_images false positives (should KEEP)
TEST_TINY_BAD = True      # Test 2: tiny_images known bad (should REJECT)
TEST_TINY_ALL = True      # Test 3: all tiny_images
TEST_NEW_FP = True        # Test 4A: new false positives from Cat folder (should KEEP)
TEST_NEW_TN = True        # Test 4B: new true negatives from Cat folder (should REJECT)

# Folder paths
tiny_folder = Path("../outputs/01_tiny_images")
cat_folder = Path("../data/PetImages/Cat")

# -----------------------------------------------------------------------------
# RUN TESTS
# -----------------------------------------------------------------------------

def run_test(title, test_id, images, folder, should_keep_result, show_notes=False):
    """
    Run a test on a list of images.
    
    Args:
        title: Test description
        test_id: Test number/ID
        images: List of tuples (filename, notes) or just filenames
        folder: Path to folder containing images
        should_keep_result: True if images should be KEPT, False if should be REJECTED
        show_notes: Whether to show notes column
    """
    print("\n" + "=" * 130)
    print(f"TEST {test_id}: {title}")
    print("=" * 130)
    
    header = "photo  animal  text   blur      V3        V3+       reason"
    if show_notes:
        header += "                                  notes"
    print(header)
    print("-" * 130)
    
    correct = 0
    wrong = 0
    
    for item in images:
        if isinstance(item, tuple):
            filename, notes = item[0], item[-1]
        else:
            filename, notes = item, ""
        
        f = folder / filename
        if not f.exists():
            print(f"NOT FOUND: {f}")
            continue
        
        try:
            photo, animal, text = check_image_multi(f)
            blur = get_blur_score(f)
            
            # Run both V3 and V3+
            v3_keep, v3_reason = should_keep_v3(photo, animal, text, blur)
            v3plus_keep, v3plus_reason = should_keep_v3plus(photo, animal, text, blur, f)
            
            v3_status = "KEEP" if v3_keep else "REJECT"
            v3plus_status = "KEEP" if v3plus_keep else "REJECT"
            
            # Check if result is correct
            is_correct = (v3plus_keep == should_keep_result)
            if is_correct:
                correct += 1
                flag = ""
            else:
                wrong += 1
                flag = "<- WRONG" if should_keep_result else "<- BROKEN!"
            
            # Show if rescued
            rescued = ""
            if not v3_keep and v3plus_keep:
                rescued = "[RESCUED]"
            
            line = f"{photo:.2f}   {animal:.2f}    {text:.2f}   {blur:7.1f}   {v3_status:6}    {v3plus_status:6}    {v3plus_reason:35} {filename} {rescued} {flag}"
            if show_notes and notes:
                line = f"{photo:.2f}   {animal:.2f}    {text:.2f}   {blur:7.1f}   {v3_status:6}    {v3plus_status:6}    {v3plus_reason:35} {filename:20} {notes} {flag}"
            
            print(line)
            
        except Exception as e:
            print(f"Error processing {filename}: {e}")
    
    total = correct + wrong
    print(f"\nResult: {correct}/{total} correct")
    
    return correct, total


# Run all tests
results = {}

if TEST_TINY_FP:
    c, t = run_test(
        "TINY IMAGES FALSE POSITIVES (should be KEEP)",
        "1", 
        false_positives, 
        tiny_folder, 
        should_keep_result=True
    )
    results["Test 1 - Tiny FP (KEEP)"] = (c, t)

if TEST_TINY_BAD:
    c, t = run_test(
        "TINY IMAGES KNOWN BAD (should be REJECT)",
        "2",
        [(f, "") for f in known_bad],
        tiny_folder,
        should_keep_result=False
    )
    results["Test 2 - Tiny Bad (REJECT)"] = (c, t)

if TEST_TINY_ALL:
    print("\n" + "=" * 130)
    print("TEST 3: FULL TINY IMAGES DATASET")
    print("=" * 130)
    
    all_tiny = list(tiny_folder.glob("*.jpg"))
    print("photo  animal  text   blur      V3        V3+       reason                              file")
    print("-" * 130)
    
    kept_v3 = 0
    kept_v3plus = 0
    rescued = 0
    
    for f in sorted(all_tiny):
        try:
            photo, animal, text = check_image_multi(f)
            blur = get_blur_score(f)
            
            v3_keep, v3_reason = should_keep_v3(photo, animal, text, blur)
            v3plus_keep, v3plus_reason = should_keep_v3plus(photo, animal, text, blur, f)
            
            v3_status = "KEEP" if v3_keep else "REJECT"
            v3plus_status = "KEEP" if v3plus_keep else "REJECT"
            
            if v3_keep:
                kept_v3 += 1
            if v3plus_keep:
                kept_v3plus += 1
            
            flag = ""
            if f.name in known_bad:
                flag = "KNOWN_BAD"
                if v3plus_keep:
                    flag += " <- BROKEN!"
            elif not v3_keep and v3plus_keep:
                flag = "[RESCUED]"
                rescued += 1
            
            print(f"{photo:.2f}   {animal:.2f}    {text:.2f}   {blur:7.1f}   {v3_status:6}    {v3plus_status:6}    {v3plus_reason:35} {f.name} {flag}")
            
        except Exception as e:
            print(f"Error: {f.name}: {e}")
    
    print(f"\nSummary: V3 kept {kept_v3}, V3+ kept {kept_v3plus} ({rescued} rescued)")
    results["Test 3 - Tiny All"] = (kept_v3plus, len(all_tiny))

if TEST_NEW_FP:
    c, t = run_test(
        "NEW FALSE POSITIVES - CAT FOLDER (should be KEEP)",
        "4A",
        new_false_positives,
        cat_folder,
        should_keep_result=True,
        show_notes=True
    )
    results["Test 4A - New FP (KEEP)"] = (c, t)

if TEST_NEW_TN:
    c, t = run_test(
        "NEW TRUE NEGATIVES - CAT FOLDER (should be REJECT)",
        "4B",
        new_true_negatives,
        cat_folder,
        should_keep_result=False,
        show_notes=True
    )
    results["Test 4B - New TN (REJECT)"] = (c, t)

# -----------------------------------------------------------------------------
# FINAL SUMMARY
# -----------------------------------------------------------------------------
print("\n" + "=" * 130)
print("FINAL SUMMARY - V3+ Algorithm")
print("=" * 130)

for test_name, (correct, total) in results.items():
    pct = (correct / total * 100) if total > 0 else 0
    status = "✓" if correct == total else "✗"
    print(f"{status} {test_name}: {correct}/{total} ({pct:.0f}%)")

print("\n" + "-" * 130)
print("V3+ DESIGN PRINCIPLE:")
print("-" * 130)
print("If V3 keeps it → V3+ keeps it. Always.")
print("Rescue rules can ONLY save images that V3 would reject.")
print("")
print("RESCUE RULES:")
print(f"1. Very blurry + high animal: blur < {RESCUE1_BLUR_MAX}, animal >= {RESCUE1_ANIMAL_MIN}, photo >= {RESCUE1_PHOTO_MIN}")
print("   Catches: yawning cats, unusual poses, solid backgrounds")
print("")
print(f"2. Multiple animals fallback: photo >= {RESCUE2_PHOTO_MIN}, animal < {RESCUE2_ANIMAL_MAX}")
print(f"   Secondary CLIP check, threshold >= {RESCUE2_MULTIPLE_MIN}")
print("   Catches: group photos, cats in cages, multiple kittens")
print("=" * 130)

Loading weights:   0%|          | 0/590 [00:00<?, ?it/s]

CLIPModel LOAD REPORT from: openai/clip-vit-large-patch14-336
Key                                  | Status     |  | 
-------------------------------------+------------+--+-
vision_model.embeddings.position_ids | UNEXPECTED |  | 
text_model.embeddings.position_ids   | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
The image processor of type `CLIPImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. 



TEST 1: TINY IMAGES FALSE POSITIVES (should be KEEP)
photo  animal  text   blur      V3        V3+       reason
----------------------------------------------------------------------------------------------------------------------------------
1.00   0.29    0.76     630.1   KEEP      KEEP      blurry photo with animal (lenient)  120x90_7630.jpg  
0.99   0.35    0.71    1078.9   KEEP      KEEP      blurry photo with animal (lenient)  140x93_9589.jpg  


You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


0.98   0.46    0.45     632.2   KEEP      KEEP      blurry photo with animal (lenient)  142x93_7610.jpg  
0.97   0.44    0.70     340.3   KEEP      KEEP      blurry photo with animal (lenient)  144x86_10807.jpg  
0.97   0.43    0.71     185.8   KEEP      KEEP      blurry photo with animal (lenient)  150x97_9703.jpg  
0.97   0.33    0.58    1636.1   KEEP      KEEP      blurry photo with animal (lenient)  183x92_11263.jpg  
0.99   0.39    0.79      43.6   KEEP      KEEP      blurry photo with animal (lenient)  200x94_3250.jpg  
0.98   0.30    0.41     222.3   KEEP      KEEP      blurry photo with animal (lenient)  300x94_5773.jpg  
0.98   0.44    0.80    2083.6   KEEP      KEEP      blurry photo with animal (lenient)  95x76_4134.jpg  
0.93   0.33    0.09    2705.5   KEEP      KEEP      blurry photo with animal (lenient)  96x65_3074.jpg  
0.98   0.47    0.44     570.8   KEEP      KEEP      blurry photo with animal (lenient)  96x71_8087.jpg  
0.91   0.41    0.69    1169.6   KEEP      KEEP 

### Results:

There were fewer of the false positives being rejected so this was a positive sign. We also rejected everything that should be rejected so we can be confident we are training on better data