# CLIP-Based Image-Text Matching Pipeline

This notebook demonstrates a complete pipeline for processing, analyzing and matching images with relevant text descriptions using the CLIP neural network. The pipeline includes:

1. **Data Acquisition**: Downloading JSON, image, and XML files from a Label Studio project
2. **Data Analysis**: Counting and analyzing label distribution in the dataset
3. **Data Filtering**: Selecting images and text with specific labels
4. **Image Processing**: Cropping images based on annotations
5. **Text Processing**: Filtering text regions to remove captions
6. **CLIP Processing**: Using CLIP to find semantic matches between images and text

Each section below implements one step in the pipeline.

# Requirements

## 1. Dataset Requirements: OCR XML Files

- **OCR XML files** must be present in the `texts/` directory.
- Each XML file should correspond to an image in your dataset.
- The directory structure should look like:
    ```
    /texts/
            <uuid1>.xml
            <uuid2>.xml
            ...
    ```

## 2. Script Requirements

The following Python scripts must be available in the working directory:

- `download.py`
- `count_json.py`
- `filter_by_label.py`
- `trim_images.py`
- `find_label_description.py`
- `filter_picture_descriptions.py`
- `process_descriptions.py` (requires `cut_text.py`)

> **Note:**  
> - `process_descriptions.py` depends on `cut_text.py`, so both must be present.
- All scripts should be accessible from the notebook's working directory.


Loads necessary modules and configures authentication for Label Studio API access.

- Sets up a global debug flag to control logging verbosity
- Loads an authentication token from a configuration file
- Configures the logging system for a notebook environment

In [1]:
# Standard library imports
import os
import sys
import json
import glob
import time
from pathlib import Path

# Third-party library imports
import pandas as pd
import torch  # For CLIP model
import numpy as np
import ipywidgets
from tqdm import tqdm
from tqdm.notebook import tqdm as notebook_tqdm
from IPython.display import display, Markdown, Image

# Global configuration
DEBUG = False

# Define directories for the complete pipeline
DIRS = {
    # Source directories
    "images_dir": "downloaded_images",  # Where downloaded images will go
    "texts_dir": "texts",               # Where XML files are stored 
    "jsons_dir": "splitted_jsons",      # Where split JSON files will go
    
    # Filtered directories
    "filtered_jsons_dir": "filtered_jsons",      # JSON files with target labels
    "filtered_images_dir": "filtered_images",    # Images with target labels
    "filtered_texts_dir": "filtered_texts",      # XML files with target labels
    
    # Processed directories
    "filtered_texts_no_desc_dir": "filtered_texts_no_desc",  # XML files without picture descriptions
    "cropped_images_dir": "cropped_images",      # Images cropped to annotation boundaries
    
    # Output directory
    "output_dir": "output_context",               # Where final output images will go
    "output_jsons_dir": "output_jsons"
}

# Create directories if they don't exist
for dir_name, dir_path in DIRS.items():
    os.makedirs(dir_path, exist_ok=True)
    display(Markdown(f"✓ Directory `{dir_path}` ready"))

# Load token from a config file (not in version control)
try:
    config_path = Path.home() / ".config" / "label_studio_config.json"
    with open(config_path) as f:
        config = json.load(f)
        label_studio_token = config.get("token")
except FileNotFoundError:
    display(Markdown("## ⚠️ Config file not found"))
    display(Markdown(f"Create a file at {config_path} with the content: `{{'token': 'your_token_here'}}`"))
    label_studio_token = None


✓ Directory `downloaded_images` ready

✓ Directory `texts` ready

✓ Directory `splitted_jsons` ready

✓ Directory `filtered_jsons` ready

✓ Directory `filtered_images` ready

✓ Directory `filtered_texts` ready

✓ Directory `filtered_texts_no_desc` ready

✓ Directory `cropped_images` ready

✓ Directory `output_context` ready

✓ Directory `output_jsons` ready

Downloads the full JSON export from Label Studio containing all task annotations.

- Uses the API token for authentication
- Saves the export as "label_studio_export.json"
- Displays the number of tasks downloaded
- This export file contains annotation data for all images in the project

In [None]:
# Import the download module
import download

# Setup logging for notebook environment
download.setup_logging(debug_mode=DEBUG, use_notebook=True)

# Download the export.json file
result = download.download_export_json(
    token=label_studio_token,
    output_file="label_studio_export.json"
)

if "error" in result:
    display(Markdown(f"## ❌ Error\n{result['error']}"))
else:
    labels = result["labels"]
    display(Markdown(f"## Export Successful\n- Source: {result['source']}\n- Tasks: {len(labels)}"))

Downloads all annotated images from the Label Studio project:

- Only downloads images that have a corresponding XML file in the texts directory
- Skips images that already exist locally
- Uses the improved progress bar interface with tqdm.notebook
- Shows detailed statistics about the download process

This step ensures we have all the necessary image data for processing.

In [None]:
# Run download function with custom directories
if label_studio_token:
    result = download.run_download(
        show_progress=True,
        token=label_studio_token,
        texts_dir=DIRS["texts_dir"],
        images_dir=DIRS["images_dir"]
    )
    
    # Display results as markdown
    if "error" in result:
        display(Markdown(f"## ❌ Error\n{result['error']}"))
    else:
        display(Markdown(f"""
        ## Download Results
        - Total tasks: {result['total_tasks']}
        - Downloaded: {result['downloaded']}
        - Skipped (already exist): {result['skipped_exists']}
        - Skipped (no XML): {result['skipped_no_xml']}
        - Failed: {result['failed']}
        """))
else:
    display(Markdown("## ❌ Cannot proceed without token"))

Divides the master export file into individual JSON files, one per task:

- Extracts each task from the export.json file
- Creates a separate JSON file named with the UUID of the task
- Verifies that the images and XML files match by comparing UUIDs
- Uses the improved progress tracking with tqdm.notebook

This step prepares the data for efficient parallel processing in later stages.

In [None]:
# Split the export.json into individual JSON files
if label_studio_token:
    split_result = download.run_split_json(
        show_progress=True,
        labels_file="label_studio_export.json",
        jsons_dir=DIRS["jsons_dir"]
    )
    
    # Display results as markdown
    if "error" in split_result:
        display(Markdown(f"## ❌ Error\n{split_result['error']}"))
    else:
        display(Markdown(f"""
        ## JSON Split Results
        - Total tasks: {split_result['total_tasks']}
        - JSON files created: {split_result['json_created']}
        - Skipped (no UUID): {split_result['skipped_no_uuid']}
        - Failed writes: {split_result['failed_writes']}
        - Processing time: {split_result['elapsed_time']:.2f} seconds
        """))
        
        # Also run directory comparison to verify we have matching files
        compare_result = download.run_compare(
            texts_dir=DIRS["texts_dir"],
            images_dir=DIRS["images_dir"]
        )
        if compare_result["match"]:
            display(Markdown("## ✅ Images and XMLs match!"))
        else:
            display(Markdown(f"""
            ## ⚠️ Mismatch between images and XMLs
            - Files in both directories: {compare_result['matching_count']}
            - Files only in texts directory: {compare_result['texts_only_count']}
            - Files only in images directory: {compare_result['images_only_count']}
            """))
else:
    display(Markdown("## ❌ Cannot proceed without token"))

Analyzes the distribution of rectangle labels in the dataset:

- Counts occurrences of each rectangle label across all JSON files
- Displays results as a sorted DataFrame for easy analysis
- Provides summary statistics about the dataset composition
- Uses the improved progress bar implementation with tqdm.notebook

This analysis helps identify which labels are most common and can inform filtering decisions.

In [None]:
# Import the module
import count_json

# Setup logging for notebook environment
count_json.setup_logging(debug_mode=DEBUG, use_notebook=True)

# Run the count_labels function with default directory
result = count_json.run_count_labels(jsons_dir=DIRS["jsons_dir"], print_table=False)  # Don't print tables in notebook

# Display results as a DataFrame
label_df = pd.DataFrame(
    list(result["label_counts"].items()), 
    columns=["Label", "Count"]
).sort_values("Count", ascending=False)

display(Markdown("## Rectangle Label Counts"))
display(label_df)

# Show summary stats
display(Markdown(f"""
## Summary Statistics
- Total files processed: {result["total_files"]}
- Files with rectangle labels: {result["processed_files"] - result["no_labels_files"]}
- Files without rectangle labels: {result["no_labels_files"]}
- Files with errors: {result["error_files"]}
- Total label instances: {result["total_labels"]}
- Unique labels found: {result["unique_labels"]}
"""))

Filters JSON, image, and XML files to keep only those with the target label:

- Identifies all JSONs containing the label "Obrázek" (Picture)
- Copies matching JSONs to the filtered_jsons directory
- Copies corresponding images to the filtered_images directory
- Copies corresponding XML files to the filtered_texts directory
- Features multiple progress bars with the improved tqdm implementation
- Updates the directory dictionary with new filtered paths

This filtering step ensures we focus only on tasks with images/illustrations.

In [None]:
# Import the enhanced filtering module
import filter_by_label

# Setup logging for notebook environment
filter_by_label.setup_logging(debug_mode=DEBUG, use_notebook=True)

# Run the filter process with our directory structure for all three file types
result = filter_by_label.run_filter_by_label(
    jsons_dir=DIRS["jsons_dir"],                # Source JSONs  
    images_dir=DIRS["images_dir"],              # Source images
    texts_dir=DIRS["texts_dir"],                # Source XML texts
    filtered_jsons_dir="filtered_jsons",        # Output filtered JSONs
    filtered_images_dir="filtered_images",      # Output filtered images
    filtered_texts_dir="filtered_texts",        # Output filtered XMLs
    label="Obrázek",                           # Filter by this label
    copy_files=True,                            # Copy instead of move
    case_sensitive=False                       # Case-insensitive matching
)
# Display results as markdown
display(Markdown(f"""
## Filter Results for Label: '{result["label_filtered"]}'

### JSON Files:
- With label: {result["json_matches"]}
- Without label: {result["json_non_matches"]}
- **Total processed: {result["json_matches"] + result["json_non_matches"]}**

### Image Files:
- Matching filtered JSONs: {result["image_matches"]}
- Not matching: {result["image_non_matches"]}
- **Total processed: {result["image_matches"] + result["image_non_matches"]}**

### Text Files (XML):
- Matching filtered JSONs: {result["text_matches"]}
- Not matching: {result["text_non_matches"]}
- **Total processed: {result["text_matches"] + result["text_non_matches"]}**

### Final Dataset:
- Total matching triplets (JSON+image+XML): {result["total_matching_pairs"]}
- Files were {result["file_operation"].lower()} to filtered directories
"""))


Processes the filtered images to crop them to their annotated regions:

- For each JSON file, finds rectangles with the "Obrázek" label
- Extracts the coordinates of these rectangles
- Crops the corresponding image to these boundaries
- Saves the cropped images to a new directory
- Uses the enhanced progress bar with tqdm.notebook
- Updates the directory dictionary with the cropped images path

Cropping lets us focus only on the annotated image content and removes unnecessary background.

In [None]:
# Import the image cropping module
import trim_images

# Setup logging for notebook environment
trim_images.setup_logging(debug_mode=DEBUG, use_notebook=True)

# Run the cropping process with our directory structure
result = trim_images.run_crop_images(
    jsons_dir=DIRS["filtered_jsons_dir"],      # Use the filtered JSONs
    images_dir=DIRS["filtered_images_dir"],    # Use the filtered images 
    output_dir="cropped_images",               # Where cropped images will go
    target_label="Obrázek",                    # Label to look for
    show_progress=True                         # Show progress updates
)

# Display results as markdown
if "error" in result:
    display(Markdown(f"## ❌ Error\n{result['error']}"))
else:
    display(Markdown(f"""
    ## Image Cropping Results
    
    - **Processed Files:** {result['files_processed']} of {result['total_files_found']} JSON files
    - **Files with Errors:** {result['files_with_errors']}
    - **Crops Created:** {result['crops_created']} images
    - **Label Used:** "{result['target_label']}"
    - **Processing Time:** {result['elapsed_time']:.2f} seconds
    
    All cropped images were saved to `{result['output_dir']}`
    """))

Analyzes the filtered JSONs to identify images that have text descriptions:

- Looks for files with the "Popis v textu" (Text Description) label
- Also checks for co-occurrence of "Obrázek" and "Popis v textu" labels
- Provides detailed statistics about label distribution
- Shows a sample of files that contain text descriptions
- Uses the enhanced progress tracking with tqdm.notebook

This information helps us understand how many images have associated text descriptions.

In [None]:
# Import the description finding module
import find_label_description

# Setup logging for notebook environment
find_label_description.setup_logging(debug_mode=DEBUG, use_notebook=True)

# Run the analysis function with the filtered JSONs directory
result = find_label_description.run_find_descriptions(
    jsons_dir=DIRS["filtered_jsons_dir"],
    pair_to_check=["Obrázek", "Popis v textu"], 
    print_table=False  # Don't print tables in notebook
)

# Display label counts as a DataFrame
label_df = pd.DataFrame(
    list(result["label_counts"].items()), 
    columns=["Label", "Count"]
).sort_values("Count", ascending=False)

display(Markdown("## Rectangle Label Counts"))
display(label_df)

# Show summary stats
display(Markdown(f"""
## Description Analysis Results
- Total files processed: {result["total_files"]}
- Files with rectangle labels: {result["processed_files"] - result["no_labels_files"]}
- Files without rectangle labels: {result["no_labels_files"]}
- Files with errors: {result["error_files"]}
- **Files with "Popis v textu" label: {result["description_count"]}**
- Total label instances: {result["total_labels"]}
- Unique labels found: {result["unique_labels"]}
"""))

# Show pair co-occurrence if requested
if "pair_checked" in result:
    display(Markdown(f"""
    ## Label Co-occurrence
    Files containing BOTH "{result["pair_checked"][0]}" AND "{result["pair_checked"][1]}": **{result["pair_count"]}**
    """))

# Show sample of files with descriptions
if result["files_with_description"]:
    sample_files = result["files_with_description"][:10]  # Show first 10
    sample_list = "\n".join([f"- {file}" for file in sample_files])
    
    display(Markdown(f"""
    ## Files with Descriptions
    Sample of files containing "Popis v textu" label ({len(result["files_with_description"])} total):
    {sample_list}
    {'...(and more)' if len(result["files_with_description"]) > 10 else ''}
    """))
else:
    display(Markdown("## No files with descriptions were found"))

Removes text regions that are explicitly labeled as picture descriptions:

- Matches "Popis u obrázku" (Picture Description) regions in JSON files
- Identifies the corresponding text regions in XML files using IoU matching
- Removes those text regions from the XML files
- Creates new filtered XML files without description text
- Uses enhanced progress tracking with tqdm.notebook
- Updates the directory dictionary with the filtered text path

This step ensures CLIP doesn't match images with their existing captions in the text.

In [None]:
# Import the picture description filtering module
import filter_picture_descriptions

# Setup logging for notebook environment
filter_picture_descriptions.setup_logging(debug_mode=DEBUG, use_notebook=True)

# Run the filtering process with our directory structure
result = filter_picture_descriptions.run_filter_descriptions(
    json_dir=DIRS["filtered_jsons_dir"],             # JSON files from filtered directory 
    xml_dir=DIRS["filtered_texts_dir"],              # XML files from filtered directory
    output_dir="filtered_texts_no_desc",             # Output directory for filtered XMLs
    iou_threshold=0.00005,                           # IoU threshold for matching
    show_progress=True                               # Show progress updates (ensure this is True)
)

# Display results as markdown
if "error" in result:
    display(Markdown(f"## ❌ Error\n{result['error']}"))
else:
    display(Markdown(f"""
    ## Picture Description Filtering Results
    
    - **Processed Files:** {result['processed_files']} file pairs
    - **Files with Matches:** {result['files_with_matches']}
    - **Files Copied Without Filtering:** {result['copied_without_filtering']}
    
    ### Matching Statistics:
    - Total picture description regions: {result['total_json_regions']}
    - Matched regions: {result['total_matches']}
    {f"- Match percentage: {result['match_percentage']:.2f}%" if 'match_percentage' in result else ""}
    - Text regions removed from XML: {result['regions_removed']}
    
    ### Processing Details:
    - IoU threshold used: {result['iou_threshold']}
    - Processing time: {result['elapsed_time']:.2f} seconds
    
    Filtered XML files are available in: `{result['output_dir']}`
    """))

Uses OpenAI's CLIP model to find semantic matches between images and text:

- Loads the CLIP ViT-B/32 model (handles both images and text)
- For each cropped image, computes CLIP embeddings
- For each text region in the XML files, computes CLIP embeddings
- Calculates cosine similarity between image and text embeddings
- Identifies text that semantically matches each image
- Creates visual context images showing the image with matched text
- Uses nested progress bars with improved tqdm implementation
- Displays a sample result with the matched image and text

This is the core of the pipeline that creates the final image-text matches.

# Filter JSONs by Text Description IDs Test only

This section creates a subset of filtered JSONs containing only those that have text descriptions.
It reads IDs from a list file and copies the matching JSONs

In [3]:
DIRS["filtered_jsons_dir"] = "filtered_jsons"

def filter_jsons_by_id_list(
    ids_file: str,
    source_dir: str,
    output_dir: str,
    show_progress: bool = True
) -> dict:
    """
    Filter JSON files by IDs listed in a text file.
    
    Args:
        ids_file: Path to text file containing IDs (one per line, with or without .json extension)
        source_dir: Directory containing source JSON files
        output_dir: Directory where filtered JSONs will be saved
        show_progress: Whether to show a progress bar
        
    Returns:
        Dictionary with summary statistics
    """
    import os
    import shutil
    import glob
    import logging
    
    # Set up logging
    logger = logging.getLogger("filter_jsons")
    logger.setLevel(logging.INFO)
    
    # Create output directory if it doesn't exist
    os.makedirs(output_dir, exist_ok=True)
    
    # Read IDs from file
    try:
        with open(ids_file, 'r', encoding='utf-8') as f:
            ids = [line.strip() for line in f if line.strip()]
        print(f"Read {len(ids)} IDs from {ids_file}")
        # Debug: Show first 5 IDs 
        print(f"Sample IDs from file: {ids[:5]}")
    except Exception as e:
        print(f"Error reading IDs file: {e}")
        return {"error": str(e)}
    
    # Get source files
    source_files = glob.glob(os.path.join(source_dir, "*.json"))
    
    # Debug: Print source directory and file count
    print(f"Source directory: {source_dir}")
    print(f"Source files found: {len(source_files)}")
    # Debug: Show first few source files
    print(f"Sample source files: {[os.path.basename(f) for f in source_files[:5]]}")
    
    # Prepare for progress bar
    if show_progress:
        try:
            file_iterator = notebook_tqdm(ids, desc="Filtering JSONs", unit="file")
        except:
            file_iterator = ids
    else:
        file_iterator = ids
    
    # Track statistics
    stats = {
        "total_ids": len(ids),
        "found_files": 0,
        "missing_files": 0,
        "copied_files": 0,
        "errors": 0
    }
    
    # First check if the directory even exists
    if not os.path.exists(source_dir):
        print(f"❌ ERROR: Source directory '{source_dir}' does not exist!")
        stats["error"] = f"Source directory '{source_dir}' not found"
        return stats
    
    # Store missed filenames for debugging
    missed_files = []
    
    # Process each ID
    for filename in file_iterator:
        # Debug: Print current ID being processed
        if stats["missing_files"] < 5 or stats["found_files"] < 5:
            print(f"Processing ID: {filename}")
        
        # Try to find the source file - test multiple path formats
        source_paths_to_try = [
            os.path.join(source_dir, filename),                    # Original filename
            os.path.join(source_dir, filename.replace('.json', '')),  # Without .json
            os.path.join(source_dir, f"{filename}.json")           # With .json
        ]
        
        source_path = None
        for path_to_try in source_paths_to_try:
            if os.path.exists(path_to_try):
                source_path = path_to_try
                if stats["found_files"] < 5:
                    print(f"✓ Found file: {source_path}")
                break
        
        if not source_path:
            stats["missing_files"] += 1
            if stats["missing_files"] <= 10:
                print(f"❌ Missing file: {filename}")
                missed_files.append(filename)
            continue
                
        stats["found_files"] += 1
        
        try:
            # Copy file to destination (always use the same filename format as the source)
            dest_path = os.path.join(output_dir, os.path.basename(source_path))
            shutil.copy2(source_path, dest_path)
            stats["copied_files"] += 1
            if stats["copied_files"] <= 5:
                print(f"✓ Copied: {source_path} → {dest_path}")
        except Exception as e:
            stats["errors"] += 1
            print(f"❌ Error copying {source_path}: {e}")
    
    # Print summary of missed files
    if missed_files:
        print(f"\nFirst 10 missing files: {missed_files[:10]}")
        
        # Try to check if files without .json extension exist
        source_files_no_ext = [os.path.splitext(os.path.basename(f))[0] for f in source_files]
        found_without_ext = [f for f in missed_files if f.replace('.json', '') in source_files_no_ext]
        if found_without_ext:
            print(f"\n✓ Found {len(found_without_ext)} files when ignoring extension!")
            print(f"Sample: {found_without_ext[:5]}")
    
    return stats

# Create a new directory for filtered JSONs with descriptions
filtered_description_jsons_dir = "filtered_description_jsons"
os.makedirs(filtered_description_jsons_dir, exist_ok=True)

# Debug: Print directory contents before running filter
print("\n--- Current Directory Structure ---")
print(f"Working directory: {os.getcwd()}")
dirs_to_check = ['filtered_jsons', 'filtered_description_jsons']
for d in dirs_to_check:
    if os.path.exists(d):
        files = os.listdir(d)
        print(f"Directory '{d}' exists with {len(files)} files")
        if files:
            print(f"Sample files: {files[:3]}")
    else:
        print(f"Directory '{d}' does not exist")

# Run the filtering function with the modified function
print("\n--- Running Filter Operation ---")
results = filter_jsons_by_id_list(
    ids_file="test.txt",             # TEST
    
    #ids_file="popis_v_textu_files.txt",             # File containing IDs
    source_dir=DIRS["filtered_jsons_dir"],          # Source directory (filtered JSONs)
    output_dir=filtered_description_jsons_dir,      # Output directory
    show_progress=True                              # Show progress bar
)

# Display results
display(Markdown(f"""
## Filter Results for JSONs with Text Descriptions

- Total IDs in list: {results.get("total_ids", 0)}
- Files found in source directory: {results.get("found_files", 0)}
- Files missing from source: {results.get("missing_files", 0)}
- Files successfully copied: {results.get("copied_files", 0)}
- Errors during copy: {results.get("errors", 0)}

Filtered JSONs with text descriptions are available in: `{filtered_description_jsons_dir}`
"""))

# Debug: Check output directory contents after filtering
print("\n--- After Filter Operation ---")
if os.path.exists(filtered_description_jsons_dir):
    files = os.listdir(filtered_description_jsons_dir)
    print(f"Output directory '{filtered_description_jsons_dir}' contains {len(files)} files")
    if files:
        print(f"Sample files: {files[:3]}")

# Only update the directory reference if files were found and copied
if results.get("copied_files", 0) > 0:
    DIRS["filtered_description_jsons_dir"] = filtered_description_jsons_dir
    DIRS["filtered_jsons_dir"] = filtered_description_jsons_dir
    print(f"✓ Updated DIRS dictionary with new filtered path: {filtered_description_jsons_dir}")
else:
    # Keep using the original directory
    display(Markdown("⚠️ No files were copied. Keeping original filtered_jsons_dir."))
    print(f"⚠️ Keeping original filtered_jsons_dir: {DIRS['filtered_jsons_dir']}")


--- Current Directory Structure ---
Working directory: c:\Users\Bahno\Desktop\skola\8sem\KNN\new\KNN\CLIP
Directory 'filtered_jsons' exists with 3949 files
Sample files: ['001fb740-cc2b-11ea-b34d-5ef3fc9bb22f.json', '003d9392-3e40-11e1-bdd3-005056a60003.json', '0045b280-d504-11e3-893a-0030487be43a.json']
Directory 'filtered_description_jsons' exists with 0 files

--- Running Filter Operation ---
Error reading IDs file: [Errno 2] No such file or directory: 'test.txt'



## Filter Results for JSONs with Text Descriptions

- Total IDs in list: 0
- Files found in source directory: 0
- Files missing from source: 0
- Files successfully copied: 0
- Errors during copy: 0

Filtered JSONs with text descriptions are available in: `filtered_description_jsons`



--- After Filter Operation ---
Output directory 'filtered_description_jsons' contains 0 files


⚠️ No files were copied. Keeping original filtered_jsons_dir.

⚠️ Keeping original filtered_jsons_dir: filtered_jsons


# Run this before running CLIP if not testing 

In [15]:

DIRS["filtered_jsons_dir"] = "filtered_jsons"

In [None]:
# Import the process descriptions module
import glob
import process_descriptions
import shutil

# Set up logging for notebook environment
process_descriptions.setup_logging(debug_mode=DEBUG, use_notebook=True, log_to_file=False)

# Check if required directories have data
required_dirs = {
    "filtered_jsons_dir": DIRS["filtered_jsons_dir"],
    "cropped_images_dir": DIRS["cropped_images_dir"],
    "filtered_texts_no_desc_dir": DIRS["filtered_texts_no_desc_dir"]
}

for name, path in required_dirs.items():
    file_count = len(glob.glob(os.path.join(path, "*")))
    display(Markdown(f"✓ Found {file_count} files in `{path}`"))

# Clear output directories before processing
output_dirs = [DIRS["output_dir"], DIRS["output_jsons_dir"]]
for dir_path in output_dirs:
    # Check if the directory exists
    if os.path.exists(dir_path):
        # Get a list of all files in the directory
        files = glob.glob(os.path.join(dir_path, "*"))
        if files:
            # Ask for confirmation before deleting
            display(Markdown(f"⚠️ Found {len(files)} files in `{dir_path}`. Clearing directory..."))
            
            # Delete all files in the directory
            for file_path in files:
                try:
                    if os.path.isfile(file_path):
                        os.unlink(file_path)
                    elif os.path.isdir(file_path):
                        shutil.rmtree(file_path)
                except Exception as e:
                    display(Markdown(f"❌ Error deleting {file_path}: {e}"))
            
            display(Markdown(f"✓ Cleared directory `{dir_path}`"))
        else:
            display(Markdown(f"✓ Directory `{dir_path}` is already empty"))

# Let's add a notification that this might take a while
display(Markdown("## ⚙️ Running CLIP model - this may take several minutes..."))

# Run the processing function with our directory structure
result = process_descriptions.run_process_descriptions(
    json_dir=DIRS["filtered_jsons_dir"],      # Use filtered JSONs
    images_dir=DIRS["cropped_images_dir"],    # Use cropped images 
    texts_dir=DIRS["filtered_texts_no_desc_dir"],  # Use filtered texts without descriptions
    output_dir=DIRS["output_dir"],            # Where output files will go
    original_images_dir=DIRS["images_dir"],   # Original images directory
    similarity_threshold=0.25,                # Threshold for text matching
    max_lines_context=3,                      # Include 3 lines above/below matches
    max_ids=100,                              # Process 100 IDs
    model_name="M-CLIP",
    # model_name="ViT-B/32",                    # CLIP model to use
    best_only=False,                          # Use all matching blocks above threshold (not just best)
    top_k=5,                                  # Find top 3 matching blocks for each image
    show_progress=True,                       # Use tqdm.notebook for progress bars
    verbose=False,                            # Disable verbose output for cleaner logs
    output_jsons_dir=DIRS["output_jsons_dir"] # Where output JSONs will go
)

# Display results as markdown
if "error" in result:
    display(Markdown(f"## ❌ Error\n{result['error']}"))
else:
    display(Markdown(f"""
    ## CLIP Text-Image Matching Results
    
    ### Processing Summary:
    - Total IDs processed: {result['summary']['total_ids']}
    - Successful matches: {result['summary']['successful_ids']}
    - Success rate: {result['summary']['success_rate']:.1f}%
    - Total images processed: {result['summary']['total_images_processed']}
    - Images below threshold: {result['summary']['images_below_threshold']}
    
    ### Configuration:
    - Model used: {result['config']['model']} on {result['config']['device']}
    - Similarity threshold: {result['config']['similarity_threshold']}
    - Max lines context: {result['config']['max_lines_context']}
    - Top-k matches: 3 (finds top 3 matching text blocks per image)
    
    ### Performance:
    - Total processing time: {result['summary']['elapsed_time']:.2f} seconds
    - Average time per ID: {result['summary']['average_time_per_id']:.2f} seconds
    """))
    
    # Show a sample image if any were successful
    successful_results = [r for r in result['details'] if r.get('success', False)]
    if successful_results:
        sample = successful_results[0]
        display(Markdown(f"### Sample Result: ID {sample['id']}"))
        sample_path = os.path.join(DIRS["output_dir"], sample['output_file'])
        if os.path.exists(sample_path):
            display(Image(filename=sample_path, width=800))
            display(Markdown(f"- Context blocks: {sample['context_blocks']}"))
            display(Markdown(f"- Processing time: {sample['time']:.2f} seconds"))
        else:
            display(Markdown(f"Image file not found: {sample_path}"))
    else:
        display(Markdown("No successful results to display"))

INFO: INFO logging enabled - use debug=True for more details


✓ Found 3949 files in `filtered_jsons`

✓ Found 7715 files in `cropped_images`

✓ Found 3949 files in `filtered_texts_no_desc`

⚠️ Found 50 files in `output_context`. Clearing directory...

✓ Cleared directory `output_context`

⚠️ Found 100 files in `output_jsons`. Clearing directory...

✓ Cleared directory `output_jsons`

## ⚙️ Running CLIP model - this may take several minutes...

INFO: Using device: cpu
INFO: Loading CLIP model: ViT-B/32
INFO: Model loaded in 3.13 seconds
INFO: Starting processing with ViT-B/32 model
INFO: Scanning for JSONs with 'Obrázek' label...


Checking JSONs for Obrázek label:   0%|          | 0/3949 [00:00<?, ?it/s]

INFO: Limiting to 100 IDs out of 3949 total
INFO: Found 100 IDs to process
INFO: Config: threshold=0.25, model=ViT-B/32


Processing IDs:   0%|          | 0/100 [00:00<?, ?ID/s]

KeyboardInterrupt: 

# Intersection Analysis

This section analyzes whether the CLIP-matched text regions intersect with manually labeled "Popis v textu" (text description) regions.

- Compares the output JSONs from CLIP with the filtered JSONs containing annotations
- Identifies where context blocks from CLIP match with human-annotated text descriptions
- Calculates statistics on overlap/intersection between AI and human annotations
- Shows examples of matches where they occur

In [3]:
# Enhanced debugging for CLIP bounding boxes with intersection detection for all examples
import os
import json
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image as PILImage
from tqdm.notebook import tqdm as notebook_tqdm

def calculate_iou(box1, box2):
    """
    Calculate Intersection over Union (IoU) between two bounding boxes.
    Boxes are in format (x, y, width, height) as fractions.
    Returns IoU value between 0.0 and 1.0
    """
    # Convert to coordinates format [x1, y1, x2, y2]
    box1_coords = [box1[0], box1[1], box1[0] + box1[2], box1[1] + box1[3]]
    box2_coords = [box2[0], box2[1], box2[0] + box2[2], box2[1] + box2[3]]
    
    # Calculate intersection area
    x_left = max(box1_coords[0], box2_coords[0])
    y_top = max(box1_coords[1], box2_coords[1])
    x_right = min(box1_coords[2], box2_coords[2])
    y_bottom = min(box1_coords[3], box2_coords[3])
    
    # No intersection
    if x_right < x_left or y_bottom < y_top:
        return 0.0
    
    intersection_area = (x_right - x_left) * (y_bottom - y_top)
    
    # Calculate union area
    box1_area = box1[2] * box1[3]
    box2_area = box2[2] * box2[3]
    union_area = box1_area + box2_area - intersection_area
    
    # Calculate IoU
    iou = intersection_area / union_area if union_area > 0 else 0.0
    return iou

def check_intersection(box1, box2):
    """
    Check if two boxes intersect.
    Boxes are in format (x, y, width, height) as fractions.
    Returns True if boxes intersect, False otherwise.
    """
    # Convert to coordinates format [x1, y1, x2, y2]
    box1_coords = [box1[0], box1[1], box1[0] + box1[2], box1[1] + box1[3]]
    box2_coords = [box2[0], box2[1], box2[0] + box2[2], box2[1] + box2[3]]
    
    # Check if boxes intersect
    if (box1_coords[2] <= box2_coords[0] or  # box1 is to the left of box2
        box1_coords[0] >= box2_coords[2] or  # box1 is to the right of box2
        box1_coords[3] <= box2_coords[1] or  # box1 is above box2
        box1_coords[1] >= box2_coords[3]):   # box1 is below box2
        return False
    
    return True

def process_bounding_box(bbox, img_width, img_height):
    """
    Process a bounding box and convert to normalized format.
    
    Args:
        bbox: Bounding box in format [x1, y1, x2, y2] in pixels
        img_width: Width of the image
        img_height: Height of the image
        
    Returns:
        Tuple (x, y, width, height) as fractions or None if failed
    """
    try:
        # Ensure we have numeric values
        bbox = [float(coord) for coord in bbox]
        
        # Convert from pixels [x1, y1, x2, y2] to fractions [x, y, width, height]
        x = bbox[0] / img_width
        y = bbox[1] / img_height
        width = (bbox[2] - bbox[0]) / img_width
        height = (bbox[3] - bbox[1]) / img_height
        return (x, y, width, height)
    except (ValueError, TypeError):
        return None

def merge_blocks_in_context(block_boxes):
    """
    Merge multiple bounding boxes into a single encompassing box.
    
    Args:
        block_boxes: List of bounding boxes in (x, y, width, height) format
        
    Returns:
        A single bounding box that encompasses all input boxes
    """
    if not block_boxes:
        return None
    
    if len(block_boxes) == 1:
        return block_boxes[0]
    
    # Find the min and max coordinates
    min_x = min(box[0] for box in block_boxes)
    min_y = min(box[1] for box in block_boxes)
    max_x = max(box[0] + box[2] for box in block_boxes)
    max_y = max(box[1] + box[3] for box in block_boxes)
    
    # Create a new bounding box that encompasses all the boxes
    merged_box = (min_x, min_y, max_x - min_x, max_y - min_y)
    return merged_box

def extract_clip_bboxes(output_data, sample_id, img_width, img_height, verbose=False, merge_context=True):
    """
    Extract all clip bounding boxes from the output data, handling different formats.
    
    Args:
        output_data: The loaded JSON data
        sample_id: ID of the sample to extract boxes for
        img_width: Width of the image
        img_height: Height of the image
        verbose: Whether to print detailed info
        merge_context: Whether to merge context blocks within a main block into one
        
    Returns:
        List of bounding boxes in format [(x, y, width, height), ...]
    """
    clip_bboxes = []
    
    # Check which format we're dealing with
    if 'images' in output_data and sample_id in output_data['images']:
        # New format with nested structure
        image_data = output_data['images'][sample_id]
        if verbose:
            print(f"Found image data for {sample_id} in nested format")
            
        if 'blocks' in image_data:
            for block_idx, block in enumerate(image_data['blocks']):
                # Process main block
                if 'bounding_box' in block:
                    main_bbox = process_bounding_box(block['bounding_box'], img_width, img_height)
                    
                    # Process context blocks nested within this block
                    context_boxes = []
                    if 'context_blocks' in block and block['context_blocks']:
                        for ctx_idx, ctx_block in enumerate(block['context_blocks']):
                            if 'bounding_box' in ctx_block:
                                ctx_bbox = process_bounding_box(ctx_block['bounding_box'], img_width, img_height)
                                if ctx_bbox:
                                    context_boxes.append(ctx_bbox)
                                    if verbose and not merge_context:
                                        print(f"Added context block {block_idx}.{ctx_idx} bounding box")
                    
                    # If we have both main block and context blocks and merging is enabled
                    if main_bbox and context_boxes and merge_context:
                        # Include the main block in the merge if it's valid
                        all_boxes = [main_bbox] + context_boxes
                        merged_box = merge_blocks_in_context(all_boxes)
                        clip_bboxes.append(merged_box)
                        if verbose:
                            print(f"Added merged block {block_idx} with {len(context_boxes)} context blocks")
                    else:
                        # Add main block if it exists
                        if main_bbox:
                            clip_bboxes.append(main_bbox)
                            if verbose:
                                print(f"Added main block {block_idx} bounding box")
                        
                        # Add individual context blocks if merging is disabled
                        if not merge_context:
                            clip_bboxes.extend(context_boxes)
                    
                # If no main bounding box but has context blocks
                elif 'context_blocks' in block and block['context_blocks']:
                    context_boxes = []
                    for ctx_idx, ctx_block in enumerate(block['context_blocks']):
                        if 'bounding_box' in ctx_block:
                            ctx_bbox = process_bounding_box(ctx_block['bounding_box'], img_width, img_height)
                            if ctx_bbox:
                                context_boxes.append(ctx_bbox)
                                if verbose and not merge_context:
                                    print(f"Added context block {block_idx}.{ctx_idx} bounding box")
                    
                    # Merge context blocks if enabled and we have any
                    if context_boxes and merge_context:
                        merged_box = merge_blocks_in_context(context_boxes)
                        clip_bboxes.append(merged_box)
                        if verbose:
                            print(f"Added merged context blocks for block {block_idx} ({len(context_boxes)} boxes)")
                    elif not merge_context:
                        clip_bboxes.extend(context_boxes)
    
    # Handle the older formats
    elif 'context_blocks' in output_data:
        if verbose:
            print(f"Using old format with direct context_blocks")
            
        # Old format with direct context_blocks
        context_boxes = []
        for block in output_data['context_blocks']:
            if 'bbox' in block:
                # Bbox in the format [x1, y1, x2, y2] as fractions
                bbox = block['bbox']
                
                # Convert from [x1, y1, x2, y2] to [x, y, width, height]
                x = bbox[0]
                y = bbox[1]
                width = bbox[2] - bbox[0]
                height = bbox[3] - bbox[1]
                context_boxes.append((x, y, width, height))
                
            elif 'bounding_box' in block:
                # Bbox in the format [x1, y1, x2, y2] in pixels
                bbox = process_bounding_box(block['bounding_box'], img_width, img_height)
                if bbox:
                    context_boxes.append(bbox)
        
        # Merge all context blocks if enabled
        if context_boxes and merge_context:
            merged_box = merge_blocks_in_context(context_boxes)
            clip_bboxes.append(merged_box)
            if verbose:
                print(f"Merged all {len(context_boxes)} context blocks into one")
        else:
            clip_bboxes.extend(context_boxes)
    
    # Check for alternative structure
    elif 'blocks' in output_data:
        if verbose:
            print(f"Using direct blocks format")
            
        # Direct blocks array
        block_boxes = []
        for block in output_data['blocks']:
            if 'bounding_box' in block:
                bbox = process_bounding_box(block['bounding_box'], img_width, img_height)
                if bbox:
                    block_boxes.append(bbox)
        
        # Merge all blocks if enabled
        if block_boxes and merge_context:
            merged_box = merge_blocks_in_context(block_boxes)
            clip_bboxes.append(merged_box)
            if verbose:
                print(f"Merged all {len(block_boxes)} blocks into one")
        else:
            clip_bboxes.extend(block_boxes)
    
    if verbose:
        print(f"Extracted {len(clip_bboxes)} CLIP bounding boxes" + 
              (" (after merging)" if merge_context else ""))
        
    return clip_bboxes

def visualize_comparison(output_json_dir, filtered_json_dir, output_dir, sample_id=None, verbose=False, merge_context=True):
    """
    Visualize an image with its CLIP context bboxes (red) and manual annotation bboxes (green).
    Also highlights intersecting boxes with yellow borders and reports intersection statistics.
    
    Args:
        output_json_dir: Directory containing CLIP output JSONs
        filtered_json_dir: Directory containing filtered JSONs with annotations
        output_dir: Directory to save visualizations
        sample_id: Specific ID to process (if None, processes first available)
        verbose: Whether to print detailed debug info
        merge_context: Whether to merge context blocks within a main block
    
    Returns:
        Dictionary with result information or None if failed
    """
    # Get available IDs
    output_jsons = [f for f in os.listdir(output_json_dir) if f.endswith('_results.json')]
    
    if not output_jsons:
        if verbose:
            print(f"❌ No output JSONs with '_results.json' suffix found in directory: {output_json_dir}")
        # Try without suffix as fallback
        output_jsons = [f for f in os.listdir(output_json_dir) if f.endswith('.json')]
        if not output_jsons:
            if verbose:
                print(f"❌ No JSON files found in directory: {output_json_dir}")
            return None
    
    # If no specific ID is provided, use the first one
    if sample_id is None:
        sample_json = output_jsons[0]
        sample_id = os.path.splitext(sample_json)[0]
        if sample_id.endswith('_results'):
            sample_id = sample_id[:-8]  # Remove "_results" suffix
    else:
        # Ensure sample_id has no extension
        sample_id = os.path.splitext(sample_id)[0]
        if sample_id.endswith('_results'):
            sample_id = sample_id[:-8]  # Remove "_results" suffix
    
    # Check if both JSONs exist
    output_json_path = os.path.join(output_json_dir, f"{sample_id}_results.json")
    if not os.path.exists(output_json_path):
        output_json_path = os.path.join(output_json_dir, f"{sample_id}.json")  # Try without suffix
        if not os.path.exists(output_json_path):
            if verbose:
                print(f"❌ Output JSON not found: {output_json_path}")
            return None
    
    filtered_json_path = os.path.join(filtered_json_dir, f"{sample_id}.json")
    if not os.path.exists(filtered_json_path):
        if verbose:
            print(f"❌ Filtered JSON not found: {filtered_json_path}")
        return None
    
    # Load the JSONs
    try:
        with open(output_json_path, 'r') as f:
            output_data = json.load(f)
        with open(filtered_json_path, 'r') as f:
            filtered_data = json.load(f)
    except Exception as e:
        if verbose:
            print(f"❌ Error loading JSONs: {e}")
        return None
    
    # Get image path
    image_path = os.path.join(DIRS["images_dir"], f"{sample_id}.jpg")
    if not os.path.exists(image_path):
        # Try other extensions
        for ext in ['.png', '.jpeg', '.gif']:
            alt_path = os.path.join(DIRS["images_dir"], f"{sample_id}{ext}")
            if os.path.exists(alt_path):
                image_path = alt_path
                break
        
        if not os.path.exists(image_path):
            if verbose:
                print(f"❌ Image not found for ID: {sample_id}")
            return None
    
    # Load the image
    img = PILImage.open(image_path)
    img_width, img_height = img.size
    
    # Get bounding boxes from CLIP output using the enhanced extraction function with merging
    clip_bboxes = extract_clip_bboxes(output_data, sample_id, img_width, img_height, verbose, merge_context)
    
    # Manual bounding boxes from filtered JSON
    manual_bboxes = []
    
    # Helper function to extract annotations from result
    def extract_manual_annotations(result_list):
        boxes = []
        for result in result_list:
            if 'value' in result and 'rectanglelabels' in result['value']:
                label = result['value']['rectanglelabels']
                if isinstance(label, list) and ('Popis v textu' in label):
                    try:
                        # Get coordinates (already in percentages)
                        x = float(result['value']['x']) / 100.0  # Convert from percentage to fraction
                        y = float(result['value']['y']) / 100.0
                        width = float(result['value']['width']) / 100.0
                        height = float(result['value']['height']) / 100.0
                        
                        boxes.append((x, y, width, height))
                    except Exception:
                        pass
        return boxes
    
    # Extract boxes from result array (different JSON structures)
    if 'result' in filtered_data:
        manual_bboxes.extend(extract_manual_annotations(filtered_data['result']))
    elif 'annotations' in filtered_data and len(filtered_data['annotations']) > 0:
        for annotation in filtered_data['annotations']:
            if 'result' in annotation:
                manual_bboxes.extend(extract_manual_annotations(annotation['result']))
    
    # Check for intersections between CLIP and manual boxes
    intersections = []
    for i, clip_box in enumerate(clip_bboxes):
        for j, manual_box in enumerate(manual_bboxes):
            if check_intersection(clip_box, manual_box):
                iou = calculate_iou(clip_box, manual_box)
                intersections.append((i, j, iou))
    
    # Create figure and axis
    fig, ax = plt.subplots(figsize=(12, 10))
    ax.imshow(np.array(img))
    
    # Plot CLIP bboxes in red
    for i, bbox in enumerate(clip_bboxes):
        x, y, width, height = bbox
        # Scale to pixel values
        x_px = x * img_width
        y_px = y * img_height
        width_px = width * img_width
        height_px = height * img_height
        
        # Check if this box intersects with any manual box
        is_intersecting = any(intersection[0] == i for intersection in intersections)
        
        # Use thicker border for intersecting boxes
        linewidth = 3 if is_intersecting else 2
        # Use different color for intersecting boxes
        edgecolor = 'orange' if is_intersecting else 'r'
        
        rect = patches.Rectangle(
            (x_px, y_px), width_px, height_px,
            linewidth=linewidth, edgecolor=edgecolor, facecolor='none',
            label='CLIP Context (Intersecting)' if (i == 0 and is_intersecting) else 
                  ('CLIP Context' if i == 0 else "")
        )
        ax.add_patch(rect)
        
        # Add text label inside the box
        ax.text(x_px + 5, y_px + 15, f"C{i+1}", color=edgecolor, fontweight='bold',
                bbox=dict(facecolor='white', alpha=0.7, edgecolor='none', pad=0))
    
    # Plot manual bboxes in green
    for i, bbox in enumerate(manual_bboxes):
        x, y, width, height = bbox
        # Scale to pixel values
        x_px = x * img_width
        y_px = y * img_height
        width_px = width * img_width
        height_px = height * img_height
        
        # Check if this box intersects with any CLIP box
        is_intersecting = any(intersection[1] == i for intersection in intersections)
        
        # Use thicker border for intersecting boxes
        linewidth = 3 if is_intersecting else 2
        # Use different color for intersecting boxes
        edgecolor = 'orange' if is_intersecting else 'g'
        
        rect = patches.Rectangle(
            (x_px, y_px), width_px, height_px,
            linewidth=linewidth, edgecolor=edgecolor, facecolor='none',
            label='Manual Annotation (Intersecting)' if (i == 0 and is_intersecting) else 
                  ('Manual Annotation' if i == 0 else "")
        )
        ax.add_patch(rect)
        
        # Add text label inside the box
        ax.text(x_px + 5, y_px + 15, f"M{i+1}", color=edgecolor, fontweight='bold',
                bbox=dict(facecolor='white', alpha=0.7, edgecolor='none', pad=0))
    
    # Add legend (only once for each color)
    handles = []
    if clip_bboxes:
        handles.append(patches.Patch(linewidth=2, edgecolor='r', facecolor='none', label='CLIP Context (C#)'))
    if manual_bboxes:
        handles.append(patches.Patch(linewidth=2, edgecolor='g', facecolor='none', label='Manual Annotation (M#)'))
    if intersections:
        handles.append(patches.Patch(linewidth=3, edgecolor='orange', facecolor='none', label='Intersection'))
    
    if handles:
        ax.legend(handles=handles, loc='upper right')
    
    # Set title with intersection stats and merging info
    merge_info = " (merged)" if merge_context else ""
    intersection_info = f" - {len(intersections)} intersections" if intersections else ""
    ax.set_title(f"ID: {sample_id} - CLIP ({len(clip_bboxes)}){merge_info} vs Manual ({len(manual_bboxes)}){intersection_info}")
    
    # Remove axes
    ax.set_axis_off()
    
    # Save the figure
    output_path = os.path.join(output_dir, f"{sample_id}_comparison{'_merged' if merge_context else ''}.jpg")
    plt.tight_layout()
    plt.savefig(output_path, bbox_inches='tight')
    plt.close()
    
    return {
        "output_path": output_path,
        "clip_boxes": len(clip_bboxes),
        "manual_boxes": len(manual_bboxes),
        "intersections": len(intersections),
        "id": sample_id,
        "merged": merge_context
    }

def run_batch_visualizations(output_json_dir, filtered_json_dir, output_dir, max_samples=None, verbose=False, merge_context=True):
    """
    Run visualizations on multiple samples.
    
    Args:
        output_json_dir: Directory containing CLIP output JSONs
        filtered_json_dir: Directory containing filtered JSONs with annotations
        output_dir: Directory to save visualizations
        max_samples: Maximum number of samples to process (None for all)
        verbose: Whether to print detailed logs
        merge_context: Whether to merge context blocks within a main block
    
    Returns:
        Dictionary with summary statistics
    """
    # Create output directory if it doesn't exist
    os.makedirs(output_dir, exist_ok=True)
    
    # Get all available IDs
    output_jsons = [f for f in os.listdir(output_json_dir) if f.endswith('_results.json')]
    if not output_jsons:
        output_jsons = [f for f in os.listdir(output_json_dir) if f.endswith('.json')]
    
    sample_ids = []
    for filename in output_jsons:
        sample_id = os.path.splitext(filename)[0]
        if sample_id.endswith('_results'):
            sample_id = sample_id[:-8]  # Remove "_results" suffix
        sample_ids.append(sample_id)
    
    # Limit sample size if requested
    if max_samples is not None and max_samples < len(sample_ids):
        sample_ids = sample_ids[:max_samples]
    
    # Process all samples with progress bar
    merge_status = "merged" if merge_context else "individual"
    print(f"Processing {len(sample_ids)} samples with {merge_status} context blocks...")
    
    results = []
    total_clip_boxes = 0
    total_manual_boxes = 0
    total_intersections = 0
    successful_samples = 0
    
    try:
        # Use tqdm.notebook for progress bar in Jupyter
        sample_iterator = notebook_tqdm(sample_ids, desc="Generating visualizations")
    except:
        # Fallback to regular iteration if tqdm fails
        sample_iterator = sample_ids
        
    for sample_id in sample_iterator:
        result = visualize_comparison(
            output_json_dir=output_json_dir,
            filtered_json_dir=filtered_json_dir,
            output_dir=output_dir,
            sample_id=sample_id,
            verbose=verbose,
            merge_context=merge_context
        )
        
        if result:
            results.append(result)
            total_clip_boxes += result["clip_boxes"]
            total_manual_boxes += result["manual_boxes"]
            total_intersections += result["intersections"]
            successful_samples += 1
    
    # Generate summary statistics
    summary = {
        "total_samples": len(sample_ids),
        "successful_samples": successful_samples,
        "failed_samples": len(sample_ids) - successful_samples,
        "total_clip_boxes": total_clip_boxes,
        "total_manual_boxes": total_manual_boxes,
        "total_intersections": total_intersections,
        "intersection_rate": (total_intersections / total_clip_boxes * 100) if total_clip_boxes > 0 else 0,
        "results": results,
        "merge_context": merge_context
    }
    
    # Calculate samples with intersections
    samples_with_intersections = len([r for r in results if r["intersections"] > 0])
    summary["samples_with_intersections"] = samples_with_intersections
    summary["intersection_sample_rate"] = (samples_with_intersections / successful_samples * 100) if successful_samples > 0 else 0
    
    # Find best samples with most intersections
    if results:
        # Sort by number of intersections (descending)
        best_samples = sorted(results, key=lambda x: x["intersections"], reverse=True)[:5]
        summary["best_samples"] = best_samples
    
    return summary

# Set output directory for visualizations
comparison_dir = "bbox_comparisons"
os.makedirs(comparison_dir, exist_ok=True)

# Run batch visualization for all samples with merged context blocks
summary = run_batch_visualizations(
    output_json_dir=DIRS["output_jsons_dir"],
    filtered_json_dir=DIRS["filtered_jsons_dir"],
    output_dir=comparison_dir,
    max_samples=None,  # Process all available samples
    verbose=False,     # Don't print debug info
    merge_context=True # Merge context blocks within each main block
)

# Display only the summary statistics
from IPython.display import Markdown, display

merge_info = "(with merged context blocks)" if summary.get("merge_context", False) else ""

display(Markdown(f"""
## Bounding Box Comparison Summary {merge_info}

### Processing Statistics:
- Total samples: {summary['total_samples']}
- Successfully processed: {summary['successful_samples']}
- Failed to process: {summary['failed_samples']}

### Box Detection:
- Total CLIP context boxes: {summary['total_clip_boxes']}
- Total manual annotation boxes: {summary['total_manual_boxes']}
- Total intersections detected: {summary['total_intersections']}
- Overall intersection rate: {summary['intersection_rate']:.2f}%

### Sample Intersections:
- Samples with at least one intersection: {summary['samples_with_intersections']}
- Percentage of samples with intersections: {summary['intersection_sample_rate']:.2f}%

### Top Samples by Intersection Count:
{', '.join([f"{s['id']} ({s['intersections']} intersections)" for s in summary['best_samples'][:5]])}

All visualizations have been saved to the 'bbox_comparisons' directory.
"""))

Processing 100 samples with merged context blocks...


Generating visualizations:   0%|          | 0/100 [00:00<?, ?it/s]


## Bounding Box Comparison Summary (with merged context blocks)

### Processing Statistics:
- Total samples: 100
- Successfully processed: 100
- Failed to process: 0

### Box Detection:
- Total CLIP context boxes: 109
- Total manual annotation boxes: 49
- Total intersections detected: 23
- Overall intersection rate: 21.10%

### Sample Intersections:
- Samples with at least one intersection: 17
- Percentage of samples with intersections: 17.00%

### Top Samples by Intersection Count:
064aef66-735d-4b34-93b3-770b48c037d3 (3 intersections), 02928146-e6ab-11e5-bc5e-001b21d0d3a4 (2 intersections), 04aa82b9-1cbb-4ae0-b65c-1cbae3c6a9cd (2 intersections), 04c338cf-0061-11e7-9c30-92f789bb8157 (2 intersections), 08abce05-4802-40a4-a48b-5dce1a0bf24e (2 intersections)

All visualizations have been saved to the 'bbox_comparisons' directory.
