# Layout Error Detection - Feature Testing Notebook

This notebook provides an interactive way to test all the features of the layout error detection system.

## Features Tested:
1. **XML Parser** - Parse PPTX files to extract slides, text elements, and images
2. **Hierarchy Detector** - Detect font size hierarchy inconsistencies
3. **Margin Detector** - Detect elements too close to slide edges
4. **Contrast Detector** - Detect text with poor contrast against background
5. **Aspect Ratio Detector** - Detect distorted images
6. **Alignment Detector** - Detect nearly-aligned elements that should snap
7. **VLM Validator** - Use vision models to validate text legibility (requires Azure OpenAI)

## Setup and Imports

In [None]:
import sys
import os
import json
from pathlib import Path

# Add the project root to Python path
project_root = Path(os.getcwd()).parent
sys.path.insert(0, str(project_root))

print(f"Project root: {project_root}")

In [None]:
# Import all modules
from src.parsers.xml_parser import parse_presentation
from src.detectors.hierarchy import detect_hierarchy_violations
from src.detectors.margin import detect_margin_violations
from src.detectors.contrast import detect_contrast_violations
from src.detectors.aspect_ratio import detect_aspect_ratio_violations
from src.detectors.alignment import detect_alignment_violations
from src.reporter import generate_report, report_to_json
from src.main import analyze
from src.models import SlideNode, TextElement, ImageElement, TextStyle, BoundingBox

print("‚úÖ All modules imported successfully!")

## 1. Set Test Presentation Path

Update the path below to point to your test PPTX file.

In [None]:
# Set your test PPTX file path here
TEST_PPTX_PATH = "path/to/your/test.pptx"

# Check if file exists
if os.path.exists(TEST_PPTX_PATH):
    print(f"‚úÖ Test file found: {TEST_PPTX_PATH}")
else:
    print(f"‚ùå Test file not found: {TEST_PPTX_PATH}")
    print("Please update TEST_PPTX_PATH to point to a valid PPTX file.")

## 2. Test XML Parser

Parse the presentation and explore its structure.

In [None]:
# Parse the presentation
slides = parse_presentation(TEST_PPTX_PATH)

print(f"üìä Parsed {len(slides)} slides")
print(f"\nSlide dimensions: {slides[0].width:.1f}px √ó {slides[0].height:.1f}px" if slides else "No slides found")

In [None]:
# Explore slide contents
for slide in slides:
    print(f"\n{'='*50}")
    print(f"üìÑ Slide {slide.index + 1}")
    print(f"   Background: {slide.background_color or 'Not set'}")
    print(f"   Text elements: {len(slide.text_elements)}")
    print(f"   Image elements: {len(slide.image_elements)}")
    
    if slide.text_elements:
        print("\n   üìù Text Elements:")
        for elem in slide.text_elements[:5]:  # Show first 5
            text_preview = elem.text[:50] + "..." if len(elem.text) > 50 else elem.text
            print(f"      - [{elem.id}] '{text_preview}'")
            print(f"        Font: {elem.style.font_name or 'Unknown'}, Size: {elem.style.font_size or 'Unknown'}pt")
            print(f"        Position: ({elem.bbox.x:.1f}, {elem.bbox.y:.1f})")
    
    if slide.image_elements:
        print("\n   üñºÔ∏è Image Elements:")
        for elem in slide.image_elements[:5]:  # Show first 5
            print(f"      - [{elem.id}] {elem.rendered_width:.1f}x{elem.rendered_height:.1f}px")

## 3. Test Individual Detectors

### 3.1 Hierarchy Detector

Detects font size inconsistencies where sizes are close but not matching the common sizes used in the presentation.

In [None]:
hierarchy_errors = detect_hierarchy_violations(slides)

print(f"üîç Hierarchy Detector: Found {len(hierarchy_errors)} error(s)\n")

for error in hierarchy_errors:
    print(f"  [{error.severity.value}] {error.message}")
    print(f"    Elements: {error.elements}\n")

### 3.2 Margin Detector

Detects elements that are positioned too close to the slide edges.

In [None]:
margin_errors = detect_margin_violations(slides)

print(f"üîç Margin Detector: Found {len(margin_errors)} error(s)\n")

for error in margin_errors:
    print(f"  [{error.severity.value}] {error.message}")
    print(f"    Elements: {error.elements}\n")

### 3.3 Contrast Detector

Detects text elements with poor contrast against the slide background.

In [None]:
contrast_errors = detect_contrast_violations(slides)

print(f"üîç Contrast Detector: Found {len(contrast_errors)} error(s)\n")

for error in contrast_errors:
    print(f"  [{error.severity.value}] {error.message}")
    print(f"    Elements: {error.elements}\n")

### 3.4 Aspect Ratio Detector

Detects images that have been distorted from their original aspect ratio.

In [None]:
aspect_ratio_errors = detect_aspect_ratio_violations(slides, TEST_PPTX_PATH)

print(f"üîç Aspect Ratio Detector: Found {len(aspect_ratio_errors)} error(s)\n")

for error in aspect_ratio_errors:
    print(f"  [{error.severity.value}] {error.message}")
    print(f"    Elements: {error.elements}\n")

### 3.5 Alignment Detector

Detects elements that are nearly aligned and could benefit from snapping to a common grid.

In [None]:
alignment_errors = detect_alignment_violations(slides)

print(f"üîç Alignment Detector: Found {len(alignment_errors)} error(s)\n")

# Show first 10 to avoid overwhelming output
for error in alignment_errors[:10]:
    print(f"  [{error.severity.value}] {error.message}")
    print(f"    Elements: {error.elements}\n")

if len(alignment_errors) > 10:
    print(f"  ... and {len(alignment_errors) - 10} more alignment issues.")

## 4. Full Analysis Pipeline

Run the complete analysis and generate a JSON report.

In [None]:
# Run the full analysis
result_json = analyze(TEST_PPTX_PATH)

# Parse and display the result
result = json.loads(result_json)

print("üìã Full Analysis Report:")
print("="*60)
print(json.dumps(result, indent=2))

In [None]:
# Summary statistics
total_errors = sum(len(slide_data.get("errors", [])) for slide_data in result.values())
errors_by_type = {}
errors_by_severity = {}

for slide_key, slide_data in result.items():
    for error in slide_data.get("errors", []):
        error_type = error.get("type", "UNKNOWN")
        severity = error.get("severity", "UNKNOWN")
        errors_by_type[error_type] = errors_by_type.get(error_type, 0) + 1
        errors_by_severity[severity] = errors_by_severity.get(severity, 0) + 1

print("\nüìä Summary Statistics:")
print(f"   Total slides analyzed: {len(slides)}")
print(f"   Total errors found: {total_errors}")

print("\n   Errors by type:")
for error_type, count in sorted(errors_by_type.items()):
    print(f"      {error_type}: {count}")

print("\n   Errors by severity:")
for severity, count in sorted(errors_by_severity.items()):
    print(f"      {severity}: {count}")

## 5. Test VLM Validator (Optional)

‚ö†Ô∏è **Note:** This requires Azure OpenAI credentials. Make sure to set the following environment variables:
- `AZURE_OPENAI_ENDPOINT`
- `AZURE_OPENAI_DEPLOYMENT` (optional, defaults to "gpt-4o")

You also need to be authenticated via `az login` for DefaultAzureCredential.

In [None]:
# Load environment variables from .env file if available
from dotenv import load_dotenv
load_dotenv(project_root / ".env")

print(f"AZURE_OPENAI_ENDPOINT: {'‚úÖ Set' if os.getenv('AZURE_OPENAI_ENDPOINT') else '‚ùå Not set'}")
print(f"AZURE_OPENAI_DEPLOYMENT: {os.getenv('AZURE_OPENAI_DEPLOYMENT', 'gpt-4o (default)')}")

In [None]:
# Test VLM Validator with a sample image
from PIL import Image
from src.validators.vlm_validator import validate_text_legibility

# Create a test image with text
test_img = Image.new('RGB', (400, 200), color='white')

try:
    from PIL import ImageDraw, ImageFont
    draw = ImageDraw.Draw(test_img)
    draw.text((50, 80), "Hello, World!", fill='black')
except:
    print("Could not draw text on image")

# Display the test image
display(test_img)

In [None]:
# Run VLM validation (requires Azure OpenAI)
try:
    is_legible = validate_text_legibility(test_img)
    print(f"VLM Validation Result: {'‚úÖ Legible' if is_legible else '‚ùå Not Legible'}")
except Exception as e:
    print(f"‚ùå VLM Validation failed: {e}")
    print("Make sure Azure OpenAI credentials are configured correctly.")

## 6. Custom Test: Create Mock Slides

Test detectors with programmatically created mock data.

In [None]:
# Create mock slides with intentional errors
mock_slides = [
    SlideNode(
        index=0,
        width=960,
        height=540,
        background_color="#FFFFFF",
        text_elements=[
            TextElement(
                id="title_1",
                slide_index=0,
                text="Main Title",
                style=TextStyle(font_size=36, font_name="Arial", bold=True, color="#000000"),
                bbox=BoundingBox(x=5, y=10, width=200, height=50)  # Too close to left edge!
            ),
            TextElement(
                id="subtitle_1",
                slide_index=0,
                text="Subtitle Here",
                style=TextStyle(font_size=24, font_name="Arial", bold=False, color="#666666"),
                bbox=BoundingBox(x=100, y=70, width=200, height=30)
            ),
            TextElement(
                id="body_1",
                slide_index=0, 
                text="Body text with size 14.5",
                style=TextStyle(font_size=14.5, font_name="Arial", bold=False, color="#EEEEEE"),  # Poor contrast!
                bbox=BoundingBox(x=102, y=150, width=300, height=25)  # Nearly aligned with subtitle!
            ),
            TextElement(
                id="body_2",
                slide_index=0,
                text="Body text with size 14",
                style=TextStyle(font_size=14, font_name="Arial", bold=False, color="#000000"),
                bbox=BoundingBox(x=100, y=180, width=300, height=25)
            ),
            TextElement(
                id="body_3",
                slide_index=0,
                text="Body text with size 14",
                style=TextStyle(font_size=14, font_name="Arial", bold=False, color="#000000"),
                bbox=BoundingBox(x=100, y=210, width=300, height=25)
            ),
        ],
        image_elements=[]
    )
]

print("‚úÖ Mock slides created with intentional errors:")
print("   - Title too close to left edge (margin)")
print("   - Body text with poor contrast against white background")
print("   - Body text with font size 14.5 instead of 14 (hierarchy)")
print("   - Elements nearly aligned (alignment)")

In [None]:
# Test detectors on mock slides
print("Testing detectors on mock slides:\n")

mock_hierarchy = detect_hierarchy_violations(mock_slides)
print(f"Hierarchy: {len(mock_hierarchy)} error(s)")
for e in mock_hierarchy:
    print(f"  - {e.message}")

mock_margin = detect_margin_violations(mock_slides)
print(f"\nMargin: {len(mock_margin)} error(s)")
for e in mock_margin:
    print(f"  - {e.message}")

mock_contrast = detect_contrast_violations(mock_slides)
print(f"\nContrast: {len(mock_contrast)} error(s)")
for e in mock_contrast:
    print(f"  - {e.message}")

mock_alignment = detect_alignment_violations(mock_slides)
print(f"\nAlignment: {len(mock_alignment)} error(s)")
for e in mock_alignment[:5]:
    print(f"  - {e.message}")

## 7. Export Results

Save the analysis results to a file.

In [None]:
# Save full report to JSON file
output_path = project_root / "output" / "analysis_report.json"
output_path.parent.mkdir(exist_ok=True)

with open(output_path, "w") as f:
    f.write(result_json)

print(f"‚úÖ Report saved to: {output_path}")