# SME Review Interface

This notebook provides an interface for Subject Matter Experts (SMEs) to review documents that were escalated by the V5 Arbiter.

## Workflow
1. View pending review packets
2. Load a document for review
3. Review classification and issues
4. Validate or correct the classification
5. Save ground truth

In [33]:
# Setup - Add parent directory to Python path
import sys
from pathlib import Path

# Add parent directory to path so we can import from src
parent_dir = Path.cwd().parent
if str(parent_dir) not in sys.path:
    sys.path.insert(0, str(parent_dir))

import json
from IPython.display import display, HTML, IFrame
import ipywidgets as widgets

# Import review helper
from src.evaluation.review_helper import SMEReviewHelper
from src.evaluation.ground_truth_schemas import SMEPacket

# Initialize helper
helper = SMEReviewHelper()
SMEReviewHelper.inject_styles() 

print("‚úÖ SME Review Interface Ready")

‚úÖ SME Review Interface Ready


## 1. View Pending Reviews

In [34]:
# Get review statistics
stats = helper.get_review_stats()

print("="*60)
print("REVIEW QUEUE STATUS")
print("="*60)
print(f"Total Packets: {stats['total_packets']}")
print(f"Pending: {stats['pending']}")
print(f"Completed: {stats['completed']}")
print(f"Completion Rate: {stats['completion_rate']:.1%}")
print()

# List pending reviews
pending = helper.list_pending_reviews()

if pending:
    print("Pending Reviews:")
    for i, p in enumerate(pending, 1):
        print(f"  {i}. {p['doc_id']} - {p['total_issues']} issue(s)")
else:
    print("No pending reviews! üéâ")

REVIEW QUEUE STATUS
Total Packets: 1
Pending: 0
Completed: 1
Completion Rate: 100.0%

No pending reviews! üéâ


## 2. Load Document for Review

Enter the document ID to review:

In [35]:
# Load packet
DOC_ID = "doc2_6"  # Change this to the document you want to review

try:
    packet = helper.load_packet(DOC_ID)
    print(f"‚úÖ Loaded packet for: {DOC_ID}")
    print(f"   PDF: {packet.pdf_filename}")
    print(f"   Total Issues: {packet.total_issues}")
    print(f"   V5 Decision: {packet.v5_decision}")
except FileNotFoundError as e:
    print(f"‚ùå Error: {e}")
    packet = None

‚úÖ Loaded packet for: doc2_6
   PDF: doc2_6.pdf
   Total Issues: 1
   V5 Decision: ESCALATE_TO_SME


## 3. Review Primary Agent Classification

In [36]:
if packet:
    classif = packet.primary_agent_classification
    
    print("="*60)
    print("PRIMARY AGENT CLASSIFICATION")
    print("="*60)
    print(f"Dominant Type: {classif.dominant_type_overall}")
    print(f"Number of Segments: {classif.number_of_segments}")
    print()
    
    print("Document Mixture:")
    for mix in classif.document_mixture:
        print(f"  ‚Ä¢ {mix.document_type.value}")
        print(f"    Presence: {mix.presence_level.value}")
        print(f"    Share: {mix.overall_share:.1%}")
        print(f"    Confidence: {mix.confidence:.2f}")
        print()

PRIMARY AGENT CLASSIFICATION
Dominant Type: DocumentType.GENOMIC_REPORT
Number of Segments: 2

Document Mixture:
  ‚Ä¢ Clinical Note
    Presence: NO_EVIDENCE
    Share: 0.0%
    Confidence: 0.00

  ‚Ä¢ Radiology Report
    Presence: NO_EVIDENCE
    Share: 0.0%
    Confidence: 0.00

  ‚Ä¢ Pathology Report
    Presence: EMBEDDED_RAW
    Share: 16.0%
    Confidence: 0.75

  ‚Ä¢ Genomic Report
    Presence: PRIMARY
    Share: 64.0%
    Confidence: 0.90

  ‚Ä¢ Other
    Presence: PRIMARY
    Share: 20.0%
    Confidence: 0.90



## 4. Review Issues Detected

In [37]:
import html as html_lib
if packet:
    print("="*60)
    print(f"ISSUES DETECTED ({packet.total_issues})")
    print("="*60)
    
    for i, issue in enumerate(packet.issues_summary, 1):
        print(f"\n{i}. [{issue['severity']}] {issue['agent']}")
        print(f"   ID: {issue['id']}")
        print(f"   Message: {issue['message']}")
        print(f"   Location: {issue['location']}")
        print(f"   Suggested Fix: {issue['suggested_fix']}")
        
        # Get and display context
        context = helper.get_issue_context(packet, issue)
        
        if context['segment_info']:
            seg = context['segment_info']
            classif = context['classification_reasoning']
            
            # Build HTML for context display
            html = f"""
            <div style="border: 2px solid #ddd; padding: 15px; margin: 15px 0; border-radius: 8px; background: #fafafa;">
                <h4 style="margin-top: 0; color: #333;">üìç Context: Segment {seg['segment_index']} (Pages {seg['start_page']}-{seg['end_page']}) - {seg['dominant_type']}</h4>
                
                <div style="margin: 10px 0; padding: 10px; background: white; border-radius: 5px;">
                    <strong>Document Type:</strong> {classif['document_type']}<br>
                    <strong>Presence Level:</strong> {classif['presence_level']} (Confidence: {classif['confidence']:.2f})<br>
                    <strong>Segment Share:</strong> {classif['segment_share']:.1%}
                </div>
            """
            
            # NEW: Display actual PDF text if available
            if context.get('pdf_text_chunks'):
                html += '<div style="background: #f0f8ff; padding: 12px; margin: 10px 0; border-left: 4px solid #1976d2; border-radius: 5px;">'
                html += '<strong style="color: #0d47a1;">üìÑ Original Document Text:</strong><br>'
                
                for chunk in context['pdf_text_chunks']:
                    html += f'<div style="margin-top: 10px;">'
                    html += f'<strong>Page {chunk["page"]}:</strong><br>'
                    
                    # Display paragraphs (now includes surrounding context from review_helper)
                    for idx, para in enumerate(chunk['paragraphs']):
                        para_escaped = html_lib.escape(para)
                        snippet_escaped = html_lib.escape(chunk['snippet'])
                        
                        # Only highlight if this paragraph contains the snippet
                        if snippet_escaped.lower() in para_escaped.lower():
                            para_highlighted = para_escaped.replace(
                                snippet_escaped,
                                f'<mark style="background: #ffeb3b; padding: 2px 4px; border-radius: 3px; font-weight: bold;">{snippet_escaped}</mark>'
                            )
                        else:
                            # This is surrounding context - show without highlighting
                            para_highlighted = para_escaped
                        
                        html += f'<div style="margin: 8px 0; padding: 8px; background: white; border-radius: 3px; font-family: monospace; font-size: 0.9em;">{para_highlighted}</div>'

                    # Display paragraphs with highlighted snippets
                    """
                    for para in chunk['paragraphs']:
                        para_escaped = html_lib.escape(para)
                        snippet_escaped = html_lib.escape(chunk['snippet'])
                        
                        # Highlight the evidence snippet
                        para_highlighted = para_escaped.replace(
                            snippet_escaped,
                            f'<mark style="background: #ffeb3b; padding: 2px 4px; border-radius: 3px; font-weight: bold;">{snippet_escaped}</mark>'
                        )
                        
                        html += f'<div style="margin: 8px 0; padding: 8px; background: white; border-radius: 3px; font-family: monospace; font-size: 0.9em;">{para_highlighted}</div>'
                    """
                    html += '</div>'
                
                html += '</div>'
            
            # Classification reasoning (agent's interpretation)
            html += """
                <div style="background: #fff3cd; padding: 12px; margin: 10px 0; border-left: 4px solid #ff6b6b; border-radius: 5px;">
                    <strong style="color: #856404;">üîç Classification Reasoning:</strong><br>
                    <div style="margin-top: 8px; color: #333;">
            """
            
            # Highlight problematic text in reasoning if found
            reasoning = classif['reasoning']
            if context['problematic_text']:
                # Escape HTML and highlight
                import html as html_lib
                reasoning_escaped = html_lib.escape(reasoning)
                problematic_escaped = html_lib.escape(context['problematic_text'])
                reasoning_highlighted = reasoning_escaped.replace(
                    problematic_escaped,
                    f'<mark style="background: #ffeb3b; padding: 2px 4px; border-radius: 3px;">{problematic_escaped}</mark>'
                )
                html += reasoning_highlighted
            else:
                html += html_lib.escape(reasoning)
            
            html += """
                    </div>
                </div>
            """
            
            # Add evidence if available (and not already shown in PDF text)
            if context['evidence'] and not context.get('pdf_text_chunks'):
                html += '<div style="background: #e3f2fd; padding: 12px; margin: 10px 0; border-left: 4px solid #2196f3; border-radius: 5px;">'
                html += '<strong style="color: #0d47a1;">üìÑ Evidence References:</strong><br>'
                html += '<div style="margin-top: 8px;">'
                for ev in context['evidence']:
                    html += f'<div style="margin: 5px 0;">'
                    html += f'<strong>Page {ev["page"]}:</strong> "{html_lib.escape(ev["snippet"])}"'
                    if ev.get('anchors'):
                        html += f'<br><small style="color: #666;">Anchors: {", ".join(ev["anchors"])}</small>'
                    html += '</div>'
                html += '</div></div>'
            
            html += '</div>'
            
            # Display the HTML
            display(HTML(html))
        else:
            print("   (No detailed context available for this issue)")

ISSUES DETECTED (1)

1. [MAJOR] V3
   ID: V3-0001
   Message: Pathology Report classified as EMBEDDED_RAW in segment 1 based on 'Immunohistochemistry Results' but the reasoning explicitly states it 'lacks the full 'Classic Triad' for a primary pathology report'. This suggests it's a result mention rather than a structured pathology report document.
   Location: {'segment_index': 1, 'document_type': 'Pathology Report', 'field': 'presence_level'}
   Suggested Fix: Re-evaluate if 'Immunohistochemistry Results' alone, without the 'Classic Triad' (Gross Description, Microscopic Description, Final Diagnosis, pathologist signature), warrants EMBEDDED_RAW classification, or if it should be downgraded to MENTION_ONLY or considered part of the Genomic Report's findings.


## 5. Production Classifier Comparison (Optional)

In [38]:
if packet and packet.production_classification:
    print("="*60)
    print("PRODUCTION CLASSIFIER COMPARISON")
    print("="*60)
    
    prod = packet.production_classification
    print(f"Production Dominant Type: {prod['dominant_type']}")
    print(f"Primary Agent Dominant Type: {packet.primary_agent_classification.dominant_type_overall}")
    
    if packet.production_differs:
        print("\n‚ö†Ô∏è  MISMATCH - Production differs from primary agent")
    else:
        print("\n‚úÖ MATCH - Production agrees with primary agent")

## 6. Manual Review Submission

Submit your review using the code below:

In [39]:
# Manual review submission
REVIEWER_NAME = "Dr. Smith"  # Change this
AGREES_WITH_PRIMARY = True   # Change to False if corrections needed
CONFIDENCE = 1.0             # 0.0 to 1.0
REVIEW_NOTES = "Classification looks correct. All issues are minor."

# If AGREES_WITH_PRIMARY is False, fill in corrections:
CORRECTIONS = None  # Or dict like: {'dominant_type': 'Corrected Type', 'notes': '...'}

# Submit
helper.save_review(
    doc_id=DOC_ID,
    reviewer_name=REVIEWER_NAME,
    agrees_with_primary=AGREES_WITH_PRIMARY,
    corrections=CORRECTIONS,
    review_notes=REVIEW_NOTES,
    confidence=CONFIDENCE
)

‚úÖ Review saved for doc2_6
   Agrees with primary: True
   Ground truth created: output/ground_truth/gt_doc2_6.json


## 7. View Ground Truth Record

In [40]:
# Load ground truth record
gt_file = Path(f"../output/ground_truth/gt_{DOC_ID}.json")

if gt_file.exists():
    with open(gt_file) as f:
        gt_data = json.load(f)
    
    print("="*60)
    print("GROUND TRUTH RECORD")
    print("="*60)
    print(f"Document: {gt_data['doc_id']}")
    print(f"Source: {gt_data['ground_truth_source']}")
    print(f"Reviewer: {gt_data['sme_review']['reviewer_name']}")
    print(f"Agrees with Primary: {gt_data['sme_review']['agrees_with_primary_agent']}")
    print(f"Confidence: {gt_data['sme_review']['confidence_in_review']}")
    print(f"\nNotes: {gt_data['sme_review']['review_notes']}")
    print()
    print(f"‚úÖ Ground truth saved to: {gt_file}")
else:
    print(f"No ground truth record found for {DOC_ID}")
    print("Submit a review first!")

GROUND TRUTH RECORD
Document: doc2_6
Source: sme_validated
Reviewer: Dr. Smith
Agrees with Primary: True
Confidence: 1.0

Notes: Classification looks correct. All issues are minor.

‚úÖ Ground truth saved to: ../output/ground_truth/gt_doc2_6.json
