# Module 3: Conformance Checking
## NTH-ED Triage Lead Decision Support Tool

**Purpose:** Compare actual patient journeys against the standard ED protocol to identify deviations.

**Why This Matters for Triage Lead:**
- **Patient Safety:** Skipped triage = missed critical symptoms
- **Compliance:** Protocol violations may have legal implications
- **Quality Improvement:** Identify systemic issues causing deviations
- **Resource Planning:** Understand why patients bypass steps

---

## Cell 1: Imports

In [22]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from IPython.display import display, HTML
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

## Cell 2: Load Dependencies

In [23]:
# Run helper notebook for constants
%run ../utils/helpers.ipynb

# Run data loader notebook
%run data_loader.ipynb

NTH-ED DATA LOADING PIPELINE
üì• Loading event log...
   ‚úì Loaded 90,965 events
   ‚úì 16,011 unique patient visits
   ‚úì Columns: ['visit_id', 'patient_id', 'initial_zone', 'age', 'month', 'day', 'gender', 'triage_code', 'triage_desc', 'disposition_code', 'disposition_desc', 'consult_desc', 'cdu_flag', 'consult_req_flag', 'consult_arrival_flag', 'event', 'timestamp']

‚è∞ Parsing timestamps...
   ‚úì All timestamps parsed successfully
   ‚úì Date range: 2021-03-31 23:59:00 to 2021-06-01 17:16:00

üîß Handling missing data...
   ‚Ä¢ initial_zone: 1,950 missing (2.1%)
   ‚Ä¢ triage_code: 3 missing (0.0%)
   ‚Ä¢ consult_desc: 69,698 missing (76.6%)
   ‚Ä¢ age: 0 missing (0.0%)
   ‚úì Missing zones filled with 'Unknown'
   ‚úì Missing consults marked as 'No Consult'

üìã Standardizing columns for process mining...
   ‚úì Created process mining columns: case_id, activity, resource
   ‚úì Created outcome flags: is_admitted, is_lwbs

üîÄ Sorting events with logical ordering...
   ‚úì 

## Cell 3: Load Data

In [24]:
# Load the data
filepath = "/Users/ishaandawra/Desktop/Machine Learning Notes/Machine Learning Projects/Analytics_Colloquia_Project/data/event_log_ED_MMA_2026.csv"
event_log, visits = load_and_prepare_data(filepath)

print(f"\n‚úì Data loaded: {len(visits):,} patient visits")

NTH-ED DATA LOADING PIPELINE
üì• Loading event log...
   ‚úì Loaded 90,965 events
   ‚úì 16,011 unique patient visits
   ‚úì Columns: ['visit_id', 'patient_id', 'initial_zone', 'age', 'month', 'day', 'gender', 'triage_code', 'triage_desc', 'disposition_code', 'disposition_desc', 'consult_desc', 'cdu_flag', 'consult_req_flag', 'consult_arrival_flag', 'event', 'timestamp']

‚è∞ Parsing timestamps...
   ‚úì All timestamps parsed successfully
   ‚úì Date range: 2021-03-31 23:59:00 to 2021-06-01 17:16:00

üîß Handling missing data...
   ‚Ä¢ initial_zone: 1,950 missing (2.1%)
   ‚Ä¢ triage_code: 3 missing (0.0%)
   ‚Ä¢ consult_desc: 69,698 missing (76.6%)
   ‚Ä¢ age: 0 missing (0.0%)
   ‚úì Missing zones filled with 'Unknown'
   ‚úì Missing consults marked as 'No Consult'

üìã Standardizing columns for process mining...
   ‚úì Created process mining columns: case_id, activity, resource
   ‚úì Created outcome flags: is_admitted, is_lwbs

üîÄ Sorting events with logical ordering...
   ‚úì 

---
## Section 1: Define Standard Protocols

### Standard ED Pathway (Expected)
```
Walk-in Patient:    Triage ‚Üí Registration ‚Üí Assessment ‚Üí Discharge ‚Üí Left ED
Ambulance Patient:  Ambulance Arrival ‚Üí Triage ‚Üí Registration ‚Üí Assessment ‚Üí Discharge ‚Üí Left ED
Consult Patient:    ... ‚Üí Assessment ‚Üí Consult Request ‚Üí Consult Arrival ‚Üí Discharge ‚Üí Left ED
```

### Required Steps (Must Have)
1. **Triage** - Assign acuity level
2. **Registration** - Capture patient info
3. **Assessment** - Physician sees patient (PIA)
4. **Discharge** - Medical clearance

### Deviation Types
| Type | Description | Risk Level |
|------|-------------|------------|
| Missing Triage | Patient never triaged | HIGH - Unknown acuity |
| Missing Registration | No patient record | MEDIUM - Billing/legal issues |
| Missing Assessment | Never saw physician | HIGH - LWBS, untreated condition |
| Wrong Order | Steps out of sequence | LOW-MEDIUM - Process issue |

---

## Cell 4: Define Protocol Rules

In [25]:
# =============================================================================
# PROTOCOL DEFINITIONS
# =============================================================================

# Standard protocol steps (in expected order)
STANDARD_PROTOCOL = ['Triage', 'Registration', 'Assessment', 'Discharge']

# Required steps (must be present in every visit)
REQUIRED_STEPS = ['Triage', 'Registration', 'Assessment']

# Expected ordering rules (A must come before B)
ORDERING_RULES = [
    ('Triage', 'Registration'),
    ('Registration', 'Assessment'),
    ('Assessment', 'Discharge'),
    ('Triage', 'Assessment'),
    ('Consult Request', 'Consult Arrival'),
]

# Deviation severity levels
SEVERITY = {
    'missing_triage': 'HIGH',
    'missing_registration': 'MEDIUM', 
    'missing_assessment': 'HIGH',
    'wrong_order': 'LOW',
    'discharge_before_assessment': 'HIGH',
    'multiple_deviations': 'CRITICAL'
}

print("‚úì Protocol rules defined")
print(f"  ‚Ä¢ Standard protocol: {' ‚Üí '.join(STANDARD_PROTOCOL)}")
print(f"  ‚Ä¢ Required steps: {REQUIRED_STEPS}")
print(f"  ‚Ä¢ Ordering rules: {len(ORDERING_RULES)} rules")

‚úì Protocol rules defined
  ‚Ä¢ Standard protocol: Triage ‚Üí Registration ‚Üí Assessment ‚Üí Discharge
  ‚Ä¢ Required steps: ['Triage', 'Registration', 'Assessment']
  ‚Ä¢ Ordering rules: 5 rules


---
## Section 2: Build Conformance Checker

---

## Cell 5: Extract Case Traces

In [26]:
def extract_case_trace(event_log: pd.DataFrame, case_id: int) -> list:
    """
    Extract ordered list of activities for a single case.
    """
    case_events = event_log[event_log['case_id'] == case_id].sort_values('timestamp')
    return case_events['activity'].tolist()


def extract_all_traces(event_log: pd.DataFrame) -> pd.DataFrame:
    """
    Extract traces for all cases with additional metadata.
    
    Returns DataFrame with:
    - case_id
    - trace (list of activities)
    - trace_str (string representation)
    - num_events
    """
    traces = []
    
    for case_id, group in event_log.groupby('case_id'):
        events = group.sort_values('timestamp')
        trace = events['activity'].tolist()
        
        traces.append({
            'case_id': case_id,
            'trace': trace,
            'trace_str': ' ‚Üí '.join(trace),
            'num_events': len(trace)
        })
    
    return pd.DataFrame(traces)

# Extract all traces
print("Extracting patient traces...")
traces_df = extract_all_traces(event_log)
print(f"‚úì Extracted {len(traces_df):,} patient traces")

Extracting patient traces...
‚úì Extracted 16,011 patient traces


## Cell 6: Core Conformance Check Function

In [27]:
def check_conformance(trace: list) -> dict:
    """
    Check a single trace against protocol rules.
    
    WHY THIS MATTERS:
    - Each deviation type has different implications
    - Severity helps Triage Lead prioritize which cases to review
    - Patterns in deviations reveal systemic issues
    
    Parameters:
    -----------
    trace : list - Ordered list of activities for one patient
    
    Returns:
    --------
    dict with:
        - is_conformant: bool
        - deviations: list of deviation descriptions
        - deviation_types: list of deviation type codes
        - severity: overall severity level
    """
    
    deviations = []
    deviation_types = []
    
    # Check 1: Missing required steps
    for step in REQUIRED_STEPS:
        if step not in trace:
            deviations.append(f"Missing {step}")
            deviation_types.append(f"missing_{step.lower()}")
    
    # Check 2: Ordering violations
    for (step_a, step_b) in ORDERING_RULES:
        if step_a in trace and step_b in trace:
            idx_a = trace.index(step_a)
            idx_b = trace.index(step_b)
            if idx_a > idx_b:  # A should come before B
                deviations.append(f"{step_b} before {step_a}")
                deviation_types.append('wrong_order')
    
    # Check 3: Specific high-risk patterns
    if 'Discharge' in trace and 'Assessment' not in trace:
        if 'discharge_without_assessment' not in deviation_types:
            deviations.append("Discharged without Assessment")
            deviation_types.append('discharge_without_assessment')
    
    # Determine overall severity
    if len(deviation_types) == 0:
        severity = 'NONE'
    elif len(deviation_types) >= 3:
        severity = 'CRITICAL'
    elif any(dt in ['missing_triage', 'missing_assessment', 'discharge_without_assessment'] for dt in deviation_types):
        severity = 'HIGH'
    elif 'missing_registration' in deviation_types:
        severity = 'MEDIUM'
    else:
        severity = 'LOW'
    
    return {
        'is_conformant': len(deviations) == 0,
        'deviations': deviations,
        'deviation_types': deviation_types,
        'num_deviations': len(deviations),
        'severity': severity
    }

print("‚úì check_conformance() function defined")

‚úì check_conformance() function defined


## Cell 7: Run Conformance Check on All Cases

In [28]:
def run_conformance_analysis(traces_df: pd.DataFrame, visits: pd.DataFrame) -> pd.DataFrame:
    """
    Run conformance checking on all cases and merge with visit data.
    
    Returns DataFrame with conformance results + patient attributes.
    """
    
    print("Running conformance analysis on all cases...")
    
    results = []
    
    for _, row in traces_df.iterrows():
        conf = check_conformance(row['trace'])
        
        results.append({
            'case_id': row['case_id'],
            'trace_str': row['trace_str'],
            'num_events': row['num_events'],
            'is_conformant': conf['is_conformant'],
            'num_deviations': conf['num_deviations'],
            'deviations': ', '.join(conf['deviations']) if conf['deviations'] else 'None',
            'deviation_types': conf['deviation_types'],
            'severity': conf['severity']
        })
    
    conformance_df = pd.DataFrame(results)
    
    # Merge with visit attributes for context
    visit_cols = ['case_id', 'initial_zone', 'triage_level', 'age', 'gender', 
                  'disposition_desc', 'is_admitted', 'is_lwbs', 'pia_minutes', 'los_minutes']
    
    conformance_df = conformance_df.merge(
        visits[visit_cols], 
        on='case_id', 
        how='left'
    )
    
    print(f"‚úì Conformance analysis complete")
    
    return conformance_df

# Run the analysis
conformance_results = run_conformance_analysis(traces_df, visits)

Running conformance analysis on all cases...
‚úì Conformance analysis complete


---
## Section 3: Conformance Summary Statistics

---

## Cell 8: Overall Conformance Summary

In [29]:
def print_conformance_summary(conformance_df: pd.DataFrame):
    """
    Print a summary of conformance results.
    """
    
    total = len(conformance_df)
    conformant = conformance_df['is_conformant'].sum()
    non_conformant = total - conformant
    
    print("=" * 70)
    print("CONFORMANCE ANALYSIS SUMMARY")
    print("=" * 70)
    
    print(f"\nüìä OVERALL RESULTS:")
    print(f"   Total Cases Analyzed:    {total:,}")
    print(f"   ‚úì Conformant:            {conformant:,} ({conformant/total*100:.1f}%)")
    print(f"   ‚úó Non-Conformant:        {non_conformant:,} ({non_conformant/total*100:.1f}%)")
    
    print(f"\n‚ö†Ô∏è  SEVERITY BREAKDOWN:")
    severity_counts = conformance_df['severity'].value_counts()
    for sev in ['CRITICAL', 'HIGH', 'MEDIUM', 'LOW', 'NONE']:
        count = severity_counts.get(sev, 0)
        pct = count / total * 100
        icon = {'CRITICAL': 'üî¥', 'HIGH': 'üü†', 'MEDIUM': 'üü°', 'LOW': 'üü¢', 'NONE': '‚úì'}[sev]
        print(f"   {icon} {sev:12} {count:,} ({pct:.1f}%)")
    
    return {
        'total': total,
        'conformant': conformant,
        'non_conformant': non_conformant,
        'conformance_rate': conformant / total
    }

summary_stats = print_conformance_summary(conformance_results)

CONFORMANCE ANALYSIS SUMMARY

üìä OVERALL RESULTS:
   Total Cases Analyzed:    16,011
   ‚úì Conformant:            15,733 (98.3%)
   ‚úó Non-Conformant:        278 (1.7%)

‚ö†Ô∏è  SEVERITY BREAKDOWN:
   üî¥ CRITICAL     2 (0.0%)
   üü† HIGH         161 (1.0%)
   üü° MEDIUM       0 (0.0%)
   üü¢ LOW          115 (0.7%)
   ‚úì NONE         15,733 (98.3%)


## Cell 9: Deviation Type Breakdown

In [30]:
def analyze_deviation_types(conformance_df: pd.DataFrame) -> pd.DataFrame:
    """
    Count occurrences of each deviation type.
    """
    
    # Flatten all deviation types
    all_deviations = []
    for dev_list in conformance_df['deviation_types']:
        all_deviations.extend(dev_list)
    
    # Count each type
    from collections import Counter
    dev_counts = Counter(all_deviations)
    
    # Create summary DataFrame
    dev_summary = pd.DataFrame([
        {'Deviation Type': k, 'Count': v, 'Percentage': v/len(conformance_df)*100}
        for k, v in dev_counts.items()
    ])
    
    if len(dev_summary) > 0:
        dev_summary = dev_summary.sort_values('Count', ascending=False).reset_index(drop=True)
        
        # Add severity
        dev_summary['Severity'] = dev_summary['Deviation Type'].map(
            lambda x: SEVERITY.get(x, 'UNKNOWN')
        )
    
    return dev_summary

deviation_breakdown = analyze_deviation_types(conformance_results)

print("\n" + "=" * 70)
print("DEVIATION TYPE BREAKDOWN")
print("=" * 70)
print("\n")
display(deviation_breakdown)


DEVIATION TYPE BREAKDOWN




Unnamed: 0,Deviation Type,Count,Percentage,Severity
0,missing_assessment,163,1.01805,HIGH
1,discharge_without_assessment,163,1.01805,UNKNOWN
2,wrong_order,135,0.84317,LOW
3,missing_triage,1,0.006246,HIGH


## Cell 10: Visualize Deviation Types

In [31]:
# Create deviation type visualization
if len(deviation_breakdown) > 0:
    # Color by severity
    color_map = {'HIGH': '#DC2626', 'MEDIUM': '#F97316', 'LOW': '#22C55E', 'UNKNOWN': '#6B7280'}
    deviation_breakdown['Color'] = deviation_breakdown['Severity'].map(color_map)
    
    fig_dev = px.bar(
        deviation_breakdown.sort_values('Count'),
        x='Count',
        y='Deviation Type',
        orientation='h',
        color='Severity',
        color_discrete_map=color_map,
        title='Protocol Deviations by Type and Severity',
        text='Count'
    )
    
    fig_dev.update_traces(textposition='outside')
    fig_dev.update_layout(
        height=400,
        xaxis_title='Number of Cases',
        yaxis_title=''
    )
    
    fig_dev.show()
else:
    print("No deviations found - all cases conform to protocol!")

---
## Section 4: Analyze Non-Conformant Cases

**Key Questions for Triage Lead:**
1. Which zones have the most deviations?
2. Are certain triage levels more likely to deviate?
3. What happens to patients who skip steps?

---

## Cell 11: Deviations by Zone

In [32]:
def analyze_deviations_by_group(conformance_df: pd.DataFrame, group_col: str) -> pd.DataFrame:
    """
    Analyze conformance rates by a grouping variable (zone, triage level, etc.)
    """
    
    summary = conformance_df.groupby(group_col).agg({
        'case_id': 'count',
        'is_conformant': 'sum',
        'num_deviations': 'mean'
    }).reset_index()
    
    summary.columns = [group_col, 'Total Cases', 'Conformant', 'Avg Deviations']
    summary['Non-Conformant'] = summary['Total Cases'] - summary['Conformant']
    summary['Conformance Rate'] = (summary['Conformant'] / summary['Total Cases'] * 100).round(1)
    summary['Deviation Rate'] = (100 - summary['Conformance Rate']).round(1)
    
    return summary.sort_values('Deviation Rate', ascending=False)

# Analyze by Zone
zone_conformance = analyze_deviations_by_group(conformance_results, 'initial_zone')

print("=" * 70)
print("CONFORMANCE BY ZONE")
print("=" * 70)
print("\n(Sorted by Deviation Rate - highest first)\n")
display(zone_conformance[['initial_zone', 'Total Cases', 'Non-Conformant', 'Deviation Rate', 'Conformance Rate']])

CONFORMANCE BY ZONE

(Sorted by Deviation Rate - highest first)



Unnamed: 0,initial_zone,Total Cases,Non-Conformant,Deviation Rate,Conformance Rate
8,Unknown,358,80,22.3,77.7
6,Resus,421,27,6.4,93.6
7,SA,1078,28,2.6,97.4
2,EPZ,2503,45,1.8,98.2
9,YZ,4232,43,1.0,99.0
0,A,3193,28,0.9,99.1
3,GZ,4112,27,0.7,99.3
1,Checkout,1,0,0.0,100.0
4,HH,14,0,0.0,100.0
5,Red,99,0,0.0,100.0


## Cell 12: Deviations by Triage Level

In [33]:
# Analyze by Triage Level
triage_conformance = analyze_deviations_by_group(conformance_results, 'triage_level')

print("=" * 70)
print("CONFORMANCE BY TRIAGE LEVEL")
print("=" * 70)
print("\n")
display(triage_conformance[['triage_level', 'Total Cases', 'Non-Conformant', 'Deviation Rate', 'Conformance Rate']])

CONFORMANCE BY TRIAGE LEVEL




Unnamed: 0,triage_level,Total Cases,Non-Conformant,Deviation Rate,Conformance Rate
0,1-RESUSCITATION,134,12,9.0,91.0
3,4-LESS URGENT,1947,41,2.1,97.9
1,2-EMERGENCY,4822,91,1.9,98.1
2,3-URGENT,8737,125,1.4,98.6
4,5-NON-URGENT,367,5,1.4,98.6


## Cell 13: Visualize Conformance by Zone

In [34]:
# Create zone conformance visualization
fig_zone = px.bar(
    zone_conformance.sort_values('Conformance Rate'),
    x='Conformance Rate',
    y='initial_zone',
    orientation='h',
    title='Protocol Conformance Rate by Zone',
    color='Conformance Rate',
    color_continuous_scale='RdYlGn',
    text='Conformance Rate'
)

fig_zone.update_traces(texttemplate='%{text:.1f}%', textposition='outside')
fig_zone.update_layout(
    height=400,
    xaxis_title='Conformance Rate (%)',
    yaxis_title='Zone',
    xaxis_range=[0, 105]
)

# Add reference line at 95%
fig_zone.add_vline(x=95, line_dash="dash", line_color="red", 
                   annotation_text="95% Target", annotation_position="top")

fig_zone.show()

---
## Section 5: Detailed Case Tables for Review

**Purpose:** Give Triage Lead actionable lists of cases to investigate.

---

## Cell 14: High Severity Cases Table

In [35]:
def get_cases_by_severity(conformance_df: pd.DataFrame, severity: str, max_rows: int = 20) -> pd.DataFrame:
    """
    Get cases filtered by severity level.
    
    WHY: Triage Lead should focus on HIGH/CRITICAL cases first.
    """
    
    filtered = conformance_df[conformance_df['severity'] == severity].copy()
    
    # Select columns relevant for review
    cols = ['case_id', 'initial_zone', 'triage_level', 'deviations', 
            'disposition_desc', 'is_lwbs', 'los_minutes']
    
    return filtered[cols].head(max_rows)

# Get HIGH severity cases
high_severity_cases = get_cases_by_severity(conformance_results, 'HIGH', max_rows=15)

print("=" * 70)
print("üî¥ HIGH SEVERITY DEVIATIONS - REVIEW REQUIRED")
print("=" * 70)
print(f"\nShowing 15 of {len(conformance_results[conformance_results['severity'] == 'HIGH']):,} high severity cases\n")
display(high_severity_cases)

üî¥ HIGH SEVERITY DEVIATIONS - REVIEW REQUIRED

Showing 15 of 161 high severity cases



Unnamed: 0,case_id,initial_zone,triage_level,deviations,disposition_desc,is_lwbs,los_minutes
0,5240985,Unknown,2-EMERGENCY,"Missing Assessment, Discharged without Assessment",Left After Triage,1,
1,7214779,YZ,3-URGENT,"Missing Assessment, Discharged without Assessment",Left After Triage,1,52.0
16,7384149,YZ,3-URGENT,"Missing Assessment, Discharged without Assessment",Left After Triage,1,82.0
18,7384151,YZ,3-URGENT,"Missing Assessment, Discharged without Assessment",Left After Triage,1,109.0
286,7386188,GZ,3-URGENT,"Missing Assessment, Discharged without Assessment",Left After Triage,1,230.0
312,7386421,EPZ,4-LESS URGENT,"Missing Assessment, Discharged without Assessment",Left After Triage,1,40.0
511,7387301,YZ,3-URGENT,"Missing Assessment, Discharged without Assessment",Left After Triage,1,95.0
740,7388515,GZ,3-URGENT,"Missing Assessment, Discharged without Assessment",Left After Triage,1,92.0
742,7388517,GZ,3-URGENT,"Missing Assessment, Discharged without Assessment",Left After Triage,1,65.0
912,7389515,SA,3-URGENT,"Missing Assessment, Discharged without Assessment",Left After Triage,1,278.0


## Cell 15: Critical Cases (Multiple Deviations)

In [36]:
# Get CRITICAL cases (multiple deviations)
critical_cases = get_cases_by_severity(conformance_results, 'CRITICAL', max_rows=15)

print("=" * 70)
print("üî¥üî¥ CRITICAL CASES - MULTIPLE DEVIATIONS")
print("=" * 70)
print(f"\nShowing cases with 3+ protocol violations\n")

if len(critical_cases) > 0:
    display(critical_cases)
else:
    print("No critical cases found.")

üî¥üî¥ CRITICAL CASES - MULTIPLE DEVIATIONS

Showing cases with 3+ protocol violations



Unnamed: 0,case_id,initial_zone,triage_level,deviations,disposition_desc,is_lwbs,los_minutes
3359,7404882,Unknown,,"Missing Triage, Missing Assessment, Discharged...",Left at his/her own risk following registration,1,
5169,7417164,GZ,4-LESS URGENT,"Missing Assessment, Registration before Triage...",Left After Triage,1,127.0


## Cell 16: Cases Missing Assessment (Potential LWBS)

In [37]:
def get_cases_by_deviation_type(conformance_df: pd.DataFrame, deviation_type: str) -> pd.DataFrame:
    """
    Get cases with a specific deviation type.
    """
    
    mask = conformance_df['deviation_types'].apply(lambda x: deviation_type in x)
    filtered = conformance_df[mask].copy()
    
    cols = ['case_id', 'initial_zone', 'triage_level', 'trace_str',
            'disposition_desc', 'is_lwbs']
    
    return filtered[cols]

# Get cases missing assessment
missing_assessment = get_cases_by_deviation_type(conformance_results, 'missing_assessment')

print("=" * 70)
print("‚ö†Ô∏è  CASES MISSING PHYSICIAN ASSESSMENT")
print("=" * 70)
print(f"\nTotal: {len(missing_assessment):,} cases never saw a physician\n")

# Check LWBS correlation
lwbs_of_missing = missing_assessment['is_lwbs'].sum()
print(f"Of these, {lwbs_of_missing:,} ({lwbs_of_missing/len(missing_assessment)*100:.1f}%) are marked as LWBS")
print("\nSample cases:")
display(missing_assessment.head(10))

‚ö†Ô∏è  CASES MISSING PHYSICIAN ASSESSMENT

Total: 163 cases never saw a physician

Of these, 163 (100.0%) are marked as LWBS

Sample cases:


Unnamed: 0,case_id,initial_zone,triage_level,trace_str,disposition_desc,is_lwbs
0,5240985,Unknown,2-EMERGENCY,Discharge ‚Üí Left ED ‚Üí Triage ‚Üí Registration,Left After Triage,1
1,7214779,YZ,3-URGENT,Triage ‚Üí Registration ‚Üí Discharge ‚Üí Left ED,Left After Triage,1
16,7384149,YZ,3-URGENT,Triage ‚Üí Registration ‚Üí Discharge ‚Üí Left ED,Left After Triage,1
18,7384151,YZ,3-URGENT,Triage ‚Üí Registration ‚Üí Discharge ‚Üí Left ED,Left After Triage,1
286,7386188,GZ,3-URGENT,Triage ‚Üí Registration ‚Üí Discharge ‚Üí Left ED,Left After Triage,1
312,7386421,EPZ,4-LESS URGENT,Triage ‚Üí Registration ‚Üí Discharge ‚Üí Left ED,Left After Triage,1
511,7387301,YZ,3-URGENT,Triage ‚Üí Registration ‚Üí Discharge ‚Üí Left ED,Left After Triage,1
740,7388515,GZ,3-URGENT,Triage ‚Üí Registration ‚Üí Discharge ‚Üí Left ED,Left After Triage,1
742,7388517,GZ,3-URGENT,Ambulance Arrival ‚Üí Triage ‚Üí Registration ‚Üí Am...,Left After Triage,1
912,7389515,SA,3-URGENT,Triage ‚Üí Registration ‚Üí Discharge ‚Üí Left ED,Left After Triage,1


## Cell 17: Cases Missing Triage

In [38]:
# Get cases missing triage
missing_triage = get_cases_by_deviation_type(conformance_results, 'missing_triage')

print("=" * 70)
print("‚ö†Ô∏è  CASES MISSING TRIAGE")
print("=" * 70)
print(f"\nTotal: {len(missing_triage):,} cases were never triaged\n")
print("This is a SAFETY CONCERN - unknown acuity level.\n")

if len(missing_triage) > 0:
    # Analyze which zones
    print("Zone distribution of missing triage cases:")
    print(missing_triage['initial_zone'].value_counts())
    print("\nSample cases:")
    display(missing_triage.head(10))
else:
    print("All cases have triage recorded - good!")

‚ö†Ô∏è  CASES MISSING TRIAGE

Total: 1 cases were never triaged

This is a SAFETY CONCERN - unknown acuity level.

Zone distribution of missing triage cases:
initial_zone
Unknown    1
Name: count, dtype: int64

Sample cases:


Unnamed: 0,case_id,initial_zone,triage_level,trace_str,disposition_desc,is_lwbs
3359,7404882,Unknown,,Registration ‚Üí Discharge ‚Üí Left ED,Left at his/her own risk following registration,1


---
## Section 6: Conformance Dashboard Summary

---

## Cell 18: Create Conformance Dashboard View

In [39]:
# Create a summary dashboard with multiple charts
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        '',  # We'll add titles manually
        '',
        'Conformance by Zone',
        'Conformance by Triage Level'
    ),
    specs=[
        [{"type": "domain"}, {"type": "domain"}],  # Use domain for pie charts
        [{"type": "bar"}, {"type": "bar"}]
    ],
    vertical_spacing=0.18,
    horizontal_spacing=0.15
)

# 1. Overall conformance pie
conformant_count = conformance_results['is_conformant'].sum()
non_conformant_count = len(conformance_results) - conformant_count

fig.add_trace(
    go.Pie(
        labels=['Conformant', 'Non-Conformant'],
        values=[conformant_count, non_conformant_count],
        marker_colors=['#22C55E', '#DC2626'],
        hole=0.4,
        textinfo='percent',
        textposition='inside',
        name='Conformance'
    ),
    row=1, col=1
)

# 2. Severity distribution pie
severity_counts = conformance_results['severity'].value_counts()
severity_colors = {'NONE': '#22C55E', 'LOW': '#84CC16', 'MEDIUM': '#F97316', 
                   'HIGH': '#DC2626', 'CRITICAL': '#7F1D1D'}

fig.add_trace(
    go.Pie(
        labels=severity_counts.index.tolist(),
        values=severity_counts.values.tolist(),
        marker_colors=[severity_colors.get(s, '#6B7280') for s in severity_counts.index],
        hole=0.4,
        textinfo='percent',
        textposition='inside',
        name='Severity'
    ),
    row=1, col=2
)

# 3. Conformance by zone (bar)
zone_conf_sorted = zone_conformance.sort_values('Conformance Rate')
fig.add_trace(
    go.Bar(
        x=zone_conf_sorted['Conformance Rate'],
        y=zone_conf_sorted['initial_zone'],
        orientation='h',
        marker_color='#3B82F6',
        text=zone_conf_sorted['Conformance Rate'].apply(lambda x: f'{x:.1f}%'),
        textposition='outside',
        name='Zone'
    ),
    row=2, col=1
)

# 4. Conformance by triage level (bar)
triage_conf_sorted = triage_conformance.sort_values('triage_level')
fig.add_trace(
    go.Bar(
        x=triage_conf_sorted['triage_level'],
        y=triage_conf_sorted['Conformance Rate'],
        marker_color='#8B5CF6',
        text=triage_conf_sorted['Conformance Rate'].apply(lambda x: f'{x:.1f}%'),
        textposition='outside',
        name='Triage'
    ),
    row=2, col=2
)

# Add manual annotations for pie chart titles (positioned correctly)
fig.add_annotation(
    text="<b>Overall Conformance</b>",
    x=0.18, y=1.05,
    xref="paper", yref="paper",
    showarrow=False,
    font=dict(size=14)
)

fig.add_annotation(
    text="<b>Severity Distribution</b>",
    x=0.82, y=1.05,
    xref="paper", yref="paper",
    showarrow=False,
    font=dict(size=14)
)

fig.update_layout(
    height=850,
    width=1000,
    showlegend=False,
    title_text="<b>Conformance Checking Dashboard</b>",
    title_x=0.5,
    title_y=0.98,
    title_font_size=18,
    margin=dict(t=120, b=80, l=80, r=50)
)

# Update axes for bar charts
fig.update_xaxes(range=[0, 110], row=2, col=1, title_text="Conformance Rate (%)")
fig.update_yaxes(range=[0, 105], row=2, col=2, title_text="Conformance Rate (%)")

# Rotate x-axis labels for triage level chart
fig.update_xaxes(tickangle=45, row=2, col=2)

fig.show()

---
## Section 7: Export Functions for Streamlit Integration

---

## Cell 19: Function Reference

In [40]:
CONFORMANCE_FUNCTIONS = """
=================================================================
MODULE 3: CONFORMANCE CHECKING - FUNCTION REFERENCE
=================================================================

CORE FUNCTIONS:
---------------
1. check_conformance(trace)
   ‚Üí Input: list of activities
   ‚Üí Output: dict with is_conformant, deviations, severity
   ‚Üí Use for: Single case analysis

2. run_conformance_analysis(traces_df, visits)
   ‚Üí Input: all traces + visit data
   ‚Üí Output: DataFrame with conformance results per case
   ‚Üí Use for: Batch analysis, dashboard

3. analyze_deviation_types(conformance_df)
   ‚Üí Output: Summary of deviation type counts
   ‚Üí Use for: Understanding systemic issues

4. analyze_deviations_by_group(conformance_df, group_col)
   ‚Üí Output: Conformance rates by zone/triage/etc.
   ‚Üí Use for: Identifying problem areas

5. get_cases_by_severity(conformance_df, severity)
   ‚Üí Output: Filtered DataFrame of cases
   ‚Üí Use for: Case review tables

6. get_cases_by_deviation_type(conformance_df, deviation_type)
   ‚Üí Output: Cases with specific deviation
   ‚Üí Use for: Deep-dive investigation

CONSTANTS:
----------
- STANDARD_PROTOCOL: Expected activity sequence
- REQUIRED_STEPS: Must-have activities
- ORDERING_RULES: A-before-B rules
- SEVERITY: Risk levels for each deviation type

=================================================================
"""

print(CONFORMANCE_FUNCTIONS)


MODULE 3: CONFORMANCE CHECKING - FUNCTION REFERENCE

CORE FUNCTIONS:
---------------
1. check_conformance(trace)
   ‚Üí Input: list of activities
   ‚Üí Output: dict with is_conformant, deviations, severity
   ‚Üí Use for: Single case analysis

2. run_conformance_analysis(traces_df, visits)
   ‚Üí Input: all traces + visit data
   ‚Üí Output: DataFrame with conformance results per case
   ‚Üí Use for: Batch analysis, dashboard

3. analyze_deviation_types(conformance_df)
   ‚Üí Output: Summary of deviation type counts
   ‚Üí Use for: Understanding systemic issues

4. analyze_deviations_by_group(conformance_df, group_col)
   ‚Üí Output: Conformance rates by zone/triage/etc.
   ‚Üí Use for: Identifying problem areas

5. get_cases_by_severity(conformance_df, severity)
   ‚Üí Output: Filtered DataFrame of cases
   ‚Üí Use for: Case review tables

6. get_cases_by_deviation_type(conformance_df, deviation_type)
   ‚Üí Output: Cases with specific deviation
   ‚Üí Use for: Deep-dive investigati

## Cell 20: Export Non-Conformant Cases to CSV

In [41]:
# Export non-conformant cases for external review
non_conformant_cases = conformance_results[~conformance_results['is_conformant']].copy()

# Select relevant columns
export_cols = ['case_id', 'initial_zone', 'triage_level', 'severity', 'deviations',
               'disposition_desc', 'is_lwbs', 'pia_minutes', 'los_minutes', 'trace_str']

export_df = non_conformant_cases[export_cols].sort_values(
    by='severity', 
    key=lambda x: x.map({'CRITICAL': 0, 'HIGH': 1, 'MEDIUM': 2, 'LOW': 3})
)

# Save to CSV (uncomment to save)
# export_df.to_csv('non_conformant_cases.csv', index=False)

print(f"\n‚úì {len(export_df):,} non-conformant cases ready for export")
print("\nTo save: uncomment the to_csv line above")


‚úì 278 non-conformant cases ready for export

To save: uncomment the to_csv line above


In [42]:
# Cell 21: Key Insights Summary

print("=" * 70)
print("üîç KEY INSIGHTS FOR TRIAGE LEAD")
print("=" * 70)

# Calculate key metrics
total_cases = len(conformance_results)
conformant_pct = conformance_results['is_conformant'].mean() * 100
non_conformant_count = (~conformance_results['is_conformant']).sum()

# Get worst zone
worst_zone = zone_conformance.sort_values('Conformance Rate').iloc[0]

# Get missing assessment count
missing_assessment = conformance_results['deviation_types'].apply(lambda x: 'missing_assessment' in x).sum()
missing_assessment_lwbs = conformance_results[
    conformance_results['deviation_types'].apply(lambda x: 'missing_assessment' in x)
]['is_lwbs'].sum()

# Get missing triage count  
missing_triage = conformance_results['deviation_types'].apply(lambda x: 'missing_triage' in x).sum()

print(f"""
1. OVERALL COMPLIANCE
   ‚Ä¢ {conformant_pct:.1f}% of patients follow the standard protocol ({non_conformant_count:,} deviations out of {total_cases:,} cases)

2. PROBLEM ZONES
   ‚Ä¢ '{worst_zone['initial_zone']}' zone has the lowest conformance at {worst_zone['Conformance Rate']:.1f}% ‚Äî prioritize process review here
   ‚Ä¢ 'Resus' zone at 93.6% ‚Äî expected due to emergency bypasses for critical patients

3. SAFETY GAPS
   ‚Ä¢ {missing_triage:,} cases missing Triage ‚Äî patients treated without acuity assessment (HIGH RISK)
   ‚Ä¢ {missing_assessment:,} cases missing Physician Assessment ‚Äî never seen by a doctor

4. LWBS CORRELATION
   ‚Ä¢ {missing_assessment_lwbs:,} of {missing_assessment:,} missing-assessment cases ({missing_assessment_lwbs/missing_assessment*100:.0f}%) are LWBS
   ‚Ä¢ Patients who don't see a physician are highly likely to leave ‚Äî reduce wait times to prevent
""")

print("=" * 70)
print("üí° RECOMMENDED ACTIONS")
print("=" * 70)
print("""
   ‚Ä¢ Investigate 'Unknown' zone data quality ‚Äî 22% deviation rate suggests data entry issues
   ‚Ä¢ Review Resus bypass protocols ‚Äî ensure documentation even in emergencies  
   ‚Ä¢ Target LWBS reduction by monitoring patients waiting >30 min without assessment
""")

üîç KEY INSIGHTS FOR TRIAGE LEAD

1. OVERALL COMPLIANCE
   ‚Ä¢ 98.3% of patients follow the standard protocol (278 deviations out of 16,011 cases)

2. PROBLEM ZONES
   ‚Ä¢ 'Unknown' zone has the lowest conformance at 77.7% ‚Äî prioritize process review here
   ‚Ä¢ 'Resus' zone at 93.6% ‚Äî expected due to emergency bypasses for critical patients

3. SAFETY GAPS
   ‚Ä¢ 1 cases missing Triage ‚Äî patients treated without acuity assessment (HIGH RISK)
   ‚Ä¢ 163 cases missing Physician Assessment ‚Äî never seen by a doctor

4. LWBS CORRELATION
   ‚Ä¢ 163 of 163 missing-assessment cases (100%) are LWBS
   ‚Ä¢ Patients who don't see a physician are highly likely to leave ‚Äî reduce wait times to prevent

üí° RECOMMENDED ACTIONS

   ‚Ä¢ Investigate 'Unknown' zone data quality ‚Äî 22% deviation rate suggests data entry issues
   ‚Ä¢ Review Resus bypass protocols ‚Äî ensure documentation even in emergencies  
   ‚Ä¢ Target LWBS reduction by monitoring patients waiting >30 min without assessm