# 3.03 All Province Alberta Crosswalk

**Consolidated notebook for mapping Alberta billing codes to BC, MB, ON, and SK equivalents.**

## Workflow
1. Upload all files (PDFs, Reference CSVs, Taxonomy)
2. Configure Alberta code to match
3. Run each province individually
4. Combine results

## Province-Specific Features
| Province | Chunking | Special Features |
|----------|----------|------------------|
| BC | Level 1 | Code prefixes (P, G, PG) |
| MB | Level 1 | Specialty-based fees |
| ON | Level 2 | H/P settings, Surg/Asst/Anae fees |
| SK | Level 1 | Referred/Not Referred dual-fees, Age premiums |

---
# STEP 1: Setup
---

## Cell 1: Install Dependencies

In [None]:
!pip install openai pandas pdfplumber openpyxl tqdm PyMuPDF -q

import pandas as pd
import pdfplumber
import fitz  # PyMuPDF
import json
import re
from tqdm.notebook import tqdm
from google.colab import files

print("All dependencies loaded.")
print("Ready to proceed.")

---
# STEP 2: Upload Files
---

## Cell 2a: Upload Province PDFs

Upload all 4 province schedule PDFs. Files will be auto-detected by name.

In [None]:
print("="*70)
print("STEP 2a: Upload Province Schedule PDFs")
print("="*70)
print("\nExpected files:")
print("  - BC Payment Schedule - March 31, 2024.pdf")
print("  - MB Payment Schedule - April 1, 2024.pdf")
print("  - ON - February 20, 2024 (effective April 1, 2024).pdf")
print("  - SK Payment Schedule - April 1, 2024.pdf")
print()

uploaded_pdfs = files.upload()

# Auto-detect province from filename
PDF_FILES = {'BC': None, 'MB': None, 'ON': None, 'SK': None}

for filename in uploaded_pdfs.keys():
    filename_upper = filename.upper()
    if 'BC' in filename_upper:
        PDF_FILES['BC'] = filename
    elif 'MB' in filename_upper:
        PDF_FILES['MB'] = filename
    elif 'ON' in filename_upper:
        PDF_FILES['ON'] = filename
    elif 'SK' in filename_upper:
        PDF_FILES['SK'] = filename

print("\n" + "="*70)
print("Detected PDFs:")
print("="*70)
for prov, f in PDF_FILES.items():
    status = "✓" if f else "✗ MISSING"
    print(f"  {prov}: {status} {f if f else ''}")

# Warn if any missing
missing = [p for p, f in PDF_FILES.items() if f is None]
if missing:
    print(f"\n⚠️  WARNING: Missing PDFs for: {', '.join(missing)}")
    print("    You can still run the provinces that have PDFs.")
else:
    print("\n✓ All 4 province PDFs loaded successfully.")

## Cell 2b: Upload Section Reference CSVs

Upload all 4 section reference CSVs. Files will be auto-detected by name.

In [None]:
print("="*70)
print("STEP 2b: Upload Section Reference CSVs")
print("="*70)
print("\nExpected files:")
print("  - bc_section_reference_simple.csv")
print("  - manitoba_section_reference_final.csv")
print("  - on_section_reference_full.csv")
print("  - sk_section_reference_simple.csv")
print()

uploaded_refs = files.upload()

# Auto-detect province from filename
REF_FILES = {'BC': None, 'MB': None, 'ON': None, 'SK': None}

for filename in uploaded_refs.keys():
    filename_lower = filename.lower()
    if 'bc' in filename_lower:
        REF_FILES['BC'] = filename
    elif 'mb' in filename_lower or 'manitoba' in filename_lower:
        REF_FILES['MB'] = filename
    elif 'on' in filename_lower:
        REF_FILES['ON'] = filename
    elif 'sk' in filename_lower:
        REF_FILES['SK'] = filename

print("\n" + "="*70)
print("Detected Reference CSVs:")
print("="*70)
for prov, f in REF_FILES.items():
    status = "✓" if f else "✗ MISSING"
    print(f"  {prov}: {status} {f if f else ''}")

# Warn if any missing
missing = [p for p, f in REF_FILES.items() if f is None]
if missing:
    print(f"\n⚠️  WARNING: Missing reference CSVs for: {', '.join(missing)}")
    print("    You can still run the provinces that have reference files.")
else:
    print("\n✓ All 4 province reference CSVs loaded successfully.")

## Cell 2c: Upload Extraction Taxonomy

Upload the extraction taxonomy Excel file for Phase 2 attribute extraction.

In [None]:
print("="*70)
print("STEP 2c: Upload Extraction Taxonomy")
print("="*70)
print("\nExpected file:")
print("  - extraction_taxonomy.xlsx")
print()

uploaded_tax = files.upload()

TAXONOMY_FILE = list(uploaded_tax.keys())[0]
df_taxonomy = pd.read_excel(TAXONOMY_FILE)

print("\n" + "="*70)
print(f"Loaded Taxonomy: {TAXONOMY_FILE}")
print("="*70)
print(f"\n{len(df_taxonomy)} attributes:")
for _, row in df_taxonomy.iterrows():
    print(f"  - {row['attribute']}: {row['data_type']}")

# Build taxonomy reference string for prompts
taxonomy_reference = "\n".join([
    f"- {row['attribute']} ({row['data_type']}): {row['definition']} Taxonomy: {row['taxonomy']}"
    for _, row in df_taxonomy.iterrows()
])

print("\n✓ Taxonomy loaded and ready for Phase 2.")

## Cell 3: API Key

Enter your OpenAI API key.

In [None]:
print("="*70)
print("STEP 2d: API Key")
print("="*70)

OPENAI_API_KEY = ""  # <-- Paste your key here, or leave blank to use getpass

if not OPENAI_API_KEY:
    from getpass import getpass
    OPENAI_API_KEY = getpass("Enter OpenAI API Key: ")

from openai import OpenAI
client = OpenAI(api_key=OPENAI_API_KEY)

print("\n✓ API client initialized.")
print("\n" + "="*70)
print("SETUP COMPLETE - Ready to configure Alberta code and run provinces")
print("="*70)

---
# STEP 3: Alberta Code Configuration
---

**Edit this cell to change the Alberta code being mapped.**

## Cell 4: Alberta Code Config

⚠️ **EDIT THIS CELL** to map a different Alberta code.

In [None]:
# ============================================================================
# ALBERTA CODE CONFIGURATION
# ============================================================================
# Edit this section to map a different Alberta billing code.
# The province configs (Cell 5) should NOT need to change.
# ============================================================================

ALBERTA_CODE_CONFIG = {
    # Basic code info
    'code': '03.03CV',
    'description': 'Telehealth consultation',
    'fee': 25.09,
    
    # Clinical definition - describes the service in detail
    'clinical_definition': """Assessment of a patient's condition via telephone or secure videoconference.

NOTE:
- At minimum: limited assessment requiring history related to presenting problems, appropriate records review, and advice to the patient
- Total physician time spent providing patient care must be MINIMUM 10 MINUTES
- If less than 10 minutes same day, must use HSC 03.01AD instead
- May only be claimed if service was initiated by the patient or their agent
- May only be claimed if service is personally rendered by the physician
- Benefit includes ordering appropriate diagnostic tests and discussion with patient
- Patient record must include detailed summary of all services including start/stop times
- Time spent on administrative tasks cannot be claimed
- May NOT be claimed same day as: 03.01AD, 03.01S, 03.01T, 03.03FV, 03.05JR, 03.08CV, 08.19CV, 08.19CW, or 08.19CX by same physician for same patient
- May NOT be claimed same day as in-person visit or consultation by same physician for same patient

Category: V Visit (Virtual)
Base rate: $25.09""",
    
    # Service type context - helps LLM understand what we're looking for
    'service_context': """This is a BASIC PATIENT-FACING virtual visit by any physician (not specialist-specific, not physician-to-physician).""",
    
    # What to search for (specific to this AB code type)
    'search_criteria': """
WHAT TO LOOK FOR:
- Virtual visits / virtual care
- Telephone consultations / assessments
- Video consultations / assessments
- Telehealth codes
- Any code that can be billed for a patient-facing virtual encounter
""",
    
    # What to exclude (specific to this AB code type)
    'exclusion_criteria': """
DO NOT INCLUDE:
- Physician-to-physician consultations (e-consults between doctors)
- E-assessments / e-consults (specialist-to-PCP) - not patient-facing
- In-person only codes
- Diagnostic procedures (ECG, imaging, labs)
- Codes you cannot find literally in the text
""",
}

# Display config
print("="*70)
print("ALBERTA CODE CONFIGURATION")
print("="*70)
print(f"\nCode: {ALBERTA_CODE_CONFIG['code']}")
print(f"Description: {ALBERTA_CODE_CONFIG['description']}")
print(f"Fee: ${ALBERTA_CODE_CONFIG['fee']}")
print("\n✓ Alberta code configured.")

---
# STEP 4: Province Configurations
---

**DO NOT EDIT** unless province schedule structure changes.

## Cell 5: Province Configs

Static configurations for each province's schedule structure.

In [None]:
# ============================================================================
# PROVINCE CONFIGURATIONS
# ============================================================================
# Static configurations for each province's schedule structure.
# These should NOT change when mapping different Alberta codes.
# ============================================================================

PROVINCE_CONFIGS = {
    'BC': {
        'name': 'British Columbia',
        'chunking_level': 1,  # Level 1 sections
        'rules_pages': (1, 52),  # General Preamble
        'skip_sections': [
            "1. GENERAL PREAMBLE TO THE PAYMENT SCHEDULE",
            "2. OUT-OF-OFFICE HOURS PREMIUMS",
        ],
        'special_fields': [],  # No special fields
        'extraction_rules': """
BC-SPECIFIC EXTRACTION RULES:

1. **CODE PREFIXES** (indicate payment type, NOT setting):
   - P = Professional fee
   - G = Group fee
   - PG = Professional + Group

2. **FEE EXTRACTION**: Copy the exact fee value as shown

ACCURACY RULES - YOU MUST FOLLOW:

1. **ONLY REAL CODES**: Return ONLY codes that LITERALLY appear in the text above.
   - Copy the EXACT code as shown (e.g., 00100, 14051, 97017)
   - If you cannot find the exact code string in the text, DO NOT include it
   - NEVER invent, fabricate, or guess codes

2. **EXACT VALUES**: Copy fee EXACTLY as shown in the document
   - Use exact decimal values (e.g., "25.43" not "25.00")
   - If fee is percentage-based premium, use "-" and explain in condition

3. **FULL DESCRIPTIONS - CLIENT READY FORMAT**:
   - Copy the COMPLETE service description as written in the schedule
   - Do NOT abbreviate (write "Telephone/video consultation" not "Tel consult")
   - Do NOT truncate (include the full description text)
   - Use sentence case for consistency (capitalize first word and proper nouns)
   - Include qualifying details (e.g., "minimum 10 minutes")
   - Format: Clear, professional, ready for client delivery

4. **MODALITY**: Only include modalities explicitly stated
   - "telephone" = text says telephone/phone only
   - "video" = text says video/videoconference only
   - "both" = text explicitly allows BOTH, or doesn't restrict

5. **PAGE NUMBERS**: page_found must match the "=== PAGE X ===" marker where code appears

6. **SECTION HEADING**: Extract the subsection heading the code appears under
   - Look for bold/uppercase headings
   - This becomes level_2_subsection
""",
        'json_schema': {
            'primary_codes': ['code', 'description', 'fee', 'modality', 'page_found', 'section_heading', 'reasoning'],
            'add_on_codes': ['code', 'description', 'fee', 'modality', 'page_found', 'section_heading', 'links_to', 'condition']
        },
        'output_columns': [
            'AB_Code', 'AB_Description', 'AB_Fee', 'Target_Province',
            'Code', 'Description', 'Fee', 'Type', 'Modality', 'Specialty',
            'Links_To', 'Condition', 'Reasoning',
            'Level_1_Section', 'Level_2_Subsection', 'Page_Found'
        ]
    },

    'MB': {
        'name': 'Manitoba',
        'chunking_level': 1,  # Level 1 sections
        'rules_pages': (1, 82),  # Rules of Application
        'skip_sections': [
            "APPENDICES",
        ],
        'min_clinical_page': 83,  # Clinical sections start at page 83
        'special_fields': [],  # No special fields
        'extraction_rules': """
MB-SPECIFIC EXTRACTION RULES:

1. **SPECIALTY-BASED FEES**: Each specialty section may have its own fee schedules

ACCURACY RULES - YOU MUST FOLLOW:

1. **ONLY REAL CODES**: Return ONLY codes that LITERALLY appear in the text above.
   - Copy the EXACT code as shown (e.g., 8321, 8340, 8447)
   - If you cannot find the exact code string in the text, DO NOT include it
   - NEVER invent, fabricate, or guess codes

2. **EXACT VALUES**: Copy fee EXACTLY as shown in the document
   - Use exact decimal values (e.g., "59.05" not "59.00")
   - If fee is percentage-based premium, use "-" and explain in condition

3. **FULL DESCRIPTIONS - CLIENT READY FORMAT**:
   - Copy the COMPLETE service description as written in the schedule
   - Do NOT abbreviate (write "Virtual visit by telephone or video" not "Virtual visit")
   - Do NOT truncate (include the full description text)
   - Use sentence case for consistency (capitalize first word and proper nouns)
   - Include qualifying details (e.g., "minimum 10 minutes")
   - Format: Clear, professional, ready for client delivery

4. **MODALITY**: Only include modalities explicitly stated
   - "telephone" = text says telephone/phone only
   - "video" = text says video/videoconference only
   - "both" = text explicitly allows BOTH, or doesn't restrict

5. **PAGE NUMBERS**: page_found must match the "=== PAGE X ===" marker where code appears

6. **SECTION HEADING**: Extract the subsection heading the code appears under
   - Look for bold/uppercase headings like "VIRTUAL VISITS", "HOSPITAL CARE", etc.
   - This becomes level_2_subsection
""",
        'json_schema': {
            'primary_codes': ['code', 'description', 'fee', 'modality', 'page_found', 'section_heading', 'reasoning'],
            'add_on_codes': ['code', 'description', 'fee', 'modality', 'page_found', 'section_heading', 'links_to', 'condition']
        },
        'output_columns': [
            'AB_Code', 'AB_Description', 'AB_Fee', 'Target_Province',
            'Code', 'Description', 'Fee', 'Type', 'Modality', 'Specialty',
            'Links_To', 'Condition', 'Reasoning',
            'Level_1_Section', 'Level_2_Subsection', 'Page_Found'
        ]
    },

    'ON': {
        'name': 'Ontario',
        'chunking_level': 2,  # Level 2 sections (more granular)
        'rules_pages': (1, 126),  # General Preamble
        'skip_sections': [
            "General Preamble",
            "Appendix A",
            "Appendix B",
            "Appendix C",
            "Appendix D",
            "Appendix F",
            "Appendix G",
            "Appendix H",
            "Appendix J",
            "Appendix Q",
            "Numeric Index",
        ],
        'special_fields': ['Fee_Type', 'Setting', 'Level_3_Heading'],
        'extraction_rules': """
ONTARIO-SPECIFIC EXTRACTION RULES:

1. **H/P COLUMNS (Setting)**:
   - If a code has BOTH H (Hospital) and P (Professional/Office) fees, create SEPARATE entries for each
   - H = Hospital setting, P = Professional/Office setting
   - If only one fee exists, use that setting

2. **SURGICAL FEE COLUMNS**:
   - Surg = Surgeon fee -> create entry with fee_type "Surgeon"
   - Asst = Assistant fee -> create entry with fee_type "Assistant" (skip if "nil")
   - Anae = Anaesthesia units -> create entry with fee_type "Anaesthesia" (these are TIME UNITS, not dollars)

3. **CODE PREFIXES** (indicate service type, NOT setting):
   - A = Assessments/consultations
   - E = Diagnostic/therapeutic procedures
   - G = General listings
   - K = Special visit premiums
   - Z = Surgical procedures

ACCURACY RULES:

1. **ONLY REAL CODES**: Return ONLY codes that LITERALLY appear in the text above.
   - Copy the EXACT code as shown (e.g., A003, K017, Z101)
   - NEVER invent, fabricate, or guess codes

2. **EXACT VALUES**: Copy fee EXACTLY as shown in the document
   - Use exact decimal values (e.g., "87.35" not "87.00")
   - For Anae column, these are UNITS not dollars

3. **FULL DESCRIPTIONS - CLIENT READY FORMAT**:
   - Copy the COMPLETE service description as written in the schedule
   - Do NOT abbreviate or truncate
   - Use sentence case for consistency
   - Include qualifying details (e.g., "minimum 50 minutes")

4. **LEVEL 3 EXTRACTION**:
   - Extract the subsection heading the code appears under (e.g., "INCISION", "EXCISION", "GENERAL LISTINGS")
   - This becomes level_3_heading

5. **MODALITY**: Only include modalities explicitly stated
   - "telephone" = text says telephone/phone
   - "video" = text says video/videoconference
   - "both" = text explicitly allows BOTH, or doesn't restrict

IMPORTANT: For codes with multiple fee types (Surg/Asst/Anae) or settings (H/P), create SEPARATE entries for each combination.
""",
        'json_schema': {
            'codes': ['code', 'description', 'fee', 'fee_type', 'setting', 'modality', 'page_found',
                     'level_3_heading', 'is_addon', 'links_to', 'condition', 'reasoning']
        },
        'output_columns': [
            'AB_Code', 'AB_Description', 'AB_Fee', 'Target_Province',
            'Code', 'Description', 'Fee', 'Fee_Type', 'Setting', 'Type', 'Modality',
            'Links_To', 'Condition', 'Reasoning',
            'Level_1_Section', 'Level_2_Subsection', 'Level_3_Heading', 'Page_Found'
        ]
    },

    'SK': {
        'name': 'Saskatchewan',
        'chunking_level': 1,  # Level 1 sections
        'rules_pages': (1, 70),  # Preamble/Rules
        'skip_sections': [
            "Introduction",
            "To Request a Change to the Payment Schedule",
            "Services Provided Outside Saskatchewan",
            "Billing For Services Provided To Out-Of-Province Beneficiaries",
            "Definitions",
            "Documentation Requirements",
            "Services Billable by Entitlement or by Approval",
            "Assessment Rules",
            "General Information",
            "Services Not Insured by the Ministry of Health",
            "Assessment of Accounts",
            "Verification Program",
            "Information Sources",
            "Reciprocal Billing",
            "Explanatory Codes for Physicians",
        ],
        'min_clinical_page': 71,  # Clinical sections start at page 71
        'special_fields': ['Fee_Type', 'Age_Premium_Applies'],
        'extraction_rules': """
SASKATCHEWAN-SPECIFIC EXTRACTION RULES:

1. **DUAL-FEE STRUCTURE (Referred vs Not Referred)**:
   - Many SK codes have TWO fees: "Referred" and "Not Referred"
   - If a code has BOTH fees, create SEPARATE entries for each:
     - One entry with fee_type="Referred" and the referred fee
     - One entry with fee_type="Not Referred" and the not-referred fee
   - If only one fee exists, use fee_type="Standard"

2. **AGE PREMIUMS (Section-Wide)**:
   - SK has age-based premiums for patients 0-5 years and 65+ years
   - If the section header or preamble states age premiums apply to ALL codes in the section, note this for EVERY code
   - Do NOT skip age premiums just because they're not repeated per-code

ACCURACY RULES - YOU MUST FOLLOW:

1. **ONLY REAL CODES**: Return ONLY codes that LITERALLY appear in the text above.
   - Copy the EXACT code as shown
   - If you cannot find the exact code string in the text, DO NOT include it
   - NEVER invent, fabricate, or guess codes

2. **EXACT VALUES**: Copy fee EXACTLY as shown in the document
   - Use exact decimal values (e.g., "45.50" not "45.00")
   - If fee is percentage-based premium, use "-" and explain in condition

3. **FULL DESCRIPTIONS - CLIENT READY FORMAT**:
   - Copy the COMPLETE service description as written in the schedule
   - Do NOT abbreviate (write "Telephone/video consultation" not "Tel consult")
   - Do NOT truncate (include the full description text)
   - Use sentence case for consistency (capitalize first word and proper nouns)
   - Include qualifying details (e.g., "minimum 10 minutes")
   - Format: Clear, professional, ready for client delivery

4. **MODALITY**: Only include modalities explicitly stated
   - "telephone" = text says telephone/phone only
   - "video" = text says video/videoconference only
   - "both" = text explicitly allows BOTH, or doesn't restrict

5. **PAGE NUMBERS**: page_found must match the "=== PAGE X ===" marker where code appears

6. **SECTION HEADING**: Extract the subsection heading the code appears under
   - Look for bold/uppercase headings
   - This becomes level_2_subsection

IMPORTANT: For codes with BOTH Referred and Not Referred fees, create SEPARATE entries for each fee type.
""",
        'json_schema': {
            'primary_codes': ['code', 'description', 'fee', 'fee_type', 'modality', 'page_found',
                             'section_heading', 'age_premium_applies', 'reasoning'],
            'add_on_codes': ['code', 'description', 'fee', 'fee_type', 'modality', 'page_found',
                            'section_heading', 'age_premium_applies', 'links_to', 'condition']
        },
        'output_columns': [
            'AB_Code', 'AB_Description', 'AB_Fee', 'Target_Province',
            'Code', 'Description', 'Fee', 'Fee_Type', 'Type', 'Modality', 'Specialty',
            'Links_To', 'Condition', 'Reasoning',
            'Level_1_Section', 'Level_2_Subsection', 'Page_Found', 'Age_Premium_Applies'
        ]
    }
}

# Display loaded configs
print("="*70)
print("PROVINCE CONFIGURATIONS LOADED")
print("="*70)
for prov, config in PROVINCE_CONFIGS.items():
    print(f"\n{prov} ({config['name']}):")
    print(f"  - Chunking: Level {config['chunking_level']}")
    print(f"  - Rules pages: {config['rules_pages'][0]}-{config['rules_pages'][1]}")
    print(f"  - Skip sections: {len(config['skip_sections'])}")
    print(f"  - Special fields: {config['special_fields'] or 'None'}")

print("\n" + "="*70)
print("All province configurations loaded.")

---
# STEP 5: Shared Functions
---

## Cell 6: Shared Functions

Core processing functions used by all provinces.

In [None]:
# Placeholder - Will be built in Step 5
print("Shared functions will be added in Step 5")

---
# STEP 6: Run Provinces
---

Run each province individually. Results are saved and downloaded after each province completes.

## Cell 7a: Run British Columbia

In [None]:
# Placeholder - Will be built in Step 6
print("BC processing will be added in Step 6")

## Cell 7b: Run Manitoba

In [None]:
# Placeholder - Will be built in Step 6
print("MB processing will be added in Step 6")

## Cell 7c: Run Ontario

In [None]:
# Placeholder - Will be built in Step 6
print("ON processing will be added in Step 6")

## Cell 7d: Run Saskatchewan

In [None]:
# Placeholder - Will be built in Step 6
print("SK processing will be added in Step 6")

---
# STEP 7: Combine & Summary
---

## Cell 8: Combine All Provinces & Final Summary

In [None]:
# Placeholder - Will be built in Step 7
print("Combine logic will be added in Step 7")