# Manitoba Crosswalk Telehealth - Phase 1 & Phase 2

Finds equivalent billing codes for Alberta **03.03CV - Telehealth consultation** in Manitoba.

**Phase 1:** Identify matching Manitoba codes (same logic as all-provinces crosswalk)

**Phase 2:** Enrich each code with detailed attributes from extraction taxonomy

**Output:** Combined master Excel with all Phase 1 + Phase 2 data

## Cell 1: Setup

In [1]:
!pip install openai pandas pdfplumber openpyxl tqdm PyMuPDF -q
print('Ready')

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m805.0 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.8/67.8 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.0/60.0 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.6/6.6 MB[0m [31m51.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.1/24.1 MB[0m [31m47.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m61.5 MB/s[0m eta [36m0:00:00[0m
[?25hReady


## Cell 2: Upload Manitoba PDF

In [2]:
from google.colab import files

print("Upload Manitoba Payment Schedule PDF:")
uploaded = files.upload()

MB_PDF = None
for f in uploaded.keys():
    MB_PDF = f
    break

if MB_PDF:
    print(f"\nLoaded: {MB_PDF}")
else:
    print("ERROR: No file uploaded")

Upload Manitoba Payment Schedule PDF:


Saving MB Payment Schedule - April 1, 2024.pdf to MB Payment Schedule - April 1, 2024.pdf

Loaded: MB Payment Schedule - April 1, 2024.pdf


## Cell 2b: Upload Section Reference CSV

In [3]:
# Upload section reference CSV (generated by Manitoba_Section_Extractor.ipynb)
import pandas as pd
from google.colab import files

print("Upload manitoba_section_reference.csv:")
uploaded_ref = files.upload()

section_ref_file = list(uploaded_ref.keys())[0]
df_section_ref = pd.read_csv(section_ref_file)

# Sort by page_start for proper lookup
df_section_ref = df_section_ref.sort_values('page_start').reset_index(drop=True)

print(f"\nLoaded section reference: {len(df_section_ref)} sections")
print(f"Page range: {df_section_ref['page_start'].min()} - {df_section_ref['page_start'].max()}")
print(f"\nUnique Level 1 sections: {df_section_ref['level_1'].nunique()}")
print(f"Unique Level 2 sections (Specialties): {df_section_ref['level_2'].dropna().nunique()}")

# Define lookup function - returns level_1 (specialty) based on page range
# Level_2 (subsection like "Virtual Visits") comes from LLM's section_heading
def lookup_section(page_num):
    """Look up the specialty (level_1) for a given page number.

    Only returns level_1 based on page range. Level_2 should come from LLM's
    section_heading since multiple subsections can exist on the same page.
    """
    matched_level_1 = ''

    # Get unique level_1 values with their minimum page_start
    level_1_pages = df_section_ref.groupby('level_1')['page_start'].min().sort_values()

    # Find the level_1 where page_start <= page_num < next_level_1_page_start
    for level_1, start_page in level_1_pages.items():
        if start_page <= page_num:
            matched_level_1 = level_1
        else:
            break

    return {
        'level_1': matched_level_1 if pd.notna(matched_level_1) else ''
    }

# Test lookup
print("\nTest lookups (level_1 only - level_2 comes from LLM):")
for test_page in [1, 50, 85, 90, 100, 200, 300]:
    section = lookup_section(test_page)
    print(f"  Page {test_page}: {section['level_1'][:50] if section['level_1'] else 'N/A'}")

print("\nSection reference loaded and lookup function ready")

Upload manitoba_section_reference.csv:


Saving manitoba_section_reference_final.csv to manitoba_section_reference_final.csv

Loaded section reference: 783 sections
Page range: 83 - 537

Unique Level 1 sections: 57
Unique Level 2 sections (Specialties): 448

Test lookups (level_1 only - level_2 comes from LLM):
  Page 1: N/A
  Page 50: N/A
  Page 85: VISITS/EXAMINATIONS—INTERNAL MEDICINE (01)
  Page 90: NEUROLOGY (01-1)
  Page 100: RHEUMATOLOGY MEDICINE (01-3)
  Page 200: ANESTHESIOLOGY (10)
  Page 300: APPENDICES

Section reference loaded and lookup function ready


## Cell 3: API Key

In [4]:
OPENAI_API_KEY = ""  # <-- Paste your key here

if not OPENAI_API_KEY:
    from getpass import getpass
    OPENAI_API_KEY = getpass("API Key: ")

from openai import OpenAI
client = OpenAI(api_key=OPENAI_API_KEY)
print("API ready")

API Key: ··········
API ready


## Cell 4: Alberta Code + Core Functions

In [5]:
import pandas as pd
import pdfplumber
import json
import re
from tqdm.notebook import tqdm

# Alberta code definition
AB_CODE = "03.03CV"
AB_DESC = "Telehealth consultation"
AB_FEE = 25.09

AB_CLINICAL_DEFINITION = """Assessment of a patient's condition via telephone or secure videoconference.

NOTE:
- At minimum: limited assessment requiring history related to presenting problems, appropriate records review, and advice to the patient
- Total physician time spent providing patient care must be MINIMUM 10 MINUTES
- If less than 10 minutes same day, must use HSC 03.01AD instead
- May only be claimed if service was initiated by the patient or their agent
- May only be claimed if service is personally rendered by the physician
- Benefit includes ordering appropriate diagnostic tests and discussion with patient
- Patient record must include detailed summary of all services including start/stop times
- Time spent on administrative tasks cannot be claimed
- May NOT be claimed same day as: 03.01AD, 03.01S, 03.01T, 03.03FV, 03.05JR, 03.08CV, 08.19CV, 08.19CW, or 08.19CX by same physician for same patient
- May NOT be claimed same day as in-person visit or consultation by same physician for same patient

Category: V Visit (Virtual)
Base rate: $25.09"""

# Tracking
PAGES_PER_CALL = 10
total_cost = 0.0
total_calls = 0

# Store results and chunk text for Phase 2
# Key format: code_fee_modality_specialty
all_results = []
code_chunks = {}

def track_cost(inp, out):
    global total_cost, total_calls
    total_cost += (inp/1e6)*3.0 + (out/1e6)*15.0
    total_calls += 1

def get_first_page_from_range(pages_str):
    """Extract first page number from a range string like '83-92'."""
    try:
        return int(pages_str.split('-')[0])
    except:
        return 1

def build_prompt(province_name, batch_pages, context):
    """Build the search prompt."""
    return f"""You are a senior physician billing specialist mapping Alberta fee codes to {province_name} equivalents.

ALBERTA CODE TO MATCH:
- Code: {AB_CODE}
- Description: {AB_DESC}
- Fee: ${AB_FEE}

CLINICAL SERVICE DEFINITION:
{AB_CLINICAL_DEFINITION}

This is a BASIC PATIENT-FACING virtual visit by any physician (not specialist-specific, not physician-to-physician).

{province_name.upper()} SCHEDULE EXCERPT (pages {batch_pages[0]}-{batch_pages[-1]}):

{context}

CRITICAL INSTRUCTION - SEARCH ALL PAGES:
These {len(batch_pages)} pages may contain MULTIPLE different specialty sections (e.g., Internal Medicine, Neurology, Cardiology, etc.).
You MUST search EVERY page from {batch_pages[0]} to {batch_pages[-1]} and return ALL matching codes found.
DO NOT stop after finding the first match. Each specialty section may have its own Virtual Visits codes with DIFFERENT FEES.

TASK:
Find ALL {province_name} codes across ALL pages that bill for THIS SAME CLINICAL ENCOUNTER - a basic virtual care assessment between a physician and patient.

STEP 1 - FIND ALL PRIMARY CODE(S) ON ALL PAGES:
Search EVERY page ({batch_pages[0]} through {batch_pages[-1]}) for codes a physician would bill for a 10+ minute patient-facing virtual assessment.
- Look for: Virtual visits, telephone assessments, video assessments in EACH specialty section
- The SAME code number (e.g., 8321) may appear on MULTIPLE pages with DIFFERENT FEES for different specialties - return EACH occurrence
- Separate codes if {province_name} splits by modality (phone vs video)

STEP 2 - FIND ALL ADD-ON CODES ON ALL PAGES:
Search EVERY page for codes that can be billed IN ADDITION TO the primary codes.
- Each add-on must link to specific primary code(s)
- Only include add-ons specifically eligible for virtual care visits

DO NOT INCLUDE:
- Physician-to-physician consultations - wrong service type
- E-assessments / e-consults (specialist-to-PCP) - not patient-facing
- Specialist-only consultations - wrong provider scope
- Ambulance/transport/detention codes - completely different services
- Diagnostic procedure codes (ECG, imaging, etc.) - not consultations
- In-person visit codes (unless no virtual equivalent exists)
- Appendix reference codes that are just claim submission references

JSON only:
{{
  "found": true/false,
  "primary_codes": [
    {{
      "code": "...",
      "description": "full description from schedule",
      "fee": "00.00 or '-' if percentage-based premium",
      "modality": "telephone|video|both",
      "page_found": <integer - the exact page number where this code appears>,
      "section_heading": "the section heading this code appears under (e.g., 'Virtual Visits', 'Hospital Care', 'Office, Home Visits')",
      "reasoning": "why this matches"
    }}
  ],
  "add_on_codes": [
    {{
      "code": "...",
      "description": "...",
      "fee": "00.00 or '-' if percentage-based premium",
      "modality": "telephone|video|both",
      "page_found": <integer - the exact page number where this code appears>,
      "section_heading": "the section heading this code appears under",
      "links_to": ["primary_code1", "primary_code2"],
      "condition": "when this add-on applies (include percentage if applicable, e.g. '20% premium for evening hours')"
    }}
  ]
}}

IMPORTANT REMINDERS:
- SEARCH ALL {len(batch_pages)} PAGES - do not stop after finding codes on the first page
- The same code (e.g., 8321) with DIFFERENT fees on different pages = SEPARATE entries (different specialties)
- page_found MUST be the exact page number (e.g., 85, 90, 92) from the "=== PAGE X ===" markers
- section_heading MUST be the exact heading text the code appears under (look for bold/uppercase headings like "VIRTUAL VISITS", "HOSPITAL CARE", etc.)
- If a code has different fees for telephone vs video, create SEPARATE entries
- If a code is a PERCENTAGE-BASED premium, set fee to "-" and include percentage in condition

If no relevant codes on these pages: {{"found": false, "primary_codes": [], "add_on_codes": []}}"""

print(f"Alberta Code: {AB_CODE} - {AB_DESC} (${AB_FEE})")
print("Core functions ready")

Alberta Code: 03.03CV - Telehealth consultation ($25.09)
Core functions ready


## Cell 5: Phase 1 - Process Manitoba

In [6]:
# Phase 1: Process Manitoba - find matching codes and store chunk text

prov_code = "MB"
prov_name = "Manitoba"
pdf_file = MB_PDF

print(f"{'='*70}")
print(f"PHASE 1: PROCESSING {prov_name} ({prov_code})")
print(f"File: {pdf_file}")
print("="*70)

# ===== LOAD PDF (text only, sections come from CSV reference) =====
print(f"\nLoading {prov_name} PDF...")
pdf_pages = {}

with pdfplumber.open(pdf_file) as pdf:
    for i, page in enumerate(tqdm(pdf.pages, desc="Loading pages")):
        page_num = i + 1
        try:
            text = page.extract_text()
            if text:
                pdf_pages[page_num] = text
        except:
            pass

print(f"Loaded {len(pdf_pages)} pages")

# ===== SEARCH ALL PAGES =====
print(f"\nSearching for matches...")

all_primary = []
all_addons = []

page_nums = sorted(pdf_pages.keys())
batches = [page_nums[i:i+PAGES_PER_CALL] for i in range(0, len(page_nums), PAGES_PER_CALL)]

print(f"Searching {len(page_nums)} pages in {len(batches)} batches...")

for batch_pages in tqdm(batches, desc=f"Searching {prov_code}"):
    context = "\n".join([f"=== PAGE {p} ===\n{pdf_pages[p]}" for p in batch_pages if p in pdf_pages])

    prompt = build_prompt(prov_name, batch_pages, context)

    try:
        resp = client.chat.completions.create(
            model="gpt-5.1-2025-11-13",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.1,
            max_completion_tokens=4000  # Increased to handle multiple specialties per batch
        )
        track_cost(resp.usage.prompt_tokens, resp.usage.completion_tokens)

        content = resp.choices[0].message.content
        match = re.search(r'\{[\s\S]*\}', content)
        if match:
            result = json.loads(match.group())

            if result.get('found'):
                n_primary = len(result.get('primary_codes', []))
                n_addon = len(result.get('add_on_codes', []))
                print(f"  Pages {batch_pages[0]}-{batch_pages[-1]}: {n_primary} primary, {n_addon} add-ons")

                for p in result.get('primary_codes', []):
                    p['pages'] = f"{batch_pages[0]}-{batch_pages[-1]}"
                    # Use page_found for level_1 (specialty) lookup
                    page_found = p.get('page_found', batch_pages[0])
                    section_info = lookup_section(page_found)
                    # Level_2 comes from LLM's section_heading
                    section_info['level_2'] = p.get('section_heading', '')
                    section_info['level_3'] = ''  # Not used currently
                    p['section_info'] = section_info
                    p['chunk_text'] = context
                    all_primary.append(p)

                for a in result.get('add_on_codes', []):
                    a['pages'] = f"{batch_pages[0]}-{batch_pages[-1]}"
                    # Use page_found for level_1 (specialty) lookup
                    page_found = a.get('page_found', batch_pages[0])
                    section_info = lookup_section(page_found)
                    # Level_2 comes from LLM's section_heading
                    section_info['level_2'] = a.get('section_heading', '')
                    section_info['level_3'] = ''  # Not used currently
                    a['section_info'] = section_info
                    a['chunk_text'] = context
                    all_addons.append(a)

    except Exception as e:
        print(f"Error on pages {batch_pages[0]}-{batch_pages[-1]}: {e}")

# ===== DEDUPLICATE BY CODE + FEE + MODALITY + SPECIALTY =====
# Manitoba: Specialty = Level 1 (e.g., "VISITS/EXAMINATIONS—INTERNAL MEDICINE")
# Different fee or different specialty = different row
print(f"\nDeduplicating by code + fee + modality + specialty (Level 1)...")

seen_primary = {}
for p in all_primary:
    code = p.get('code', '')
    fee = str(p.get('fee', ''))
    modality = p.get('modality', '')
    # Manitoba-specific: use level_1 for specialty
    specialty = p.get('section_info', {}).get('level_1', '')

    key = f"{code}_{fee}_{modality}_{specialty}"
    if key and key not in seen_primary:
        seen_primary[key] = p

seen_addon = {}
for a in all_addons:
    code = a.get('code', '')
    fee = str(a.get('fee', ''))
    modality = a.get('modality', '')
    # Manitoba-specific: use level_1 for specialty
    specialty = a.get('section_info', {}).get('level_1', '')

    key = f"{code}_{fee}_{modality}_{specialty}"
    if key and key not in seen_addon:
        seen_addon[key] = a

primary_codes = list(seen_primary.values())
addon_codes = list(seen_addon.values())

print(f"After dedup: {len(primary_codes)} primary, {len(addon_codes)} add-ons")

# ===== DISPLAY RESULTS =====
print(f"\n--- {prov_code} PRIMARY CODES ({len(primary_codes)}) ---")
for p in primary_codes:
    fee_display = p.get('fee', '?')
    # Manitoba: Specialty from Level 1, Subsection from LLM's section_heading
    specialty = p.get('section_info', {}).get('level_1', '-')
    level_2 = p.get('section_info', {}).get('level_2', '-')
    print(f"  {p.get('code', ''):8} | {str(fee_display):>7} | {p.get('modality', '?'):10} | pg {p.get('page_found', '?'):3} | {level_2[:20]:20} | {specialty[:25]}")

print(f"\n--- {prov_code} ADD-ON CODES ({len(addon_codes)}) ---")
for a in addon_codes:
    links = ', '.join(a.get('links_to', [])) if a.get('links_to') else 'unspecified'
    fee_display = a.get('fee', '?')
    # Manitoba: Specialty from Level 1, Subsection from LLM's section_heading
    specialty = a.get('section_info', {}).get('level_1', '-')
    level_2 = a.get('section_info', {}).get('level_2', '-')
    print(f"  {a.get('code', ''):8} | {str(fee_display):>7} | pg {a.get('page_found', '?'):3} | {level_2[:20]:20} | Links: {links[:15]}")

# ===== BUILD RESULTS WITH ALL DIFFERENTIATING FACTORS =====
for p in primary_codes:
    code = p.get('code', '')
    fee = str(p.get('fee', ''))
    modality = p.get('modality', '')
    section_info = p.get('section_info', {})
    # Manitoba-specific: Specialty = Level 1
    specialty = section_info.get('level_1', '')

    # Unique key for chunk storage and Phase 2 merge
    unique_key = f"{code}_{fee}_{modality}_{specialty}"
    code_chunks[unique_key] = p.get('chunk_text', '')

    all_results.append({
        'AB_Code': AB_CODE,
        'AB_Description': AB_DESC,
        'AB_Fee': AB_FEE,
        'Target_Province': prov_code,
        'Code': code,
        'Description': p.get('description', ''),
        'Fee': p.get('fee', ''),
        'Type': 'PRIMARY',
        'Modality': modality,
        'Specialty': specialty,
        'Links_To': '',
        'Condition': '',
        'Reasoning': p.get('reasoning', ''),
        'Level_1_Section': section_info.get('level_1', ''),
        'Level_2_Subsection': section_info.get('level_2', ''),  # Now from LLM's section_heading
        'Level_3_Subsection': section_info.get('level_3', ''),
        'Pages': p.get('pages', ''),
        'Page_Found': p.get('page_found', ''),
        '_unique_key': unique_key  # For Phase 2 merge
    })

for a in addon_codes:
    code = a.get('code', '')
    fee = str(a.get('fee', ''))
    modality = a.get('modality', '')
    section_info = a.get('section_info', {})
    # Manitoba-specific: Specialty = Level 1
    specialty = section_info.get('level_1', '')

    # Unique key for chunk storage and Phase 2 merge
    unique_key = f"{code}_{fee}_{modality}_{specialty}"
    code_chunks[unique_key] = a.get('chunk_text', '')

    all_results.append({
        'AB_Code': AB_CODE,
        'AB_Description': AB_DESC,
        'AB_Fee': AB_FEE,
        'Target_Province': prov_code,
        'Code': code,
        'Description': a.get('description', ''),
        'Fee': a.get('fee', ''),
        'Type': 'ADD-ON',
        'Modality': modality,
        'Specialty': specialty,
        'Links_To': ', '.join(a.get('links_to', [])) if a.get('links_to') else '',
        'Condition': a.get('condition', ''),
        'Reasoning': '',
        'Level_1_Section': section_info.get('level_1', ''),
        'Level_2_Subsection': section_info.get('level_2', ''),  # Now from LLM's section_heading
        'Level_3_Subsection': section_info.get('level_3', ''),
        'Pages': a.get('pages', ''),
        'Page_Found': a.get('page_found', ''),
        '_unique_key': unique_key  # For Phase 2 merge
    })

print(f"\n Phase 1 complete: {len(primary_codes)} primary + {len(addon_codes)} add-ons")
print(f"Total: {len(all_results)} results | ${total_cost:.2f} spent")
print(f"Stored {len(code_chunks)} unique chunk texts for Phase 2")

PHASE 1: PROCESSING Manitoba (MB)
File: MB Payment Schedule - April 1, 2024.pdf

Loading Manitoba PDF...


Loading pages:   0%|          | 0/537 [00:00<?, ?it/s]

Loaded 525 pages

Searching for matches...
Searching 525 pages in 53 batches...


Searching MB:   0%|          | 0/53 [00:00<?, ?it/s]

  Pages 83-92: 6 primary, 2 add-ons
  Pages 93-102: 4 primary, 0 add-ons
  Pages 103-112: 4 primary, 0 add-ons
  Pages 113-122: 4 primary, 0 add-ons
  Pages 123-132: 4 primary, 2 add-ons
  Pages 133-142: 4 primary, 0 add-ons
  Pages 143-152: 5 primary, 2 add-ons
  Pages 153-162: 3 primary, 2 add-ons
  Pages 163-172: 3 primary, 0 add-ons
  Pages 173-182: 12 primary, 0 add-ons
  Pages 183-192: 6 primary, 0 add-ons
  Pages 193-202: 7 primary, 1 add-ons
  Pages 203-212: 6 primary, 2 add-ons
  Pages 213-222: 4 primary, 0 add-ons
  Pages 223-232: 8 primary, 0 add-ons
  Pages 243-252: 1 primary, 0 add-ons
  Pages 253-262: 3 primary, 1 add-ons

Deduplicating by code + fee + modality + specialty (Level 1)...
After dedup: 84 primary, 12 add-ons

--- MB PRIMARY CODES (84) ---
  8340     |   20.40 | telephone  | pg  85 | VIRTUAL VISITS       | VISITS/EXAMINATIONS—INTER
  8321     |   59.05 | both       | pg  85 | VIRTUAL VISITS       | VISITS/EXAMINATIONS—INTER
  8447     |  112.42 | both       | 

## Cell 6: Phase 1 - Save Results

In [7]:
# Save Phase 1 results
df_phase1 = pd.DataFrame(all_results)
phase1_file = 'phase1_manitoba_codes.xlsx'
df_phase1.to_excel(phase1_file, index=False)
print(f"Phase 1 saved: {phase1_file} ({len(df_phase1)} rows)")
df_phase1

Phase 1 saved: phase1_manitoba_codes.xlsx (96 rows)


Unnamed: 0,AB_Code,AB_Description,AB_Fee,Target_Province,Code,Description,Fee,Type,Modality,Specialty,Links_To,Condition,Reasoning,Level_1_Section,Level_2_Subsection,Level_3_Subsection,Pages,Page_Found,_unique_key
0,03.03CV,Telehealth consultation,25.09,MB,8340,Episodic virtual visit by phone,20.40,PRIMARY,telephone,VISITS/EXAMINATIONS—INTERNAL MEDICINE (01),,,"Explicitly a basic, episodic patient-facing vi...",VISITS/EXAMINATIONS—INTERNAL MEDICINE (01),VIRTUAL VISITS,,83-92,85,8340_20.40_telephone_VISITS/EXAMINATIONS—INTER...
1,03.03CV,Telehealth consultation,25.09,MB,8321,Virtual visit by telephone or video,59.05,PRIMARY,both,VISITS/EXAMINATIONS—INTERNAL MEDICINE (01),,,General virtual visit code for telephone or vi...,VISITS/EXAMINATIONS—INTERNAL MEDICINE (01),VIRTUAL VISITS,,83-92,85,8321_59.05_both_VISITS/EXAMINATIONS—INTERNAL M...
2,03.03CV,Telehealth consultation,25.09,MB,8447,Comprehensive Virtual Assessment by telephone ...,112.42,PRIMARY,both,VISITS/EXAMINATIONS—INTERNAL MEDICINE (01),,,More extensive virtual assessment by telephone...,VISITS/EXAMINATIONS—INTERNAL MEDICINE (01),VIRTUAL VISITS,,83-92,85,8447_112.42_both_VISITS/EXAMINATIONS—INTERNAL ...
3,03.03CV,Telehealth consultation,25.09,MB,8340,Episodic virtual visit by phone,20.40,PRIMARY,telephone,NEUROLOGY (01-1),,,Same code number but listed under Neurology wi...,NEUROLOGY (01-1),VIRTUAL VISITS,,83-92,90,8340_20.40_telephone_NEUROLOGY (01-1)
4,03.03CV,Telehealth consultation,25.09,MB,8321,Virtual visit by telephone or video,58.37,PRIMARY,both,NEUROLOGY (01-1),,,Neurology-specific virtual visit by telephone ...,NEUROLOGY (01-1),VIRTUAL VISITS,,83-92,90,8321_58.37_both_NEUROLOGY (01-1)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91,03.03CV,Telehealth consultation,25.09,MB,8418,"15 to 183 days post discharge, add",-,ADD-ON,both,PSYCHIATRY (03),"8340, 8321, 8447, 8535, 8521, 8533, 8786, 8668...",10% premium. Payable in addition to all servic...,,PSYCHIATRY (03),COMMUNITY PSYCHIATRIC CARE FOR ACUTE MENTAL HE...,,153-162,158,8418_-_both_PSYCHIATRY (03)
92,03.03CV,Telehealth consultation,25.09,MB,8465,"Extended Visit, time based premium, add",-,ADD-ON,both,OBSTETRICS AND GYNAECOLOGY (09),"8540, 8505, 8550",20% premium added to the base tariff when mini...,,OBSTETRICS AND GYNAECOLOGY (09),"OFFICE, HOME VISITS",,193-202,197,8465_-_both_OBSTETRICS AND GYNAECOLOGY (09)
93,03.03CV,Telehealth consultation,25.09,MB,5530,Extended clinic hours 0600 to 0800 (6:00 a.m. ...,-,ADD-ON,both,GENERAL PRACTICE (11),"8340, 8345, 8321, 8350, 8442, 8535",Adds a 20% premium to the payable fee for elig...,,GENERAL PRACTICE (11),EXTENDED CLINIC HOURS PREMIUM,,203-212,207,5530_-_both_GENERAL PRACTICE (11)
94,03.03CV,Telehealth consultation,25.09,MB,5531,Extended clinic hours 1700 to 2359 (5:00 p.m. ...,-,ADD-ON,both,GENERAL PRACTICE (11),"8340, 8345, 8321, 8350, 8442, 8535",Adds a 30% premium to the payable fee for elig...,,GENERAL PRACTICE (11),EXTENDED CLINIC HOURS PREMIUM,,203-212,207,5531_-_both_GENERAL PRACTICE (11)


---
# Phase 2: Attribute Extraction
---

## Cell 7: Phase 2 Step 1 - Load Extraction Taxonomy

In [8]:
# Load extraction taxonomy
print("Upload extraction_taxonomy.xlsx:")
uploaded_tax = files.upload()

taxonomy_file = list(uploaded_tax.keys())[0]
df_taxonomy = pd.read_excel(taxonomy_file)

print(f"\nLoaded {len(df_taxonomy)} attributes:")
for _, row in df_taxonomy.iterrows():
    print(f"  - {row['attribute']}: {row['data_type']}")

# Build taxonomy reference string for prompts
taxonomy_reference = "\n".join([
    f"- {row['attribute']} ({row['data_type']}): {row['definition']} Taxonomy: {row['taxonomy']}"
    for _, row in df_taxonomy.iterrows()
])

print("\nTaxonomy loaded and ready for Phase 2")

Upload extraction_taxonomy.xlsx:


Saving extraction_taxonomy_final.xlsx to extraction_taxonomy_final.xlsx

Loaded 11 attributes:
  - modality: enum | null
  - minimum_time_minutes: integer | null
  - frequency_per_day: integer | null
  - frequency_per_year: integer | null
  - frequency_per_year_period: enum | null
  - same_day_exclusions: array | null
  - premium_extended_hours: string | null
  - premium_location: string | null
  - premium_age: string | null
  - premium_other: string | null
  - additional_notes: string | null

Taxonomy loaded and ready for Phase 2


## Cell 8: Phase 2 Step 2 - Extract & Save Rules of Application (Pages 1-82)

In [9]:
import fitz  # PyMuPDF

# Extract pages 1-82 as Rules of Application
RULES_START_PAGE = 1
RULES_END_PAGE = 82

print(f"Extracting Rules of Application (pages {RULES_START_PAGE}-{RULES_END_PAGE})...")

# Open source PDF
src_pdf = fitz.open(MB_PDF)

# Create new PDF with just pages 1-82
rules_pdf = fitz.open()
rules_pdf.insert_pdf(src_pdf, from_page=RULES_START_PAGE-1, to_page=RULES_END_PAGE-1)

# Save Rules of Application PDF
rules_pdf_file = 'rules_of_application.pdf'
rules_pdf.save(rules_pdf_file)
print(f"Saved: {rules_pdf_file} ({RULES_END_PAGE - RULES_START_PAGE + 1} pages)")

# Extract text from Rules of Application for use in prompts
rules_of_application_text = ""
for page_num in range(RULES_START_PAGE - 1, RULES_END_PAGE):
    page = src_pdf[page_num]
    text = page.get_text()
    if text:
        rules_of_application_text += f"\n=== RULES PAGE {page_num + 1} ===\n{text}"

src_pdf.close()
rules_pdf.close()

print(f"Loaded Rules of Application text: {len(rules_of_application_text):,} characters")

# Download the Rules PDF
files.download(rules_pdf_file)

Extracting Rules of Application (pages 1-82)...
Saved: rules_of_application.pdf (82 pages)
Loaded Rules of Application text: 366,405 characters


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Cell 9: Phase 2 Step 3 - Extract Attributes for Each Code

In [10]:
# Phase 2: Extract attributes for each code using Rules of Application + stored chunk text

def build_attribute_prompt(code_info, chunk_text, rules_text, taxonomy_ref):
    """Build prompt to extract attributes for a single code."""
    return f"""You are a senior physician billing specialist extracting detailed attributes for a Manitoba billing code.

CODE TO ANALYZE:
- Code: {code_info['Code']}
- Description: {code_info['Description']}
- Fee: {code_info['Fee']}
- Type: {code_info['Type']}
- Specialty/Section: {code_info.get('Specialty', 'N/A')}
- Condition (from Phase 1): {code_info.get('Condition', 'N/A')}

ATTRIBUTES TO EXTRACT:
{taxonomy_ref}

RULES OF APPLICATION (Pages 1-82 - general billing rules):
{rules_text[:50000]}

CODE-SPECIFIC SECTION (where this code was found):
{chunk_text[:30000]}

TASK:
Using ALL available information above, extract values for each attribute.

INSTRUCTIONS:
1. Use information from BOTH the Rules of Application AND the code-specific section
2. For each attribute, extract the value if found, or null if not stated
3. For same_day_exclusions: return as array of code strings. If the text mentions codes that cannot be billed together or on the same day, put them here
4. For additional_notes: ONLY include important billing information that is:
   - Explicitly mentioned in the text
   - NOT already captured in the Condition field above
   - NOT already in the premium/modifier columns
   - Write complete sentences. If nothing additional, use null.

Return JSON only:
{{
  "modality": "telephone|video|both|in_person|asynchronous|null",
  "minimum_time_minutes": integer or null,
  "frequency_per_day": integer or null,
  "frequency_per_year": integer or null,
  "frequency_per_year_period": "annual|quarterly|90_days|monthly|null",
  "same_day_exclusions": ["code1", "code2"] or [] or null,
  "premium_extended_hours": "rate% code conditions" or null,
  "premium_location": "rate% code conditions" or null,
  "premium_age": "rate% conditions" or null,
  "premium_other": "rate% code conditions" or null,
  "additional_notes": "other important billing info not in Condition or premium columns, complete sentences only" or null
}}"""

# Process each code
phase2_results = []

print(f"Extracting attributes for {len(all_results)} codes...")
print("="*70)

for idx, code_info in enumerate(tqdm(all_results, desc="Extracting attributes")):
    # Use _unique_key for chunk lookup
    unique_key = code_info.get('_unique_key', '')
    chunk_text = code_chunks.get(unique_key, '')

    prompt = build_attribute_prompt(code_info, chunk_text, rules_of_application_text, taxonomy_reference)

    try:
        resp = client.chat.completions.create(
            model="gpt-5.1-2025-11-13",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.1,
            max_completion_tokens=1500
        )
        track_cost(resp.usage.prompt_tokens, resp.usage.completion_tokens)

        content = resp.choices[0].message.content
        match = re.search(r'\{[\s\S]*\}', content)

        if match:
            attrs = json.loads(match.group())

            # Convert same_day_exclusions array to string for Excel
            if attrs.get('same_day_exclusions') and isinstance(attrs['same_day_exclusions'], list):
                attrs['same_day_exclusions'] = ', '.join(attrs['same_day_exclusions'])

            phase2_results.append({
                '_unique_key': unique_key,
                **attrs
            })

            print(f"  {code_info['Code']} ({code_info.get('Specialty', '')[:20]}): extracted {sum(1 for v in attrs.values() if v is not None and v != 'null')} attributes")
        else:
            print(f"  {code_info['Code']}: No JSON found in response")
            phase2_results.append({'_unique_key': unique_key})

    except Exception as e:
        print(f"  {code_info['Code']}: Error - {e}")
        phase2_results.append({'_unique_key': unique_key})

print(f"\nPhase 2 complete: {len(phase2_results)} codes processed")
print(f"Total API cost: ${total_cost:.2f}")

Extracting attributes for 96 codes...


Extracting attributes:   0%|          | 0/96 [00:00<?, ?it/s]

  8340 (VISITS/EXAMINATIONS—): extracted 2 attributes
  8321 (VISITS/EXAMINATIONS—): extracted 2 attributes
  8447 (VISITS/EXAMINATIONS—): extracted 3 attributes
  8340 (NEUROLOGY (01-1)): extracted 1 attributes
  8321 (NEUROLOGY (01-1)): extracted 2 attributes
  8447 (NEUROLOGY (01-1)): extracted 2 attributes
  8340 (GERIATRIC MEDICINE (): extracted 1 attributes
  8321 (GERIATRIC MEDICINE (): extracted 2 attributes
  8340 (RHEUMATOLOGY MEDICIN): extracted 1 attributes
  8321 (RHEUMATOLOGY MEDICIN): extracted 2 attributes
  8340 (CARDIOLOGY (01-4)): extracted 2 attributes
  8321 (CARDIOLOGY (01-4)): extracted 2 attributes
  8340 (GASTROENTEROLOGY (01): extracted 2 attributes
  8321 (GASTROENTEROLOGY (01): extracted 2 attributes
  8340 (NEPHROLOGY (01-6)): extracted 2 attributes
  8321 (NEPHROLOGY (01-6)): extracted 2 attributes
  8340 (ALLERGY & CLINICAL I): extracted 2 attributes
  8321 (ALLERGY & CLINICAL I): extracted 2 attributes
  8340 (MEDICAL GENETICS (01): extracted 1 attribute

## Cell 10: Combine Phase 1 + Phase 2 into Master Sheet

In [11]:
# Combine Phase 1 and Phase 2 results

df_phase1 = pd.DataFrame(all_results)
df_phase2 = pd.DataFrame(phase2_results)

# Merge on _unique_key
df_combined = df_phase1.merge(df_phase2, on='_unique_key', how='left')

# Drop internal column
df_combined = df_combined.drop(columns=['_unique_key'])

# Reorder columns for clarity
column_order = [
    'AB_Code', 'AB_Description', 'AB_Fee', 'Target_Province',
    'Code', 'Description', 'Fee', 'Type', 'Modality', 'Specialty',
    'Links_To', 'Condition', 'Reasoning',
    'Level_1_Section', 'Level_2_Subsection', 'Level_3_Subsection', 'Pages', 'Page_Found',
    'modality', 'minimum_time_minutes', 'frequency_per_day', 'frequency_per_year',
    'frequency_per_year_period', 'same_day_exclusions', 'premium_extended_hours',
    'premium_location', 'premium_age', 'premium_other', 'additional_notes'
]

# Only include columns that exist
final_columns = [c for c in column_order if c in df_combined.columns]
df_combined = df_combined[final_columns]

print(f"Combined DataFrame: {len(df_combined)} rows, {len(df_combined.columns)} columns")
print(f"\nColumns:")
for col in df_combined.columns:
    print(f"  - {col}")

df_combined

Combined DataFrame: 96 rows, 29 columns

Columns:
  - AB_Code
  - AB_Description
  - AB_Fee
  - Target_Province
  - Code
  - Description
  - Fee
  - Type
  - Modality
  - Specialty
  - Links_To
  - Condition
  - Reasoning
  - Level_1_Section
  - Level_2_Subsection
  - Level_3_Subsection
  - Pages
  - Page_Found
  - modality
  - minimum_time_minutes
  - frequency_per_day
  - frequency_per_year
  - frequency_per_year_period
  - same_day_exclusions
  - premium_extended_hours
  - premium_location
  - premium_age
  - premium_other
  - additional_notes


Unnamed: 0,AB_Code,AB_Description,AB_Fee,Target_Province,Code,Description,Fee,Type,Modality,Specialty,...,minimum_time_minutes,frequency_per_day,frequency_per_year,frequency_per_year_period,same_day_exclusions,premium_extended_hours,premium_location,premium_age,premium_other,additional_notes
0,03.03CV,Telehealth consultation,25.09,MB,8340,Episodic virtual visit by phone,20.40,PRIMARY,telephone,VISITS/EXAMINATIONS—INTERNAL MEDICINE (01),...,,,,,,,,,,Tariff 8340 is an episodic virtual visit that ...
1,03.03CV,Telehealth consultation,25.09,MB,8321,Virtual visit by telephone or video,59.05,PRIMARY,both,VISITS/EXAMINATIONS—INTERNAL MEDICINE (01),...,,,,,,,,,,Tariff 8321 is a virtual visit that may be pro...
2,03.03CV,Telehealth consultation,25.09,MB,8447,Comprehensive Virtual Assessment by telephone ...,112.42,PRIMARY,both,VISITS/EXAMINATIONS—INTERNAL MEDICINE (01),...,,,,,,,15% hospital in‑patient or Emergency Departmen...,,,Tariff 8447 may only be provided as part of a ...
3,03.03CV,Telehealth consultation,25.09,MB,8340,Episodic virtual visit by phone,20.40,PRIMARY,telephone,NEUROLOGY (01-1),...,,,,,,,,,,
4,03.03CV,Telehealth consultation,25.09,MB,8321,Virtual visit by telephone or video,58.37,PRIMARY,both,NEUROLOGY (01-1),...,,,,,,,,,,Tariff 8321 is a virtual visit that may be pro...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91,03.03CV,Telehealth consultation,25.09,MB,8418,"15 to 183 days post discharge, add",-,ADD-ON,both,PSYCHIATRY (03),...,,,,,[],,,,10% 8418 Payable in addition to all services p...,Code 8418 is an add-on premium that must be bi...
92,03.03CV,Telehealth consultation,25.09,MB,8465,"Extended Visit, time based premium, add",-,ADD-ON,both,OBSTETRICS AND GYNAECOLOGY (09),...,30.0,,,,,,,20% under 18 years when billed with 8550 and m...,20% 8465 added to 8540 with minimum 45 minutes...,Patient/physician contact time must be documen...
93,03.03CV,Telehealth consultation,25.09,MB,5530,Extended clinic hours 0600 to 0800 (6:00 a.m. ...,-,ADD-ON,both,GENERAL PRACTICE (11),...,,,,,"5555, 5553, 5550, 5556, 5557, 5558, 8000, 8001...","20% 5530 weekday early-morning, weekday evenin...",,,,The service is payable as a 20% premium on the...
94,03.03CV,Telehealth consultation,25.09,MB,5531,Extended clinic hours 1700 to 2359 (5:00 p.m. ...,-,ADD-ON,both,GENERAL PRACTICE (11),...,,,,,"5530, 5555, 5553, 5550, 5556, 5557, 5558, 8000...",30% 5531 Friday-Sunday and designated holidays...,,,,The time the service commences must be entered...


## Cell 11: Save Final Master Sheet + Print Results

In [12]:
# Save final combined results
print("="*70)
print("FINAL OUTPUT")
print("="*70)

output_file = 'manitoba_crosswalk_complete.xlsx'
df_combined.to_excel(output_file, index=False)
print(f"\nSaved: {output_file}")
print(f"  - Rows: {len(df_combined)}")
print(f"  - Columns: {len(df_combined.columns)}")

# Summary statistics
print(f"\n--- SUMMARY ---")
print(f"Total codes found: {len(df_combined)}")
print(f"  - PRIMARY: {len(df_combined[df_combined['Type'] == 'PRIMARY'])}")
print(f"  - ADD-ON: {len(df_combined[df_combined['Type'] == 'ADD-ON'])}")

# Unique codes vs rows (shows specialty differentiation)
unique_codes = df_combined['Code'].nunique()
print(f"\nUnique code numbers: {unique_codes}")
print(f"Total rows (code+fee+specialty variations): {len(df_combined)}")

# Specialty breakdown
if 'Specialty' in df_combined.columns:
    print(f"\n--- BY SPECIALTY ---")
    specialty_counts = df_combined['Specialty'].value_counts()
    for specialty, count in specialty_counts.items():
        print(f"  {specialty or 'N/A'}: {count}")

print(f"\n--- CODES ---")
for _, row in df_combined.iterrows():
    specialty = row.get('Specialty', '-') or '-'
    modality = row.get('Modality', '?') or '?'
    print(f"  {row['Code']:8} | {str(row['Fee']):>7} | {row['Type']:8} | {modality:10} | {specialty[:15]:15} | {str(row['Description'])[:25]}")

print(f"\n--- PHASE 2 ATTRIBUTES FILLED ---")
attr_cols = ['modality', 'minimum_time_minutes', 'frequency_per_day', 'frequency_per_year',
             'frequency_per_year_period', 'same_day_exclusions', 'premium_extended_hours',
             'premium_location', 'premium_age', 'premium_other', 'additional_notes']

for col in attr_cols:
    if col in df_combined.columns:
        filled_real = sum(1 for v in df_combined[col] if pd.notna(v) and v != 'null' and v != '')
        print(f"  {col}: {filled_real}/{len(df_combined)} filled")

print(f"\n--- COST ---")
print(f"Total API calls: {total_calls}")
print(f"Total cost: ${total_cost:.2f}")

# Download
files.download(output_file)
print(f"\nDownload started: {output_file}")

FINAL OUTPUT

Saved: manitoba_crosswalk_complete.xlsx
  - Rows: 96
  - Columns: 29

--- SUMMARY ---
Total codes found: 96
  - PRIMARY: 84
  - ADD-ON: 12

Unique code numbers: 18
Total rows (code+fee+specialty variations): 96

--- BY SPECIALTY ---
  GENERAL PRACTICE (11): 9
  PAEDIATRICS (02): 7
  GENERAL SCHEDULE: 5
  PSYCHIATRY (03): 5
  VISITS/EXAMINATIONS—INTERNAL MEDICINE (01): 4
  NEUROLOGY (01-1): 4
  GENERAL SURGERY (04-1): 3
  ENDOCRINOLOGY (13-1): 3
  MEDICAL GENETICS (01-8): 3
  ORTHOPAEDIC SURGERY (04-5): 3
  UROLOGY (04-4): 3
  PLASTIC & RECONSTRUCTIVE SURGERY (04-3): 3
  OBSTETRICS AND GYNAECOLOGY (09): 3
  CARDIAC SURGERY (04-2): 3
  ALLERGY & CLINICAL IMMUNOLOGY (01-7): 2
  GASTROENTEROLOGY (01-5): 2
  NEPHROLOGY (01-6): 2
  INFECTIOUS DISEASE (13-3): 2
  RESPIROLOGY (13-4): 2
  RHEUMATOLOGY MEDICINE (01-3): 2
  GERIATRIC MEDICINE (01-2): 2
  CARDIOLOGY (01-4): 2
  OTORHINOLARYNGOLOGY (05-2): 2
  OPHTHALMOLOGY (05-1): 2
  NEUROLOGICAL SURGERY (04-6): 2
  ANESTHESIOLOGY (

  filled_real = sum(1 for v in df_combined[col] if pd.notna(v) and v != 'null' and v != '')


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>


Download started: manitoba_crosswalk_complete.xlsx
