# ICD‑10 Mapper Evaluation (R/I/J chapters)
*Generated 2025-06-13T17:18*

This notebook:
1. Loads the Kaggle `icd10_symptoms.csv` mapping.
2. Loads the gold symptoms from the 914‑utterance dataset.
3. Filters ICD‑10 codes to chapters **R, I, J**.
4. Computes coverage, ambiguity, and prepares stubs for accuracy evaluation once gold codes are available.


In [1]:
import pandas as pd, ast, collections
print('pandas', pd.__version__)

pandas 2.2.2


## Load mapping and gold symptoms

In [2]:
MAP_PATH = '/content/icd10_symptoms.csv'
GOLD_PATH = '/content/symptom_extraction_results.csv'
map_df = pd.read_csv(MAP_PATH)
gold_df = pd.read_csv(GOLD_PATH)
gold_df['gold_list'] = gold_df['actual_output'].apply(ast.literal_eval)
gold_symptoms = {s.lower().strip() for sub in gold_df['gold_list'] for s in sub}
print('Unique gold symptoms:', len(gold_symptoms))

Unique gold symptoms: 160


## Build symptom → codes dictionary (filtered to R/I/J)

In [7]:
symptom_to_codes = collections.defaultdict(set)
for _,row in map_df.iterrows():
    code = row.icd10code.strip()
    # if not code or code[0] not in 'RIJ':
    #     continue
    for symptom in [s.strip().lower() for s in row.symptoms.split(',')]:
        symptom_to_codes[symptom].add(code)
print('Symptoms with ≥1 R/I/J code:', len(symptom_to_codes))

Symptoms with ≥1 R/I/J code: 7198


## Coverage & ambiguity for gold symptoms

In [8]:
covered = {s for s in gold_symptoms if s in symptom_to_codes}
coverage = len(covered)/len(gold_symptoms)
unambig = sum(1 for s in covered if len(symptom_to_codes[s])==1)
ambig = len(covered) - unambig
print(f'Coverage: {len(covered)}/{len(gold_symptoms)} = {coverage:.1%}')
print(f'  Unambiguous: {unambig}')
print(f'  Ambiguous (≥2 codes): {ambig}')

Coverage: 58/160 = 36.2%
  Unambiguous: 8
  Ambiguous (≥2 codes): 50


## Baseline top‑1 picker (alphabetical first code)

In [9]:
# Stub for future gold code annotation
gold_code_dict = {}  # symptom -> gold ICD‑10 code (R/I/J only)
predictions = {s: sorted(symptom_to_codes[s])[0] for s in covered}
print('Baseline predictions prepared.')
print(predictions)

Baseline predictions prepared.
{'': 'C44.6', 'constipation': 'A00.0', 'swelling': 'A25.0', 'high blood sugar levels': 'E24.9', 'rash': 'A25.1', 'nausea': 'A02.1', 'calf pain': 'O22.3', 'trouble breathing': 'I30.1', 'fatigue': 'A06.3', 'pain in the legs': 'I70.2', 'high cholesterol': 'R94.6', 'redness': 'A46', 'pain': 'A46', 'pain in the shoulder': 'G54.2', 'frequent urination': 'A56.2', 'discomfort': 'A63.0', 'pain in the wrist': 'S62.0', 'bruising': 'L90.4', 'sinusitis': 'D83.1', 'difficulty breathing': 'A16.2', 'stomach pain': 'A09.0', 'confusion': 'A24.4', 'difficulty walking': 'A81.1', 'frequent urination at night': 'D40.7', 'neck pain': 'C75.0', 'memory loss': 'A17.0', 'lower back pain': 'C62.9', 'fever': 'A00.0', 'chest discomfort': 'D15.2', 'cough': 'A15.3', 'back pain': 'C54.8', 'diarrhea': 'A01.0', 'stiffness': 'G24.8', 'numbness': 'A30.0', 'joint pain': 'A23.0', 'heart palpitations': 'C74.1', 'runny nose': 'A37.8', 'muscle weakness': 'A30.0', 'redness and swelling': 'T31.9', 

### TODO
1. Fill `gold_code_dict` with reference codes.
2. Compute exact‑code, block‑code, chapter‑code accuracy.
3. Build confusion analysis.