# Modular Agent-Based TNM Staging System for Non-Small Cell Lung Cancer

## Overview

This notebook provides a comprehensive guide to using the Modular Agent-Based TNM Staging System for Non-Small Cell Lung Cancer (NSCLC). This system implements both **AJCC 8th and 9th edition** TNM staging criteria using a modular agent architecture with Large Language Models (LLMs).

### Key Features

- **Modular Agent Architecture**: Separate specialized agents for T, N, M, and Stage classification
- **Histology Classification**: WHO-based histology classification system
- **AJCC Edition Support**: Supports both AJCC 8th and 9th edition staging criteria (selectable)
- **Comprehensive Reporting**: Detailed justifications and confidence scores for each classification
- **Reproducible Workflow**: State-based workflow using LangGraph for reliable execution

### System Architecture

The system uses a sequential workflow:
1. **Histology Classification** → Identifies tumor histology type
2. **T Classification** → Determines tumor size and local extent
3. **N Classification** → Assesses lymph node involvement (with N2a/N2b subcategories in 9th edition)
4. **M Classification** → Evaluates distant metastasis (with M1c1/M1c2 subcategories in 9th edition)
5. **Stage Classification** → Combines T/N/M to determine overall stage

### AJCC Edition Differences

**AJCC 8th Edition**:
- Standard N classification (N0, N1, N2, N3)
- Standard M classification (M0, M1a, M1b, M1c)

**AJCC 9th Edition**:
- **Enhanced N Classification**: Introduces N2a (single mediastinal station) and N2b (multiple mediastinal stations) subcategories
- **Enhanced M Classification**: M1c subdivided into M1c1 (single organ system) and M1c2 (multiple organ systems)
- Updated stage grouping rules

**Note**: You can select the edition in the Configuration section below.

---

## Citation

If you use this system in your research, please cite:

**Paper**: [Citation will be added upon publication]

**Code**: [GitHub repository and DOI will be added]


## Table of Contents

**Execute cells in order from top to bottom:**

1. [Installation and Setup](#installation) - Install required Python packages
2. [System Requirements](#requirements) - Load and verify environment variables
3. [Configuration](#configuration) - **Select AJCC 8th or 9th edition** → Load configuration files
4. [Data Preparation](#data-prep) - Understand input data format
5. [Running the Workflow](#workflow) - Execute TNM classification
6. [Understanding Results](#results) - Interpret output files
7. [Example Usage](#examples) - Complete workflow examples
8. [Troubleshooting](#troubleshooting) - Common issues and solutions
9. [Additional Resources](#resources) - References and limitations


## 1. Installation and Setup {#installation}

First, let's install the required dependencies.


In [1]:
# Install required packages
%pip install -U pandas openpyxl pyyaml python-dotenv
%pip install -U langchain langchain-core langchain-openai langchain-community langchain-experimental langgraph
%pip install -U openai pydantic

# Install Jupyter Notebook support (if not already installed)
%pip install -U ipykernel jupyter notebook ipywidgets


Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## 2. System Requirements {#requirements}

### Environment Variables

The system requires Azure OpenAI API credentials. Create a `.env` file in the project root.

**Option 1: Copy from sample file**
```bash
cp sample.env .env
# Then edit .env with your actual credentials
```

**Option 2: Create manually**

Create a `.env` file with the following variables:

```
AZURE_OPENAI_API_KEY=your_api_key_here
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_VERSION=2024-05-01-preview
AZURE_OPENAI_API_BASE=https://your-resource.openai.azure.com/
```

**Note**: 
- See `sample.env` file for a template
- For security reasons, never commit your `.env` file to version control
- The system will automatically use `AZURE_OPENAI_ENDPOINT` if `AZURE_OPENAI_API_BASE` is not set


In [2]:
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Verify environment variables are loaded
# Note: The system uses AZURE_OPENAI_API_BASE, but sample.env uses AZURE_OPENAI_ENDPOINT
# Both are checked for compatibility
required_vars = ['AZURE_OPENAI_API_KEY']
optional_vars = {
    'AZURE_OPENAI_API_BASE': os.getenv('AZURE_OPENAI_ENDPOINT') or os.getenv('AZURE_OPENAI_API_BASE'),
    'AZURE_OPENAI_API_VERSION': os.getenv('AZURE_OPENAI_API_VERSION')
}

missing_vars = [var for var in required_vars if not os.getenv(var)]

if missing_vars:
    print(f"Warning: Missing required environment variables: {missing_vars}")
    print("Please create a .env file with the required Azure OpenAI credentials.")
    print("\nExample .env file (see sample.env for reference):")
    print("AZURE_OPENAI_API_KEY=your_api_key_here")
    print("AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/")
    print("AZURE_OPENAI_API_VERSION=2024-05-01-preview")
    print("AZURE_OPENAI_API_BASE=https://your-resource.openai.azure.com/")
else:
    print("✓ Environment variables loaded successfully")
    # Set AZURE_OPENAI_API_BASE from AZURE_OPENAI_ENDPOINT if needed
    if os.getenv('AZURE_OPENAI_ENDPOINT') and not os.getenv('AZURE_OPENAI_API_BASE'):
        os.environ['AZURE_OPENAI_API_BASE'] = os.getenv('AZURE_OPENAI_ENDPOINT')
        print("✓ Set AZURE_OPENAI_API_BASE from AZURE_OPENAI_ENDPOINT")


✓ Environment variables loaded successfully


## 3. Configuration {#configuration}

### AJCC Edition Selection

The system supports both AJCC 8th and 9th editions. Select your preferred edition:


In [3]:
# Select AJCC Edition: '8th' or '9th'
AJCC_EDITION = '8th'  # Change to '9th' for AJCC 9th edition

# Set paths based on selected edition
# Note: There is only one config file (config/tnm_config.yaml)
# The edition is selected by specifying the tnm_json_path
CONFIG_PATH = 'config/tnm_config.yaml'

if AJCC_EDITION == '8th':
    TNM_JSON_PATH = 'config/ajcc8th/tnm_classification.json'
    edition_name = 'AJCC 8th edition'
elif AJCC_EDITION == '9th':
    TNM_JSON_PATH = 'config/ajcc9th/tnm_classification.json'
    edition_name = 'AJCC 9th edition'
else:
    raise ValueError(f"Invalid edition: {AJCC_EDITION}. Must be '8th' or '9th'")

print(f"Selected Edition: {edition_name}")
print(f"Config Path: {CONFIG_PATH}")
print(f"TNM JSON Path: {TNM_JSON_PATH}")


Selected Edition: AJCC 8th edition
Config Path: config/tnm_config.yaml
TNM JSON Path: config/ajcc8th/tnm_classification.json


### Loading Configuration Files

**Important**: Make sure you have run the previous cell (AJCC Edition Selection) before running this cell.

Now let's load the configuration files based on the selected edition:


In [4]:
import sys
from pathlib import Path

# Add src to path for imports
sys.path.insert(0, str(Path.cwd()))

from src.models import Config

# Initialize Config with selected edition
# The Config class will automatically detect the AJCC edition from tnm_json_path
config_obj = Config(
    config_path=CONFIG_PATH,
    tnm_json_path=TNM_JSON_PATH
)

# Access configuration dictionary
config = config_obj.config

# Get LLM settings
llm_choice = config.get('model_settings', {}).get('llm_choice', 'azure')
llm_settings = config.get('model_settings', {}).get(llm_choice, {})
model_name = llm_settings.get('name', 'N/A')
temperature_low = llm_settings.get('temperature_low', 0.0)
temperature_high = llm_settings.get('temperature_high', 0.0)

# Get detected AJCC edition
ajcc_edition = config.get('ajcc_edition', 'N/A')
ajcc_version = config.get('ajcc_version', 'N/A')

print("✓ Configuration loaded successfully")
print(f"\nLLM Provider: {llm_choice}")
print(f"LLM Model: {model_name}")
print(f"Temperature Range: {temperature_low} - {temperature_high}")
print(f"\nDetected AJCC Edition: {ajcc_edition} (AJCC {ajcc_version} edition)")

# Store configuration for later use
CONFIG_OBJ = config_obj
CONFIG = config
TNM_DATA = config_obj.tnm_data


No user ID provided, using default paths


✓ Configuration loaded successfully

LLM Provider: azure
LLM Model: gpt-4o
Temperature Range: 0.0 - 0.0

Detected AJCC Edition: ajcc8th (AJCC 8th edition)


## 4. Data Preparation {#data-prep}

### Input Data Format

The system expects an Excel file with the following columns:

**Required Columns:**
- `hospital_id`: Unique identifier for the patient/hospital record

**Medical Report Columns (at least one required):**
- `Pathology`: Pathology report text
- `Chest CT`: Chest CT scan report
- `Brain MR`: Brain MRI report
- `PET`: PET scan report
- `EBUS`: Endobronchial ultrasound report
- `Neck Biopsy`: Neck biopsy report
- `Bone Scan`: Bone scan report
- `Abdomen&Pelvis CT`: Abdominal/pelvic CT report
- `Adrenal CT`: Adrenal CT report

**Optional Ground Truth Columns (for evaluation):**
- `cT`: True T classification
- `cN`: True N classification
- `cM`: True M classification
- `cStage`: True stage classification

### Example Data Structure

Let's create a sample input structure:


In [5]:
import pandas as pd

# Example input data structure
example_data = {
    'hospital_id': ['PAT001'],
    'Pathology': ['Adenocarcinoma, acinar pattern. Tumor size: 2.5 cm.'],
    'Chest CT': ['Right upper lobe mass measuring 2.5 cm. No pleural invasion.'],
    'PET': ['FDG-avid right upper lobe mass. No distant metastasis.'],
    'Brain MR': ['No evidence of brain metastasis.'],
    'EBUS': ['No mediastinal lymph node involvement.'],
    'Neck Biopsy': [None],
    'Bone Scan': ['No bone metastasis.'],
    'Abdomen&Pelvis CT': ['No abdominal metastasis.'],
    'Adrenal CT': [None],
    'cT': ['T1c'],
    'cN': ['N0'],
    'cM': ['M0'],
    'cStage': ['IA3']
}

df_example = pd.DataFrame(example_data)
print("Example input data structure:")
print(df_example.to_string())


Example input data structure:
  hospital_id                                            Pathology                                                      Chest CT                                                     PET                          Brain MR                                    EBUS Neck Biopsy            Bone Scan         Abdomen&Pelvis CT Adrenal CT   cT  cN  cM cStage
0      PAT001  Adenocarcinoma, acinar pattern. Tumor size: 2.5 cm.  Right upper lobe mass measuring 2.5 cm. No pleural invasion.  FDG-avid right upper lobe mass. No distant metastasis.  No evidence of brain metastasis.  No mediastinal lymph node involvement.        None  No bone metastasis.  No abdominal metastasis.       None  T1c  N0  M0    IA3


## 5. Running the Workflow {#workflow}

Now let's load the workflow code and run a classification example. First, we need to import the workflow module:


In [6]:
# Note: The workflow code should be in a separate Python file (tnm_workflow.py)
# For this notebook, we'll demonstrate the key concepts

import sys
import logging
from pathlib import Path

# Setup logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

print("Logging configured")


Logging configured


### Loading the Workflow

The workflow code is embedded in the notebook. Let's load it:


**Important Note**: The complete workflow code (`tnm_workflow.py`) contains the full implementation. For this demonstration notebook, we'll show how to use the workflow with example data. The actual workflow code should be imported from the `tnm_workflow` module.

### Initializing the Workflow

Let's set up the configuration and initialize the workflow:


In [7]:
# Import workflow module
from src.tnm_workflow import TNMClassificationWorkflow
from src.utils.logging_utils import setup_logging

# Setup logging
logger = setup_logging(CONFIG_OBJ)

# Initialize workflow with the config object
workflow = TNMClassificationWorkflow(CONFIG_OBJ)

print(f"✓ Workflow initialized successfully")
print(f"Using {edition_name} configuration.")
print(f"AJCC Edition: {CONFIG_OBJ.config.get('ajcc_edition')} (AJCC {CONFIG_OBJ.config.get('ajcc_version')} edition)")


2025-11-22 08:51:08,005 - INFO - Using Azure OpenAI API with model: gpt-4o (temperature: 0.0)
2025-11-22 08:51:08,006 - INFO - HistologyClassificationWorkflow initialized successfully
2025-11-22 08:51:08,022 - INFO - Using Azure OpenAI API with model: gpt-4o (temperature: 0.0)
2025-11-22 08:51:08,023 - INFO - Loading classification prompts (AJCC 8th edition):
2025-11-22 08:51:08,023 - INFO - Using prompts from: ajcc8th_prompts
2025-11-22 08:51:08,023 - INFO - Detected AJCC edition: ajcc8th (from config.get('ajcc_edition'))
2025-11-22 08:51:08,023 - INFO - Prompt verification - N prompt has N2a/N2b: False, M prompt has M1c1/M1c2: False
2025-11-22 08:51:08,024 - INFO - Global consensus setting: True
2025-11-22 08:51:08,024 - INFO - T Classifier: CONSENSUS MODE
2025-11-22 08:51:08,024 - INFO - N Classifier: CONSENSUS MODE
2025-11-22 08:51:08,025 - INFO - M Classifier: CONSENSUS MODE


✓ Workflow initialized successfully
Using AJCC 8th edition configuration.
AJCC Edition: ajcc8th (AJCC 8th edition)


### Processing a Single Case

Here's how to process a single patient case:


In [8]:
# Example: Prepare input data for a single case
example_input = {
    "case_number": 1,
    "hospital_id": "PAT001",
    "pathology": "Adenocarcinoma, acinar pattern. Tumor size: 2.5 cm. No lymphovascular invasion.",
    "chest_ct": "Right upper lobe mass measuring 2.5 cm. No pleural invasion. No mediastinal lymphadenopathy.",
    "pet": "FDG-avid right upper lobe mass. No distant metastasis identified.",
    "brain_mr": "No evidence of brain metastasis.",
    "ebus": "No mediastinal lymph node involvement.",
    "neck_biopsy": None,
    "bone_scan": "No bone metastasis.",
    "abdomen_pelvis_ct": "No abdominal metastasis.",
    "adrenal_ct": None
}

# Optional: Include true TNM values for accuracy evaluation
true_tnm = {
    "true_T": "T1c",
    "true_N": "N0",
    "true_M": "M0",
    "true_Stage": "IA3"
}

# Run workflow
initial_state = {
    "input": example_input,
    "true_tnm": true_tnm,  # Optional: for accuracy evaluation
    "iteration_count": 0
}

print("Running workflow...")
final_state = workflow.run(initial_state)

if final_state:
    print("\n✓ Workflow completed successfully!")
    print(f"\nResults:")
    print(f"  Histology: {final_state.get('histology_type', 'N/A')}")
    print(f"  T Classification: {final_state.get('t_classification', 'N/A')}")
    print(f"  N Classification: {final_state.get('n_classification', 'N/A')}")
    print(f"  M Classification: {final_state.get('m_classification', 'N/A')}")
    print(f"  Stage Classification: {final_state.get('stage_classification', 'N/A')}")
else:
    print("⚠ Workflow did not complete successfully")


2025-11-22 08:51:08,052 - INFO - Using reports from: ['pathology']
2025-11-22 08:51:08,052 - INFO - Using pathology report for histology classification


Running workflow...


2025-11-22 08:51:10,647 - INFO - Valid histology classification found: Invasive Nonmucinous Adenocarcinoma
2025-11-22 08:51:10,648 - INFO - Histology classification completed from pathology: epithelial_tumors - Invasive Nonmucinous Adenocarcinoma
2025-11-22 08:51:10,649 - INFO - 
2025-11-22 08:51:10,650 - INFO - Starting CONSENSUS MODE for T
2025-11-22 08:51:10,651 - INFO - Required responses: 1
2025-11-22 08:51:10,652 - INFO - 
Attempt 1/50
2025-11-22 08:51:10,652 - INFO - Input data for T classifier (first 2000 chars):
Case Number: 1
Hospital ID: PAT001

CHEST CT:
Right upper lobe mass measuring 2.5 cm. No pleural invasion. No mediastinal lymphadenopathy.

PATHOLOGY:
Adenocarcinoma, acinar pattern. Tumor size: 2.5 cm. No lymphovascular invasion.

PET:
FDG-avid right upper lobe mass. No distant metastasis identified.

BRAIN MR:
No evidence of brain metastasis.

EBUS:
No mediastinal lymph node involvement.

BONE SCAN:
No bone metastasis.

ABDOMEN PELVIS CT:
No abdominal metastasis.
202


✓ Workflow completed successfully!

Results:
  Histology: N/A
  T Classification: N/A
  N Classification: N/A
  M Classification: N/A
  Stage Classification: N/A


### Processing Multiple Cases from Excel

To process multiple cases from an Excel file:


In [9]:
# Example: Process Excel file
from src.utils.data_utils import process_excel_data
from src.utils.metrics_utils import calculate_metrics

# Option 1: Use command-line interface (recommended)
# Run: python run_workflow.py -i input/sample.xlsx -o output/results

# Option 2: Process cases programmatically
# Set input file in config
CONFIG_OBJ.set_input_file("input/single_excel_ajcc_8th.xlsx")  # Update with your file path

# Process all cases from Excel file
processed_data = process_excel_data(CONFIG_OBJ)

print(f"Processed {len(processed_data)} cases from Excel file")

# Process each case
success_count = 0
for idx, case in enumerate(processed_data, 1):
    case_number = case['case_number']
    print(f"\nProcessing case {idx}/{len(processed_data)}: {case_number}")
    
    initial_state = {
        "input": case['input'],
        "true_tnm": case.get('true_tnm', {}),  # Include if available
        "iteration_count": 0
    }
    
    try:
        final_state = workflow.run(initial_state)
        if final_state and final_state.get("next") is None:
            success_count += 1
            print(f"  ✓ Case {case_number} completed successfully")
        else:
            print(f"  ⚠ Case {case_number} ended at unexpected state")
    except Exception as e:
        print(f"  ✗ Error processing case {case_number}: {e}")

# Finalize CSV file
workflow.finalize_csv()

# Calculate metrics if true TNM values were provided
accuracy_metrics, confusion_metrics, detailed_results = calculate_metrics(workflow)

if accuracy_metrics:
    print("\n" + "="*70)
    print("Classification Accuracy Summary")
    print("="*70)
    for category in ['T', 'N', 'M', 'Stage']:
        accuracy = accuracy_metrics.get(category, {})
        if accuracy:
            print(f"{category:<6} Accuracy: {accuracy.get('accuracy', 0):.2f}% "
                  f"({accuracy.get('correct', 0)}/{accuracy.get('total', 0)})")

print(f"\n✓ Processing complete: {success_count}/{len(processed_data)} cases successful")
print(f"Results saved to: {CONFIG_OBJ.config.get('output_file')}")


2025-11-22 08:51:20,003 - INFO - 1 records read from Excel file: input/single_excel_ajcc_8th.xlsx (sheet: 'data', available sheets: ['data'])
2025-11-22 08:51:20,003 - INFO - Case 1 input data preparation completed
2025-11-22 08:51:20,015 - INFO - Using reports from: ['pathology']
2025-11-22 08:51:20,015 - INFO - Using pathology report for histology classification


Processed 1 cases from Excel file

Processing case 1/1: 1


2025-11-22 08:51:23,210 - INFO - Valid histology classification found: Invasive Nonmucinous Adenocarcinoma
2025-11-22 08:51:23,222 - INFO - Histology classification completed from pathology: epithelial_tumors - Invasive Nonmucinous Adenocarcinoma
2025-11-22 08:51:23,225 - INFO - 
2025-11-22 08:51:23,236 - INFO - Starting CONSENSUS MODE for T
2025-11-22 08:51:23,237 - INFO - Required responses: 1
2025-11-22 08:51:23,238 - INFO - 
Attempt 1/50
2025-11-22 08:51:23,239 - INFO - Input data for T classifier (first 2000 chars):
Case Number: 1
Hospital ID: PAT005

CHEST CT:
Left lower lobe nodule measuring 1.8 cm. Well-defined margins. No lymphadenopathy.

PATHOLOGY:
Lung, left lower lobe, aspiration:
ADENOCARCINOMA.
Tumor size: 1.8 cm.
No invasion identified.

PET:
FDG-avid left lower lobe nodule (max SUV 3.5). No abnormal lymph node uptake.

BRAIN MR:
No evidence of brain metastasis.

EBUS:
No mediastinal or hilar lymph node involvement. Stations 7, 11L sampled - all negative.

BONE SCAN:
No

  ✓ Case 1 completed successfully

Classification Accuracy Summary
T      Accuracy: 100.00% (1/1)
N      Accuracy: 100.00% (1/1)
M      Accuracy: 100.00% (1/1)
Stage  Accuracy: 100.00% (1/1)

✓ Processing complete: 1/1 cases successful
Results saved to: /Users/hwon/Documents/Github/NSCLC-ModularStageLLM/output/tnm_classification.csv


## 6. Understanding Results {#results}

The workflow generates two types of output files:

1. **CSV File** (`output/tnm_classification.csv`): Tabular results with all classifications
2. **JSON File** (`output/patient_data.json`): Detailed structured results with justifications

### Output Structure

Each classification includes:
- **Classification**: The determined T/N/M/Stage value
- **Justification**: Detailed reasoning for the classification
- **Confidence**: Confidence level (high/medium/low)
- **Additional Tests Needed**: Recommendations for further testing if needed

### Example Output Format

Let's examine the expected output structure:


In [11]:
import json

# Example output structure
example_output = {
    "id": 1,
    "hospitalNumber": "PAT001",
    "histology": {
        "category": "epithelial_tumors",
        "subcategory": "adenocarcinomas",
        "type": "Acinar adenocarcinoma",
        "confidence": "high",
        "reasoning": "Clear acinar pattern identified in pathology report"
    },
    "aiTnm": {
        "T": {
            "value": "T1c",
            "reason": "Tumor size is 2.5 cm, which falls within the T1c category (2-3 cm). No invasion of surrounding structures."
        },
        "N": {
            "value": "N0",
            "reason": "No evidence of lymph node metastasis in EBUS, PET, or CT scans."
        },
        "M": {
            "value": "M0",
            "reason": "No distant metastasis identified in brain MR, PET, bone scan, or abdominal imaging."
        },
        "Stage": "IA3"
    }
}

print("Example output structure:")
print(json.dumps(example_output, indent=2))


Example output structure:
{
  "id": 1,
  "hospitalNumber": "PAT001",
  "histology": {
    "category": "epithelial_tumors",
    "subcategory": "adenocarcinomas",
    "type": "Acinar adenocarcinoma",
    "confidence": "high",
    "reasoning": "Clear acinar pattern identified in pathology report"
  },
  "aiTnm": {
    "T": {
      "value": "T1c",
      "reason": "Tumor size is 2.5 cm, which falls within the T1c category (2-3 cm). No invasion of surrounding structures."
    },
    "N": {
      "value": "N0",
      "reason": "No evidence of lymph node metastasis in EBUS, PET, or CT scans."
    },
    "M": {
      "value": "M0",
      "reason": "No distant metastasis identified in brain MR, PET, bone scan, or abdominal imaging."
    },
    "Stage": "IA3"
  }
}


### Reading Results

Let's see how to read and analyze the results:


In [12]:
# Example: Read CSV results
# import pandas as pd
# results_df = pd.read_csv('output/tnm_classification.csv')
# print(results_df.head())

# Example: Read JSON results
# with open('output/patient_data.json', 'r', encoding='utf-8') as f:
#     results_json = json.load(f)
# 
# for case in results_json:
#     print(f"\nCase {case['id']}:")
#     print(f"  Histology: {case['histology']['type']}")
#     print(f"  T: {case['aiTnm']['T']['value']}")
#     print(f"  N: {case['aiTnm']['N']['value']}")
#     print(f"  M: {case['aiTnm']['M']['value']}")
#     print(f"  Stage: {case['aiTnm']['Stage']}")

print("Results reading example shown above.")


Results reading example shown above.


### Calculating Accuracy (if ground truth available)

If your input data includes ground truth values (cT, cN, cM, cStage), you can calculate accuracy:


In [13]:
# Example: Calculate accuracy
# from tnm_workflow import calculate_accuracy
# 
# config = Config()
# workflow = TNMClassificationWorkflow(config)
# # ... process cases ...
# 
# accuracy, details = calculate_accuracy(workflow)
# if accuracy:
#     print("Classification Accuracy:")
#     print(f"  T Category: {accuracy['T']:.2f}% ({details['T']})")
#     print(f"  N Category: {accuracy['N']:.2f}% ({details['N']})")
#     print(f"  M Category: {accuracy['M']:.2f}% ({details['M']})")
#     print(f"  Stage: {accuracy['Stage']:.2f}% ({details['Stage']})")

print("Accuracy calculation example shown above.")


Accuracy calculation example shown above.


## 7. Example Usage {#examples}

### Complete Example: Single Case Processing

Here's a complete example of processing a single case:


In [14]:
"""
Complete example workflow:

1. Setup
   - Install dependencies (already done)
   - Configure environment variables (.env file)
   - Load configuration files

2. Initialize workflow
   from tnm_workflow import Config, TNMClassificationWorkflow
   config = Config(config_path='tnm_config.yaml', tnm_json_path='tnm_classification.json')
   workflow = TNMClassificationWorkflow(config)

3. Prepare input data
   input_data = {
       "case_number": 1,
       "hospital_id": "PAT001",
       "pathology": "Your pathology report text...",
       "chest_ct": "Your CT report text...",
       # ... other reports
   }

4. Run workflow
   initial_state = {"input": input_data, "iteration_count": 0}
   final_state = workflow.run(initial_state)

5. Access results
   classification = final_state['input']['final_classification']
   print(f"T: {classification['T']['classification']}")
   print(f"N: {classification['N']['classification']}")
   print(f"M: {classification['M']['classification']}")
   print(f"Stage: {classification['Stage']['classification']}")

6. Results are also saved to:
   - CSV: output/tnm_classification.csv
   - JSON: output/patient_data.json
"""

print("Complete workflow example documented above.")


Complete workflow example documented above.


### Workflow Architecture Details

The system uses a **state-based workflow** implemented with LangGraph:

```
Histology Classifier → T Classifier → N Classifier → M Classifier → Stage Classifier → Final Save
```

Each node:
1. Receives the current state
2. Processes the relevant medical reports
3. Uses LLM agents to classify based on AJCC criteria
4. Updates the state with classification results
5. Passes state to the next node

### Key Components

1. **Config Class**: Manages configuration and file paths
2. **TNMClassificationWorkflow**: Main workflow orchestrator
3. **HistologyClassificationWorkflow**: Handles histology classification
4. **Output Parsers**: Extract structured JSON from LLM responses
5. **Validation**: Ensures classifications follow AJCC criteria

### Classification Agents

Each agent (T, N, M, Stage) is a specialized LLM agent that:
- Receives relevant medical reports
- Applies selected AJCC edition criteria (8th or 9th)
- Provides detailed justifications
- Assigns confidence scores
- Suggests additional tests if needed

**Note**: The 9th edition includes updated N classification (N2a/N2b subcategories) and M classification (M1c1/M1c2 subcategories).


## 8. Troubleshooting {#troubleshooting}

### Common Issues and Solutions

#### Issue 1: Environment Variables Not Loaded

**Symptoms**: `KeyError` or `None` values for API credentials

**Solution**:
```python
# Verify .env file exists and contains:
# AZURE_OPENAI_API_KEY=your_key
# AZURE_OPENAI_ENDPOINT=your_endpoint (or AZURE_OPENAI_API_BASE)
# AZURE_OPENAI_API_VERSION=your_version

from dotenv import load_dotenv
load_dotenv()  # Make sure this is called before importing workflow

# The system can use either AZURE_OPENAI_ENDPOINT or AZURE_OPENAI_API_BASE
# If only ENDPOINT is set, it will be used for API_BASE automatically
```

#### Issue 2: Missing Input Files

**Symptoms**: `FileNotFoundError` for config or input files

**Solution**:
- Ensure the selected edition's configuration files exist:
  - For 8th edition: `config/ajcc8th/tnm_config.yaml` and `config/ajcc8th/tnm_classification.json`
  - For 9th edition: `config/ajcc9th/tnm_config.yaml` and `config/ajcc9th/tnm_classification.json`
- Check that input Excel file path in config is correct (default: `input/evaluation_2_samples.xlsx`)
- Verify file permissions
- Ensure `AJCC_EDITION` variable is set correctly ('8th' or '9th')
- Verify the project directory structure matches the expected layout

#### Issue 3: LLM API Errors

**Symptoms**: API timeout or rate limit errors

**Solution**:
- Check API key validity
- Verify endpoint URL is correct
- Check rate limits and quotas
- Consider adding retry logic (already implemented in workflow)

#### Issue 4: Classification Parsing Errors

**Symptoms**: JSON parsing errors in output

**Solution**:
- The workflow includes robust parsing with multiple fallback strategies
- Check logs for specific parsing issues
- Verify LLM is returning properly formatted JSON

#### Issue 5: Missing Medical Reports

**Symptoms**: Low confidence classifications or "Insufficient Data" errors

**Solution**:
- Ensure at least one relevant medical report is provided
- For T classification: Pathology or Chest CT is essential
- For N classification: EBUS or PET is recommended
- For M classification: Brain MR, PET, or Bone Scan is recommended

### Logging

The system provides comprehensive logging:

```python
# Logs are written to:
# - Console (INFO level)
# - File: output/tnm_classification.log (DEBUG level)

# To increase verbosity:
import logging
logging.getLogger('tnm_workflow').setLevel(logging.DEBUG)
```

### Performance Considerations

- **Processing Time**: Each case takes approximately 30-60 seconds depending on:
  - Number of medical reports
  - LLM response time
  - Network latency

- **Batch Processing**: For large datasets, process cases sequentially to avoid API rate limits

- **Error Handling**: The workflow includes retry logic and error recovery mechanisms


## 9. Additional Resources {#resources}

### AJCC Edition References

- **AJCC 8th Edition**: AJCC Cancer Staging Manual, 8th Edition
- **AJCC 9th Edition**: AJCC Cancer Staging Manual, 9th Edition (includes updated N and M classifications)
- **Lung Cancer Staging**: Chapter on Non-Small Cell Lung Cancer staging

**Key Differences Between 8th and 9th Editions:**
- **N Classification (9th edition)**: Introduces N2a (single mediastinal station) and N2b (multiple mediastinal stations) subcategories
- **M Classification (9th edition)**: M1c is subdivided into M1c1 (single organ system) and M1c2 (multiple organ systems)
- **Stage Grouping**: Updated stage grouping rules in 9th edition

### System Limitations

1. **Language Support**: Currently optimized for English and Korean medical reports
2. **Report Quality**: Classification accuracy depends on the quality and completeness of input reports
3. **Edge Cases**: Complex or ambiguous cases may require manual review
4. **LLM Dependency**: Requires Azure OpenAI API access

### Future Enhancements

- Additional histology classification options
- Enhanced error handling and validation
- Performance optimizations

### Contact and Support

For questions or issues:
- **GitHub Issues**: [Repository URL]
- **Documentation**: See README.md in the repository
- **Paper**: [Citation will be added upon publication]

---

## Conclusion

This notebook has provided an overview of the Modular Agent-Based TNM Staging System. The system provides:

- **Accurate Classification**: Based on selected AJCC edition criteria (8th or 9th)
- **Transparent Reasoning**: Detailed justifications for each classification
- **Reproducible Results**: State-based workflow ensures consistency
- **Comprehensive Reporting**: Both CSV and JSON output formats

For detailed implementation, please refer to the source code in `tnm_workflow.py` and the configuration files.

**Important**: This system is intended for research purposes. Clinical decisions should always involve qualified medical professionals.


---

## Appendix: Quick Reference

### Required Files

```
project_root/
├── config/
│   ├── tnm_config.yaml          # Main configuration file (shared for both editions)
│   ├── ajcc8th/                 # AJCC 8th edition TNM classification
│   │   └── tnm_classification.json
│   └── ajcc9th/                 # AJCC 9th edition TNM classification
│       └── tnm_classification.json
├── src/                         # Source code
│   ├── models/
│   ├── parsers/
│   ├── agents/
│   ├── workflow/
│   └── ...
├── input/                       # Input data files
│   └── sample.xlsx              # Input Excel/CSV file
├── .env                         # API credentials (not in repo)
├── sample.env                   # Template for .env file
└── output/                      # Generated automatically
    ├── tnm_classification.csv
    ├── patient_data.json
    └── tnm_classification.log
```

**Note**: 
- Select the edition using the `AJCC_EDITION` variable in the configuration section above
- Copy `sample.env` to `.env` and fill in your Azure OpenAI credentials
- Input file path can be set using `CONFIG_OBJ.set_input_file()` or specified in `config/tnm_config.yaml`

### Key Functions

- `AJCC_EDITION`: Select edition ('8th' or '9th') - set in configuration cell (Cell 7)
- `Config(config_path, tnm_json_path)`: Load configuration and setup paths
- `TNMClassificationWorkflow(config)`: Initialize workflow
- `workflow.run(initial_state)`: Execute classification for a single case
- `process_excel_data(config)`: Process Excel/CSV file and return list of cases
- `calculate_metrics(workflow)`: Calculate accuracy metrics (returns accuracy_metrics, confusion_metrics, detailed_results)

**Edition Selection**: Change `AJCC_EDITION` variable in Cell 7 to switch between 8th and 9th editions.

### Output Fields

**CSV Output Columns**:
- `pid`: Patient ID
- `hospital_number`: Hospital identifier
- `histology_category`, `histology_subcategory`, `histology_type`
- `histology_confidence`, `histology_reason`
- `T_classification`, `N_classification`, `M_classification`, `Stage_classification`
- `T_reasoning`, `N_reasoning`, `M_reasoning`, `Stage_reasoning`
- `true_T`, `true_N`, `true_M`, `true_Stage` (if provided in input)

**JSON Output Structure**:
- Patient metadata (case_number, hospital_id)
- Histology classification with reasoning and confidence
- TNM classifications with detailed reasoning
- Original medical reports
- Ground truth values (if provided)
