# Medical NLP Pipeline - Data Exploration

This notebook demonstrates the complete medical NLP pipeline.

## ⚠️ Note to Reviewers

This notebook demonstrates the full Physician Notetaker pipeline.

- The **recommended review path** is to inspect the **pre-generated outputs** stored in:
  `data/output/`

- Live LLM-based execution (Groq) is **optional** and requires an API key.
  If no API key is provided, the pipeline falls back to a local-only mode
  which may produce less polished summaries.

- To ensure reproducibility and ease of review, the final outputs generated
  using the LLM-assisted pipeline are included as JSON artifacts.


## 1. Setup

In [None]:
import sys
import os
import json
import numpy as np
from dotenv import load_dotenv

# Add parent directory to path
sys.path.append('..')

# Load environment variables from project root
load_dotenv(dotenv_path=os.path.join('..', '.env'))
load_dotenv()  # Also try current directory

# Import pipeline
from src.pipeline import MedicalNLPPipeline


def convert_to_json_serializable(obj):
    """
    Convert NumPy types and other non-serializable types to native Python types.
    
    Args:
        obj: Object to convert (can be dict, list, NumPy type, etc.)
        
    Returns:
        JSON-serializable version of the object
    """
    if isinstance(obj, (np.integer, np.floating)):
        return float(obj) if isinstance(obj, np.floating) else int(obj)
    elif isinstance(obj, np.ndarray):
        return obj.tolist()
    elif isinstance(obj, dict):
        return {key: convert_to_json_serializable(value) for key, value in obj.items()}
    elif isinstance(obj, (list, tuple)):
        return [convert_to_json_serializable(item) for item in obj]
    return obj

## 2. Sample Transcript

In [None]:
sample_transcript = """
Doctor: Good morning, Mr. Carter. How are you feeling today?

Patient: Good morning, doctor. I'm feeling better than before, but I still get some discomfort every now and then.

Doctor: I understand you slipped and fell at work last October. Can you walk me through what happened?

Patient: Yes, it was on October 14th, around 9:00 in the morning. I was working in the warehouse in Stockport, and the floor had a wet patch I didn’t notice. My foot slipped, and I fell sideways onto my right shoulder.

Doctor: That sounds like a painful fall. Were you able to break the fall with your hands?

Patient: Not really. It happened too fast.

Doctor: What did you feel immediately after the incident?

Patient: At first, I was just startled. Then I felt a sharp pain in my right shoulder and upper arm. I couldn’t lift my arm properly right after the fall.

Doctor: Did you seek medical attention at that time?

Patient: Yes, I went to Stepping Hill Hospital’s Urgent Care. They examined me. They did an X-ray to rule out fractures, which came back clear. They gave me a sling and some painkillers.

Doctor: How did things progress after that?

Patient: The first few weeks were difficult. I had limited movement, and the pain made simple tasks hard. I started physiotherapy about two weeks after the injury and had eight sessions, which really helped with mobility and strength.

Doctor: That makes sense. Are you still experiencing pain now?

Patient: It’s not constant, but I do get mild aches if I lift something heavy or sleep on my right side. Definitely not as bad as before.

Doctor: That’s good to hear. Have you noticed any other effects, like anxiety about returning to work or difficulty focusing?

Patient: No, nothing like that. I feel normal otherwise, and I’m not nervous about being in the warehouse again.

Doctor: And how has this impacted your daily life? Work, hobbies, anything like that?

Patient: I took about ten days off work, but after that, I was able to go back gradually. I avoided lifting for a bit, but overall it hasn’t stopped me from doing my usual activities.

Doctor: That’s encouraging. Let’s go ahead and do a physical examination to check your mobility and any lingering pain.

Doctor: Everything looks good. Your shoulder has a full range of movement, and there’s no tenderness or signs of ongoing injury. Your strength and joint stability appear normal.

Patient: That’s a relief!

Doctor: Yes, your recovery so far has been very positive. Given your progress, I’d expect you to make a full recovery within twenty-three months of the injury. There are no signs of long-term damage or instability.

Patient: That’s great to hear. So, I don’t need to worry about this affecting me in the future?

Doctor: That’s right. I don’t anticipate any long-term impact on your work or daily life. If anything changes or you start experiencing increased pain, you can always come back for a follow-up. But at this point, you’re on track for a complete recovery.

Patient: Thank you, doctor. I appreciate it.

Doctor: You’re very welcome, Mr. Carter. Take care, and don’t hesitate to reach out if you need anything.





"""

print("Sample Transcript:")
print(sample_transcript)

## 3. Initialize Pipeline

In [None]:
# Initialize with LLM (if API key available)
pipeline = MedicalNLPPipeline(config_path='../config/config.yaml', use_llm=True)

# Or without LLM (local models only)
# pipeline = MedicalNLPPipeline(config_path='../config/config.yaml', use_llm=False)

## 4. Process Transcript

In [None]:
results = pipeline.process(sample_transcript)

## 5. View Results (Pre-generated + Live)

#### 5.1 Load pre-generated output (No API key required)

In [None]:
# ============================================================
# RECOMMENDED: Load pre-generated output (no API keys required)
# ============================================================

import json

# Pick one representative output file
saved_output_path = "../data/output/analysis_20251212_055503.json"

with open(saved_output_path, "r") as f:
    saved_results = json.load(f)

print("\n" + "="*80)
print("PRE-GENERATED MEDICAL SUMMARY (LLM-ASSISTED)")
print("="*80)
print(json.dumps(saved_results.get("medical_summary", {}), indent=2))

print("\n" + "="*80)
print("PRE-GENERATED SENTIMENT & INTENT")
print("="*80)
print(json.dumps(saved_results.get("sentiment_analysis", {}), indent=2))

print("\n" + "="*80)
print("PRE-GENERATED SOAP NOTE")
print("="*80)
print(json.dumps(saved_results.get("soap_note", {}), indent=2))


#### 5.2 Live Execution (Optional)

The following cells demonstrate live execution of the pipeline.
If no LLM API key is configured, results may differ from the
pre-generated outputs shown above.


In [None]:
print("\n" + "="*80)
print("MEDICAL SUMMARY")
print("="*80)

# Convert NumPy types to native Python types for JSON serialization
medical_summary_serializable = convert_to_json_serializable(results['medical_summary'])
print(json.dumps(medical_summary_serializable, indent=2))

In [None]:
print("\n" + "="*80)
print("SENTIMENT & INTENT ANALYSIS")
print("="*80)

# Convert NumPy types to native Python types for JSON serialization
sentiment_serializable = convert_to_json_serializable(results['sentiment_analysis'])
print(json.dumps(sentiment_serializable, indent=2))

In [None]:
print("\n" + "="*80)
print("SOAP NOTE")
print("="*80)

# Convert NumPy types to native Python types for JSON serialization
soap_note_serializable = convert_to_json_serializable(results['soap_note'])
print(json.dumps(soap_note_serializable, indent=2))

## 6. Process Full Transcript from File

In [None]:
# Process the full transcript
output_path = pipeline.process_file(
    '../data/input/sample_transcript.txt',
    '../data/output'
)

print(f"Results saved to: {output_path}")

In [None]:
# Load and view full results
with open(output_path, 'r') as f:
    full_results = json.load(f)

print(json.dumps(full_results, indent=2))

## 7. Explore Individual Components

In [None]:
# Text preprocessing
from src.preprocessing.text_processor import TextProcessor

processor = TextProcessor(pipeline.config)
processed = processor.process(sample_transcript)

print("Doctor utterances:", len(processed['doctor_utterances']))
print("Patient utterances:", len(processed['patient_utterances']))

In [None]:
# Entity extraction
from src.medical_nlp.entity_extractor import EntityExtractor

extractor = EntityExtractor(pipeline.config)
entities = extractor.extract_entities(sample_transcript)

print("Symptoms:", entities['symptoms'])
print("Treatments:", entities['treatments'])
print("Body parts:", entities['body_parts'])