# Physician Notetaker - Medical NLP Pipeline

This notebook demonstrates the complete NLP pipeline for analyzing physician-patient conversations.

## Features:
1. **Named Entity Recognition (NER)** - Extract symptoms, diagnosis, treatment, prognosis
2. **Text Summarization** - Generate structured medical reports
3. **Keyword Extraction** - Identify important medical phrases
4. **Sentiment Analysis** - Classify patient sentiment (Anxious/Neutral/Reassured)
5. **Intent Detection** - Identify patient intent
6. **SOAP Note Generation** - Create clinical documentation

In [None]:
# Import required libraries
import sys
import json
from pathlib import Path

# Add project root to path
sys.path.insert(0, str(Path('.').absolute()))

# Import our NLP modules
from src.ner_extractor import MedicalNERExtractor
from src.summarizer import MedicalSummarizer
from src.keyword_extractor import MedicalKeywordExtractor
from src.sentiment_analyzer import SentimentAnalyzer
from src.intent_detector import IntentDetector
from src.soap_generator import SOAPNoteGenerator

print("✓ All modules imported successfully!")

## Sample Transcript

Let's load our sample physician-patient conversation:

In [None]:
# Full sample transcript
SAMPLE_TRANSCRIPT = """
Physician: Good morning, Ms. Jones. How are you feeling today?

Patient: Good morning, doctor. I'm doing better, but I still have some discomfort now and then.

Physician: I understand you were in a car accident last September. Can you walk me through what happened?

Patient: Yes, it was on September 1st, around 12:30 in the afternoon. I was driving from Cheadle Hulme to Manchester when I had to stop in traffic. Out of nowhere, another car hit me from behind, which pushed my car into the one in front.

Physician: That sounds like a strong impact. Were you wearing your seatbelt?

Patient: Yes, I always do.

Physician: What did you feel immediately after the accident?

Patient: At first, I was just shocked. But then I realized I had hit my head on the steering wheel, and I could feel pain in my neck and back almost right away.

Physician: Did you seek medical attention at that time?

Patient: Yes, I went to Moss Bank Accident and Emergency. They checked me over and said it was a whiplash injury, but they didn't do any X-rays. They just gave me some advice and sent me home.

Physician: How did things progress after that?

Patient: The first four weeks were rough. My neck and back pain were really bad—I had trouble sleeping and had to take painkillers regularly. It started improving after that, but I had to go through ten sessions of physiotherapy to help with the stiffness and discomfort.

Physician: That makes sense. Are you still experiencing pain now?

Patient: It's not constant, but I do get occasional backaches. It's nothing like before, though.

Physician: That's good to hear. Have you noticed any other effects, like anxiety while driving or difficulty concentrating?

Patient: No, nothing like that. I don't feel nervous driving, and I haven't had any emotional issues from the accident.

Physician: And how has this impacted your daily life? Work, hobbies, anything like that?

Patient: I had to take a week off work, but after that, I was back to my usual routine. It hasn't really stopped me from doing anything.

Physician: That's encouraging. Let's go ahead and do a physical examination to check your mobility and any lingering pain.

[Physical Examination Conducted]

Physician: Everything looks good. Your neck and back have a full range of movement, and there's no tenderness or signs of lasting damage. Your muscles and spine seem to be in good condition.

Patient: That's a relief!

Physician: Yes, your recovery so far has been quite positive. Given your progress, I'd expect you to make a full recovery within six months of the accident. There are no signs of long-term damage or degeneration.

Patient: That's great to hear. So, I don't need to worry about this affecting me in the future?

Physician: That's right. I don't foresee any long-term impact on your work or daily life. If anything changes or you experience worsening symptoms, you can always come back for a follow-up. But at this point, you're on track for a full recovery.

Patient: Thank you, doctor. I appreciate it.

Physician: You're very welcome, Ms. Jones. Take care, and don't hesitate to reach out if you need anything.
"""

print("Sample transcript loaded!")
print(f"Length: {len(SAMPLE_TRANSCRIPT)} characters")

---
# 1. Named Entity Recognition (NER)

Extract medical entities: Symptoms, Diagnosis, Treatment, Prognosis

In [None]:
# Initialize NER extractor
ner_extractor = MedicalNERExtractor(use_spacy=False)

# Extract entities
entities = ner_extractor.extract_entities(SAMPLE_TRANSCRIPT)

print("=" * 50)
print("EXTRACTED MEDICAL ENTITIES")
print("=" * 50)
print(json.dumps(entities.to_dict(), indent=2))

---
# 2. Text Summarization

Generate a structured medical report from the transcript.

In [None]:
# Initialize summarizer
summarizer = MedicalSummarizer(use_transformers=False)

# Generate structured summary
summary = summarizer.summarize(SAMPLE_TRANSCRIPT, entities.to_dict())

print("=" * 50)
print("STRUCTURED MEDICAL SUMMARY")
print("=" * 50)
print(json.dumps(summary.to_dict(), indent=2))

---
# 3. Keyword Extraction

Identify important medical phrases from the transcript.

In [None]:
# Initialize keyword extractor
keyword_extractor = MedicalKeywordExtractor(use_yake=False)

# Extract keywords
keywords = keyword_extractor.extract_keywords(SAMPLE_TRANSCRIPT)

print("=" * 50)
print("EXTRACTED KEYWORDS")
print("=" * 50)
print(f"\nMedical Phrases Found:")
for phrase in keywords['medical_phrases']:
    print(f"  • {phrase}")

print(f"\nTop Keywords:")
for kw in keywords['keywords'][:10]:
    print(f"  • {kw}")

---
# 4. Sentiment Analysis

Classify patient sentiment as Anxious, Neutral, or Reassured.

In [None]:
# Initialize sentiment analyzer
sentiment_analyzer = SentimentAnalyzer(use_transformers=False)

# Analyze overall sentiment
overall_sentiment = sentiment_analyzer.get_overall_sentiment(SAMPLE_TRANSCRIPT)

print("=" * 50)
print("OVERALL SENTIMENT ANALYSIS")
print("=" * 50)
print(json.dumps(overall_sentiment.to_dict(), indent=2))

In [None]:
# Analyze individual patient statements
print("\nPER-STATEMENT SENTIMENT ANALYSIS")
print("=" * 50)

statement_sentiments = sentiment_analyzer.analyze_transcript(SAMPLE_TRANSCRIPT)

for statement, result in list(statement_sentiments.items())[:5]:
    print(f"\n\"{statement}\"")
    print(f"  → Sentiment: {result.sentiment} (confidence: {result.confidence:.2f})")
    print(f"  → Indicators: {result.indicators}")

### Sample Input Analysis

Analyze the specific example from the assignment:

In [None]:
# Sample input from assignment
sample_input = "I'm a bit worried about my back pain, but I hope it gets better soon."

print(f"Input: \"{sample_input}\"")
print()

# Analyze sentiment
result = sentiment_analyzer.analyze(sample_input)

print("Output:")
print(json.dumps({
    "Sentiment": result.sentiment,
    "Confidence": round(result.confidence, 2),
    "Indicators": result.indicators
}, indent=2))

---
# 5. Intent Detection

Identify patient intent: Seeking reassurance, Reporting symptoms, Expressing concern, etc.

In [None]:
# Initialize intent detector
intent_detector = IntentDetector()

# Analyze the sample input
sample_input = "I'm a bit worried about my back pain, but I hope it gets better soon."

print(f"Input: \"{sample_input}\"")
print()

# Detect intent
intent_result = intent_detector.detect(sample_input)

print("Output:")
print(json.dumps({
    "Intent": intent_result.primary_intent,
    "Secondary_Intents": intent_result.secondary_intents,
    "Confidence": round(intent_result.confidence, 2)
}, indent=2))

In [None]:
# Combined Sentiment + Intent Analysis (as per assignment requirements)
print("=" * 50)
print("COMBINED SENTIMENT & INTENT ANALYSIS")
print("=" * 50)

sample_statements = [
    "I'm a bit worried about my back pain, but I hope it gets better soon.",
    "So, I don't need to worry about this affecting me in the future?",
    "That's a relief!",
    "My neck and back pain were really bad—I had trouble sleeping.",
    "Thank you, doctor. I appreciate it."
]

for statement in sample_statements:
    sentiment = sentiment_analyzer.analyze(statement)
    intent = intent_detector.detect(statement)
    
    print(f"\nInput: \"{statement}\"")
    print(json.dumps({
        "Sentiment": sentiment.sentiment,
        "Intent": intent.primary_intent
    }, indent=2))

---
# 6. SOAP Note Generation (Bonus)

Generate a structured SOAP note from the transcript:
- **S**ubjective: Patient-reported information
- **O**bjective: Physical examination findings
- **A**ssessment: Diagnosis and severity
- **P**lan: Treatment and follow-up

In [None]:
# Initialize SOAP generator
soap_generator = SOAPNoteGenerator()

# Generate SOAP note
soap_note = soap_generator.generate(SAMPLE_TRANSCRIPT)

print("=" * 50)
print("GENERATED SOAP NOTE")
print("=" * 50)
print(json.dumps(soap_note.to_dict(), indent=2))

In [None]:
# Formatted SOAP note
print("\nFORMATTED SOAP NOTE")
print(soap_generator.generate_formatted(SAMPLE_TRANSCRIPT))

---
# Complete Pipeline Demo

Run the complete analysis pipeline on a shorter sample transcript:

In [None]:
# Sample input from assignment
SHORT_TRANSCRIPT = """
Doctor: How are you feeling today?
Patient: I had a car accident. My neck and back hurt a lot for four weeks.
Doctor: Did you receive treatment?
Patient: Yes, I had ten physiotherapy sessions, and now I only have occasional back pain.
"""

print("INPUT TRANSCRIPT:")
print(SHORT_TRANSCRIPT)
print("\n" + "=" * 60)
print("ANALYSIS RESULTS")
print("=" * 60)

# Run all analyses
entities = ner_extractor.extract_entities(SHORT_TRANSCRIPT)
keywords = keyword_extractor.extract_keywords(SHORT_TRANSCRIPT)
sentiment = sentiment_analyzer.get_overall_sentiment(SHORT_TRANSCRIPT)
soap = soap_generator.generate(SHORT_TRANSCRIPT)

print("\n1. STRUCTURED SUMMARY:")
print(json.dumps(entities.to_dict(), indent=2))

print("\n2. KEYWORDS:")
print(json.dumps(keywords['medical_phrases'], indent=2))

print("\n3. SENTIMENT:")
print(json.dumps(sentiment.to_dict(), indent=2))

print("\n4. SOAP NOTE:")
print(json.dumps(soap.to_dict(), indent=2))

---
# Theory Questions & Answers

### Q1: How would you handle ambiguous or missing medical data in the transcript?

**Answer:**
1. **Confidence scores** - Flag uncertain extractions with low confidence
2. **Default values** - Use "Unknown" or "Not mentioned" for missing fields
3. **Contextual inference** - Use surrounding text to disambiguate
4. **Fallback models** - Use general NLP when medical terms aren't recognized

### Q2: What pre-trained NLP models would you use for medical summarization?

**Answer:**
- **SciSpacy** (en_core_sci_lg, en_core_sci_scibert) - Medical NER
- **BioBERT** - Biomedical domain pre-training
- **ClinicalBERT** - Clinical notes pre-training
- **PubMedBERT** - PubMed corpus pre-training
- **BART/T5** fine-tuned on clinical datasets

### Q3: How would you fine-tune BERT for medical sentiment detection?

**Answer:**
1. Collect medical conversation datasets with sentiment labels
2. Add classification head on BERT pooled output
3. Use domain-adaptive pre-training on medical text
4. Apply focal loss for class imbalance
5. Data augmentation (back-translation, synonym replacement)

### Q4: What datasets for healthcare sentiment models?

**Answer:**
- **MIMIC-III** - Clinical notes with sentiment indicators
- **i2b2 datasets** - Annotated clinical text
- **MedQuAD** - Medical Q&A pairs
- **HealthCareMagic** - Doctor-patient conversations

### Q5: How to train for SOAP format mapping?

**Answer:**
1. Create training data with manual SOAP annotations
2. Use sequence labeling for section boundaries
3. Train section classifier for sentences
4. Slot-filling for each section
5. Fine-tune encoder-decoder for generation

### Q6: Rule-based vs deep-learning for SOAP notes?

**Answer:**
- **Rule-based**: Speaker identification, keyword triggers, templates
- **Deep-learning**: BERT for classification, T5 for generation
- **Best approach**: Hybrid - rules for structure, ML for content

---
## Summary

This notebook demonstrated a complete NLP pipeline for medical transcript analysis:

| Component | Method | Output |
|-----------|--------|--------|
| NER | Pattern matching + optional spaCy | Symptoms, Diagnosis, Treatment, Prognosis |
| Summarization | Template-based + optional transformers | Structured JSON report |
| Keywords | TF-IDF + medical vocabulary | Medical phrases list |
| Sentiment | Lexicon + optional DistilBERT | Anxious/Neutral/Reassured |
| Intent | Pattern + keyword matching | Patient intent categories |
| SOAP Notes | Speaker detection + templating | Clinical documentation |

All components work in rule-based mode by default, with optional transformer/spaCy integration for enhanced accuracy.