# Physician Notetaker - Medical Transcription & NLP Analysis

This notebook demonstrates the complete medical transcription analysis system including:
1. Medical NLP Summarization (NER, Text Summarization, Keyword Extraction)
2. Sentiment & Intent Analysis
3. SOAP Note Generation (Bonus)

## Setup


In [None]:
import sys
import os
import json
from pathlib import Path

# Add src directory to path
project_root = Path().resolve().parent
sys.path.insert(0, str(project_root / 'src'))

# Load environment variables
from dotenv import load_dotenv
load_dotenv()

# Import modules
from pipeline import PhysicianNotetakerPipeline

print("Imports successful!")
print(f"Project root: {project_root}")


## Load Sample Transcript


In [None]:
# Load sample transcript
transcript_path = project_root / 'tests' / 'test_sample_transcript.txt'

with open(transcript_path, 'r', encoding='utf-8') as f:
    sample_transcript = f.read()

print("Sample Transcript:")
print("=" * 60)
print(sample_transcript[:500] + "...")
print("=" * 60)
print(f"\nTotal length: {len(sample_transcript)} characters")


## Initialize Pipeline


In [None]:
# Initialize the pipeline
# API key is read from GEMINI_API_KEY environment variable
pipeline = PhysicianNotetakerPipeline()

print("Pipeline initialized successfully!")


## 1. Medical NLP Summarization

### 1.1 Named Entity Recognition (NER)


In [None]:
from medical_ner import MedicalNER

# Initialize NER module
ner = MedicalNER(gemini_client=pipeline.client)

# Extract medical entities
print("Extracting medical entities...")
entities = ner.extract_entities(sample_transcript)

print("\n" + "=" * 60)
print("MEDICAL ENTITIES EXTRACTED")
print("=" * 60)
print(json.dumps(entities, indent=2))


### 1.2 Keyword Extraction


In [None]:
# Extract important medical keywords
print("Extracting medical keywords...")
keywords = ner.extract_keywords(sample_transcript, top_n=15)

print("\n" + "=" * 60)
print("MEDICAL KEYWORDS")
print("=" * 60)
for i, keyword in enumerate(keywords, 1):
    print(f"{i}. {keyword}")


### 1.3 Structured Summary


In [None]:
# Get complete structured summary
print("Generating structured medical summary...")
structured_summary = ner.extract_structured_summary(sample_transcript)

print("\n" + "=" * 60)
print("STRUCTURED MEDICAL SUMMARY")
print("=" * 60)
print(json.dumps(structured_summary, indent=2))


### 1.4 Text Summarization


In [None]:
from summarization import MedicalSummarizer

# Initialize summarizer
summarizer = MedicalSummarizer(gemini_client=pipeline.client)

# Generate comprehensive summary
print("Generating medical text summary...")
medical_summary = summarizer.summarize(sample_transcript)

print("\n" + "=" * 60)
print("MEDICAL TEXT SUMMARY")
print("=" * 60)
print(json.dumps(medical_summary, indent=2))


## 2. Sentiment & Intent Analysis


In [None]:
from sentiment_analysis import SentimentAnalyzer

# Initialize sentiment analyzer
sentiment_analyzer = SentimentAnalyzer(gemini_client=pipeline.client)

# Analyze full transcript
print("Analyzing patient sentiment and intent...")
sentiment_analysis = sentiment_analyzer.analyze_full_transcript(sample_transcript)

print("\n" + "=" * 60)
print("SENTIMENT & INTENT ANALYSIS")
print("=" * 60)
print(json.dumps(sentiment_analysis, indent=2))


### Example: Analyzing Individual Patient Statements


In [None]:
# Example patient statements
example_statements = [
    "I'm doing better, but I still have some discomfort now and then.",
    "That's a relief!",
    "That's great to hear. So, I don't need to worry about this affecting me in the future?"
]

print("Analyzing individual patient statements:")
print("=" * 60)

for i, statement in enumerate(example_statements, 1):
    result = sentiment_analyzer.analyze_sentiment(statement)
    print(f"\nStatement {i}: {statement}")
    print(f"  Sentiment: {result['Sentiment']}")
    print(f"  Intent: {result['Intent']}")


## 3. SOAP Note Generation (Bonus)


In [None]:
from soap_generator import SOAPGenerator

# Initialize SOAP generator
soap_generator = SOAPGenerator(gemini_client=pipeline.client)

# Generate SOAP note
print("Generating SOAP note...")
soap_note = soap_generator.generate_soap_note(sample_transcript)

print("\n" + "=" * 60)
print("SOAP NOTE (JSON Format)")
print("=" * 60)
print(json.dumps(soap_note, indent=2))


### SOAP Note in Text Format


In [None]:
# Display SOAP note in readable text format
soap_text = soap_generator.format_soap_note(soap_note, format_type="text")
print(soap_text)


## Complete Pipeline Processing

Process the entire transcript through all modules at once:


In [None]:
# Process complete transcript
print("Processing complete transcript through pipeline...")
print("This may take a few moments...\n")

results = pipeline.process_transcript(sample_transcript, include_soap=True)

print("=" * 60)
print("COMPLETE ANALYSIS RESULTS")
print("=" * 60)
print(json.dumps(results, indent=2))


### Export Results in Text Format


In [None]:
# Export results in human-readable text format
text_output = pipeline.export_results(results, format_type="text")
print(text_output)


## Summary

This notebook demonstrates:
- Medical NER extraction (Symptoms, Diagnosis, Treatment, Prognosis)
- Text summarization with structured output
- Keyword extraction from medical transcripts
- Sentiment analysis (Anxious/Neutral/Reassured)
- Intent detection (Seeking reassurance, Reporting symptoms, etc.)
- SOAP note generation (Subjective, Objective, Assessment, Plan)

All modules use Google's Gemini 2.5 Flash API for accurate medical NLP processing.
