# Physician Notetaker - Medical Transcription & NLP Analysis

This notebook demonstrates the complete medical transcription analysis system including:
1. Medical NLP Summarization (NER, Text Summarization, Keyword Extraction)
2. Sentiment & Intent Analysis
3. SOAP Note Generation (Bonus)

## Setup


In [1]:
import sys
import os
import json
from pathlib import Path

# Add project root to path (not src, so we can import as package)
project_root = Path().resolve().parent
sys.path.insert(0, str(project_root))

# Load environment variables
from dotenv import load_dotenv
load_dotenv()

# Import modules from src package
from src.pipeline import PhysicianNotetakerPipeline

print("Imports successful!")
print(f"Project root: {project_root}")


Imports successful!
Project root: /home/imtiyaz/Desktop/PhysicianNotetaker


## Load Sample Transcript


In [2]:
# Load sample transcript
transcript_path = project_root / 'tests' / 'test_sample_transcript.txt'

with open(transcript_path, 'r', encoding='utf-8') as f:
    sample_transcript = f.read()

print("Sample Transcript:")
print("=" * 60)
print(sample_transcript[:500] + "...")
print("=" * 60)
print(f"\nTotal length: {len(sample_transcript)} characters")


Sample Transcript:
Physician: Good morning, Ms. Jones. How are you feeling today?

Patient: Good morning, doctor. I'm doing better, but I still have some discomfort now and then.

Physician: I understand you were in a car accident last September. Can you walk me through what happened?

Patient: Yes, it was on September 1st, around 12:30 in the afternoon. I was driving from Cheadle Hulme to Manchester when I had to stop in traffic. Out of nowhere, another car hit me from behind, which pushed my car into the one in ...

Total length: 3139 characters


## Initialize Pipeline


In [3]:
# Initialize the pipeline
# API key is read from GEMINI_API_KEY environment variable
pipeline = PhysicianNotetakerPipeline()

print("Pipeline initialized successfully!")


Pipeline initialized successfully!


## 1. Medical NLP Summarization

### 1.1 Named Entity Recognition (NER)


In [4]:
from src.medical_ner import MedicalNER

# Initialize NER module
ner = MedicalNER(gemini_client=pipeline.client)

# Extract medical entities
print("Extracting medical entities...")
entities = ner.extract_entities(sample_transcript)

print("\n" + "=" * 60)
print("MEDICAL ENTITIES EXTRACTED")
print("=" * 60)
print(json.dumps(entities, indent=2))


Extracting medical entities...

MEDICAL ENTITIES EXTRACTED
{
  "Patient_Name": "Ms. Jones",
  "Symptoms": [
    "discomfort",
    "pain in neck",
    "pain in back",
    "trouble sleeping",
    "stiffness",
    "occasional backaches"
  ],
  "Diagnosis": "Whiplash injury",
  "Treatment": [
    "medical advice",
    "painkillers",
    "physiotherapy"
  ],
  "Current_Status": "Patient is doing better with occasional discomfort and backaches. Neck and back have full range of movement, no tenderness or signs of lasting damage. Muscles and spine are in good condition. No anxiety while driving, difficulty concentrating, or emotional issues. Back to usual routine with no significant impact on daily life or work.",
  "Prognosis": "Expected to make a full recovery within six months of the accident. No signs of long-term damage or degeneration, and no foreseen long-term impact on work or daily life."
}


### 1.2 Keyword Extraction


In [5]:
# Extract important medical keywords
print("Extracting medical keywords...")
keywords = ner.extract_keywords(sample_transcript, top_n=15)

print("\n" + "=" * 60)
print("MEDICAL KEYWORDS")
print("=" * 60)
for i, keyword in enumerate(keywords, 1):
    print(f"{i}. {keyword}")


Extracting medical keywords...

MEDICAL KEYWORDS
1. Physiotherapy
2. Back pain
3. injury
4. discomfort
5. Long-term damage
6. pain
7. mobility
8. movement
9. Stiffness
10. Whiplash injury
11. follow-up
12. Neck pain
13. physiotherapy
14. Full recovery
15. examination


### 1.3 Structured Summary


In [6]:
# Get complete structured summary
print("Generating structured medical summary...")
structured_summary = ner.extract_structured_summary(sample_transcript)

print("\n" + "=" * 60)
print("STRUCTURED MEDICAL SUMMARY")
print("=" * 60)
print(json.dumps(structured_summary, indent=2))


Generating structured medical summary...

STRUCTURED MEDICAL SUMMARY
{
  "Patient_Name": "Ms. Jones",
  "Symptoms": [
    "discomfort",
    "pain in neck",
    "pain in back",
    "trouble sleeping",
    "stiffness",
    "occasional backaches"
  ],
  "Diagnosis": "whiplash injury",
  "Treatment": [
    "painkillers",
    "physiotherapy",
    "physical examination",
    "advice"
  ],
  "Current_Status": "Patient is doing better, experiencing occasional backaches and some discomfort, but not constantly. She reports no anxiety while driving, difficulty concentrating, or emotional issues. Her neck and back have a full range of movement, with no tenderness or signs of lasting damage. Muscles and spine are in good condition, and she is back to her usual routine.",
  "Prognosis": "Expected full recovery within six months of the accident, with no signs of long-term damage, degeneration, or impact on work or daily life.",
  "Keywords": [
    "injury",
    "discomfort",
    "pain",
    "mobility

### 1.4 Text Summarization


In [7]:
from src.summarization import MedicalSummarizer

# Initialize summarizer
summarizer = MedicalSummarizer(gemini_client=pipeline.client)

# Generate comprehensive summary
print("Generating medical text summary...")
medical_summary = summarizer.summarize(sample_transcript)

print("\n" + "=" * 60)
print("MEDICAL TEXT SUMMARY")
print("=" * 60)
print(json.dumps(medical_summary, indent=2))


Generating medical text summary...

MEDICAL TEXT SUMMARY
{
  "Patient_Demographics": {
    "Name": "Ms. Jones",
    "Age": null,
    "Gender": "Female"
  },
  "Chief_Complaint": "Persistent, occasional discomfort in back following a car accident.",
  "History_of_Present_Illness": "Ms. Jones was involved in a car accident on September 1st, around 12:30 PM, while driving from Cheadle Hulme to Manchester. She had stopped in traffic when her car was rear-ended, pushing it into the vehicle in front. She was wearing her seatbelt. Immediately after the impact, she experienced shock, hit her head on the steering wheel, and felt pain in her neck and back.",
  "Symptoms": {
    "Primary": [
      "neck pain",
      "back pain",
      "discomfort",
      "stiffness"
    ],
    "Secondary": [
      "trouble sleeping",
      "occasional backaches"
    ],
    "Timeline": "Immediately post-accident: shock, head impact, acute neck and back pain. First four weeks: severe neck and back pain, trouble sle

## 2. Sentiment & Intent Analysis


In [None]:
from src.sentiment_analysis import SentimentAnalyzer

# Initialize sentiment analyzer
sentiment_analyzer = SentimentAnalyzer(gemini_client=pipeline.client)

# Analyze full transcript
print("Analyzing patient sentiment and intent...")
sentiment_analysis = sentiment_analyzer.analyze_full_transcript(sample_transcript)

print("\n" + "=" * 60)
print("SENTIMENT & INTENT ANALYSIS")
print("=" * 60)
print(json.dumps(sentiment_analysis, indent=2))


Analyzing patient sentiment and intent...


### Example: Analyzing Individual Patient Statements


In [None]:
# Example patient statements
example_statements = [
    "I'm doing better, but I still have some discomfort now and then.",
    "That's a relief!",
    "That's great to hear. So, I don't need to worry about this affecting me in the future?"
]

print("Analyzing individual patient statements:")
print("=" * 60)

for i, statement in enumerate(example_statements, 1):
    result = sentiment_analyzer.analyze_sentiment(statement)
    print(f"\nStatement {i}: {statement}")
    print(f"  Sentiment: {result['Sentiment']}")
    print(f"  Intent: {result['Intent']}")


## 3. SOAP Note Generation (Bonus)


In [None]:
from src.soap_generator import SOAPGenerator

# Initialize SOAP generator
soap_generator = SOAPGenerator(gemini_client=pipeline.client)

# Generate SOAP note
print("Generating SOAP note...")
soap_note = soap_generator.generate_soap_note(sample_transcript)

print("\n" + "=" * 60)
print("SOAP NOTE (JSON Format)")
print("=" * 60)
print(json.dumps(soap_note, indent=2))


### SOAP Note in Text Format


In [None]:
# Display SOAP note in readable text format
soap_text = soap_generator.format_soap_note(soap_note, format_type="text")
print(soap_text)


## Complete Pipeline Processing

Process the entire transcript through all modules at once:


In [None]:
# Process complete transcript
print("Processing complete transcript through pipeline...")
print("This may take a few moments...\n")

results = pipeline.process_transcript(sample_transcript, include_soap=True)

print("=" * 60)
print("COMPLETE ANALYSIS RESULTS")
print("=" * 60)
print(json.dumps(results, indent=2))


### Export Results in Text Format


In [None]:
# Export results in human-readable text format
text_output = pipeline.export_results(results, format_type="text")
print(text_output)


## Summary

This notebook demonstrates:
- Medical NER extraction (Symptoms, Diagnosis, Treatment, Prognosis)
- Text summarization with structured output
- Keyword extraction from medical transcripts
- Sentiment analysis (Anxious/Neutral/Reassured)
- Intent detection (Seeking reassurance, Reporting symptoms, etc.)
- SOAP note generation (Subjective, Objective, Assessment, Plan)

All modules use Google's Gemini 2.5 Flash API for accurate medical NLP processing.
