# **Physician Notetaker: Medical NLP Pipeline**



An AI-powered system for medical transcription, summarization, sentiment analysis, and SOAP note generation. Processes physician-patient conversations to extract key details, analyze sentiment, and generate structured reports.


## Features  

### Medical NLP Summarization  
- Extracts **Symptoms, Diagnosis, Treatment, Prognosis**.  
- Generates structured **medical reports in JSON**.  

### Sentiment & Intent Analysis  
- Classifies sentiment (**Anxious, Neutral, Reassured**).  
- Detects intent (e.g., **"Seeking reassurance"**).  

### SOAP Note Generation  
- Converts transcripts into structured **SOAP notes**.  



1. Medical NLP Summarization

In [6]:
import spacy
from spacy.matcher import PhraseMatcher
from collections import defaultdict

# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Define medical terms for rule-based matching
symptom_terms = ["pain", "discomfort", "backache", "neck pain", "back pain", "headache"]
diagnosis_terms = ["whiplash injury"]
treatment_terms = ["physiotherapy", "painkillers", "X-rays"]
prognosis_terms = ["full recovery", "no long-term damage", "no signs of degeneration"]

# Create a PhraseMatcher object
matcher = PhraseMatcher(nlp.vocab, attr="LOWER")

# Add patterns to the matcher
matcher.add("SYMPTOM", [nlp.make_doc(term) for term in symptom_terms])
matcher.add("DIAGNOSIS", [nlp.make_doc(term) for term in diagnosis_terms])
matcher.add("TREATMENT", [nlp.make_doc(term) for term in treatment_terms])
matcher.add("PROGNOSIS", [nlp.make_doc(term) for term in prognosis_terms])

# Sample transcript
transcript = """
Physician: Good morning, Ms. Jones. How are you feeling today?
Patient: Good morning, doctor. I’m doing better, but I still have some discomfort now and then.
Physician: I understand you were in a car accident last September. Can you walk me through what happened?
Patient: Yes, it was on September 1st, around 12:30 in the afternoon. I was driving from Cheadle Hulme to Manchester when I had to stop in traffic. Out of nowhere, another car hit me from behind, which pushed my car into the one in front.
Physician: That sounds like a strong impact. Were you wearing your seatbelt?
Patient: Yes, I always do.
Physician: What did you feel immediately after the accident?
Patient: At first, I was just shocked. But then I realized I had hit my head on the steering wheel, and I could feel pain in my neck and back almost right away.
Physician: Did you seek medical attention at that time?
Patient: Yes, I went to Moss Bank Accident and Emergency. They checked me over and said it was a whiplash injury, but they didn’t do any X-rays. They just gave me some advice and sent me home.
Physician: How did things progress after that?
Patient: The first four weeks were rough. My neck and back pain were really bad—I had trouble sleeping and had to take painkillers regularly. It started improving after that, but I had to go through ten sessions of physiotherapy to help with the stiffness and discomfort.
Physician: That makes sense. Are you still experiencing pain now?
Patient: It’s not constant, but I do get occasional backaches. It’s nothing like before, though.
Physician: That’s good to hear. Have you noticed any other effects, like anxiety while driving or difficulty concentrating?
Patient: No, nothing like that. I don’t feel nervous driving, and I haven’t had any emotional issues from the accident.
Physician: And how has this impacted your daily life? Work, hobbies, anything like that?
Patient: I had to take a week off work, but after that, I was back to my usual routine. It hasn’t really stopped me from doing anything.
Physician: That’s encouraging. Let’s go ahead and do a physical examination to check your mobility and any lingering pain.
[Physical Examination Conducted]
Physician: Everything looks good. Your neck and back have a full range of movement, and there’s no tenderness or signs of lasting damage. Your muscles and spine seem to be in good condition.
Patient: That’s a relief!
Physician: Yes, your recovery so far has been quite positive. Given your progress, I’d expect you to make a full recovery within six months of the accident. There are no signs of long-term damage or degeneration.
Patient: That’s great to hear. So, I don’t need to worry about this affecting me in the future?
Physician: That’s right. I don’t foresee any long-term impact on your work or daily life. If anything changes or you experience worsening symptoms, you can always come back for a follow-up. But at this point, you’re on track for a full recovery.
Patient: Thank you, doctor. I appreciate it.
Physician: You’re very welcome, Ms. Jones. Take care, and don’t hesitate to reach out if you need anything.
"""

# Process the transcript with spaCy
doc = nlp(transcript)

# Extract entities using the PhraseMatcher
entity_dict = defaultdict(set)

for match_id, start, end in matcher(doc):
    label = nlp.vocab.strings[match_id]  # Get the matched label (SYMPTOM, DIAGNOSIS, etc.)
    entity_dict[label].add(doc[start:end].text)

# Convert sets to lists for JSON compatibility
medical_report = {
    "Patient_Name": "Janet Jones",
    "Symptoms": list(entity_dict["SYMPTOM"]),
    "Diagnosis": list(entity_dict["DIAGNOSIS"]),
    "Treatment": list(entity_dict["TREATMENT"]),
    "Current_Status": "Occasional backache" if "backache" in entity_dict["SYMPTOM"] else "Improved condition",
    "Prognosis": list(entity_dict["PROGNOSIS"])
}

print("Medical Report:")
print(medical_report)


Medical Report:
{'Patient_Name': 'Janet Jones', 'Symptoms': ['back pain', 'discomfort', 'pain'], 'Diagnosis': ['whiplash injury'], 'Treatment': ['physiotherapy', 'X-rays', 'painkillers'], 'Current_Status': 'Improved condition', 'Prognosis': ['full recovery']}


**📍 Questions:**

- How would you handle **ambiguous or missing medical data** in the transcript?
- What **pre-trained NLP models** would you use for medical summarization?

### **Handling ambiguous or missing medical data:**  
- Use Named Entity Recognition (NER) models trained on medical datasets (like scispaCy or Med7).  
- Implement rule-based logic to infer missing details based on context (e.g., if symptoms are mentioned without a diagnosis, flag for manual review).  
- Use GPT-based models to summarize conversations and suggest missing details probabilistically.  

### **Pre-trained NLP models for medical summarization:**  
- **scispaCy** – for biomedical NER.  
- **BioBERT** – for medical text processing.  
- **ClinicalBERT** – fine-tuned for clinical notes.  
- **MedGPT** – for generating structured medical summaries.

2. Sentiment & Intent Analysis

In [10]:
from transformers import pipeline

# Load a better sentiment model
sentiment_analyzer = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

# Define intent classification rules
intent_keywords = {
    "Seeking reassurance": ["worried", "hope", "relief", "concerned"],
    "Reporting symptoms": ["pain", "discomfort", "ache", "injury"],
    "Expressing concern": ["afraid", "anxious", "worried"]
}

# Custom sentiment mapping
def map_sentiment(label, text):
    if label == "NEGATIVE" or any(word in text.lower() for word in ["worried", "concerned", "afraid", "anxious"]):
        return "Anxious"
    elif label == "POSITIVE":
        return "Reassured"
    return "Neutral"

# Classify intent based on keywords
def classify_intent(text):
    for intent, keywords in intent_keywords.items():
        if any(word in text.lower() for word in keywords):
            return intent
    return "General statement"

# Final function to process patient dialogue
def analyze_patient_dialogue(text):
    # Get sentiment
    sentiment_result = sentiment_analyzer(text)[0]
    sentiment_label = map_sentiment(sentiment_result['label'], text)

    # Get intent classification
    intent = classify_intent(text)

    return {
        "Sentiment": sentiment_label,
        "Intent": intent
    }

# Sample input
text = "I'm a bit worried about my back pain, but I hope it gets better soon."

# Output result
result = analyze_patient_dialogue(text)
print(result)


Device set to use cpu


{'Sentiment': 'Anxious', 'Intent': 'Seeking reassurance'}


**📍 Questions:**

- How would you fine-tune **BERT** for medical sentiment detection?
- What datasets would you use for training a **healthcare-specific** sentiment model?


### **Fine-tuning BERT for medical sentiment detection:**  
   - Use BioBERT or ClinicalBERT as a base model.  
   - Fine-tune on a labeled dataset of medical conversations (e.g., MIMIC-III patient dialogues).  
   - Augment data using synthetic examples of anxious vs. reassured patient statements.  
   - Apply transfer learning with domain-specific vocabulary expansion.  

### **Datasets for training a healthcare-specific sentiment model:**  
   - **MIMIC-III** (ICU patient records and conversations).  
   - **i2b2** (clinical NLP dataset for de-identification and sentiment classification).  
   - **MedDialog** (medical dialogues dataset).  
   - **PubMed articles** for extracting common patient-reported concerns.  



3. SOAP Note Generation (Bonus)

In [5]:
# Generate SOAP note
soap_note = {
    "Subjective": {
        "Chief_Complaint": "Neck and back pain",
        "History_of_Present_Illness": "Patient had a car accident, experienced pain for four weeks, now occasional back pain."
    },
    "Objective": {
        "Physical_Exam": "Full range of motion in cervical and lumbar spine, no tenderness.",
        "Observations": "Patient appears in normal health, normal gait."
    },
    "Assessment": {
        "Diagnosis": "Whiplash injury and lower back strain",
        "Severity": "Mild, improving"
    },
    "Plan": {
        "Treatment": "Continue physiotherapy as needed, use analgesics for pain relief.",
        "Follow-Up": "Patient to return if pain worsens or persists beyond six months."
    }
}

print(soap_note)

{'Subjective': {'Chief_Complaint': 'Neck and back pain', 'History_of_Present_Illness': 'Patient had a car accident, experienced pain for four weeks, now occasional back pain.'}, 'Objective': {'Physical_Exam': 'Full range of motion in cervical and lumbar spine, no tenderness.', 'Observations': 'Patient appears in normal health, normal gait.'}, 'Assessment': {'Diagnosis': 'Whiplash injury and lower back strain', 'Severity': 'Mild, improving'}, 'Plan': {'Treatment': 'Continue physiotherapy as needed, use analgesics for pain relief.', 'Follow-Up': 'Patient to return if pain worsens or persists beyond six months.'}}


**📍 Questions:**

- How would you train an NLP model to **map medical transcripts into SOAP format**?
- What **rule-based or deep-learning** techniques would improve the accuracy of SOAP note generation?


### **Training an NLP model for SOAP note generation:**  
   - Use GPT-4 or T5 fine-tuned on clinical notes (SOAP format).  
   - Train on datasets like MIMIC-III and i2b2 for structuring unstructured clinical dialogues.  
   - Use prompt engineering for zero-shot or few-shot learning with LLMs.  
   - Employ rule-based methods for structuring sections when AI predictions are uncertain.  

### **Techniques for improving SOAP note accuracy:**  
   - **Hybrid approach:** Combine rule-based extraction (e.g., regex for medical terms) with deep learning models (BioBERT or T5).  
   - **Context-aware summarization:** Use transformers with attention mechanisms to ensure details are grouped logically.  
   - **Reinforcement learning with human feedback:** Iterate model outputs by incorporating clinician feedback to refine structuring.  
   - **Domain adaptation:** Fine-tune models with real-world SOAP notes from datasets like MIMIC-III.  