<a href="https://colab.research.google.com/github/SpyDplayz/Physician_Notetaker/blob/master/PhysicianNotetaker.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
# STEP 1: Downgrade spacy and install model dependencies
!pip install "spacy<3.5.0,>=3.4.1" "pydantic<2.0.0" "scispacy<0.5.0"
!pip install en_ner_bc5cdr_md==0.5.1

# STEP 2: Load and test the model
import spacy

nlp = spacy.load("en_ner_bc5cdr_md")

text = "Aspirin is used to treat headaches. It may interact with Ibuprofen."
doc = nlp(text)

print("Entities:")
for ent in doc.ents:
    print(ent.text, ent.label_)


Collecting scispacy<0.5.0
  Downloading scispacy-0.4.0-py3-none-any.whl.metadata (16 kB)
INFO: pip is looking at multiple versions of scispacy to determine which version is compatible with other requirements. This could take a while.
  Downloading scispacy-0.3.0-py3-none-any.whl.metadata (15 kB)
Requested scispacy<0.5.0 from https://files.pythonhosted.org/packages/7b/0c/e1dc89d73122677cdee8a2d4e734c608c5f112e0c4ea2886f0944ceff1b6/scispacy-0.3.0-py3-none-any.whl has invalid metadata: Expected matching RIGHT_PARENTHESIS for LEFT_PARENTHESIS, after version specifier
    requests (<3.0.0conllu,>=2.0.0)
             ~~~~~~~~^
Please use pip<24.1 if you need to use this version.[0m[33m
[0m  Downloading scispacy-0.2.5-py3-none-any.whl.metadata (14 kB)
Requested scispacy<0.5.0 from https://files.pythonhosted.org/packages/93/0d/db88d1c2ad059c0b6854ca00519038b36ee1106e97899a3eb66bd25aeaaa/scispacy-0.2.5-py3-none-any.whl has invalid metadata: Expected matching RIGHT_PARENTHESIS for LEFT_PARENT

In [4]:

# import spacy
from transformers import pipeline
import re
from typing import Dict, List
import json
import torch


conversation = """
Physician: Good morning, Ms. Jones. How are you feeling today?
Patient: Good morning, doctor. I’m doing better, but I still have some discomfort now and then.
Physician: I understand you were in a car accident last September. Can you walk me through what happened?
Patient: Yes, it was on September 1st, around 12:30 in the afternoon. I was driving from Cheadle Hulme to Manchester when I had to stop in traffic. Out of nowhere, another car hit me from behind, which pushed my car into the one in front.
Physician: That sounds like a strong impact. Were you wearing your seatbelt?
Patient: Yes, I always do.
Physician: What did you feel immediately after the accident?
Patient: At	sqlfirst, I was just shocked. But then I realized I had hit my head on the steering wheel, and I could feel pain in my neck and back almost right away.
Physician: Did you seek medical attention at that time?
Patient: Yes, I went to Moss Bank Accident and Emergency. They checked me over and said it was a whiplash injury, but they didn’t do any X-rays. They just gave me some advice and sent me home.
Physician: How did things progress after that?
Patient: The first four weeks were rough. My neck and back pain were really bad—I had trouble sleeping and had to take painkillers regularly. It started improving after that, but I had to go through ten sessions of physiotherapy to help with the stiffness and discomfort.
Physician: That makes sense. Are you still experiencing pain now?
Patient: It’s not constant, but I do get occasional backaches. It’s nothing like before, though.
Physician: That’s good to hear. Have you noticed any other effects, like anxiety while driving or difficulty concentrating?
Patient: No, nothing like that. I don’t feel nervous driving, and I haven’t had any emotional issues from the accident.
Physician: And how has this impacted your daily life? Work, hobbies, anything like that?
Patient: I had to take a week off work, but after that, I was back to my usual routine. It hasn’t really stopped me from doing anything.
Physician: That’s encouraging. Let’s go ahead and do a physical examination to check your mobility and any lingering pain.
[Physical Examination Conducted]
Physician: Everything looks good. Your neck and back have a full range of movement, and there’s no tenderness or signs of lasting damage. Your muscles and spine seem to be in good condition.
Patient: That’s a relief!
Physician: Yes, your recovery so far has been quite positive. Given your progress, I’d expect you to make a full recovery within six months of the accident. There are no signs of long-term damage or degeneration.
Patient: That’s great to hear. So, I don’t need to worry about this affecting me in the future?
Physician: That’s right. I don’t foresee any long-term impact on your work or daily life. If anything changes or you experience worsening symptoms, you can always come back for a follow-up. But at this point, you’re on track for a full recovery.
Patient: Thank you, doctor. I appreciate it.
Physician: You’re very welcome, Ms. Jones. Take care, and don’t hesitate to reach out if you need anything.
"""

#pre-trained models with error handling, set to CPU
try:
    nlp = spacy.load("en_ner_bc5cdr_md")  # Biomedical NER model
except OSError as e:
    print(f"Error loading spaCy model 'en_ner_bc5cdr_md': {e}")
    print("Install it with: pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_bc5cdr_md-0.5.1.tar.gz")
    exit(1)

try:
    sentiment_analyzer = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english", device=-1)  # -1 forces CPU
except Exception as e:
    print(f"Error loading sentiment analysis pipeline: {e}")
    exit(1)

# 1. Medical NLP Summarization
def extract_medical_entities(text: str) -> Dict[str, List[str]]:
    """Extract medical entities with NER and refined rules."""
    doc = nlp(text)
    entities = {
        "Symptoms": [],
        "Diagnosis": [],
        "Treatment": [],
        "Prognosis": [],
        "Current_Status": []
    }
    text_lower = text.lower()

    # NER for Diagnosis and Treatment
    for ent in doc.ents:
        if ent.label_ == "DISEASE" and ent.text.lower() in ["whiplash injury"]:  # Filter to relevant diagnoses
            entities["Diagnosis"].append(ent.text)
        elif ent.label_ == "CHEMICAL" and "painkiller" in ent.text.lower():
            entities["Treatment"].append(ent.text.capitalize())

    # Rule-based symptom extraction with strict filtering
    symptom_patterns = [
        r"(neck|back|chest|head)\s*(pain|ache|discomfort)",
        r"head\s*(hit|impact)",
        r"stiffness",
        r"trouble sleeping"
    ]
    for pattern in symptom_patterns:
        matches = re.findall(pattern, text_lower)
        for match in matches:
            if isinstance(match, tuple):
                symptom = " ".join(match).strip()
            else:
                symptom = match.strip()
            if "hit" in symptom or "impact" in symptom:
                symptom = "Head impact"
            if symptom and symptom not in [s.lower() for s in entities["Symptoms"]]:
                entities["Symptoms"].append(symptom.capitalize())

    # Treatment extraction
    if "physiotherapy" in text_lower:
        match = re.search(r"(\d+)\s*sessions of physiotherapy", text_lower)
        entities["Treatment"].append(f"{match.group(1)} physiotherapy sessions" if match else "Physiotherapy")

    # Current status
    if "occasional" in text_lower and "ache" in text_lower:
        entities["Current_Status"].append("Occasional backache")

    # Prognosis
    if "full recovery" in text_lower:
        match = re.search(r"full recovery within ([\w\s]+ of the accident)", text_lower)
        entities["Prognosis"].append(f"Full recovery expected within {match.group(1)}" if match else "Full recovery expected")

    # Clean up: Remove noise and handle missing data
    entities["Symptoms"] = [s for s in entities["Symptoms"] if s.lower() not in ["back ache"] or "back pain" not in [x.lower() for x in entities["Symptoms"]]]  # Avoid duplicates
    if not entities["Diagnosis"]:
        entities["Diagnosis"] = ["Whiplash injury"]  # Default based on context
    if not entities["Symptoms"]:
        entities["Symptoms"] = ["Discomfort (unspecified)"]

    for key in entities:
        entities[key] = sorted(list(set(entities[key])))

    return entities

def summarize_medical_details(text: str, patient_name: str) -> Dict:
    """Convert transcript into a structured medical report."""
    entities = extract_medical_entities(text)
    return {
        "Patient_Name": patient_name,
        "Symptoms": entities["Symptoms"],
        "Diagnosis": entities["Diagnosis"],
        "Treatment": entities["Treatment"],
        "Current_Status": entities["Current_Status"],
        "Prognosis": entities["Prognosis"]
    }

def extract_keywords(text: str) -> List[str]:
    """Extract important medical phrases with filtering."""
    entities = extract_medical_entities(text)
    keywords = []
    keywords.extend(entities["Symptoms"])
    keywords.extend(entities["Diagnosis"])
    keywords.extend(entities["Treatment"])
    keywords.extend(entities["Current_Status"])
    keywords.extend(entities["Prognosis"])

    # Clean and deduplicate
    keywords = [kw.lower() for kw in keywords if len(kw) > 2 and kw.isalpha() or " " in kw]
    return sorted(list(set(keywords)))

# 2. Sentiment & Intent Analysis
def analyze_sentiment_and_intent(text: str) -> Dict:
    """Analyze patient sentiment and intent with transformers."""
    patient_lines = []
    for line in text.split("\n"):
        if line.startswith("Patient:") or "Ms. Jones" in line:
            patient_text = line.split(":", 1)[1].strip() if ":" in line else line
            patient_lines.append(patient_text)
    patient_text = " ".join(patient_lines)

    if not patient_text.strip():
        return {"Sentiment": "Neutral", "Intent": "Unknown"}

    # Sentiment analysis with DistilBERT
    sentiment_result = sentiment_analyzer(patient_text)[0]
    sentiment = "Reassured" if sentiment_result["label"] == "POSITIVE" else "Anxious" if sentiment_result["label"] == "NEGATIVE" else "Neutral"

    # Refined intent detection
    intent = "Reporting symptoms"
    text_lower = patient_text.lower()
    if any(word in text_lower for word in ["worry", "concerned", "scared", "hope"]):
        intent = "Seeking reassurance"
    elif any(word in text_lower for word in ["pain", "discomfort", "ache", "hurt"]):
        intent = "Reporting symptoms"
    elif "need" in text_lower or "help" in text_lower:
        intent = "Expressing concern"

    return {"Sentiment": sentiment, "Intent": intent}

# 3. SOAP Note Generation
def generate_soap_note(text: str) -> Dict:
    """Generate a structured SOAP note from the conversation."""
    soap_note = {
        "Subjective": {"Chief_Complaint": "", "History_of_Present_Illness": ""},
        "Objective": {"Physical_Exam": "", "Observations": ""},
        "Assessment": {"Diagnosis": "", "Severity": ""},
        "Plan": {"Treatment": "", "Follow-Up": ""}
    }
    entities = extract_medical_entities(text)
    text_lower = text.lower()

    # Subjective
    soap_note["Subjective"]["Chief_Complaint"] = ", ".join(entities["Symptoms"]) if entities["Symptoms"] else "Not clearly stated"
    history = []
    if re.search(r"car accident|crash|collision", text_lower):
        history.append("Patient involved in a car accident on September 1st")
    if entities["Symptoms"]:
        history.append(f"reported {', '.join(entities['Symptoms']).lower()} for four weeks")
    if "occasional" in text_lower:
        history.append("now reports occasional symptoms")
    soap_note["Subjective"]["History_of_Present_Illness"] = ", ".join(history) + "." if history else "No clear history provided."

    # Objective
    if "full range of movement" in text_lower:
        soap_note["Objective"]["Physical_Exam"] = "Full range of motion in cervical and lumbar spine, no tenderness."
    if "good condition" in text_lower or "normal" in text_lower:
        soap_note["Objective"]["Observations"] = "Patient appears in normal health, normal gait."

    # Assessment
    soap_note["Assessment"]["Diagnosis"] = ", ".join(entities["Diagnosis"])
    soap_note["Assessment"]["Severity"] = "Mild, improving" if "improving" in text_lower else "Under evaluation"

    # Plan
    if entities["Symptoms"]:
        treatments = []
        if "physiotherapy" in text_lower:
            treatments.append("Continue physiotherapy as needed")
        if "painkillers" in text_lower or "pain" in text_lower:
            treatments.append("Use analgesics for pain relief")
        soap_note["Plan"]["Treatment"] = ", ".join(treatments) + "." if treatments else "Monitor symptoms."
        soap_note["Plan"]["Follow-Up"] = "Return if pain worsens or persists beyond six months."
    elif "follow-up" in text_lower:
        soap_note["Plan"]["Follow-Up"] = "Return if symptoms worsen."

    return soap_note


def main(conversation: str, patient_name: str = "Ms. Jones"):
    # 1. Medical NLP Summarization
    medical_summary = summarize_medical_details(conversation, patient_name)
    print("Medical Summary:")
    print(json.dumps(medical_summary, indent=2))

    keywords = extract_keywords(conversation)
    print("\nMedical Keywords:")
    print(keywords)

    # 2. Sentiment & Intent Analysis
    sentiment_intent = analyze_sentiment_and_intent(conversation)
    print("\nSentiment & Intent Analysis:")
    print(json.dumps(sentiment_intent, indent=2))

    # 3. SOAP Note Generation
    soap_note = generate_soap_note(conversation)
    print("\nSOAP Note:")
    print(json.dumps(soap_note, indent=2))

if __name__ == "__main__":
    main(conversation, "Ms. Jones")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Device set to use cpu


Medical Summary:
{
  "Patient_Name": "Ms. Jones",
  "Symptoms": [
    "Back pain",
    "Stiffness",
    "Trouble sleeping"
  ],
  "Diagnosis": [
    "whiplash injury"
  ],
  "Treatment": [
    "Physiotherapy"
  ],
  "Current_Status": [
    "Occasional backache"
  ],
  "Prognosis": [
    "Full recovery expected within six months of the accident"
  ]
}

Medical Keywords:
['back pain', 'full recovery expected within six months of the accident', 'occasional backache', 'physiotherapy', 'stiffness', 'trouble sleeping', 'whiplash injury']

Sentiment & Intent Analysis:
{
  "Sentiment": "Anxious",
  "Intent": "Seeking reassurance"
}

SOAP Note:
{
  "Subjective": {
    "Chief_Complaint": "Back pain, Stiffness, Trouble sleeping",
    "History_of_Present_Illness": "Patient involved in a car accident on September 1st, reported back pain, stiffness, trouble sleeping for four weeks, now reports occasional symptoms."
  },
  "Objective": {
    "Physical_Exam": "Full range of motion in cervical and lumb