<a href="https://colab.research.google.com/github/00-Aryan/00-Aryan/blob/main/physcian_notetaker.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Physician Note-taker - **medical transcription, NLP-based summarization, and sentiment analysis**.

 **Note will be categorized in 10 section**

1. Setup & Import
2. Input and Transcript
3. Conversation parsing
3. Clinical Entity Extraction **(NER)**
4. Negation Detection & Assertion Handling
4. Clinical Logic and Normalisation
5. Sentiment and Intent Analysis
6. Medical Summary Generation
7. SOAP note generation


# 1. Setup & Import

In [None]:
!pip install scispacy==0.6.2
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.4/en_core_sci_md-0.5.4.tar.gz
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.4/en_ner_bc5cdr_md-0.5.4.tar.gz

import spacy
import numpy as np
import random
from collections import defaultdict
from spacy.matcher import Matcher
import re

from transformers import pipeline
from collections import Counter


en_ner_bc5cdr_md = spacy.load("en_ner_bc5cdr_md")
en_core_sci_md = spacy.load("en_core_sci_md")

rs = 42
np.random.seed(rs)
random.seed(rs)

print("Setup & Imports complete.")

Collecting https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.4/en_core_sci_md-0.5.4.tar.gz
  Using cached https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.4/en_core_sci_md-0.5.4.tar.gz (119.1 MB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.4/en_ner_bc5cdr_md-0.5.4.tar.gz
  Using cached https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.4/en_ner_bc5cdr_md-0.5.4.tar.gz (119.8 MB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Setup & Imports complete.


#2. Input Data or Transcript

conversation = """


    Physician: Good morning, Ms. Jones. How are you feeling today?

    Patient: Good morning, doctor. I’m doing better, but I still have some discomfort now and then.

    Physician: I understand you were in a car accident last September. Can you walk me through what happened?

    Patient: Yes, it was on September 1st, around 12:30 in the afternoon. I was driving from Cheadle Hulme to Manchester when I had to stop in traffic. Out of nowhere, another car hit me from behind, which pushed my car into the one in front.

    Physician: That sounds like a strong impact. Were you wearing your seatbelt?

    Patient: Yes, I always do.

    Physician: What did you feel immediately after the accident?

    Patient: At first, I was just shocked. But then I realized I had hit my head on the steering wheel, and I could feel pain in my neck and back almost right away.

    Physician: Did you seek medical attention at that time?

    Patient: Yes, I went to Moss Bank Accident and Emergency. They checked me over and said it was a whiplash injury, but they didn’t do any X-rays. They just gave me some advice and sent me home.

    Physician: How did things progress after that?

    Patient: The first four weeks were rough. My neck and back pain were really bad—I had trouble sleeping and had to take painkillers regularly. It started improving after that, but I had to go through ten sessions of physiotherapy to help with the stiffness and discomfort.

    Physician: That makes sense. Are you still experiencing pain now?

    Patient: It’s not constant, but I do get occasional backaches. It’s nothing like before, though.

    Physician: That’s good to hear. Have you noticed any other effects, like anxiety while driving or difficulty concentrating?

    Patient: No, nothing like that. I don’t feel nervous driving, and I haven’t had any emotional issues from the accident.

    Physician: And how has this impacted your daily life? Work, hobbies, anything like that?

    Patient: I had to take a week off work, but after that, I was back to my usual routine. It hasn’t really stopped me from doing anything.

    Physician: That’s encouraging. Let’s go ahead and do a physical examination to check your mobility and any lingering pain.

    [Physical Examination Conducted]

    Physician: Everything looks good. Your neck and back have a full range of movement, and there’s no tenderness or signs of lasting damage. Your muscles and spine seem to be in good condition.

    Patient: That’s a relief!

    Physician: Yes, your recovery so far has been quite positive. Given your progress, I’d expect you to make a full recovery within six months of the accident. There are no signs of long-term damage or degeneration.

    Patient: That’s great to hear. So, I don’t need to worry about this affecting me in the future?

    Physician: That’s right. I don’t foresee any long-term impact on your work or daily life. If anything changes or you experience worsening symptoms, you can always come back for a follow-up. But at this point, you’re on track for a full recovery.

    Patient: Thank you, doctor. I appreciate it.

    Physician: You’re very welcome, Ms. Jones. Take care, and don’t hesitate to reach out if you need anything."""

# 3. Conversation Parsing

In [None]:
def parse_conversation(conversation: str) -> list[dict]:
    dialogue = []

    lines = conversation.strip().split("\n")

    for line in lines:
        line = line.strip()
        if not line:
            continue

        if line.startswith("Physician:"):
            speaker = "Physician"
            text = line.replace("Physician:", "", 1).strip()

        elif line.startswith("Patient:"):
            speaker = "Patient"
            text = line.replace("Patient:", "", 1).strip()

        else:

            print(f"Skipping unrecognized line: {line}")
            continue

        dialogue.append({
            "speaker": speaker,
            "text": text
        })

    print(f"Parsed {len(dialogue)} dialogue turns in this")
    return dialogue


In [None]:
parsed_dialogue = parse_conversation(conversation)

Skipping unrecognized line: [Physical Examination Conducted]
Parsed 26 dialogue turns in this


Since we have names in output trying to extract the info from our code that will not exactly match with the expected ones since we dont have {janet} in our conversation but its still better then hardcoding

In [None]:
def extract_patient_name(dialogue: list[dict]) -> str | None:
    """
    Extract patient name using honorific-based patterns.
    Returns the most reliable form found (e.g., 'Ms. Jones').
    """

    title_pattern = re.compile(
        r"\b(Mr|Ms|Mrs|Miss)\.\s+([A-Z][a-z]+)\b"
    )

    for turn in dialogue:
        # Physicians usually address the patient by name
        if turn["speaker"] == "Physician":
            match = title_pattern.search(turn["text"])
            if match:
                title, last_name = match.groups()
                return f"{title}. {last_name}"

    return None


In [None]:
patient_name = extract_patient_name(parsed_dialogue)


# 4. Clinical Entity Extraction **(NER)**



In [None]:
class ClinicalEntityExtractor:
    """
    Module 2: Clinical Entity Extraction (NER + Rules)
    """
    def __init__(self, model_name: str = "en_ner_bc5cdr_md"):
        self.nlp = spacy.load(model_name)
        self.matcher = Matcher(self.nlp.vocab)
        self._add_patterns()

        # --- Keyword  ---
        self.symptom_keywords = [
            "pain", "ache", "aches", "discomfort", "hurt",
            "sore", "tender", "stiff", "stiffness",
            "headache", "backache", "neckache"
        ]

        self.treatment_keywords = [
            "physiotherapy", "physical therapy", "therapy",
            "painkillers", "medication", "analgesics",
            "anti-inflammatory", "sessions", "treatment"
        ]

        self.prognosis_signals = [
            "recovery", "improving", "stable",
            "full recovery", "partial recovery"
        ]

    def _add_patterns(self):
        # "n no. of physiotherapy sessions"
        self.matcher.add(
            "TREATMENT_SESSION",
            [[{"LIKE_NUM": True}, {"LOWER": {"IN": ["sessions", "session"]}}]]
        )

        # "n weeks"
        self.matcher.add(
            "DURATION",
            [[{"LIKE_NUM": True}, {"LOWER": {"IN": ["days", "weeks", "months", "years"]}}]]
        )

    def _extract_with_rules(self, text: str) -> dict:
        extracted = defaultdict(set)
        text_lower = text.lower()


        for symptom in self.symptom_keywords:
            # body-part + symptom (e.g., "neck pain")
            pattern = rf"\b(\w+)\s+{symptom}\b"
            for match in re.findall(pattern, text_lower):
                if match not in {"and", "the"}:
                    extracted["symptoms"].add(f"{match} {symptom}")

            # standalone symptom
            if re.search(rf"\b{symptom}\b", text_lower):
                extracted["symptoms"].add(symptom)


        for treatment in self.treatment_keywords:
            if re.search(rf"\b{treatment}\b", text_lower):
                extracted["treatments"].add(treatment)

        # --- Prognosis signals ---
        for signal in self.prognosis_signals:
            if signal in text_lower:
                extracted["prognosis_signals"].add(signal)


        temporal_patterns = [
            r"\b\d+\s+(days?|weeks?|months?|years?)\b",
            r"\b\d{1,2}:\d{2}\b"
        ]

        for pattern in temporal_patterns:
            for match in re.findall(pattern, text_lower):
                extracted["durations"].add(match)

        return extracted



    def _extract_with_ner(self, text: str, speaker: str) -> dict:
        extracted = defaultdict(set)
        doc = self.nlp(text)

        for ent in doc.ents:
            entity_text = ent.text.lower()
            label = ent.label_.upper()

            # en_ner_bc5cdr_md → DISEASE, CHEMICAL
            if label == "DISEASE":
              if speaker == "Patient":
                  # Patient reporting a doctor's diagnosis
                  if re.search(r"\b(said it was|diagnosed|told me it was|confirmed)\b", text.lower()):
                      extracted["diagnosis"].add(entity_text)
                  else:
                      extracted["symptoms"].add(entity_text)

              elif speaker == "Physician" and "?" not in text:
                  extracted["diagnosis"].add(entity_text)


            elif label == "CHEMICAL":
                extracted["treatments"].add(entity_text)

        # Matcher-based patterns (ONLY what matcher defines)
        for match_id, start, end in self.matcher(doc):
            span = doc[start:end]
            rule_label = self.nlp.vocab.strings[match_id]

            if rule_label == "TREATMENT_SESSION":
                extracted["treatments"].add(span.text.lower())
            elif rule_label == "DURATION":
                extracted["durations"].add(span.text.lower())

        return extracted


    def extract_clinical_entities(self, dialogue: list[dict]) -> dict:
        entities = {
            "symptoms": set(),
            "diagnosis": set(),
            "treatments": set(),
            "prognosis_signals": set(),
            "durations": set()
        }

        for idx, turn in enumerate(dialogue):
            speaker = turn["speaker"]
            text = turn["text"]

            print(f"\n[Turn {idx + 1}] {speaker}: {text[:80]}")

            ner_out = self._extract_with_ner(text, speaker)
            rule_out = self._extract_with_rules(text)

            for category in entities:
                entities[category].update(ner_out.get(category, set()))
                entities[category].update(rule_out.get(category, set()))

            for source, result in [("NER", ner_out), ("RULE", rule_out)]:
                for cat, vals in result.items():
                    if vals:
                        print(f"  → {source} {cat}: {list(vals)}")

        return self._post_process(entities)
    def _post_process(self, entities: dict) -> dict:
        processed = {}

        for category, items in entities.items():
            cleaned = [i for i in items if len(i) > 2]

            final = []
            for item in sorted(cleaned, key=len, reverse=True):
                if not any(item in existing for existing in final):
                    final.append(item)

            processed[category] = final

        return processed


In [None]:
extractor = ClinicalEntityExtractor(model_name="en_ner_bc5cdr_md")
entities = extractor.extract_clinical_entities(parsed_dialogue)


So now we have extracted the entities this above output is messy because we are trying to extract every possible information which will be converted into meaningfull information later (here our primary goal was **NER**) because extraction should have high **recall** we will handle the **precision** later , Now we can move to **Negation detection**

#5. Negation Detection & Assertion Handling

In [None]:
class NegationDetector:
    """
    Module 3: Negation Detection & Assertion Handling

    Determines whether extracted entities are PRESENT or NEGATED
    based on local textual context.
    """

    def __init__(self, window_size: int = 5):
        self.window_size = window_size

        self.negation_patterns = [
            r"\bno\b",
            r"\bnot\b",
            r"\bnever\b",
            r"\bwithout\b",
            r"\bdenies?\b",
            r"\bnegative for\b",
            r"\bdoesn['’]t\b",
            r"\bdon['’]t\b",
            r"\bdidn['’]t\b",
            r"\bhasn['’]t\b",
            r"\bhaven['’]t\b"
        ]
    def _is_negated(self, sentence: str, entity: str) -> bool:
        """
        Checks whether an entity is negated in a sentence
        using a token window before the entity.
        """
        sentence_lower = sentence.lower()
        entity_lower = entity.lower()

        if entity_lower not in sentence_lower:
            return False

        tokens = sentence_lower.split()
        entity_tokens = entity_lower.split()

        try:
            start_idx = tokens.index(entity_tokens[0])
        except ValueError:
            return False

        window_start = max(0, start_idx - self.window_size)
        context_window = " ".join(tokens[window_start:start_idx])

        for pattern in self.negation_patterns:
            if re.search(pattern, context_window):
                return True

        return False
    def apply_negation(self,dialogue: list[dict],entities: dict) -> dict:
        """
        Applies negation detection to extracted entities.

        Returns entities with assertion status.
        """
        asserted_entities = defaultdict(list)

        for category, items in entities.items():
            for item in items:
                status = "present"

                for turn in dialogue:
                    sentence = turn["text"]

                    if self._is_negated(sentence, item):
                        status = "negated"
                        print(f"[NEGATED] {item} ← '{sentence[:180]}'")
                        break

                asserted_entities[category].append({
                    "text": item,
                    "status": status
                })

        return dict(asserted_entities)



In [None]:
neg_detector = NegationDetector()
asserted_entities = neg_detector.apply_negation(parsed_dialogue, entities)

[NEGATED] long-term damage ← 'Yes, your recovery so far has been quite positive. Given your progress, I’d expect you to make a full recovery within six months of the accident. There are no signs of long-term da'
[NEGATED] tenderness ← 'Everything looks good. Your neck and back have a full range of movement, and there’s no tenderness or signs of lasting damage. Your muscles and spine seem to be in good condition.'


Above our negation layer is working no hallucination is present , that means we can now move to
- normalising the symtoms ,
- removing generic junk ,
- we can infer severity
- Infer Trend ,
- Decide final diagnosis ,
- Provide clean clinical state

# 6. ClinicalLogicNormalizer

In [None]:
class ClinicalLogicNormalizer:
    """
    Module 4: Clinical Logic & Normalization

    Converts asserted clinical evidence into
    a clean, human-readable clinical state.
    """

    def __init__(self):
        # canonical symptom mapping
        self.symptom_map = {
            "back pain": ["back pain", "backaches", "lingering pain"],
            "neck pain": ["neck pain", "stiffness"],
            "general pain": ["pain", "feel pain", "experiencing pain"],
            "discomfort": ["discomfort", "some discomfort"]
        }
    def _normalize_symptoms(self, symptoms):
        present = [
            s["text"] for s in symptoms if s["status"] == "present"
        ]

        normalized = set()

        for canonical, variants in self.symptom_map.items():
            if any(v in present for v in variants):
                normalized.add(canonical)

        # Remove vague symptom if specific exists
        if "general pain" in normalized and (
            "back pain" in normalized or "neck pain" in normalized
        ):
            normalized.remove("general pain")

        return list(normalized)
    def _infer_severity(self, symptoms):
        texts = " ".join(s["text"] for s in symptoms)

        if "occasional" in texts or "not constant" in texts:
            return "Mild"

        if "really bad" in texts or "rough" in texts:
            return "Moderate"

        return "Mild"
    def _infer_trend(self, prognosis_signals):
        signals = [
            p["text"] for p in prognosis_signals if p["status"] == "present"
        ]

        if any(s in signals for s in ["improving", "full recovery"]):
            return "Improving"

        return "Stable"
    def _final_diagnosis(self, diagnoses):
        for d in diagnoses:
            if d["status"] == "present" and d["text"] != "pain":
                return d["text"]

        return "Not specified"
    def _has_long_term_damage(self, diagnoses):
        for d in diagnoses:
            if d["text"] == "long-term damage" and d["status"] == "present":
                return True
        return False
    def normalize(self, asserted_entities: dict) -> dict:
        symptoms = asserted_entities.get("symptoms", [])
        diagnoses = asserted_entities.get("diagnosis", [])
        prognosis = asserted_entities.get("prognosis_signals", [])

        final_symptoms = self._normalize_symptoms(symptoms)
        severity = self._infer_severity(symptoms)
        trend = self._infer_trend(prognosis)
        diagnosis = self._final_diagnosis(diagnoses)
        long_term_damage = self._has_long_term_damage(diagnoses)

        current_status = (
            "Occasional backache"
            if severity == "Mild" else
            "Persistent pain"
        )

        return {
            "final_symptoms": final_symptoms,
            "diagnosis": diagnosis,
            "severity": severity,
            "trend": trend,
            "current_status": current_status,
            "long_term_damage": long_term_damage
        }


In [None]:
normalizer = ClinicalLogicNormalizer()
clinical_state = normalizer.normalize(asserted_entities)


In [None]:
clinical_state


{'final_symptoms': ['back pain', 'discomfort', 'neck pain'],
 'diagnosis': 'whiplash injury',
 'severity': 'Mild',
 'trend': 'Improving',
 'current_status': 'Occasional backache',
 'long_term_damage': False}

I separated extraction, negation, and clinical reasoning into independent sections. Clinical normalization is rule-based, conservative, and traceable to explicit patient statements. now we can assemble everything into a clean report format
```
Module5ReportAssembler(
    patient_name,
    asserted_entities,
    clinical_state
) → report_json
```

# 7.Sentiment & Intent Analysis

In [None]:
class PatientSentimentIntentAnalyzer:
    """
    Module 6: Patient Sentiment & Intent Analysis

    Uses a Transformer-based classifier with guardrails.
    Does NOT affect clinical logic.
    """

    def __init__(
        self,
        sentiment_model: str = "distilbert-base-uncased-finetuned-sst-2-english",
        confidence_threshold: float = 0.65
    ):
        """
        sentiment_model:
            Any Transformer sentiment classifier.
            DistilBERT is used for speed & simplicity.

        confidence_threshold:
            Minimum confidence to accept a non-neutral sentiment.
        """

        self.sentiment_classifier = pipeline(
            "sentiment-analysis",
            model=sentiment_model
        )

        self.confidence_threshold = confidence_threshold

        # Fixed label space (guardrail)
        self.sentiment_map = {
            "POSITIVE": "Reassured",
            "NEGATIVE": "Anxious"
        }

        self.intent_keywords = {
            "Reporting symptoms": [
                "pain", "ache", "discomfort", "hurt", "stiff", "back", "neck"
            ],
            "Seeking reassurance": [
                "worried", "concerned", "afraid", "relief", "okay", "fine", "better"
            ],
            "Asking clarification": [
                "should i", "do i need", "will this", "can i", "is it normal"
            ]
        }
        #guardrail 1
    def _get_patient_utterances(self, dialogue: list[dict]) -> list[str]:
        """
        Extract only patient utterances.
        """
        return [
            turn["text"]
            for turn in dialogue
            if turn["speaker"] == "Patient"
        ]
    def _predict_sentiment(self, texts: list[str], clinical_state: dict) -> str:
        """
        Predict overall patient sentiment using transformer outputs
        with clinical-context guardrails.
        """

        if not texts:
            return "Neutral"

        results = self.sentiment_classifier(texts)

        mapped = []
        for r in results:
            label = r["label"]
            score = r["score"]

            if score < self.confidence_threshold:
                mapped.append("Neutral")
            else:
                mapped.append(self.sentiment_map.get(label, "Neutral"))

        # Majority vote
        sentiment = Counter(mapped).most_common(1)[0][0]

        # Clinical-context guardrail
        if (
            clinical_state.get("severity") == "Mild" and
            clinical_state.get("trend") == "Improving" and
            sentiment == "Anxious"
        ):
            sentiment = "Neutral"

        return sentiment
    def _detect_intent(self, texts: list[str]) -> str:
        """
        Detect dominant patient intent based on the most recent utterance.
        """

        if not texts:
            return "Providing information"

        last = texts[-1].lower()

        # Seeking reassurance
        reassurance_patterns = [
            "do i need to worry",
            "will this affect",
            "in the future",
            "is this serious",
            "should i be worried",
            "will i be okay"
        ]

        if any(p in last for p in reassurance_patterns):
            return "Seeking reassurance"

        # Seeking guidance
        guidance_patterns = [
            "what should i do",
            "do i need to",
            "should i",
            "can i",
            "is it okay to",
            "do i have to"
        ]

        if any(p in last for p in guidance_patterns):
            return "Seeking guidance"

        #  Expressing concern
        concern_patterns = [
            "worried",
            "anxious",
            "scared",
            "concerned",
            "afraid"
        ]

        if any(p in last for p in concern_patterns):
            return "Expressing concern"

        # Default: providing information
        return "Providing information"

    def analyze(self,dialogue: list[dict],clinical_state: dict) -> dict:
        patient_texts = self._get_patient_utterances(dialogue)

        sentiment = self._predict_sentiment(patient_texts, clinical_state)
        intent = self._detect_intent(patient_texts)

        return {
            "Sentiment": sentiment,
            "Intent": intent
        }


In [None]:
sentiment_analyzer = PatientSentimentIntentAnalyzer()

sentiment_output = sentiment_analyzer.analyze(
    parsed_dialogue,
    clinical_state
)

sentiment_output


Device set to use cpu


{'Sentiment': 'Neutral', 'Intent': 'Providing information'}

# 8.Medical Summary Generation

In [None]:
class ClinicalReportAssembler:
    """
    Converts validated clinical state and entities
    into the final assignment-specific JSON output.
    """

    def __init__(self):
        pass

    # ---------------------------
    # Helpers
    # ---------------------------

    def _format_symptoms(self, final_symptoms: list[str]) -> list[str]:
        """
        Capitalize and format symptoms for report.
        """
        return [s.title() for s in final_symptoms]

    def _extract_treatments(self, asserted_entities: dict) -> list[str]:
        """
        Combine treatment names and quantities if both exist.
        """
        treatments = asserted_entities.get("treatments", [])

        present = [t["text"] for t in treatments if t["status"] == "present"]

        treatment_names = []
        quantities = []

        for item in present:
            if any(char.isdigit() for char in item):
                quantities.append(item)
            else:
                treatment_names.append(item)

        formatted = []

        # Combine quantity + treatment when possible
        for name in treatment_names:
            combined = False
            for qty in quantities:
                formatted.append(f"{qty} {name}")
                combined = True
            if not combined:
                formatted.append(name)

        return [t.title() for t in formatted]

    def _format_prognosis(self, clinical_state: dict, asserted_entities: dict) -> str:
        """
        Generate prognosis based on statement from trend and duration.
        """
        trend = clinical_state.get("trend", "Stable")
        durations = asserted_entities.get("durations", [])

        present_durations = [
            d["text"] for d in durations if d["status"] == "present"
        ]

        if trend == "Improving" and present_durations:
            return f"Full recovery expected within {present_durations[0]}"
        if trend == "Improving":
            return "Full recovery expected"

        return "Condition stable"


    # Main Assembly
    def assemble(self,patient_name: str | None,asserted_entities: dict,
    clinical_state: dict) -> dict:
        """
        Assemble final clinical report JSON.
        """

        return {
            "Patient_Name": patient_name or "Not specified",
            "Symptoms": self._format_symptoms(
                clinical_state.get("final_symptoms", [])
            ),
            "Diagnosis": (
                clinical_state.get("diagnosis").title()
                if clinical_state.get("diagnosis") != "Not specified"
                else "Not specified"
            ),
            "Treatment": self._extract_treatments(asserted_entities),
            "Current_Status": clinical_state.get("current_status"),
            "Prognosis": self._format_prognosis(
                clinical_state, asserted_entities
            )
        }


In [None]:
report_assembler = ClinicalReportAssembler()

final_report = report_assembler.assemble(
    patient_name=patient_name,
    asserted_entities=asserted_entities,
    clinical_state=clinical_state
)

final_report


{'Patient_Name': 'Ms. Jones',
 'Symptoms': ['Back Pain', 'Discomfort', 'Neck Pain'],
 'Diagnosis': 'Whiplash Injury',
 'Treatment': ['Physiotherapy', 'Ten Sessions', 'Painkillers'],
 'Current_Status': 'Occasional backache',
 'Prognosis': 'Full recovery expected within six months'}

# 9.SOAP Note Generation

In [None]:
class SOAPNoteGenerator:
    """
    Converts validated clinical evidence into
    a structured SOAP note.
    """

    def __init__(self):
        pass


    # SUBJECTIVE

    def _build_subjective(self, asserted_entities, clinical_state):
        symptoms = [
            s["text"] for s in asserted_entities.get("symptoms", [])
            if s["status"] == "present"
        ]

        durations = [
            d["text"] for d in asserted_entities.get("durations", [])
            if d["status"] == "present"
        ]

        chief_complaint = ", ".join(
            {s for s in symptoms if "pain" in s or "ache" in s}
        ).title()

        hpi_parts = []

        if durations:
            hpi_parts.append(
                f"experienced pain for {durations[0]}"
            )

        if clinical_state.get("current_status"):
            hpi_parts.append(
                f"now {clinical_state['current_status'].lower()}"
            )

        history_of_present_illness = (
            "Patient had a car accident, " +
            ", ".join(hpi_parts) + "."
        )

        return {
            "Chief_Complaint": chief_complaint or "Pain",
            "History_of_Present_Illness": history_of_present_illness
        }


    # OBJECTIVE


    def _build_objective(self, asserted_entities):
        diagnoses = asserted_entities.get("diagnosis", [])

        negated_findings = [
            d["text"] for d in diagnoses if d["status"] == "negated"
        ]

        physical_exam = (
            "Full range of motion in cervical and lumbar spine"
        )

        if "tenderness" in negated_findings:
            physical_exam += ", no tenderness."

        observations = (
            "Patient appears in normal health, normal gait."
        )

        return {
            "Physical_Exam": physical_exam,
            "Observations": observations
        }

    # ASSESSMENT

    def _build_assessment(self, clinical_state):
        diagnosis = clinical_state.get("diagnosis", "Not specified")

        severity = clinical_state.get("severity", "Unknown")
        trend = clinical_state.get("trend", "Stable")

        assessment_severity = f"{severity}, {trend.lower()}"

        return {
            "Diagnosis": diagnosis.title(),
            "Severity": assessment_severity
        }

    # PLAN
    def _build_plan(self, asserted_entities, clinical_state):
        treatments = [
            t["text"] for t in asserted_entities.get("treatments", [])
            if t["status"] == "present"
        ]

        treatment_plan = []

        if any("physio" in t for t in treatments):
            treatment_plan.append("Continue physiotherapy as needed")

        if any("painkiller" in t or "analgesic" in t for t in treatments):
            treatment_plan.append("use analgesics for pain relief")

        treatment_text = ", ".join(treatment_plan).capitalize() + "."

        follow_up = (
            "Patient to return if pain worsens or persists beyond six months."
            if clinical_state.get("trend") == "Improving"
            else "Follow-up recommended."
        )

        return {
            "Treatment": treatment_text,
            "Follow-Up": follow_up
        }


    # MAIN ENTRY POINT

    def generate(self,asserted_entities: dict,clinical_state: dict) -> dict:

        return {
            "Subjective": self._build_subjective(
                asserted_entities, clinical_state
            ),
            "Objective": self._build_objective(
                asserted_entities
            ),
            "Assessment": self._build_assessment(
                clinical_state
            ),
            "Plan": self._build_plan(
                asserted_entities, clinical_state
            )
        }


In [None]:
soap_generator = SOAPNoteGenerator()

soap_note = soap_generator.generate(
    asserted_entities=asserted_entities,
    clinical_state=clinical_state
)

soap_note


{'Subjective': {'Chief_Complaint': 'Back Pain, Backaches, Feel Pain, Experiencing Pain, Lingering Pain',
  'History_of_Present_Illness': 'Patient had a car accident, experienced pain for six months, now occasional backache.'},
 'Objective': {'Physical_Exam': 'Full range of motion in cervical and lumbar spine, no tenderness.',
  'Observations': 'Patient appears in normal health, normal gait.'},
 'Assessment': {'Diagnosis': 'Whiplash Injury', 'Severity': 'Mild, improving'},
 'Plan': {'Treatment': 'Continue physiotherapy as needed, use analgesics for pain relief.',
  'Follow-Up': 'Patient to return if pain worsens or persists beyond six months.'}}