# Physician Notetaker

Link to Assessment : https://sunset-parrot-38b.notion.site/Physician-Notetaker-1ac7fca3d41680849078deb25228b34f

## Task 1 : Medical NLP Summarization 

**Task:** Implement an NLP pipeline to **extract medical details** from the transcribed conversation.

### **📍 Deliverables:**

1. **Named Entity Recognition (NER):** Extract **Symptoms, Treatment, Diagnosis, Prognosis** using `spaCy` or `transformers`.
2. **Text Summarization:** Convert the transcript into a **structured medical report**.
3. **Keyword Extraction:** Identify **important medical phrases** (e.g., "whiplash injury," "physiotherapy sessions").

**📍 Sample Input (Raw Transcript):**

```
text
CopyEdit
Doctor: How are you feeling today?
Patient: I had a car accident. My neck and back hurt a lot for four weeks.
Doctor: Did you receive treatment?
Patient: Yes, I had ten physiotherapy sessions, and now I only have occasional back pain.

```

**📍 Expected Output (Structured Summary in JSON Format):**

```json
json
CopyEdit
{
  "Patient_Name": "Janet Jones",
  "Symptoms": ["Neck pain", "Back pain", "Head impact"],
  "Diagnosis": "Whiplash injury",
  "Treatment": ["10 physiotherapy sessions", "Painkillers"],
  "Current_Status": "Occasional backache",
  "Prognosis": "Full recovery expected within six months"
}

```

**📍 Questions:**

- How would you handle **ambiguous or missing medical data** in the transcript?
- What **pre-trained NLP models** would you use for medical summarization?



## Task 1 Using Spacy

In [5]:
!pip install spacy scispacy

Defaulting to user installation because normal site-packages is not writeable
Collecting scispacy
  Using cached scispacy-0.5.5-py3-none-any.whl.metadata (18 kB)
Collecting typer<0.10.0,>=0.3.0 (from spacy)
  Downloading typer-0.9.4-py3-none-any.whl.metadata (14 kB)
Collecting conllu (from scispacy)
  Downloading conllu-6.0.0-py3-none-any.whl.metadata (21 kB)
Collecting nmslib-metabrainz==2.1.3 (from scispacy)
  Downloading nmslib_metabrainz-2.1.3-cp311-cp311-win_amd64.whl.metadata (975 bytes)
Collecting pybind11>=2.2.3 (from nmslib-metabrainz==2.1.3->scispacy)
  Downloading pybind11-2.13.6-py3-none-any.whl.metadata (9.5 kB)
Downloading scispacy-0.5.5-py3-none-any.whl (46 kB)
Downloading nmslib_metabrainz-2.1.3-cp311-cp311-win_amd64.whl (468 kB)
Downloading typer-0.9.4-py3-none-any.whl (45 kB)
Downloading conllu-6.0.0-py3-none-any.whl (16 kB)
Downloading pybind11-2.13.6-py3-none-any.whl (243 kB)
Installing collected packages: pybind11, conllu, typer, nmslib-metabrainz, scispacy
  Attem

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
prefect 2.10.8 requires coolname>=1.0.4, which is not installed.
prefect 2.10.8 requires pathspec>=0.8.0, which is not installed.
prefect 2.10.8 requires python-slugify>=5.0, which is not installed.
prefect 2.10.8 requires readchar>=4.0.0, which is not installed.
embedchain 0.1.103 requires alembic<2.0.0,>=1.13.1, but you have alembic 1.8.1 which is incompatible.
embedchain 0.1.103 requires langchain<0.2.0,>=0.1.4, but you have langchain 0.3.3 which is incompatible.
embedchain 0.1.103 requires tiktoken<0.6.0,>=0.5.2, but you have tiktoken 0.8.0 which is incompatible.
gradio 5.16.1 requires typer<1.0,>=0.12; sys_platform != "emscripten", but you have typer 0.9.4 which is incompatible.
prefect 2.10.8 requires sqlalchemy[asyncio]!=1.4.33,<2.0,>=1.4.22, but you have sqlalchemy 2.0.36 which is incompatible.

[notice] A

In [6]:
import spacy
from scispacy.linking import EntityLinker # For Medical specific entities
import json

In [23]:
!pip install -U spacy==3.7.4 scispacy==0.5.5 
# ^^^ Need above mentioned versions for the model to work better.

Defaulting to user installation because normal site-packages is not writeable


[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip



Collecting thinc<8.3.0,>=8.2.2 (from spacy==3.7.4)
  Using cached thinc-8.2.5-cp311-cp311-win_amd64.whl.metadata (15 kB)
Using cached thinc-8.2.5-cp311-cp311-win_amd64.whl (1.5 MB)
Installing collected packages: thinc
  Attempting uninstall: thinc
    Found existing installation: thinc 8.1.12
    Uninstalling thinc-8.1.12:
      Successfully uninstalled thinc-8.1.12
Successfully installed thinc-8.2.5


In [24]:
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.3/en_core_sci_md-0.5.3.tar.gz
# ^^^ To download medical specific model for spacy (Mandatory)

Defaulting to user installation because normal site-packages is not writeable
Collecting https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.3/en_core_sci_md-0.5.3.tar.gz
  Using cached https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.3/en_core_sci_md-0.5.3.tar.gz (119.1 MB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting spacy<3.7.0,>=3.6.1 (from en-core-sci-md==0.5.3)
  Using cached spacy-3.6.1-cp311-cp311-win_amd64.whl.metadata (26 kB)
Collecting thinc<8.2.0,>=8.1.8 (from spacy<3.7.0,>=3.6.1->en-core-sci-md==0.5.3)
  Using cached thinc-8.1.12-cp311-cp311-win_amd64.whl.metadata (15 kB)
Using cached spacy-3.6.1-cp311-cp311-win_amd64.whl (12.0 MB)
Using cached thinc-8.1.12-cp311-cp311-win_amd64.whl (1.5 MB)
Installing collected packages: thinc, spacy
  Attempting uninstall: thinc
    Found existing installation: thinc 8.2.5
    Uninstalling thinc-8.2.5:
      Successfully uninstalled thinc-8.2

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
en-core-web-sm 3.7.1 requires spacy<3.8.0,>=3.7.2, but you have spacy 3.6.1 which is incompatible.
scispacy 0.5.5 requires spacy<3.8.0,>=3.7.0, but you have spacy 3.6.1 which is incompatible.

[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [20]:
!pip show en-core-sci-md

Name: en-core-sci-md




Version: 0.5.3
Summary: Spacy Models for Biomedical Text.
Home-page: https://allenai.github.io/SciSpaCy/
Author: Allen Institute for Artificial Intelligence
Author-email: ai2-info@allenai.org
License: CC BY-SA 3.0
Location: C:\Users\Rizen3\AppData\Roaming\Python\Python311\site-packages
Requires: spacy
Required-by: 


In [30]:
nlp = spacy.load("en_core_sci_md") # Medical NLP Model from SciSpacy

In [31]:
CONVERSATION = """
Doctor: How are you feeling today?
Patient: I had a car accident. My neck and back hurt a lot for four weeks.
Doctor: Did you receive treatment?
Patient: Yes, I had ten physiotherapy sessions, and now I only have occasional back pain.
"""

doc = nlp(CONVERSATION)
print(doc)


Doctor: How are you feeling today?
Patient: I had a car accident. My neck and back hurt a lot for four weeks.
Doctor: Did you receive treatment?
Patient: Yes, I had ten physiotherapy sessions, and now I only have occasional back pain.



In [32]:
for i in doc.ents:
    print(i.label_)

ENTITY
ENTITY
ENTITY
ENTITY
ENTITY
ENTITY
ENTITY
ENTITY
ENTITY
ENTITY
ENTITY
ENTITY
ENTITY


In [33]:
symptoms, treatments, diagnosis, prognosis = [],[],"",""

symptom_labels = ["SYMPTOM", "DISEASE", "CONDITION"]
treatment_labels = ["TREATMENT", "PROCEDURE"]
diagnosis_labels = ["DIAGNOSIS"]
prognosis_labels = ["PROGNOSIS"]

for entity in doc.ents:
    if entity.label_ in symptom_labels:
        symptoms.append(ent.text)
    elif entity.label_ in treatment_labels:
        treaments.append(ent.text)
    elif entity.label_ in diagnosis_labels:
        diagnosis += entity.text
    elif entity.label_ in prognosis_labels:
        prognosis += entity.text

print(f"Symptoms : {list(set(symptoms))},Diagnosis: {diagnosis},Treatment: {list(set(treatments))},Prognosis: {prognosis}")

Symptoms : [],Diagnosis: ,Treatment: [],Prognosis: 
