![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/MENOPAUSE.ipynb)

## **Detect Menopause Entities**

To run this yourself, you will need to upload your license keys to the notebook. Just Run The Cell Below in order to do that. Also You can open the file explorer on the left side of the screen and upload license_keys.json to the folder that opens. Otherwise, you can look at the example outputs at the bottom of the notebook.

## **Colab Setup**

In [None]:
import json
import os

from google.colab import files

license_keys = files.upload()

with open(list(license_keys.keys())[0]) as f:
    license_keys = json.load(f)

# Defining license key-value pairs as local variables
locals().update(license_keys)

# Adding license key-value pairs to environment variables
os.environ.update(license_keys)

## **Install dependencies**

In [None]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.4.1 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

# Installing Spark NLP Display Library for visualization
! pip install -q spark-nlp-display

## **Import dependencies into Python and start the Spark session**

In [3]:
import json
import os

from pyspark.ml import Pipeline, PipelineModel
from pyspark.sql import SparkSession

import sparknlp
import sparknlp_jsl

from sparknlp.annotator import *
from sparknlp_jsl.annotator import *
from sparknlp.base import *
from sparknlp.util import *
from sparknlp.pretrained import ResourceDownloader
from pyspark.sql import functions as F
from pyspark.sql.types import StringType

from sparknlp_display import EntityResolverVisualizer

import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', None)

import string
import numpy as np

params = {"spark.driver.memory":"16G",
          "spark.kryoserializer.buffer.max":"2000M",
          "spark.driver.maxResultSize":"2000M"}

spark = sparknlp_jsl.start(secret = SECRET, params=params)

print ("Spark NLP Version :", sparknlp.version())
print ("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark

Spark NLP Version : 5.4.0
Spark NLP_JSL Version : 5.4.0


# **🔎 NER**

## ner_menopause_core

In [4]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

ner_model = MedicalNerModel.pretrained("ner_menopause_core", "en", "clinical/models")\
    .setInputCols(["sentence", "token","embeddings"])\
    .setOutputCol("ner")

ner_converter = NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")

pipeline = Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    clinical_embeddings,
    ner_model,
    ner_converter
    ])



text_list = [
    """The patient is a 52-year-old female, G3P2, who presents with complaints of irregular menstruation and symptoms suggestive of perimenopause. She reports experiencing hot flashes, night sweats, and vaginal dryness. Her medical history includes polycystic ovary syndrome (PCOS), fatigue, mood swings, hypertension diagnosed 5 years ago and currently managed with medication, and osteoporosis diagnosed 2 years ago with ongoing treatment.
Current medications include estradiol for hormone replacement therapy, alendronate for osteoporosis therapy, and fluoxetine for depressive symptoms related to menopause. Recent tests and procedures include a bone density scan to monitor osteoporosis, blood tests for estradiol and follicle-stimulating hormone (FSH) levels, and a vaginal swab collected for routine infection screening. Test results showed elevated FSH levels indicating menopause.
The patient's family history includes breast cancer in her mother and a hip fracture in her mother at the age of 60. The plan is to continue current hormone replacement therapy and osteoporosis therapy, with follow-up appointments every 6 months to monitor symptoms and treatment efficacy.""",
"""A 55-year-old female, G1P1, reports severe mood swings and a decrease in libido, which she attributes to menopausal changes. Her medical history is notable for diabetes mellitus type 2 and hypercholesterolemia, both controlled with medications. She is currently on hormone replacement therapy, which includes progesterone and a low dose of estradiol, to manage her menopausal symptoms. A recent cardiovascular risk assessment was conducted due to her age and underlying conditions, suggesting adjustments in her treatment plan to reduce potential risks.""",
"""The patient is a 53-year-old female, G4P3, who is experiencing frequent urinary tract infections and urinary incontinence, which she believes are exacerbated by her postmenopausal state. She has a past medical history of recurrent urinary infections and cervical dysplasia, treated with surgical intervention five years ago. Her current treatment includes topical estrogen therapy to improve urogenital health and a prophylactic antibiotic regimen. A urodynamic test is planned to further investigate her urinary symptoms.""",
"""A 50-year-old female, G0, visited the clinic with concerns about persistent joint pains and muscle stiffness, which she associates with menopause. Her past health records show a history of rheumatoid arthritis and depression, managed with methotrexate and sertraline respectively. Given her symptoms and age, a dual-energy X-ray absorptiometry (DEXA) scan was performed, revealing early signs of osteopenia. The patient is advised to start calcium and vitamin D supplements along with her ongoing treatments, and to engage in regular weight-bearing exercise to strengthen bone density.""",
"""The patient, a 49-year-old female, G2P2, came to the clinic complaining of increased irritability, sleep disturbances, and memory lapses, alongside a cessation of menstrual periods for the last 12 months. She describes experiencing frequent migraines which have recently worsened. Her medical background is significant for hypothyroidism, managed with levothyroxine, and a recent diagnosis of early-stage breast cancer, currently in remission. The patient's treatment regimen includes low-dose hormone therapy to alleviate menopausal symptoms, with careful monitoring due to her cancer history."""
]

data = spark.createDataFrame(text_list, StringType()).toDF("text")

result = pipeline.fit(data).transform(data)

sentence_detector_dl_healthcare download started this may take some time.
Approximate size to download 367.3 KB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_menopause_core download started this may take some time.
[OK!]


In [5]:
result.select(F.explode(F.arrays_zip(result.ner_chunk.result,
                                     result.ner_chunk.begin,
                                     result.ner_chunk.end,
                                     result.ner_chunk.metadata)).alias("cols")) \
               .select( F.expr("cols['0']").alias("chunk"),
                       F.expr("cols['1']").alias("begin"),
                       F.expr("cols['2']").alias("end"),
                      F.expr("cols['3']['entity']").alias("ner_label"))\
                       .filter("ner_label!='O'")\
                       .show(1000,truncate=False)

+----------------------------+-----+----+---------------------------+
|chunk                       |begin|end |ner_label                  |
+----------------------------+-----+----+---------------------------+
|irregular menstruation      |75   |96  |Irregular_Menstruation     |
|perimenopause               |125  |137 |Perimenopause              |
|hot flashes                 |165  |175 |Other_Symptom              |
|night sweats                |178  |189 |Other_Symptom              |
|vaginal dryness             |196  |210 |Gynecological_Symptom      |
|polycystic ovary syndrome   |242  |266 |Gynecological_Disease      |
|PCOS                        |269  |272 |Gynecological_Disease      |
|fatigue                     |276  |282 |Other_Symptom              |
|hypertension                |298  |309 |Hypertension               |
|osteoporosis                |376  |387 |Osteoporosis               |
|estradiol                   |464  |472 |Hormone_Replacement_Therapy|
|hormone replacement

In [6]:
results = result.collect()

In [7]:
from sparknlp_display import NerVisualizer
visualiser = NerVisualizer()

In [8]:
for i in range(len(results)):
    visualiser.display(results[i], label_col='ner_chunk')

# **🔎 ASSERTION**

## assertion_menopause_wip

In [11]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

ner_model = MedicalNerModel.pretrained("ner_menopause_core", "en", "clinical/models")\
    .setInputCols(["sentence", "token","embeddings"])\
    .setOutputCol("ner")

ner_converter = NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")

assertion = AssertionDLModel.pretrained("assertion_menopause_wip", "en", "clinical/models")\
    .setInputCols(["sentence", "ner_chunk", "embeddings"]) \
    .setOutputCol("assertion")

pipeline = Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    clinical_embeddings,
    ner_model,
    ner_converter,
    assertion
    ])


text =["""A 50-year-old woman, G2P1, presents with symptoms of perimenopause including night sweats, irregular menstruation, and fatigue.She has previously been diagnosed with hypertension. She is taking hormone replacement therapy with estradiol and norethindrone acetate. Recent tests included a bone density scan, which confirmed osteoporosis and showed elevated FSH levels. She also underwent a vaginal swab test for routine screening. Her mother has a history of breast cancer. Her menarche age was 11."""]


empty_data = spark.createDataFrame([[""]]).toDF("text")
model = pipeline.fit(empty_data)

sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_menopause_core download started this may take some time.
[OK!]
assertion_menopause_wip download started this may take some time.
[OK!]


In [12]:
light_model = LightPipeline(model)

light_result = light_model.fullAnnotate(text)[0]

chunks=[]
entities=[]
status=[]
confidence=[]

for n,m in zip(light_result['ner_chunk'],light_result['assertion']):

    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    status.append(m.result)
    confidence.append(m.metadata['confidence'])

df = pd.DataFrame({'chunks':chunks, 'entities':entities, 'assertion':status, 'confidence':confidence})

df

Unnamed: 0,chunks,entities,assertion,confidence
0,G2P1,G_P,Present,0.9999
1,perimenopause,Perimenopause,Present,0.9999
2,night sweats,Other_Symptom,Present,0.9997
3,irregular menstruation,Irregular_Menstruation,Present,0.9997
4,fatigue,Other_Symptom,Present,0.9954
5,hypertension,Hypertension,Past,0.9916
6,hormone replacement therapy,Hormone_Replacement_Therapy,Present,0.9988
7,estradiol,Hormone_Replacement_Therapy,Present,0.9696
8,norethindrone acetate,Hormone_Replacement_Therapy,Present,0.9984
9,osteoporosis,Osteoporosis,Present,1.0


In [13]:
from sparknlp_display import AssertionVisualizer

vis = AssertionVisualizer()

vis.display(light_result, 'ner_chunk', 'assertion')