

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare_jsl/NER_SIGN_SYMP.ipynb)




# **Detect signs and symptoms**

To run this yourself, you will need to upload your license keys to the notebook. Just Run The Cell Below in order to do that. Also You can open the file explorer on the left side of the screen and upload `license_keys.json` to the folder that opens.
Otherwise, you can look at the example outputs at the bottom of the notebook.



## 1. Colab Setup

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install -q johnsnowlabs

In [None]:
from google.colab import files
print("Please Upload your John Snow Labs License using the button below")
license_keys = files.upload()

In [None]:
from johnsnowlabs import *

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
# Make sure to restart your notebook afterwards for changes to take effect

jsl.install()

## Start Session

In [None]:
from johnsnowlabs import *
# Automatically load license data and start a session with all jars user has access to
spark = jsl.start()

## 2. Select the NER model and construct the pipeline

Select the NER model - Sign/symptom models: **ner_clinical, ner_jsl**

For more details: https://github.com/JohnSnowLabs/spark-nlp-models#pretrained-models---spark-nlp-for-healthcare

In [None]:
# You can change this to the model you want to use and re-run cells below.
# Sign / symptom models: ner_clinical, ner_jsl
# All these models use the same clinical embeddings.
MODEL_NAME = "ner_clinical"

Create the pipeline

In [None]:
document_assembler = nlp.DocumentAssembler() \
    .setInputCol('text')\
    .setOutputCol('document')

sentence_detector = nlp.SentenceDetector() \
    .setInputCols(['document'])\
    .setOutputCol('sentence')

tokenizer = nlp.Tokenizer()\
    .setInputCols(['sentence']) \
    .setOutputCol('token')

word_embeddings = nlp.WordEmbeddingsModel.pretrained('embeddings_clinical', 'en', 'clinical/models') \
    .setInputCols(['sentence', 'token']) \
    .setOutputCol('embeddings')

clinical_ner = medical.NerModel.pretrained(MODEL_NAME, "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
    .setInputCols(['sentence', 'token', 'ner']) \
    .setOutputCol('ner_chunk')

nlp_pipeline = Pipeline(
    stages=[
        document_assembler, 
        sentence_detector,
        tokenizer,
        word_embeddings,
        clinical_ner,
        ner_converter
        ])

embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_clinical download started this may take some time.
[OK!]


## 3. Create example inputs

In [None]:
# Enter examples as strings in this array
input_list = [
    """The patient is a 21-day-old Caucasian male here for 2 days of congestion - mom has been suctioning yellow discharge from the patient's nares, plus she has noticed some mild problems with his breathing while feeding (but negative for any perioral cyanosis or retractions). One day ago, mom also noticed a tactile temperature and gave the patient Tylenol. Baby also has had some decreased p.o. intake. His normal breast-feeding is down from 20 minutes q.2h. to 5 to 10 minutes secondary to his respiratory congestion. He sleeps well, but has been more tired and has been fussy over the past 2 days. The parents noticed no improvement with albuterol treatments given in the ER. His urine output has also decreased; normally he has 8 to 10 wet and 5 dirty diapers per 24 hours, now he has down to 4 wet diapers per 24 hours. Mom denies any diarrhea. His bowel movements are yellow colored and soft in nature."""
]

## 4. Use the pipeline to create outputs

In [None]:
from pyspark.sql.types import StringType, IntegerType

df = spark.createDataFrame(input_list, StringType()).toDF("text")
result = nlp_pipeline.fit(df).transform(df)

## 5. Visualize results

In [None]:
from sparknlp_display import NerVisualizer

NerVisualizer().display(
    result = result.collect()[0],
    label_col = 'ner_chunk',
    document_col = 'document'
)