![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/NER_SUPPLEMENT_CLINICAL.ipynb)

# `NER_SUPPLEMENT_CLINICAL` **Models**

This model is trained to extract benefits of using drugs for certain conditions.

## 1. Colab Setup

**Import license keys**

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install johnsnowlabs

In [None]:
from google.colab import files
print("Please Upload your John Snow Labs License using the button below")
license_keys = files.upload()

In [None]:
from johnsnowlabs import *

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
# Make sure to restart your notebook afterwards for changes to take effect

jsl.install()

## 2. Start Session

**Import dependencies into Python and start the Spark session**

In [None]:
from johnsnowlabs import *
# Automatically load license data and start a session with all jars user has access to
spark = jsl.start()

In [None]:
spark

## 3. Select the model and construct the pipeline

In [None]:
MODEL_LIST = ["ner_supplement_clinical",
              "bert_token_classifier_ner_supplement"]

**Create the pipeline**

In [None]:
document_assembler = nlp.DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("document")

tokenizer = nlp.Tokenizer()\
      .setInputCols(["document"])\
      .setOutputCol("token")


word_embeddings = nlp.WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
      .setInputCols(["document", "token"])\
      .setOutputCol("word_embeddings")

clinical_ner = medical.NerModel.pretrained("ner_supplement_clinical", "en", "clinical/models") \
      .setInputCols(["document", "token", "word_embeddings"]) \
      .setOutputCol("ner")

tokenClassifier = nlp.BertForTokenClassification.pretrained("bert_token_classifier_ner_supplement","en", "clinical/models")\
      .setInputCols(["token", "document"])\
      .setOutputCol("ner")\
      .setCaseSensitive(True)

ner_converter = medical.NerConverterInternal() \
      .setInputCols(["document", "token", "ner"]) \
      .setOutputCol("ner_chunk")\



def run_pipeline(MODEL_NAME , sample_text):
    if MODEL_NAME == "ner_supplement_clinical":
        resolver_pipeline = Pipeline(stages = [document_assembler,
                                               tokenizer,
                                               word_embeddings,
                                               clinical_ner,
                                               ner_converter,])
        
    else: 
        resolver_pipeline = Pipeline(stages = [document_assembler,
                                               tokenizer,
                                               tokenClassifier,
                                               ner_converter,])
        
    text = spark.createDataFrame(sample_text,StringType()).toDF('text')

    result = resolver_pipeline.fit(text).transform(text)
    return result

embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_supplement_clinical download started this may take some time.
[OK!]
bert_token_classifier_ner_supplement download started this may take some time.
Approximate size to download 385.5 MB
[OK!]


## 4. Create example inputs

In [None]:
sample_text = [
    """Excellent!. The state of health improves, nervousness disappears, and night sleep improves. It also promotes hair and nail growth.""",

    """This is perfect for energy. Helps with sleep. Supports the body when having an eye inflammation. Perfect protection against virus infections. Good for hypertension.""",

    """I take this for adrenal exhaustion, two caps three times a day and contribute it to my hair growth. """,

    """This product is great. I have asthma and this really helped me to breath better. I would recommend.""",
]

In [None]:
from pyspark.sql.types import StringType, IntegerType

text = spark.createDataFrame(sample_text,StringType()).toDF('text')

text.show(truncate = 100)

+----------------------------------------------------------------------------------------------------+
|                                                                                                text|
+----------------------------------------------------------------------------------------------------+
|Excellent!. The state of health improves, nervousness disappears, and night sleep improves. It al...|
|This is perfect for energy. Helps with sleep. Supports the body when having an eye inflammation. ...|
|I take this for adrenal exhaustion, two caps three times a day and contribute it to my hair growth. |
| This product is great. I have asthma and this really helped me to breath better. I would recommend.|
+----------------------------------------------------------------------------------------------------+



## 5. Use the pipeline to create outputs

In [None]:
for i in range(len(MODEL_LIST)):

    result = run_pipeline(MODEL_LIST[i], sample_text)

    print(f"\n*******{MODEL_LIST[i]}********")

    result.select(F.explode(F.arrays_zip(result.ner_chunk.result, 
                                         result.ner_chunk.begin, 
                                         result.ner_chunk.end,
                                         result.ner_chunk.metadata, )).alias("cols"))\
            .select(F.expr("cols['0']").alias("chunk"),
                    F.expr("cols['1']").alias("begin"),
                    F.expr("cols['2']").alias("end"),
                    F.expr("cols['3']['entity']").alias("entity")).show()


*******ner_supplement_clinical********
+------------------+-----+---+---------+
|             chunk|begin|end|   entity|
+------------------+-----+---+---------+
|       nervousness|   42| 52|CONDITION|
|       night sleep|   70| 80|  BENEFIT|
|              hair|  109|112|  BENEFIT|
|       nail growth|  118|128|  BENEFIT|
|            energy|   20| 25|  BENEFIT|
|             sleep|   39| 43|  BENEFIT|
|  eye inflammation|   79| 94|CONDITION|
|  virus infections|  124|139|CONDITION|
|      hypertension|  151|162|CONDITION|
|adrenal exhaustion|   16| 33|CONDITION|
|       hair growth|   87| 97|  BENEFIT|
|            asthma|   30| 35|CONDITION|
|            breath|   66| 71|  BENEFIT|
+------------------+-----+---+---------+


*******bert_token_classifier_ner_supplement********
+------------------+-----+---+---------+
|             chunk|begin|end|   entity|
+------------------+-----+---+---------+
|       nervousness|   42| 52|CONDITION|
|       night sleep|   70| 80|  BENEFIT|
|   

## 6. Visualize results

In [None]:
from sparknlp_display import NerVisualizer

ner_viz = NerVisualizer()

for i in range(len(MODEL_LIST)):

    result = run_pipeline(MODEL_LIST[i], sample_text)
    print(f"\n\n******************{MODEL_LIST[i]}************************\n")
    
    for j in range(len(sample_text)):
        ner_viz.display(result = result.collect()[j], label_col = "ner_chunk")



******************ner_supplement_clinical************************





******************bert_token_classifier_ner_supplement************************

