
![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare_jsl/RE_NIHSS.ipynb)

# **Relate scale items and their measurements according to NIHSS guidelines.**

To run this yourself, you will need to upload your license keys to the notebook. Just Run The Cell Below in order to do that. Also You can open the file explorer on the left side of the screen and upload license_keys.json to the folder that opens. Otherwise, you can look at the example outputs at the bottom of the notebook.

# **Colab Setup**

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install -q johnsnowlabs

In [None]:
from google.colab import files
print("Please Upload your John Snow Labs License using the button below")
license_keys = files.upload()

In [None]:
from johnsnowlabs import *

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
# Make sure to restart your notebook afterwards for changes to take effect

jsl.install()

## 2. Start Session

In [None]:
from johnsnowlabs import *
# Automatically load license data and start a session with all jars user has access to
spark = jsl.start()

# **🔎 For about models**


📌 **"redl_nihss_biobert"**--> *Relate scale items and their measurements according to NIHSS guidelines.*

*   Predicted Entities => **Has_Value : Measurement is related to the entity, 0 : Measurement is not related to the entity**



🔎**You can find all these models and more [NLP Models Hub](https://nlp.johnsnowlabs.com/models?task=Named+Entity+Recognition&edition=Spark+NLP+for+Healthcare)**

### **🔎Define Spark NLP pipeline**

In [None]:
documenter = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentencer = nlp.SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentences")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentences"])\
    .setOutputCol("tokens")

words_embedder = nlp.WordEmbeddingsModel()\
    .pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("embeddings")

pos_tagger = nlp.PerceptronModel()\
    .pretrained("pos_clinical", "en", "clinical/models") \
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("pos_tags")

ner_tagger = medical.NerModel()\
    .pretrained("ner_nihss", "en", "clinical/models")\
    .setInputCols("sentences", "tokens", "embeddings")\
    .setOutputCol("ner_tags")    

ner_chunker = medical.NerConverterInternal()\
    .setInputCols(["sentences", "tokens", "ner_tags"])\
    .setOutputCol("ner_chunks")

dependency_parser = nlp.DependencyParserModel()\
    .pretrained("dependency_conllu", "en")\
    .setInputCols(["sentences", "pos_tags", "tokens"])\
    .setOutputCol("dependencies")

re_ner_chunk_filter = medical.RENerChunksFilter() \
    .setInputCols(["ner_chunks", "dependencies"])\
    .setMaxSyntacticDistance(10)\
    .setOutputCol("re_ner_chunks")
    # .setRelationPairs(pair_list)

re_model = medical.RelationExtractionDLModel()\
    .pretrained('redl_nihss_biobert', 'en', "clinical/models") \
    .setPredictionThreshold(0.5)\
    .setInputCols(["re_ner_chunks", "sentences"]) \
    .setOutputCol("relations")

pipeline = Pipeline(
    stages=[
        documenter,
        sentencer,
        tokenizer, 
        words_embedder, 
        pos_tagger, 
        ner_tagger,
        ner_chunker,
        dependency_parser,
        re_ner_chunk_filter,
        re_model
        ])

empty_data = spark.createDataFrame([[""]]).toDF("text")
model = pipeline.fit(empty_data)
light_model = LightPipeline(model)

embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
pos_clinical download started this may take some time.
Approximate size to download 1.5 MB
[OK!]
ner_nihss download started this may take some time.
[OK!]
dependency_conllu download started this may take some time.
Approximate size to download 16.7 MB
[OK!]
redl_nihss_biobert download started this may take some time.
[OK!]


In [None]:
NIHSS_list = [x[2:].lower() for x in ner_tagger.getClasses() if x[0]=="B"]
NIHSS_list.remove("measurement")
NIHSS_list

['3_visual',
 '9_bestlanguage',
 '6_motor',
 '1b_locquestions',
 'nihss',
 '7_limbataxia',
 '8_sensory',
 '1c_loccommands',
 '1a_loc',
 '5b_rightarm',
 '11_extinctioninattention',
 '6b_rightleg',
 '6a_leftleg',
 '2_bestgaze',
 '10_dysarthria',
 '5_motor',
 '4_facialpalsy',
 '5a_leftarm']

In [None]:
pair_list = [ ]

[pair_list.append(x+"-measurement") for x in NIHSS_list ]
[pair_list.append("measurement-"+x) for x in NIHSS_list ]
pair_list

['3_visual-measurement',
 '9_bestlanguage-measurement',
 '6_motor-measurement',
 '1b_locquestions-measurement',
 'nihss-measurement',
 '7_limbataxia-measurement',
 '8_sensory-measurement',
 '1c_loccommands-measurement',
 '1a_loc-measurement',
 '5b_rightarm-measurement',
 '11_extinctioninattention-measurement',
 '6b_rightleg-measurement',
 '6a_leftleg-measurement',
 '2_bestgaze-measurement',
 '10_dysarthria-measurement',
 '5_motor-measurement',
 '4_facialpalsy-measurement',
 '5a_leftarm-measurement',
 'measurement-3_visual',
 'measurement-9_bestlanguage',
 'measurement-6_motor',
 'measurement-1b_locquestions',
 'measurement-nihss',
 'measurement-7_limbataxia',
 'measurement-8_sensory',
 'measurement-1c_loccommands',
 'measurement-1a_loc',
 'measurement-5b_rightarm',
 'measurement-11_extinctioninattention',
 'measurement-6b_rightleg',
 'measurement-6a_leftleg',
 'measurement-2_bestgaze',
 'measurement-10_dysarthria',
 'measurement-5_motor',
 'measurement-4_facialpalsy',
 'measurement-5a_

In [None]:
def get_relations_df (results, rel='relations'):
    rel_pairs=[]
    for rel in results[rel]:
        rel_pairs.append((
          rel.result, 
          rel.metadata['entity1'],
          rel.metadata['entity1_begin'],
          rel.metadata['entity1_end'],
          rel.metadata['chunk1'], 
          rel.metadata['entity2'],
          rel.metadata['entity2_begin'],
          rel.metadata['entity2_end'],
          rel.metadata['chunk2'], 
          rel.metadata['confidence']
        ))

    rel_df = pd.DataFrame(rel_pairs, columns=['relation','entity1','entity1_begin','entity1_end','chunk1','entity2','entity2_begin','entity2_end','chunk2', 'confidence'])

    return rel_df[rel_df.relation!='O']

### **🔎Sample Text**

In [None]:
text = "There , her initial NIHSS score was 4 , as recorded by the ED physicians . This included 2 for weakness  in  her  left  leg and 2 for left arm because of felt was subtle ataxia ."

### **🔎Run the pipeline**

In [None]:
import pandas as pd

light_result = light_model.fullAnnotate(text)
get_relations_df(light_result[0])

Unnamed: 0,relation,entity1,entity1_begin,entity1_end,chunk1,entity2,entity2_begin,entity2_end,chunk2,confidence
0,Has_Value,NIHSS,20,30,NIHSS score,Measurement,36,36,4,0.999323
1,Has_Value,Measurement,89,89,2,6a_LeftLeg,114,122,left leg,0.99987483
2,0,Measurement,89,89,2,Measurement,128,128,2,0.76675594
3,0,Measurement,89,89,2,6a_LeftLeg,134,137,left,0.99994004
4,0,Measurement,89,89,2,5a_LeftArm,139,141,arm,0.999456
5,0,Measurement,89,89,2,7_LimbAtaxia,170,177,ataxia .,0.9949862
6,0,6a_LeftLeg,114,122,left leg,Measurement,128,128,2,0.9941993
7,0,6a_LeftLeg,114,122,left leg,6a_LeftLeg,134,137,left,0.9999933
8,0,6a_LeftLeg,114,122,left leg,5a_LeftArm,139,141,arm,0.99955374
9,0,6a_LeftLeg,114,122,left leg,7_LimbAtaxia,170,177,ataxia .,0.9999769


### **🔎Visualize results**

In [None]:
from sparknlp_display import RelationExtractionVisualizer

re_vis = RelationExtractionVisualizer()

re_vis.display(light_result[0],
               relation_col = 'relations',
               document_col = 'document',
               show_relations=True
               )