
![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare/RE_NIHSS.ipynb)

# **Relate scale items and their measurements according to NIHSS guidelines.**

To run this yourself, you will need to upload your license keys to the notebook. Just Run The Cell Below in order to do that. Also You can open the file explorer on the left side of the screen and upload license_keys.json to the folder that opens. Otherwise, you can look at the example outputs at the bottom of the notebook.

# **Colab Setup**

In [None]:
import json, os
from google.colab import files

if 'spark_jsl.json' not in os.listdir():
  license_keys = files.upload()
  os.rename(list(license_keys.keys())[0], 'spark_jsl.json')

with open('spark_jsl.json') as f:
    license_keys = json.load(f)

# Defining license key-value pairs as local variables
locals().update(license_keys)
os.environ.update(license_keys)

In [None]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.1.2 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

# Installing Spark NLP Display Library for visualization
! pip install -q spark-nlp-display

In [3]:
import sparknlp
import sparknlp_jsl

from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp_jsl.annotator import *

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.ml import Pipeline,PipelineModel
from pyspark.sql.types import StringType, IntegerType

import pandas as pd
pd.set_option('display.max_colwidth', 200)

import warnings
warnings.filterwarnings('ignore')

params = {"spark.driver.memory":"16G", 
          "spark.kryoserializer.buffer.max":"2000M", 
          "spark.driver.maxResultSize":"2000M"} 

spark = sparknlp_jsl.start(license_keys['SECRET'],params=params)

print("Spark NLP Version :", sparknlp.version())
print("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark

Spark NLP Version : 4.2.8
Spark NLP_JSL Version : 4.2.8


# **🔎 For about models**


📌 **"redl_nihss_biobert"**--> *Relate scale items and their measurements according to NIHSS guidelines.*

*   Predicted Entities => **Has_Value : Measurement is related to the entity, 0 : Measurement is not related to the entity**



🔎**You can find all these models and more [NLP Models Hub](https://nlp.johnsnowlabs.com/models?task=Named+Entity+Recognition&edition=Spark+NLP+for+Healthcare)**

### **🔎Define Spark NLP pipeline**

In [4]:
documenter = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentencer = SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentences")

tokenizer = Tokenizer()\
    .setInputCols(["sentences"])\
    .setOutputCol("tokens")

words_embedder = WordEmbeddingsModel()\
    .pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("embeddings")

pos_tagger = PerceptronModel()\
    .pretrained("pos_clinical", "en", "clinical/models") \
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("pos_tags")

ner_tagger = MedicalNerModel()\
    .pretrained("ner_nihss", "en", "clinical/models")\
    .setInputCols("sentences", "tokens", "embeddings")\
    .setOutputCol("ner_tags")    

ner_chunker = NerConverterInternal()\
    .setInputCols(["sentences", "tokens", "ner_tags"])\
    .setOutputCol("ner_chunks")

dependency_parser = DependencyParserModel()\
    .pretrained("dependency_conllu", "en")\
    .setInputCols(["sentences", "pos_tags", "tokens"])\
    .setOutputCol("dependencies")

re_ner_chunk_filter = RENerChunksFilter() \
    .setInputCols(["ner_chunks", "dependencies"])\
    .setMaxSyntacticDistance(10)\
    .setOutputCol("re_ner_chunks")
    # .setRelationPairs(pair_list)

re_model = RelationExtractionDLModel()\
    .pretrained('redl_nihss_biobert', 'en', "clinical/models") \
    .setPredictionThreshold(0.5)\
    .setInputCols(["re_ner_chunks", "sentences"]) \
    .setOutputCol("relations")

pipeline = Pipeline(
    stages=[
        documenter,
        sentencer,
        tokenizer, 
        words_embedder, 
        pos_tagger, 
        ner_tagger,
        ner_chunker,
        dependency_parser,
        re_ner_chunk_filter,
        re_model
        ])

empty_data = spark.createDataFrame([[""]]).toDF("text")
model = pipeline.fit(empty_data)
light_model = LightPipeline(model)

embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
pos_clinical download started this may take some time.
Approximate size to download 1.5 MB
[OK!]
ner_nihss download started this may take some time.
[OK!]
dependency_conllu download started this may take some time.
Approximate size to download 16.7 MB
[OK!]
redl_nihss_biobert download started this may take some time.
[OK!]


In [5]:
NIHSS_list = [x[2:].lower() for x in ner_tagger.getClasses() if x[0]=="B"]
NIHSS_list.remove("measurement")
NIHSS_list

['3_visual',
 '9_bestlanguage',
 '6_motor',
 '1b_locquestions',
 'nihss',
 '7_limbataxia',
 '8_sensory',
 '1c_loccommands',
 '1a_loc',
 '5b_rightarm',
 '11_extinctioninattention',
 '6b_rightleg',
 '6a_leftleg',
 '2_bestgaze',
 '10_dysarthria',
 '5_motor',
 '4_facialpalsy',
 '5a_leftarm']

In [6]:
pair_list = [ ]

[pair_list.append(x+"-measurement") for x in NIHSS_list ]
[pair_list.append("measurement-"+x) for x in NIHSS_list ]
pair_list

['3_visual-measurement',
 '9_bestlanguage-measurement',
 '6_motor-measurement',
 '1b_locquestions-measurement',
 'nihss-measurement',
 '7_limbataxia-measurement',
 '8_sensory-measurement',
 '1c_loccommands-measurement',
 '1a_loc-measurement',
 '5b_rightarm-measurement',
 '11_extinctioninattention-measurement',
 '6b_rightleg-measurement',
 '6a_leftleg-measurement',
 '2_bestgaze-measurement',
 '10_dysarthria-measurement',
 '5_motor-measurement',
 '4_facialpalsy-measurement',
 '5a_leftarm-measurement',
 'measurement-3_visual',
 'measurement-9_bestlanguage',
 'measurement-6_motor',
 'measurement-1b_locquestions',
 'measurement-nihss',
 'measurement-7_limbataxia',
 'measurement-8_sensory',
 'measurement-1c_loccommands',
 'measurement-1a_loc',
 'measurement-5b_rightarm',
 'measurement-11_extinctioninattention',
 'measurement-6b_rightleg',
 'measurement-6a_leftleg',
 'measurement-2_bestgaze',
 'measurement-10_dysarthria',
 'measurement-5_motor',
 'measurement-4_facialpalsy',
 'measurement-5a_

In [7]:
def get_relations_df (results, rel='relations'):
    rel_pairs=[]
    for rel in results[rel]:
        rel_pairs.append((
          rel.result, 
          rel.metadata['entity1'],
          rel.metadata['entity1_begin'],
          rel.metadata['entity1_end'],
          rel.metadata['chunk1'], 
          rel.metadata['entity2'],
          rel.metadata['entity2_begin'],
          rel.metadata['entity2_end'],
          rel.metadata['chunk2'], 
          rel.metadata['confidence']
        ))

    rel_df = pd.DataFrame(rel_pairs, columns=['relation','entity1','entity1_begin','entity1_end','chunk1','entity2','entity2_begin','entity2_end','chunk2', 'confidence'])

    return rel_df[rel_df.relation!='O']

### **🔎Sample Text**

In [8]:
text = "There , her initial NIHSS score was 4 , as recorded by the ED physicians . This included 2 for weakness  in  her  left  leg and 2 for left arm because of felt was subtle ataxia ."

### **🔎Run the pipeline**

In [9]:
import pandas as pd

light_result = light_model.fullAnnotate(text)
get_relations_df(light_result[0])

Unnamed: 0,relation,entity1,entity1_begin,entity1_end,chunk1,entity2,entity2_begin,entity2_end,chunk2,confidence
0,Has_Value,NIHSS,20,30,NIHSS score,Measurement,36,36,4,0.9998851
1,Has_Value,Measurement,89,89,2,6a_LeftLeg,114,122,left leg,0.9992316
2,0,Measurement,89,89,2,Measurement,128,128,2,0.94987947
3,0,Measurement,89,89,2,6a_LeftLeg,134,137,left,0.99996936
4,0,Measurement,89,89,2,5a_LeftArm,139,141,arm,0.9999573
5,0,Measurement,89,89,2,7_LimbAtaxia,170,177,ataxia .,0.9991615
6,0,6a_LeftLeg,114,122,left leg,Measurement,128,128,2,0.9998616
7,0,6a_LeftLeg,114,122,left leg,6a_LeftLeg,134,137,left,0.99985945
8,0,6a_LeftLeg,114,122,left leg,5a_LeftArm,139,141,arm,0.9997907
9,0,6a_LeftLeg,114,122,left leg,7_LimbAtaxia,170,177,ataxia .,0.9999622


### **🔎Visualize results**

In [10]:
from sparknlp_display import RelationExtractionVisualizer

re_vis = RelationExtractionVisualizer()

re_vis.display(light_result[0],
               relation_col = 'relations',
               document_col = 'document',
               show_relations=True
               )