![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/10.1.Clinical_Relation_Extraction_BodyParts_Models.ipynb)

# 10.1 Clinical Relation Extraction BodyPart Models

(requires Spark NLP 2.7.1 and Spark NLP Healthcare 2.7.2 and above)

In [2]:
import os

jsl_secret = os.getenv('SECRET')

import sparknlp
sparknlp_version = sparknlp.version()
import sparknlp_jsl
jsl_version = sparknlp_jsl.version()

print (jsl_secret)

In [3]:
import json
import os
from pyspark.ml import Pipeline
from pyspark.sql import SparkSession

from sparknlp.annotator import *
from sparknlp_jsl.annotator import *
from sparknlp.base import *
import sparknlp_jsl
import sparknlp

params = {"spark.driver.memory":"16G",
"spark.kryoserializer.buffer.max":"2000M",
"spark.driver.maxResultSize":"2000M"}

spark = sparknlp_jsl.start(jsl_secret, params=params)

print (sparknlp.version())
print (sparknlp_jsl.version())

3.1.2
3.1.2


In [4]:
spark

## 1. Prediction Pipeline for Clinical Binary Relation Models

Basic Pipeline without Re Models. Run it once and we can add custom Re models to the same pipeline

In [5]:
documenter = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentencer = SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentences")

tokenizer = Tokenizer()\
    .setInputCols(["sentences"])\
    .setOutputCol("tokens")\

words_embedder = WordEmbeddingsModel()\
    .pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("embeddings")

pos_tagger = PerceptronModel()\
    .pretrained("pos_clinical", "en", "clinical/models") \
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("pos_tags")
    
dependency_parser = DependencyParserModel()\
    .pretrained("dependency_conllu", "en")\
    .setInputCols(["sentences", "pos_tags", "tokens"])\
    .setOutputCol("dependencies")

# get pretrained ner model 
clinical_ner_tagger = MedicalNerModel()\
    .pretrained('jsl_ner_wip_greedy_clinical','en','clinical/models')\
    .setInputCols("sentences", "tokens", "embeddings")\
    .setOutputCol("ner_tags")    

ner_chunker = NerConverter()\
    .setInputCols(["sentences", "tokens", "ner_tags"])\
    .setOutputCol("ner_chunks")
    

embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
pos_clinical download started this may take some time.
Approximate size to download 1.5 MB
[OK!]
dependency_conllu download started this may take some time.
Approximate size to download 16.7 MB
[OK!]
jsl_ner_wip_greedy_clinical download started this may take some time.
Approximate size to download 14.5 MB
[OK!]


In [6]:
import pandas as pd

# This function will be utilized to show prediction results in a dataframe
def get_relations_df (results, col='relations'):
    rel_pairs=[]
    for rel in results[0][col]:
        rel_pairs.append((
          rel.result, 
          rel.metadata['entity1'], 
          rel.metadata['entity1_begin'],
          rel.metadata['entity1_end'],
          rel.metadata['chunk1'], 
          rel.metadata['entity2'],
          rel.metadata['entity2_begin'],
          rel.metadata['entity2_end'],
          rel.metadata['chunk2'], 
          rel.metadata['confidence']
        ))

    rel_df = pd.DataFrame(rel_pairs, columns=['relations',
                                              'entity1','entity1_begin','entity1_end','chunk1',
                                              'entity2','entity2_end','entity2_end','chunk2', 
                                              'confidence'])
    # limit df columns to get entity and chunks with results only
    rel_df = rel_df.iloc[:,[0,1,4,5,8,9]]
    
    return rel_df

### Example pairs for relation entities

In [7]:

# bodypart entities >> ['external_body_part_or_region', 'internal_organ_or_component']

# 1. bodypart vs problem
pair1 = ['symptom-external_body_part_or_region', 'external_body_part_or_region-symptom']

# 2. bodypart vs procedure and test
pair2 = ['internal_organ_or_component-imagingtest',
 'imagingtest-internal_organ_or_component',
 'internal_organ_or_component-procedure',
 'procedure-internal_organ_or_component',
 'internal_organ_or_component-test',
 'test-internal_organ_or_component',
 'external_body_part_or_region-imagingtest',
 'imagingtest-external_body_part_or_region',
 'external_body_part_or_region-procedure',
 'procedure-external_body_part_or_region',
 'external_body_part_or_region-test',
 'test-external_body_part_or_region']

# 3. bodypart vs direction
pair3 = ['direction-external_body_part_or_region', 'external_body_part_or_region-direction',
        'internal_organ_or_component-direction','direction-internal_organ_or_component']

# 4. date vs other clinical entities
# date entities >> ['Date', 'RelativeDate', 'Duration', 'RelativeTime', 'Time']
pair4 = ['symptom-date', 'date-procedure', 'relativedate-test', 'test-date']

 **Pretrained relation model names**; use this names in `RelationExtractionModel()` ;  
 
 + `re_bodypart_problem`  
 
 + `re_bodypart_directions`  
 
 + `re_bodypart_proceduretest`  
 
 + `re_date_clinical`  

## 2. Example of how custom RE models can be added to the same pipeline

### 2.1 Relation Extraction Model

In [8]:
re_model = RelationExtractionModel()\
    .pretrained("re_bodypart_directions", "en", 'clinical/models')\
    .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
    .setOutputCol("relations")\
    .setRelationPairs(['direction-external_body_part_or_region', 
                       'external_body_part_or_region-direction',
                       'direction-internal_organ_or_component',
                       'internal_organ_or_component-direction'
                      ])\
    .setMaxSyntacticDistance(4)\
    .setPredictionThreshold(0.9)

trained_pipeline = Pipeline(stages=[
    documenter,
    sentencer,
    tokenizer, 
    words_embedder, 
    pos_tagger, 
    clinical_ner_tagger,
    ner_chunker,
    dependency_parser,
    re_model
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

loaded_re_model = trained_pipeline.fit(empty_data)

re_bodypart_directions download started this may take some time.
Approximate size to download 9.2 MB
[OK!]


### 2.2 ReDL Model - based on end-to-end trained Bert Model

In [9]:
re_ner_chunk_filter = RENerChunksFilter() \
    .setInputCols(["ner_chunks", "dependencies"])\
    .setOutputCol("re_ner_chunks")\
    .setMaxSyntacticDistance(4)\
    .setRelationPairs(['direction-external_body_part_or_region', 
                       'external_body_part_or_region-direction',
                       'direction-internal_organ_or_component',
                       'internal_organ_or_component-direction'
                      ])
    
re_model = RelationExtractionDLModel() \
    .pretrained('redl_bodypart_direction_biobert', "en", "clinical/models")\
    .setPredictionThreshold(0.5)\
    .setInputCols(["re_ner_chunks", "sentences"]) \
    .setOutputCol("relations")

trained_pipeline = Pipeline(stages=[
    documenter,
    sentencer,
    tokenizer, 
    words_embedder, 
    pos_tagger, 
    clinical_ner_tagger,
    ner_chunker,
    dependency_parser,
    re_ner_chunk_filter,
    re_model
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

loaded_redl_model = trained_pipeline.fit(empty_data)

redl_bodypart_direction_biobert download started this may take some time.
Approximate size to download 383.4 MB
[OK!]


## 3. Sample clinical tetxs

In [10]:
# bodypart vs problem 
text1 = '''No neurologic deficits other than some numbness in his left hand.'''

# bodypart  vs procedure and test
#text2 = 'Common bile duct was noted to be 10 mm in size on that ultrasound.'
#text2 = 'Biopsies of the distal duodenum, gastric antrum, distalesophagus were taken and sent for pathological evaluation.'
text2 = 'TECHNIQUE IN DETAIL: After informed consent was obtained from the patient and his mother, the chest was scanned with portable ultrasound.'

# bodypart direction
text3 = '''MRI demonstrated infarction in the upper brain stem , left cerebellum and  right basil ganglia'''

# date vs other clinical entities
text4 = ''' This 73 y/o patient had Brain CT on 1/12/95, with progressive memory and cognitive decline since 8/11/94.'''

**Get Single Prediction** with `LightPipeline()`

### 3. 1 Using Relation Extraction Model

In [11]:
# choose one of the sample texts depending on the pretrained relation model you want to use
text = text3

loaded_re_model_light = LightPipeline(loaded_re_model)
annotations = loaded_re_model_light.fullAnnotate(text)


rel_df = get_relations_df(annotations) # << get_relations_df() is the function defined in the 3rd cell

print('\n',text)

rel_df[rel_df.relations!="0"]
#rel_df



 MRI demonstrated infarction in the upper brain stem , left cerebellum and  right basil ganglia


Unnamed: 0,relations,entity1,chunk1,entity2,chunk2,confidence
0,1,Direction,upper,Internal_organ_or_component,brain stem,0.9999989
5,1,Direction,left,Internal_organ_or_component,cerebellum,1.0
8,1,Direction,right,Internal_organ_or_component,basil ganglia,1.0


### 3.2 Using Relation Extraction DL Model

In [12]:
# choose one of the sample texts depending on the pretrained relation model you want to use
text = text3

loaded_re_model_light = LightPipeline(loaded_redl_model)
annotations = loaded_re_model_light.fullAnnotate(text)


rel_df = get_relations_df(annotations) # << get_relations_df() is the function defined in the 3rd cell

print('\n',text)

rel_df[rel_df.relations!="0"]
#rel_df



 MRI demonstrated infarction in the upper brain stem , left cerebellum and  right basil ganglia


Unnamed: 0,relations,entity1,chunk1,entity2,chunk2,confidence
0,1,Direction,upper,Internal_organ_or_component,brain stem,0.9997383
5,1,Direction,left,Internal_organ_or_component,cerebellum,0.9990734
8,1,Direction,right,Internal_organ_or_component,basil ganglia,0.9998061


## Custom Function

In [22]:
# Previous cell content is merged in this custom function to get quick predictions, for custom cases please check parameters in RelationExtractionModel()
def relation_exraction(model_name, pairs, text):
    
    re_model = RelationExtractionModel()\
        .pretrained(model_name, "en", 'clinical/models')\
        .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
        .setOutputCol("relations")\
        .setRelationPairs(pairs)\
        .setMaxSyntacticDistance(3)\
        .setPredictionThreshold(0.9)

    trained_pipeline = Pipeline(stages=[
        documenter,
        sentencer,
        tokenizer, 
        words_embedder, 
        pos_tagger, 
        clinical_ner_tagger,
        ner_chunker,
        dependency_parser,
        re_model
    ])

    empty_data = spark.createDataFrame([[""]]).toDF("text")

    loaded_re_model = trained_pipeline.fit(empty_data)
    
    loaded_re_model_light = LightPipeline(loaded_re_model)
    annotations = loaded_re_model_light.fullAnnotate(text)

    rel_df = get_relations_df(annotations) # << get_relations_df() is the function defined in the 3rd cell

    print('\n','Target Text : ',text, '\n')

    #rel_df
    return rel_df[rel_df.relations!="0"]

def relation_exraction_dl(model_name, pairs, text):
    
    re_ner_chunk_filter = RENerChunksFilter() \
        .setInputCols(["ner_chunks", "dependencies"])\
        .setOutputCol("re_ner_chunks")\
        .setRelationPairs(pairs)
    
    re_model = RelationExtractionDLModel() \
        .pretrained(model_name, "en", "clinical/models")\
        .setPredictionThreshold(0.0)\
        .setInputCols(["re_ner_chunks", "sentences"]) \
        .setOutputCol("relations")

    trained_pipeline = Pipeline(stages=[
        documenter,
        sentencer,
        tokenizer, 
        words_embedder, 
        pos_tagger, 
        clinical_ner_tagger,
        ner_chunker,
        dependency_parser,
        re_ner_chunk_filter,
        re_model
    ])

    empty_data = spark.createDataFrame([[""]]).toDF("text")

    loaded_re_model = trained_pipeline.fit(empty_data)
    
    loaded_re_model_light = LightPipeline(loaded_re_model)
    annotations = loaded_re_model_light.fullAnnotate(text)

    rel_df = get_relations_df(annotations) # << get_relations_df() is the function defined in the 3rd cell

    print('\n','Target Text : ',text, '\n')

    #rel_df
    return rel_df[rel_df.relations!="0"]
    



## Predictions with Custom Function

### 4.1 Bodypart vs Problem - RelationExtractionModel

In [23]:
# bodypart vs problem 
model_name =  're_bodypart_problem'
pairs =  ['symptom-external_body_part_or_region', 'external_body_part_or_region-symptom']

text = "Some numbness in his left hand noted, no other neurologic deficts."

relation_exraction(model_name, pairs, text)


re_bodypart_problem download started this may take some time.
Approximate size to download 9.2 MB
[OK!]

 Target Text :  Some numbness in his left hand noted, no other neurologic deficts. 



Unnamed: 0,relations,entity1,chunk1,entity2,chunk2,confidence
0,1,Symptom,numbness,External_body_part_or_region,hand,1.0


### 4.2 Bodypart vs Problem - RelationExtractionDLModel

In [24]:
# bodypart vs problem 
model_name =  'redl_bodypart_problem_biobert'
pairs =  ['symptom-external_body_part_or_region', 'external_body_part_or_region-symptom']

text = "Some numbness in his left hand noted, no other neurologic deficts."

relation_exraction_dl(model_name, pairs, text)


redl_bodypart_problem_biobert download started this may take some time.
Approximate size to download 383.4 MB
[OK!]

 Target Text :  Some numbness in his left hand noted, no other neurologic deficts. 



Unnamed: 0,relations,entity1,chunk1,entity2,chunk2,confidence
0,1,Symptom,numbness,External_body_part_or_region,hand,0.9982468


### 5.1 Bodypart vs Procedure & Test - RelationExtractionModel

In [25]:
# bodypart vs procedure and test 
model_name =  're_bodypart_proceduretest'
pairs = pair2
text = text2

relation_exraction(model_name, pairs, text)


re_bodypart_proceduretest download started this may take some time.
Approximate size to download 9.2 MB
[OK!]

 Target Text :  TECHNIQUE IN DETAIL: After informed consent was obtained from the patient and his mother, the chest was scanned with portable ultrasound. 



Unnamed: 0,relations,entity1,chunk1,entity2,chunk2,confidence
0,1,External_body_part_or_region,chest,Test,portable ultrasound,1.0


### 5.2 Bodypart vs Procedure & Test - RelationExtractionDLModel

In [26]:
# bodypart vs procedure and test 
model_name =  'redl_bodypart_procedure_test_biobert'
pairs = pair2
text = text2

relation_exraction_dl(model_name, pairs, text)


redl_bodypart_procedure_test_biobert download started this may take some time.
Approximate size to download 383.4 MB
[OK!]

 Target Text :  TECHNIQUE IN DETAIL: After informed consent was obtained from the patient and his mother, the chest was scanned with portable ultrasound. 



Unnamed: 0,relations,entity1,chunk1,entity2,chunk2,confidence
0,1,External_body_part_or_region,chest,Test,portable ultrasound,0.99958295


### 6.1 Bodypart vs Directions -  RelationExtractionModel

In [27]:
# bodypart vs directions
model_name =  're_bodypart_directions'
pairs = pair3
text = text3

relation_exraction(model_name, pairs, text)


re_bodypart_directions download started this may take some time.
Approximate size to download 9.2 MB
[OK!]

 Target Text :  MRI demonstrated infarction in the upper brain stem , left cerebellum and  right basil ganglia 



Unnamed: 0,relations,entity1,chunk1,entity2,chunk2,confidence
0,1,Direction,upper,Internal_organ_or_component,brain stem,0.9999989
4,1,Direction,left,Internal_organ_or_component,cerebellum,1.0
7,1,Direction,right,Internal_organ_or_component,basil ganglia,1.0


### 6.2 Bodypart vs Directions - RelationExtractionDLModel

In [28]:
# bodypart vs directions
model_name =  'redl_bodypart_direction_biobert'
pairs = pair3
text = text3

relation_exraction_dl(model_name, pairs, text)


redl_bodypart_direction_biobert download started this may take some time.
Approximate size to download 383.4 MB
[OK!]

 Target Text :  MRI demonstrated infarction in the upper brain stem , left cerebellum and  right basil ganglia 



Unnamed: 0,relations,entity1,chunk1,entity2,chunk2,confidence
0,1,Direction,upper,Internal_organ_or_component,brain stem,0.99991536
5,1,Direction,left,Internal_organ_or_component,cerebellum,0.99987006
8,1,Direction,right,Internal_organ_or_component,basil ganglia,0.99921966


### 7.1 Date vs Clinical Entities - RelationExtractionModel

In [29]:
# date vs clinical date entities
model_name =  're_date_clinical'
pairs = pair4
text = text4

relation_exraction(model_name, pairs, text)


re_date_clinical download started this may take some time.
Approximate size to download 9.2 MB
[OK!]

 Target Text :   This 73 y/o patient had Brain CT on 1/12/95, with progressive memory and cognitive decline since 8/11/94. 



Unnamed: 0,relations,entity1,chunk1,entity2,chunk2,confidence
0,1,Test,Brain CT,Date,1/12/95,1.0
1,1,Symptom,progressive memory and cognitive decline,Date,8/11/94,1.0


### 7.2 Date vs Clinical Entities - RelationExtractionDLModel

In [30]:
# date vs clinical date entities
model_name =  'redl_date_clinical_biobert'
pairs = pair4
text = text4

relation_exraction_dl(model_name, pairs, text)


redl_date_clinical_biobert download started this may take some time.
Approximate size to download 383.4 MB
[OK!]

 Target Text :   This 73 y/o patient had Brain CT on 1/12/95, with progressive memory and cognitive decline since 8/11/94. 



Unnamed: 0,relations,entity1,chunk1,entity2,chunk2,confidence
0,1,Test,Brain CT,Date,1/12/95,0.9999094
1,1,Test,Brain CT,Date,8/11/94,0.9999577
2,1,Date,1/12/95,Symptom,progressive memory and cognitive decline,0.9999186
3,1,Symptom,progressive memory and cognitive decline,Date,8/11/94,0.9995466
