

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare_jsl/RE_POSOLOGY.ipynb)




# **Detect posology relations**

To run this yourself, you will need to upload your license keys to the notebook. Just Run The Cell Below in order to do that. Also You can open the file explorer on the left side of the screen and upload `license_keys.json` to the folder that opens.
Otherwise, you can look at the example outputs at the bottom of the notebook.



## 1. Colab Setup

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install -q johnsnowlabs

In [None]:
from google.colab import files
print("Please Upload your John Snow Labs License using the button below")
license_keys = files.upload()

In [None]:
from johnsnowlabs import *

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
# Make sure to restart your notebook afterwards for changes to take effect

jsl.install()

## Start Session

In [None]:
from johnsnowlabs import *
# Automatically load license data and start a session with all jars user has access to
spark = jsl.start()

## 2. Select the Relation Extraction model and construct the pipeline

Select the models:


* Posology Relation Extraction models: **posology_re**




For more details: https://github.com/JohnSnowLabs/spark-nlp-models#pretrained-models---spark-nlp-for-healthcare

In [None]:
# Change this to the model you want to use and re-run the cells below.
RE_MODEL_NAME = "posology_re"
NER_MODEL_NAME = "ner_posology_large"

Create the pipeline

In [None]:
document_assembler = nlp.DocumentAssembler() \
    .setInputCol('text')\
    .setOutputCol('document')

sentence_detector = nlp.SentenceDetector() \
    .setInputCols(['document'])\
    .setOutputCol('sentences')

tokenizer = nlp.Tokenizer()\
    .setInputCols(['sentences']) \
    .setOutputCol('tokens')

pos_tagger = nlp.PerceptronModel()\
    .pretrained("pos_clinical", "en", "clinical/models") \
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("pos_tags")

dependency_parser = nlp.DependencyParserModel()\
    .pretrained("dependency_conllu", "en")\
    .setInputCols(["sentences", "pos_tags", "tokens"])\
    .setOutputCol("dependencies")

embeddings = nlp.WordEmbeddingsModel.pretrained('embeddings_clinical', 'en', 'clinical/models')\
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("embeddings")

clinical_ner_model = medical.NerModel.pretrained(NER_MODEL_NAME, "en", "clinical/models") \
    .setInputCols(["sentences", "tokens", "embeddings"])\
    .setOutputCol("clinical_ner_tags")

clinical_ner_chunker = nlp.NerConverter()\
    .setInputCols(["sentences", "tokens", "clinical_ner_tags"])\
    .setOutputCol("clinical_ner_chunks")

clinical_re_Model = medical.RelationExtractionModel()\
    .pretrained(RE_MODEL_NAME, 'en', 'clinical/models')\
    .setInputCols(["embeddings", "pos_tags", "clinical_ner_chunks", "dependencies"])\
    .setOutputCol("clinical_relations")\
    .setMaxSyntacticDistance(4)
    #.setRelationPairs()#["problem-test", "problem-treatment"]) # we can set the possible relation pairs (if not set, all the relations will be calculated)

pipeline = Pipeline(
    stages=[
        document_assembler, 
        sentence_detector,
        tokenizer,
        pos_tagger,
        dependency_parser,
        embeddings,
        clinical_ner_model,
        clinical_ner_chunker,
        clinical_re_Model
        ])

empty_df = spark.createDataFrame([['']]).toDF("text")
pipeline_model = pipeline.fit(empty_df)
light_pipeline = LightPipeline(pipeline_model)

pos_clinical download started this may take some time.
Approximate size to download 1.5 MB
[OK!]
dependency_conllu download started this may take some time.
Approximate size to download 16.7 MB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_posology_large download started this may take some time.
[OK!]


## 3. Create example inputs

In [None]:
# Enter examples as strings in this array
input_list = [
"""The patient is a 40-year-old white male who presents with a chief complaint of "chest pain". The patient is diabetic and has a prior history of coronary artery disease. The patient presents today stating that his chest pain started yesterday evening and has been somewhat intermittent. He has been advised Aspirin 81 milligrams QDay. Humulin N. insulin 50 units in a.m. HCTZ 50 mg QDay. Nitroglycerin 1/150 sublingually PRN chest pain.""",
]

# 4. Run the pipeline

In [None]:
df = spark.createDataFrame(input_list, StringType()).toDF("text")
result = pipeline_model.transform(df)
light_result = light_pipeline.fullAnnotate(input_list[0])

# 5. Visualize

In [None]:
from sparknlp_display import RelationExtractionVisualizer


vis = RelationExtractionVisualizer()
vis.display(light_result[0], 'clinical_relations', show_relations=True) # default show_relations: True