

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/healthcare_jsl/RE_CHEM_PROT.ipynb)




# **Detect Relations Between Chemicals and Proteins**

Detect interactions between chemicals and proteins using BERT model by classifying whether a specified semantic relation holds between the chemical and protein entities within a sentence or document.




## 1. Colab Setup

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install -q johnsnowlabs

In [None]:
from google.colab import files
print("Please Upload your John Snow Labs License using the button below")
license_keys = files.upload()

In [None]:
from johnsnowlabs import *

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
# Make sure to restart your notebook afterwards for changes to take effect

jsl.install()

## Start Session

In [None]:
from johnsnowlabs import *
# Automatically load license data and start a session with all jars user has access to
spark = jsl.start()

## 2. Select the Relation Extraction model and construct the pipeline

Select the models:


* Clinical Events Relation Extraction models: **redl_drugprot_biobert**




For more details: https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/10.Clinical_Relation_Extraction.ipynb

In [None]:
# Change this to the model you want to use and re-run the cells below.
RE_MODEL_NAME = "redl_chemprot_biobert"
NER_MODEL_NAME = "ner_chemprot_clinical"

Create the pipeline

In [None]:
documenter = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentencer = nlp.SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentences")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentences"])\
    .setOutputCol("tokens")

words_embedder = nlp.WordEmbeddingsModel()\
    .pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("embeddings")

pos_tagger = nlp.PerceptronModel()\
    .pretrained("pos_clinical", "en", "clinical/models") \
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("pos_tags")

clinical_ner_tagger = medical.NerModel()\
    .pretrained(NER_MODEL_NAME, "en", "clinical/models")\
    .setInputCols(["sentences", "tokens", "embeddings"])\
    .setOutputCol("ner_tags")

ner_chunker = medical.NerConverterInternal()\
    .setInputCols(["sentences", "tokens", "ner_tags"])\
    .setOutputCol("ner_chunks")\
    .setThreshold(0.7)

dependency_parser = nlp.DependencyParserModel()\
    .pretrained("dependency_conllu", "en")\
    .setInputCols(["sentences", "pos_tags", "tokens"])\
    .setOutputCol("dependencies")

drugprot_re_ner_chunk_filter = medical.RENerChunksFilter()\
    .setInputCols(["ner_chunks", "dependencies"])\
    .setOutputCol("re_ner_chunks")\
    .setMaxSyntacticDistance(5)\


clinical_re_Model = medical.RelationExtractionDLModel()\
    .pretrained(RE_MODEL_NAME, "en", 'clinical/models')\
    .setInputCols(["re_ner_chunks", "sentences"])\
    .setOutputCol("relations")\
    .setPredictionThreshold(0.2)

pipeline = Pipeline(
    stages=[
        documenter,
        sentencer,
        tokenizer, 
        words_embedder, 
        pos_tagger, 
        clinical_ner_tagger,
        ner_chunker,
        dependency_parser,
        drugprot_re_ner_chunk_filter,
        clinical_re_Model
        ])

empty_df = spark.createDataFrame([['']]).toDF('text')
pipeline_model = pipeline.fit(empty_df)

embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
pos_clinical download started this may take some time.
Approximate size to download 1.5 MB
[OK!]
ner_chemprot_clinical download started this may take some time.
[OK!]
dependency_conllu download started this may take some time.
Approximate size to download 16.7 MB
[OK!]
redl_chemprot_biobert download started this may take some time.
[OK!]


## 3. Create example inputs

In [None]:
import pandas as pd

from pyspark.sql.types import StringType, IntegerType
# Enter examples as strings in this array


input_list = [
"""In this study, we examined the effects of mitiglinide on various cloned K(ATP) channels (Kir6.2/SUR1, Kir6.2/SUR2A, and Kir6.2/SUR2B) reconstituted in COS-1 cells, and compared them to another meglitinide-related compound, nateglinide.""",

"""Nateglinide inhibits Kir6.2/SUR1 and Kir6.2/SUR2B channels at 100 nM, and inhibits Kir6.2/SUR2A channels at high concentrations (1 microM). These results indicate that, similar to the sulfonylureas, mitiglinide is highly specific to the Kir6.2/SUR1 complex, i.e., the pancreatic beta-cell K(ATP) channel, and suggest that mitiglinide may be a clinically useful anti-diabetic drug.""",   
            
"""Patch-clamp analysis using inside-out recording configuration showed that mitiglinide inhibits the Kir6.2/SUR1 channel currents in a dose-dependent manner (IC50 value, 100 nM) but does not significantly inhibit either Kir6.2/SUR2A or Kir6.2/SUR2B channel currents even at high doses (more than 10 microM). These results indicate that, similar to the sulfonylureas, mitiglinide is highly specific to the Kir6.2/SUR1 complex, i.e., the pancreatic beta-cell K(ATP) channel, and suggest that mitiglinide may be a clinically useful anti-diabetic drug.""",

"""The highest doses of cannabinoid drugs yielded, on average, 26-32 g/kg urine; comparable effects were obtained with 10 mg/kg furosemide and 3.0 mg/kg. Methanandamide (10.0 mg/kg) had lesser effect than other CB agonists, and the CB2 agonist AM1241, and the CB antagonist rimonabant did not have diuretic effects.""",

"""In further studies, the diuretic effects of the CB1 agonist AM4054 were similar in male and female rats, displayed a relatively rapid onset to action, and were dose-dependently antagonized by 30 minutes pretreatment with rimonabant, but not by the vanilloid receptor-type-I antagonist capsazepine. These data indicate that cannabinoids have robust diuretic effects in rats that are mediated via CB1 receptor mechanisms.""",
]

df = spark.createDataFrame(input_list, StringType()).toDF("text")
df.show(truncate=100)

+----------------------------------------------------------------------------------------------------+
|                                                                                                text|
+----------------------------------------------------------------------------------------------------+
|In this study, we examined the effects of mitiglinide on various cloned K(ATP) channels (Kir6.2/S...|
|Nateglinide inhibits Kir6.2/SUR1 and Kir6.2/SUR2B channels at 100 nM, and inhibits Kir6.2/SUR2A c...|
|Patch-clamp analysis using inside-out recording configuration showed that mitiglinide inhibits th...|
|The highest doses of cannabinoid drugs yielded, on average, 26-32 g/kg urine; comparable effects ...|
|In further studies, the diuretic effects of the CB1 agonist AM4054 were similar in male and femal...|
+----------------------------------------------------------------------------------------------------+



## 4. Run the pipeline

In [None]:
result = pipeline_model.transform(df)
light_pipeline = LightPipeline(pipeline_model)

## 5. Visualize

In [None]:
from sparknlp_display import RelationExtractionVisualizer

vis = RelationExtractionVisualizer()

for i in range(len(input_list)):
  light_result = light_pipeline.fullAnnotate(input_list[i])[0]
  vis.display(light_result, #should be the results of a single example, not the complete dataframe
                relation_col = 'relations', #specify relations column
                document_col = 'document', #specify document column
                show_relations=True) # default show_relations: True