![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/)

# RelationExtractionModel

In this notebook, we will examine the `RelationExtractionModel` annotator.

This Relation Extraction annotator extracts and classifies instances of relations between named entities.

**📖 Learning Objectives:**

1. Understand how to extract and classify the relations between named entities by using pre-trained models.

2. Become comfortable using the different parameters of the annotator.

**🔗 Helpful Links:**

For extended examples of usage, see the [Spark NLP Workshop](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/healthcare-nlp/03.0.Clinical_Relation_Extraction.ipynb)

Python Documentation: [RelationExtractionModel](https://nlp.johnsnowlabs.com/licensed/api/python/reference/autosummary/sparknlp_jsl/annotator/re/relation_extraction/index.html#sparknlp_jsl.annotator.re.relation_extraction.RelationExtractionModel.name)

Scala Documentation: [RelationExtractionModel](https://nlp.johnsnowlabs.com/licensed/api/com/johnsnowlabs/nlp/annotators/re/RelationExtractionModel.html)

Relation Extraction Models and Relation Pairs Table: [In this link](https://nlp.johnsnowlabs.com/docs/en/best_practices_pretrained_models#relation-extraction-models-and-relation-pairs-table), available Relation Extraction models, its labels, optimal NER model, and meaningful relation pairs are illustrated.

## **📜 Background**


This annotator extracts and classifies instances of relations between named entities. For this, relation pairs need to be defined with `setRelationPairs`, to specify between which entities the extraction should be done.

## **🎬 Colab Setup**

In [None]:
# Install the johnsnowlabs library to access Spark-NLP for Healthcare
! pip install -q johnsnowlabs

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

In [None]:
from johnsnowlabs import nlp, medical

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
nlp.install()

In [None]:
from johnsnowlabs import nlp, medical
import pandas as pd

# Automatically load license data and start a session with all jars user has access to
spark = nlp.start()

In [6]:
spark

In [7]:
from pyspark.sql import DataFrame
import pyspark.sql.functions as F
import pyspark.sql.types as T

## **🖨️ Input/Output Annotation Types**
- Input: `WORD_EMBEDDINGS, POS, CHUNK, DEPENDENCY`
- Output: `CATEGORY`

## **🔎 Parameters**


- `predictionThreshold` *(Float)*: Sets minimal activation of the target unit to encode a new relation instance.

- `relationPairs` *(List[Str])*: List of dash-separated pairs of named entities. For example, [“Biomarker-RelativeDay”] will process all relations between entities of type “Biomarker” and “RelativeDay”.

- `relationPairsCaseSensitive` *(Bool)*: Determines whether relation pairs are case sensitive.

- `relationTypePerPair` *dict[str, list[str]]*: List of entity pairs per relations which limit the entities can form a relation. For example, {“CAUSE”: [“PROBLEM”, “SYMPTOM”]} which only let a “CAUSE” relation to hold between a problem (“PROBLEM) and a symptom (“SYMTOM”).

- `maxSyntacticDistance` *(Int)*: Maximal syntactic distance, as threshold (Default: 0). Determine how far the “from entity” can be from the “to entity” in the text. Increasing this value will increase recall, but also increase the number of false positives.

- `customLabels` *(dict[str, str])*: Custom relation labels.

- `multiClass` *(Bool)*: If multiClass is set, the model will return all the labels with corresponding scores (Default: False)



### `setMaxSyntacticDistance`
This parameter is used for setting the maximal syntactic distance, as threshold.

**Build pipeline using SparNLP pretrained models and the relation extration model**.

 The precision of the RE model is controlled by "setMaxSyntacticDistance(4)", which sets the maximum syntactic distance between named entities to 4. A larger value will improve recall at the expense at lower precision. A value of 4 leads to literally perfect precision (i.e. the model doesn't produce any false positives) and reasonably good recall.

In our example pipeline,  we will use `posology_re` pretrained Relation Extraction model for posology.

In [8]:
documenter = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentencer = nlp.SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentences")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentences"])\
    .setOutputCol("tokens")

words_embedder = nlp.WordEmbeddingsModel()\
    .pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("embeddings")

pos_tagger = nlp.PerceptronModel()\
    .pretrained("pos_clinical", "en", "clinical/models") \
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("pos_tags")

ner_tagger = medical.NerModel()\
    .pretrained("ner_posology", "en", "clinical/models")\
    .setInputCols("sentences", "tokens", "embeddings")\
    .setOutputCol("ner_tags")

ner_chunker = medical.NerConverterInternal()\
    .setInputCols(["sentences", "tokens", "ner_tags"])\
    .setOutputCol("ner_chunks")

dependency_parser = nlp.DependencyParserModel()\
    .pretrained("dependency_conllu", "en")\
    .setInputCols(["sentences", "pos_tags", "tokens"])\
    .setOutputCol("dependencies")

reModel = medical.RelationExtractionModel()\
    .pretrained("posology_re")\
    .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
    .setOutputCol("relations")\
    .setMaxSyntacticDistance(4)

pipeline = nlp.Pipeline(stages=[
    documenter,
    sentencer,
    tokenizer,
    words_embedder,
    pos_tagger,
    ner_tagger,
    ner_chunker,
    dependency_parser,
    reModel
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_data)


embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
pos_clinical download started this may take some time.
Approximate size to download 1.5 MB
[OK!]
ner_posology download started this may take some time.
[OK!]
dependency_conllu download started this may take some time.
Approximate size to download 16.7 MB
[OK!]


**Create a light pipeline for annotating free text**

In [None]:
text = """
The patient was prescribed 1 unit of Advil for 5 days after meals. The patient was also
given 1 unit of Metformin daily.
He was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night ,
12 units of insulin lispro with meals , and metformin 1000 mg two times a day.
"""

lmodel = nlp.LightPipeline(model)

results = lmodel.fullAnnotate(text)

In [None]:
results[0].keys()

dict_keys(['sentences', 'document', 'ner_chunks', 'ner_tags', 'relations', 'tokens', 'embeddings', 'pos_tags', 'dependencies'])

In [None]:
results[0]['ner_chunks']

[Annotation(chunk, 28, 33, 1 unit, {'chunk': '0', 'confidence': '0.71675', 'ner_source': 'ner_chunks', 'entity': 'DOSAGE', 'sentence': '0'}, []),
 Annotation(chunk, 38, 42, Advil, {'chunk': '1', 'confidence': '0.9984', 'ner_source': 'ner_chunks', 'entity': 'DRUG', 'sentence': '0'}, []),
 Annotation(chunk, 44, 53, for 5 days, {'chunk': '2', 'confidence': '0.7455', 'ner_source': 'ner_chunks', 'entity': 'DURATION', 'sentence': '0'}, []),
 Annotation(chunk, 95, 100, 1 unit, {'chunk': '3', 'confidence': '0.72360003', 'ner_source': 'ner_chunks', 'entity': 'DOSAGE', 'sentence': '1'}, []),
 Annotation(chunk, 105, 113, Metformin, {'chunk': '4', 'confidence': '0.9998', 'ner_source': 'ner_chunks', 'entity': 'DRUG', 'sentence': '1'}, []),
 Annotation(chunk, 115, 119, daily, {'chunk': '5', 'confidence': '0.9997', 'ner_source': 'ner_chunks', 'entity': 'FREQUENCY', 'sentence': '1'}, []),
 Annotation(chunk, 189, 196, 40 units, {'chunk': '6', 'confidence': '0.84085', 'ner_source': 'ner_chunks', 'entity

In [None]:
results[0]['relations']

[Annotation(category, 28, 42, DOSAGE-DRUG, {'chunk2': 'Advil', 'confidence': '1.0', 'entity2_end': '42', 'chunk1': '1 unit', 'entity2_begin': '38', 'entity1': 'DOSAGE', 'chunk2_confidence': '0.9984', 'entity1_begin': '28', 'direction': 'both', 'entity1_end': '33', 'chunk1_confidence': '0.71675', 'entity2': 'DRUG'}, []),
 Annotation(category, 38, 53, DRUG-DURATION, {'chunk2': 'for 5 days', 'confidence': '1.0', 'entity2_end': '53', 'chunk1': 'Advil', 'entity2_begin': '44', 'entity1': 'DRUG', 'chunk2_confidence': '0.7455', 'entity1_begin': '38', 'direction': 'both', 'entity1_end': '42', 'chunk1_confidence': '0.9984', 'entity2': 'DURATION'}, []),
 Annotation(category, 95, 113, DOSAGE-DRUG, {'chunk2': 'Metformin', 'confidence': '1.0', 'entity2_end': '113', 'chunk1': '1 unit', 'entity2_begin': '105', 'entity1': 'DOSAGE', 'chunk2_confidence': '0.9998', 'entity1_begin': '95', 'direction': 'both', 'entity1_end': '100', 'chunk1_confidence': '0.72360003', 'entity2': 'DRUG'}, []),
 Annotation(cate

**Show extracted relations**

In [None]:
for rel in results[0]["relations"]:
    print("{}({}={} - {}={})".format(
        rel.result,
        rel.metadata['entity1'],
        rel.metadata['chunk1'],
        rel.metadata['entity2'],
        rel.metadata['chunk2']
    ))

DOSAGE-DRUG(DOSAGE=1 unit - DRUG=Advil)
DRUG-DURATION(DRUG=Advil - DURATION=for 5 days)
DOSAGE-DRUG(DOSAGE=1 unit - DRUG=Metformin)
DRUG-FREQUENCY(DRUG=Metformin - FREQUENCY=daily)
DOSAGE-DRUG(DOSAGE=40 units - DRUG=insulin glargine)
DRUG-FREQUENCY(DRUG=insulin glargine - FREQUENCY=at night)
DOSAGE-DRUG(DOSAGE=12 units - DRUG=insulin lispro)
DRUG-FREQUENCY(DRUG=insulin lispro - FREQUENCY=with meals)
DRUG-STRENGTH(DRUG=metformin - STRENGTH=1000 mg)
DRUG-FREQUENCY(DRUG=metformin - FREQUENCY=two times a day)


In [None]:
results[0]["relations"]

[Annotation(category, 28, 42, DOSAGE-DRUG, {'chunk2': 'Advil', 'confidence': '1.0', 'entity2_end': '42', 'chunk1': '1 unit', 'entity2_begin': '38', 'entity1': 'DOSAGE', 'chunk2_confidence': '0.9984', 'entity1_begin': '28', 'direction': 'both', 'entity1_end': '33', 'chunk1_confidence': '0.71675', 'entity2': 'DRUG'}, []),
 Annotation(category, 38, 53, DRUG-DURATION, {'chunk2': 'for 5 days', 'confidence': '1.0', 'entity2_end': '53', 'chunk1': 'Advil', 'entity2_begin': '44', 'entity1': 'DRUG', 'chunk2_confidence': '0.7455', 'entity1_begin': '38', 'direction': 'both', 'entity1_end': '42', 'chunk1_confidence': '0.9984', 'entity2': 'DURATION'}, []),
 Annotation(category, 95, 113, DOSAGE-DRUG, {'chunk2': 'Metformin', 'confidence': '1.0', 'entity2_end': '113', 'chunk1': '1 unit', 'entity2_begin': '105', 'entity1': 'DOSAGE', 'chunk2_confidence': '0.9998', 'entity1_begin': '95', 'direction': 'both', 'entity1_end': '100', 'chunk1_confidence': '0.72360003', 'entity2': 'DRUG'}, []),
 Annotation(cate

Get relations in a pandas dataframe


In [9]:
def get_relations_df(results, rel_col='relations', chunk_col='ner_chunks'):
    rel_pairs=[]
    chunks = []

    for rel in results[0][rel_col]:
        rel_pairs.append((
            rel.metadata['entity1_begin'],
            rel.metadata['entity1_end'],
            rel.metadata['chunk1'],
            rel.metadata['entity1'],
            rel.metadata['entity2_begin'],
            rel.metadata['entity2_end'],
            rel.metadata['chunk2'],
            rel.metadata['entity2'],
            rel.result,
            rel.metadata['confidence'],
        ))

    for chunk in results[0][chunk_col]:
        chunks.append((
            chunk.metadata["sentence"],
            chunk.begin,
            chunk.end,
            chunk.result,
        ))

    rel_df = pd.DataFrame(rel_pairs, columns=['entity1_begin', 'entity1_end', 'chunk1', 'entity1', 'entity2_begin', 'entity2_end', 'chunk2', 'entity2', 'relation', 'confidence'])

    chunks_df = pd.DataFrame(chunks, columns = ["sentence", "begin", "end", "chunk"])
    chunks_df.begin = chunks_df.begin.astype(str)
    chunks_df.end = chunks_df.end.astype(str)

    result_df = pd.merge(rel_df,chunks_df, left_on=["entity1_begin", "entity1_end", "chunk1"], right_on=["begin", "end", "chunk"])[["sentence"] + list(rel_df.columns)]


    return result_df

In [None]:
print(text, "\n")

rel_df = get_relations_df (results)
rel_df


The patient was prescribed 1 unit of Advil for 5 days after meals. The patient was also
given 1 unit of Metformin daily.
He was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night ,
12 units of insulin lispro with meals , and metformin 1000 mg two times a day.
 



Unnamed: 0,sentence,entity1_begin,entity1_end,chunk1,entity1,entity2_begin,entity2_end,chunk2,entity2,relation,confidence
0,0,28,33,1 unit,DOSAGE,38,42,Advil,DRUG,DOSAGE-DRUG,1.0
1,0,38,42,Advil,DRUG,44,53,for 5 days,DURATION,DRUG-DURATION,1.0
2,1,95,100,1 unit,DOSAGE,105,113,Metformin,DRUG,DOSAGE-DRUG,1.0
3,1,105,113,Metformin,DRUG,115,119,daily,FREQUENCY,DRUG-FREQUENCY,1.0
4,2,189,196,40 units,DOSAGE,201,216,insulin glargine,DRUG,DOSAGE-DRUG,1.0
5,2,201,216,insulin glargine,DRUG,218,225,at night,FREQUENCY,DRUG-FREQUENCY,1.0
6,2,229,236,12 units,DOSAGE,241,254,insulin lispro,DRUG,DOSAGE-DRUG,1.0
7,2,241,254,insulin lispro,DRUG,256,265,with meals,FREQUENCY,DRUG-FREQUENCY,1.0
8,2,273,281,metformin,DRUG,283,289,1000 mg,STRENGTH,DRUG-STRENGTH,1.0
9,2,273,281,metformin,DRUG,291,305,two times a day,FREQUENCY,DRUG-FREQUENCY,1.0


In [None]:
text ="""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus ( T2DM ),
one prior episode of HTG-induced pancreatitis three years prior to presentation,  associated with an acute hepatitis , and obesity with a body mass index ( BMI ) of 33.5 kg/m2 , presented with a one-week history of polyuria , polydipsia , poor appetite , and vomiting .
Two weeks prior to presentation , she was treated with a five-day course of amoxicillin for a respiratory tract infection .
She was on metformin , glipizide , and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG .
She had been on dapagliflozin for six months at the time of presentation. Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness , guarding , or rigidity .
Pertinent laboratory findings on admission were : serum glucose 111 mg/dl , bicarbonate 18 mmol/l , anion gap 20 , creatinine 0.4 mg/dL , triglycerides 508 mg/dL , total cholesterol 122 mg/dL , glycated hemoglobin ( HbA1c ) 10% , and venous pH 7.27 . Serum lipase was normal at 43 U/L .
Serum acetone levels could not be assessed as blood samples kept hemolyzing due to significant lipemia . The patient was initially admitted for starvation ketosis , as she reported poor oral intake for three days prior to admission . However , serum chemistry obtained six hours after presentation revealed her glucose was 186 mg/dL , the anion gap was still elevated at 21 , serum bicarbonate was 16 mmol/L , triglyceride level peaked at 2050 mg/dL , and lipase was 52 U/L .
The β-hydroxybutyrate level was obtained and found to be elevated at 5.29 mmol/L - the original sample was centrifuged and the chylomicron layer removed prior to analysis due to interference from turbidity caused by lipemia again .
The patient was treated with an insulin drip for euDKA and HTG with a reduction in the anion gap to 13 and triglycerides to 1400 mg/dL , within 24 hours . Her euDKA was thought to be precipitated by her respiratory tract infection in the setting of SGLT2 inhibitor use .
The patient was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night , 12 units of insulin lispro with meals , and metformin 1000 mg two times a day . It was determined that all SGLT2 inhibitors should be discontinued indefinitely .
She had close follow-up with endocrinology post discharge .
"""

annotations = lmodel.fullAnnotate(text)

rel_df = get_relations_df (annotations)

rel_df



Unnamed: 0,sentence,entity1_begin,entity1_end,chunk1,entity1,entity2_begin,entity2_end,chunk2,entity2,relation,confidence
0,1,492,499,five-day,DURATION,511,521,amoxicillin,DRUG,DURATION-DRUG,1.0
1,3,680,692,dapagliflozin,DRUG,694,707,for six months,DURATION,DRUG-DURATION,1.0
2,12,1939,1945,insulin,DRUG,1947,1950,drip,ROUTE,DRUG-ROUTE,1.0
3,14,2254,2261,40 units,DOSAGE,2266,2281,insulin glargine,DRUG,DOSAGE-DRUG,1.0
4,14,2266,2281,insulin glargine,DRUG,2283,2290,at night,FREQUENCY,DRUG-FREQUENCY,1.0
5,14,2294,2301,12 units,DOSAGE,2306,2319,insulin lispro,DRUG,DOSAGE-DRUG,1.0
6,14,2306,2319,insulin lispro,DRUG,2321,2330,with meals,FREQUENCY,DRUG-FREQUENCY,1.0
7,14,2338,2346,metformin,DRUG,2348,2354,1000 mg,STRENGTH,DRUG-STRENGTH,1.0
8,14,2338,2346,metformin,DRUG,2356,2370,two times a day,FREQUENCY,DRUG-FREQUENCY,1.0


**Visualization of Extracted Relations**

In [None]:
text = """
The patient was prescribed 1 unit of Advil for 5 days after meals. The patient was also
given 1 unit of Metformin daily.
He was seen by the endocrinology service and she was discharged on 40 units of insulin glargine ,
12 units of insulin lispro with meals , and metformin two times a day.
"""

lmodel = nlp.LightPipeline(model)

results = lmodel.fullAnnotate(text)

In [None]:
vis = nlp.viz.RelationExtractionVisualizer()
vis.display(results[0], 'relations', show_relations=True) # default show_relations: True


### `setRelationPairs`
This parameter is used for setting the list of dash-separated pairs of named entities. For example, [“Biomarker-RelativeDay”] will process all relations between entities of type “Biomarker” and “RelativeDay”.

Now we will first use `ner_ade_clinical` NER model and detect `DRUG` and `ADE` entities. Then we can find the relations between them by using `re_ade_clinical` Relation Extraction model. We can also use the same pipeline elemnets rest of them we created above.

In [None]:
ner_tagger = medical.NerModel.pretrained("ner_events_clinical", "en", "clinical/models")\
    .setInputCols("sentences", "tokens", "embeddings")\
    .setOutputCol("ner_tags")

reModel = medical.RelationExtractionModel()\
    .pretrained("re_ade_clinical", "en", 'clinical/models')\
    .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
    .setOutputCol("relations")\
    .setMaxSyntacticDistance(10)\
    .setRelationPairs(["drug-ade, ade-drug"])

ade_pipeline = nlp.Pipeline(stages=[
    documenter,
    sentencer,
    tokenizer,
    words_embedder,
    pos_tagger,
    ner_tagger,
    ner_chunker,
    dependency_parser,
    reModel
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

ade_model = ade_pipeline.fit(empty_data)

ner_ade_clinical download started this may take some time.
[OK!]
re_ade_clinical download started this may take some time.
[OK!]


Show the classes

In [None]:
reModel.getClasses()

['0', '1']

We will create a LightPipeline for annotation.

In [None]:
ade_lmodel = nlp.LightPipeline(ade_model)

In [None]:
text = "I experienced fatigue, muscle cramps, anxiety, agression and sadness after taking Lipitor but no more adverse after passing Zocor."

ade_results = ade_lmodel.fullAnnotate(text)

**Lets show the ADE-DRUG relations by using pandas dataframe.**

In [None]:
ade_results = ade_lmodel.fullAnnotate(text)

rel_df = get_relations_df(ade_results)

rel_df

Unnamed: 0,sentence,entity1_begin,entity1_end,chunk1,entity1,entity2_begin,entity2_end,chunk2,entity2,relation,confidence
0,0,14,20,fatigue,ADE,82,88,Lipitor,DRUG,1,0.9998622
1,0,14,20,fatigue,ADE,124,128,Zocor,DRUG,0,0.99888533
2,0,23,35,muscle cramps,ADE,82,88,Lipitor,DRUG,1,0.99999154
3,0,23,35,muscle cramps,ADE,124,128,Zocor,DRUG,0,0.9780189
4,0,38,44,anxiety,ADE,82,88,Lipitor,DRUG,1,0.86541504
5,0,38,44,anxiety,ADE,124,128,Zocor,DRUG,0,0.99998915
6,0,47,55,agression,ADE,82,88,Lipitor,DRUG,1,0.99999785
7,0,47,55,agression,ADE,124,128,Zocor,DRUG,0,0.999469
8,0,61,67,sadness,ADE,82,88,Lipitor,DRUG,1,1.0
9,0,61,67,sadness,ADE,124,128,Zocor,DRUG,0,0.99980325


### `setRelationPairsCaseSensitive`
This parameter is set to determine whether relation pairs are case sensitive (Default: False).

We will use the same `ADE` Relation Extraction pipeline as above with the `setRelationPairsCaseSensitive(True)` and see the difference. <br/>

We have `ADE` and `DRUG` entities coming from the NER model in uppercased. But, we will set `setRelationPairs(["drug-ade, ade-drug"])`lowercased and `setRelationPairsCaseSensitive(True)`. Therefore, we do not expect any relation since case status of the NER entities and the relation pairs do not match.

In [None]:
ner_tagger = medical.NerModel.pretrained("ner_ade_clinical", "en", "clinical/models")\
    .setInputCols("sentences", "tokens", "embeddings")\
    .setOutputCol("ner_tags")

reModel = medical.RelationExtractionModel()\
    .pretrained("re_ade_clinical", "en", 'clinical/models')\
    .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
    .setOutputCol("relations")\
    .setMaxSyntacticDistance(10)\
    .setRelationPairs(["drug-ade, ade-drug"])\
    .setRelationPairsCaseSensitive(True)      # it will return only "ade-drug" relationship.
                                              # True, then the pairs of entities in the dataset should match the pairs in setRelationPairs in their specific case (case sensitive).
                                              # False, meaning that the match of those relation names is case insensitive.

ade_pipeline = nlp.Pipeline(stages=[
    documenter,
    sentencer,
    tokenizer,
    words_embedder,
    pos_tagger,
    ner_tagger,
    ner_chunker,
    dependency_parser,
    reModel
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

ade_model = ade_pipeline.fit(empty_data)

ner_ade_clinical download started this may take some time.
[OK!]
re_ade_clinical download started this may take some time.
[OK!]


Create a LightPipeline for annotation.

In [None]:
ade_lmodel = nlp.LightPipeline(ade_model)

In [None]:
text = "I experienced fatigue, muscle cramps, anxiety, agression and sadness after taking Lipitor but no more adverse after passing Zocor."

ade_results = ade_lmodel.fullAnnotate(text)

Show the ADE-DRUG relations by using pandas dataframe.

In [None]:
ade_results = ade_lmodel.fullAnnotate(text)

rel_df = get_relations_df(ade_results)

rel_df

Unnamed: 0,sentence,entity1_begin,entity1_end,chunk1,entity1,entity2_begin,entity2_end,chunk2,entity2,relation,confidence


As seen above, there is no relation catched because of the mismatch between the relation paris and the NER entities.

### `setCustomLabels`

This parameter is used for setting the custom relation labels.

Lets set custom labels instead of default ones by using .`setCustomLabels` parameter

In [None]:
reModel = medical.RelationExtractionModel()\
    .pretrained("re_ade_clinical", "en", 'clinical/models')\
    .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
    .setOutputCol("relations")\
    .setMaxSyntacticDistance(10)\
    .setRelationPairs(["drug-ade, ade-drug"])\
    .setRelationPairsCaseSensitive(False)

# set custom labels
reModel.setCustomLabels({"1": "is_related", "0": "not_related"})


ade_pipeline = nlp.Pipeline(stages=[
    documenter,
    sentencer,
    tokenizer,
    words_embedder,
    pos_tagger,
    ner_tagger,
    ner_chunker,
    dependency_parser,
    reModel
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

ade_model = ade_pipeline.fit(empty_data)

re_ade_clinical download started this may take some time.
[OK!]


Create a LightPipeline for annotation.

In [None]:
ade_lmodel = nlp.LightPipeline(ade_model)

text ="""A 44-year-old man taking naproxen for chronic low back pain and a 20-year-old woman on oxaprozin for rheumatoid arthritis presented with tense bullae and cutaneous fragility on the face and the back of the hands."""

ade_results = ade_lmodel.fullAnnotate(text)

Showing the results in a pandas dataframe

In [None]:
print(text)

rel_df = get_relations_df (ade_results)

rel_df


A 44-year-old man taking naproxen for chronic low back pain and a 20-year-old woman on oxaprozin for rheumatoid arthritis presented with tense bullae and cutaneous fragility on the face and the back of the hands.


Unnamed: 0,sentence,entity1_begin,entity1_end,chunk1,entity1,entity2_begin,entity2_end,chunk2,entity2,relation,confidence
0,0,25,32,naproxen,DRUG,137,148,tense bullae,ADE,is_related,1.0
1,0,25,32,naproxen,DRUG,154,210,cutaneous fragility on the face and the back o...,ADE,is_related,0.99999976
2,0,87,95,oxaprozin,DRUG,137,148,tense bullae,ADE,is_related,1.0
3,0,87,95,oxaprozin,DRUG,154,210,cutaneous fragility on the face and the back o...,ADE,is_related,1.0


As seen above, we see the labels as we customized; `is_related`, `not_related`.

### `setPredictionThreshold`

This parameter is used for setting the minimal activation of the target unit to encode a new relation instance.


We will set`setPredictionThreshold()` parameter with different values and see its effects on the results. <br/>


In [None]:
ner_tagger = medical.NerModel.pretrained("ner_ade_clinical", "en", "clinical/models")\
    .setInputCols("sentences", "tokens", "embeddings")\
    .setOutputCol("ner_tags")

reModel = medical.RelationExtractionModel()\
    .pretrained("re_ade_clinical", "en", 'clinical/models')\
    .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
    .setOutputCol("relations")\
    .setMaxSyntacticDistance(10)\
    .setRelationPairs(["drug-ade, ade-drug"])\
    .setPredictionThreshold(0.9)

ade_pipeline = nlp.Pipeline(stages=[
    documenter,
    sentencer,
    tokenizer,
    words_embedder,
    pos_tagger,
    ner_tagger,
    ner_chunker,
    dependency_parser,
    reModel
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

ade_model = ade_pipeline.fit(empty_data)

ner_ade_clinical download started this may take some time.
[OK!]
re_ade_clinical download started this may take some time.
[OK!]


Create a LightPipeline for annotation.

In [None]:
ade_lmodel = nlp.LightPipeline(ade_model)

text = "I experienced fatigue, muscle cramps, anxiety, agression and sadness after taking Lipitor but no more adverse after passing Zocor."

ade_results = ade_lmodel.fullAnnotate(text)

Checking the results in a pandas dataframe

In [None]:
ade_results = ade_lmodel.fullAnnotate(text)

rel_df = get_relations_df(ade_results)

rel_df

Unnamed: 0,sentence,entity1_begin,entity1_end,chunk1,entity1,entity2_begin,entity2_end,chunk2,entity2,relation,confidence
0,0,14,20,fatigue,ADE,82,88,Lipitor,DRUG,1,0.9998622
1,0,14,20,fatigue,ADE,124,128,Zocor,DRUG,0,0.99888533
2,0,23,35,muscle cramps,ADE,82,88,Lipitor,DRUG,1,0.99999154
3,0,23,35,muscle cramps,ADE,124,128,Zocor,DRUG,0,0.9780189
4,0,38,44,anxiety,ADE,124,128,Zocor,DRUG,0,0.99998915
5,0,47,55,agression,ADE,82,88,Lipitor,DRUG,1,0.99999785
6,0,47,55,agression,ADE,124,128,Zocor,DRUG,0,0.999469
7,0,61,67,sadness,ADE,82,88,Lipitor,DRUG,1,1.0
8,0,61,67,sadness,ADE,124,128,Zocor,DRUG,0,0.99980325


As seen above, we only have the relations which have a confidence higher than 0.9

Pipeline with `setPredictionThreshold(0.1)`:

In [None]:
ner_tagger = medical.NerModel.pretrained("ner_ade_clinical", "en", "clinical/models")\
    .setInputCols("sentences", "tokens", "embeddings")\
    .setOutputCol("ner_tags")

reModel = medical.RelationExtractionModel()\
    .pretrained("re_ade_clinical", "en", 'clinical/models')\
    .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
    .setOutputCol("relations")\
    .setMaxSyntacticDistance(10)\
    .setRelationPairs(["drug-ade, ade-drug"])\
    .setPredictionThreshold(0.1)

ade_pipeline = nlp.Pipeline(stages=[
    documenter,
    sentencer,
    tokenizer,
    words_embedder,
    pos_tagger,
    ner_tagger,
    ner_chunker,
    dependency_parser,
    reModel
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

ade_model = ade_pipeline.fit(empty_data)

ner_ade_clinical download started this may take some time.
[OK!]
re_ade_clinical download started this may take some time.
[OK!]


In [None]:
ade_lmodel = nlp.LightPipeline(ade_model)

text = "I experienced fatigue, muscle cramps, anxiety, agression and sadness after taking Lipitor but no more adverse after passing Zocor."

ade_results = ade_lmodel.fullAnnotate(text)

In [None]:
rel_df = get_relations_df(ade_results)

rel_df

Unnamed: 0,sentence,entity1_begin,entity1_end,chunk1,entity1,entity2_begin,entity2_end,chunk2,entity2,relation,confidence
0,0,14,20,fatigue,ADE,82,88,Lipitor,DRUG,1,0.9998622
1,0,14,20,fatigue,ADE,124,128,Zocor,DRUG,0,0.99888533
2,0,23,35,muscle cramps,ADE,82,88,Lipitor,DRUG,1,0.99999154
3,0,23,35,muscle cramps,ADE,124,128,Zocor,DRUG,0,0.9780189
4,0,38,44,anxiety,ADE,82,88,Lipitor,DRUG,1,0.86541504
5,0,38,44,anxiety,ADE,124,128,Zocor,DRUG,0,0.99998915
6,0,47,55,agression,ADE,82,88,Lipitor,DRUG,1,0.99999785
7,0,47,55,agression,ADE,124,128,Zocor,DRUG,0,0.999469
8,0,61,67,sadness,ADE,82,88,Lipitor,DRUG,1,1.0
9,0,61,67,sadness,ADE,124,128,Zocor,DRUG,0,0.99980325


As seen above, since we set the threshold 0.1, we can see all relations higher that 0.1 (see the index 4.)

### `setRelationTypePerPair`

List of entity pairs per relations which limit the entities can form a relation. For example, {TrAP: [“PROBLEM”, "TREATMENT"]} which only let “PROBLEM" and "TREATMENT" NER entities in a "TrAP" relation.


Now, we will define the RE pipeline with `setRelationTypePerPair({"TrAP": ["PROBLEM-TREATMENT"]})` to see only "PROBLEM" and "TREATMENT" entities in "TrAP" relation type.

In [16]:
ner_tagger = medical.NerModel.pretrained("ner_clinical", "en", "clinical/models")\
    .setInputCols("sentences", "tokens", "embeddings")\
    .setOutputCol("ner_tags")

reModel = medical.RelationExtractionModel()\
    .pretrained("re_clinical", "en", 'clinical/models')\
    .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
    .setOutputCol("relations")\
    .setMaxSyntacticDistance(10)\
    .setRelationPairsCaseSensitive(False)\
    .setRelationTypePerPair({"TrAP": ["PROBLEM-TREATMENT"]})


re_pipeline = nlp.Pipeline(stages=[
    documenter,
    sentencer,
    tokenizer,
    words_embedder,
    pos_tagger,
    ner_tagger,
    ner_chunker,
    dependency_parser,
    reModel
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

re_model = re_pipeline.fit(empty_data)

ner_clinical download started this may take some time.
[OK!]
re_clinical download started this may take some time.
[OK!]


In [17]:
re_lmodel = nlp.LightPipeline(ade_model)

text= "A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus ( T2DM ), one prior episode of HTG-induced pancreatitis three years prior to presentation,  associated with an acute hepatitis , and obesity with a body mass index ( BMI ) of 33.5 kg/m2 , presented with a one-week history of polyuria , polydipsia , poor appetite , and vomiting . Two weeks prior to presentation , she was treated with a five-day course of amoxicillin for a respiratory tract infection . She was on metformin , glipizide , and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG . She had been on dapagliflozin for six months at the time of presentation. Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness , guarding , or rigidity . Pertinent laboratory findings on admission were : serum glucose 111 mg/dl , bicarbonate 18 mmol/l , anion gap 20 , creatinine 0.4 mg/dL , triglycerides 508 mg/dL , total cholesterol 122 mg/dL , glycated hemoglobin ( HbA1c ) 10% , and venous pH 7.27 . Serum lipase was normal at 43 U/L . Serum acetone levels could not be assessed as blood samples kept hemolyzing due to significant lipemia . The patient was initially admitted for starvation ketosis , as she reported poor oral intake for three days prior to admission . However , serum chemistry obtained six hours after presentation revealed her glucose was 186 mg/dL , the anion gap was still elevated at 21 , serum bicarbonate was 16 mmol/L , triglyceride level peaked at 2050 mg/dL , and lipase was 52 U/L . The β-hydroxybutyrate level was obtained and found to be elevated at 5.29 mmol/L - the original sample was centrifuged and the chylomicron layer removed prior to analysis due to interference from turbidity caused by lipemia again . The patient was treated with an insulin drip for euDKA and HTG with a reduction in the anion gap to 13 and triglycerides to 1400 mg/dL , within 24 hours . Her euDKA was thought to be precipitated by her respiratory tract infection in the setting of SGLT2 inhibitor use . The patient was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night , 12 units of insulin lispro with meals , and metformin 1000 mg two times a day . It was determined that all SGLT2 inhibitors should be discontinued indefinitely . She had close follow-up with endocrinology post discharge ."

re_results = re_lmodel.fullAnnotate(text)

In [18]:
rel_df = get_relations_df(re_results)

rel_df[rel_df["relation"]=="TrAP"]

Unnamed: 0,sentence,entity1_begin,entity1_end,chunk1,entity1,entity2_begin,entity2_end,chunk2,entity2,relation,confidence
51,1,511,521,amoxicillin,TREATMENT,527,555,a respiratory tract infection,PROBLEM,TrAP,0.9999393
54,2,570,578,metformin,TREATMENT,616,619,T2DM,PROBLEM,TrAP,0.99999905
57,2,570,578,metformin,TREATMENT,658,660,HTG,PROBLEM,TrAP,0.99999964
59,2,582,590,glipizide,TREATMENT,616,619,T2DM,PROBLEM,TrAP,0.9999999
62,2,582,590,glipizide,TREATMENT,658,660,HTG,PROBLEM,TrAP,1.0
63,2,598,610,dapagliflozin,TREATMENT,616,619,T2DM,PROBLEM,TrAP,0.99999976
66,2,598,610,dapagliflozin,TREATMENT,658,660,HTG,PROBLEM,TrAP,0.9998598
71,2,625,636,atorvastatin,TREATMENT,658,660,HTG,PROBLEM,TrAP,0.99999547
72,2,642,652,gemfibrozil,TREATMENT,658,660,HTG,PROBLEM,TrAP,1.0
158,12,1936,1950,an insulin drip,TREATMENT,1956,1960,euDKA,PROBLEM,TrAP,0.9996302


As seen above, we only have "TREATMENT" and "PROBLEM" entities in "TrAP" relations.

This time we will set `setRelationTypePerPair({"TrAP": ["PROBLEM-TEST"]})` to see only "PROBLEM" and "TEST" entities in "TrAP" relations.

In [19]:
reModel = medical.RelationExtractionModel()\
    .pretrained("re_clinical", "en", 'clinical/models')\
    .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
    .setOutputCol("relations")\
    .setMaxSyntacticDistance(10)\
    .setRelationPairsCaseSensitive(False)\
    .setRelationTypePerPair({"TrAP": ["PROBLEM-TEST"]})


re_pipeline = nlp.Pipeline(stages=[
    documenter,
    sentencer,
    tokenizer,
    words_embedder,
    pos_tagger,
    ner_tagger,
    ner_chunker,
    dependency_parser,
    reModel
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

re_model = re_pipeline.fit(empty_data)

re_clinical download started this may take some time.
[OK!]


In [20]:
re_lmodel = nlp.LightPipeline(re_model)

re_results = re_lmodel.fullAnnotate(text)

In [21]:
rel_df = get_relations_df(re_results)
rel_df[rel_df["relation"]=="TrAP"]

Unnamed: 0,sentence,entity1_begin,entity1_end,chunk1,entity1,entity2_begin,entity2_end,chunk2,entity2,relation,confidence
45,0,321,323,BMI,TEST,424,431,vomiting,PROBLEM,TrAP,0.894692


As seen above, we only see "PROBLEM" and "TEST" entities in "TrAP" relations.

### `setMultiClass`

This parameter is set in order to return whether all the labels with corresponding scores (Default: False).



In [22]:
reModel = medical.RelationExtractionModel()\
    .pretrained("re_clinical", "en", 'clinical/models')\
    .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
    .setOutputCol("relations")\
    .setMaxSyntacticDistance(10)\
    .setMultiClass(True)


re_pipeline = nlp.Pipeline(stages=[
    documenter,
    sentencer,
    tokenizer,
    words_embedder,
    pos_tagger,
    ner_tagger,
    ner_chunker,
    dependency_parser,
    reModel
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

re_model = re_pipeline.fit(empty_data)

re_clinical download started this may take some time.
[OK!]


In [23]:
re_lmodel = nlp.LightPipeline(re_model)

text= "A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus ( T2DM ), one prior episode of HTG-induced pancreatitis three years prior to presentation,  associated with an acute hepatitis , and obesity with a body mass index ( BMI ) of 33.5 kg/m2 , presented with a one-week history of polyuria , polydipsia , poor appetite , and vomiting . Two weeks prior to presentation , she was treated with a five-day course of amoxicillin for a respiratory tract infection . She was on metformin , glipizide , and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG . She had been on dapagliflozin for six months at the time of presentation. Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness , guarding , or rigidity . Pertinent laboratory findings on admission were : serum glucose 111 mg/dl , bicarbonate 18 mmol/l , anion gap 20 , creatinine 0.4 mg/dL , triglycerides 508 mg/dL , total cholesterol 122 mg/dL , glycated hemoglobin ( HbA1c ) 10% , and venous pH 7.27 . Serum lipase was normal at 43 U/L . Serum acetone levels could not be assessed as blood samples kept hemolyzing due to significant lipemia . The patient was initially admitted for starvation ketosis , as she reported poor oral intake for three days prior to admission . However , serum chemistry obtained six hours after presentation revealed her glucose was 186 mg/dL , the anion gap was still elevated at 21 , serum bicarbonate was 16 mmol/L , triglyceride level peaked at 2050 mg/dL , and lipase was 52 U/L . The β-hydroxybutyrate level was obtained and found to be elevated at 5.29 mmol/L - the original sample was centrifuged and the chylomicron layer removed prior to analysis due to interference from turbidity caused by lipemia again . The patient was treated with an insulin drip for euDKA and HTG with a reduction in the anion gap to 13 and triglycerides to 1400 mg/dL , within 24 hours . Her euDKA was thought to be precipitated by her respiratory tract infection in the setting of SGLT2 inhibitor use . The patient was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night , 12 units of insulin lispro with meals , and metformin 1000 mg two times a day . It was determined that all SGLT2 inhibitors should be discontinued indefinitely . She had close follow-up with endocrinology post discharge ."

re_results = re_lmodel.fullAnnotate(text)

Let's check the metadata of the relations

In [24]:
re_results[0]["relations"]

[Annotation(category, 39, 153, TeRP, {'TrWP_confidence': '5.49793E-24', 'chunk2': 'subsequent type two diabetes mellitus', 'TrIP_confidence': '0.0', 'TrAP_confidence': '6.382531E-31', 'TrNAP_confidence': '2.8671916E-23', 'confidence': '1.0', 'entity2_end': '153', 'chunk1': 'gestational diabetes mellitus', 'entity1': 'PROBLEM', 'entity2_begin': '117', 'TeRP_confidence': '1.0', 'TeCP_confidence': '1.07731E-29', 'O_confidence': '5.097145E-16', 'chunk2_confidence': '0.75560004', 'entity1_begin': '39', 'sentence': '0', 'PIP_confidence': '8.428624E-23', 'direction': 'both', 'TrCP_confidence': '0.0', 'entity1_end': '67', 'entity2': 'PROBLEM', 'chunk1_confidence': '0.9205'}, []),
 Annotation(category, 39, 160, TeRP, {'TrWP_confidence': '7.218621E-33', 'chunk2': 'T2DM', 'TrIP_confidence': '0.0', 'TrAP_confidence': '5.3768856E-31', 'TrNAP_confidence': '3.5057997E-35', 'confidence': '1.0', 'entity2_end': '160', 'chunk1': 'gestational diabetes mellitus', 'entity1': 'PROBLEM', 'entity2_begin': '157

As you see above, we see all the labels with corresponding scores since we set `setMultiClass(True)`.

Now, we will set `setMultiClass(False)` and see the difference.

In [26]:
reModel = medical.RelationExtractionModel()\
    .pretrained("re_clinical", "en", 'clinical/models')\
    .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
    .setOutputCol("relations")\
    .setMaxSyntacticDistance(10)\
    .setMultiClass(False)


re_pipeline = nlp.Pipeline(stages=[
    documenter,
    sentencer,
    tokenizer,
    words_embedder,
    pos_tagger,
    ner_tagger,
    ner_chunker,
    dependency_parser,
    reModel
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

re_model = re_pipeline.fit(empty_data)

re_clinical download started this may take some time.
[OK!]


In [27]:
re_lmodel = nlp.LightPipeline(re_model)

text= "A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus ( T2DM ), one prior episode of HTG-induced pancreatitis three years prior to presentation,  associated with an acute hepatitis , and obesity with a body mass index ( BMI ) of 33.5 kg/m2 , presented with a one-week history of polyuria , polydipsia , poor appetite , and vomiting . Two weeks prior to presentation , she was treated with a five-day course of amoxicillin for a respiratory tract infection . She was on metformin , glipizide , and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG . She had been on dapagliflozin for six months at the time of presentation. Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness , guarding , or rigidity . Pertinent laboratory findings on admission were : serum glucose 111 mg/dl , bicarbonate 18 mmol/l , anion gap 20 , creatinine 0.4 mg/dL , triglycerides 508 mg/dL , total cholesterol 122 mg/dL , glycated hemoglobin ( HbA1c ) 10% , and venous pH 7.27 . Serum lipase was normal at 43 U/L . Serum acetone levels could not be assessed as blood samples kept hemolyzing due to significant lipemia . The patient was initially admitted for starvation ketosis , as she reported poor oral intake for three days prior to admission . However , serum chemistry obtained six hours after presentation revealed her glucose was 186 mg/dL , the anion gap was still elevated at 21 , serum bicarbonate was 16 mmol/L , triglyceride level peaked at 2050 mg/dL , and lipase was 52 U/L . The β-hydroxybutyrate level was obtained and found to be elevated at 5.29 mmol/L - the original sample was centrifuged and the chylomicron layer removed prior to analysis due to interference from turbidity caused by lipemia again . The patient was treated with an insulin drip for euDKA and HTG with a reduction in the anion gap to 13 and triglycerides to 1400 mg/dL , within 24 hours . Her euDKA was thought to be precipitated by her respiratory tract infection in the setting of SGLT2 inhibitor use . The patient was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night , 12 units of insulin lispro with meals , and metformin 1000 mg two times a day . It was determined that all SGLT2 inhibitors should be discontinued indefinitely . She had close follow-up with endocrinology post discharge ."

re_results = re_lmodel.fullAnnotate(text)

In [28]:
re_results[0]["relations"]

[Annotation(category, 39, 153, TeRP, {'chunk2': 'subsequent type two diabetes mellitus', 'confidence': '1.0', 'entity2_end': '153', 'chunk1': 'gestational diabetes mellitus', 'entity1': 'PROBLEM', 'entity2_begin': '117', 'chunk2_confidence': '0.75560004', 'entity1_begin': '39', 'sentence': '0', 'direction': 'both', 'entity1_end': '67', 'entity2': 'PROBLEM', 'chunk1_confidence': '0.9205'}, []),
 Annotation(category, 39, 160, TeRP, {'chunk2': 'T2DM', 'confidence': '1.0', 'entity2_end': '160', 'chunk1': 'gestational diabetes mellitus', 'entity1': 'PROBLEM', 'entity2_begin': '157', 'chunk2_confidence': '0.9928', 'entity1_begin': '39', 'sentence': '0', 'direction': 'both', 'entity1_end': '67', 'entity2': 'PROBLEM', 'chunk1_confidence': '0.9205'}, []),
 Annotation(category, 39, 294, TeRP, {'chunk2': 'obesity', 'confidence': '1.0', 'entity2_end': '294', 'chunk1': 'gestational diabetes mellitus', 'entity1': 'PROBLEM', 'entity2_begin': '288', 'chunk2_confidence': '0.997', 'entity1_begin': '39',