![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/Spark_NLP_Udemy_MOOC/Healthcare_NLP/ChunkMergeModel.ipynb)

# **ChunkMerge**

This notebook will cover the different parameters and usages of `ChunkMergeModel`. This annotator provides the ability to merge chunk columns coming from two or more annotators using a model generated by `ChunkMergeApproach`. Common parameters with `ChunkMergeApproach` can be used in `ChunkMergeModel` in the same way.

**📖 Learning Objectives:**

1. Merging two or more chunks results in a spark nlp pipeline
2. Using `ChunkMergeModel` annotator's parameters to get desired outputs


**🔗 Helpful Links:**

- Documentation : [ChunkMerge](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators#chunkmerge)

- Python Docs : [ChunkMergeModel](https://nlp.johnsnowlabs.com/licensed/api/python/reference/autosummary/sparknlp_jsl/annotator/merge/chunk_merge/index.html#sparknlp_jsl.annotator.merge.chunk_merge.ChunkMergeModel)

- Scala Docs : [ChunkMergeModel](https://nlp.johnsnowlabs.com/licensed/api/com/johnsnowlabs/nlp/annotators/merge/ChunkMergeModel.html)

- For extended examples of usage, see the [Spark NLP Workshop repository](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/7.Clinical_NER_Chunk_Merger.ipynb).

## **📜 Background**


Chunk Merge annotators merge chunk columns coming from two or more annotators (NER, ContextualParser, TextMatcher, or any other annotator-producing chunks). There are 2 types of chunk merge annotators:


1.   `ChunkMergeApproach`  --> Merges two or more chunk columns coming from annotators
2.   `ChunkMergeModel` --> Uses a model produced by ChunkMergerAproach

## **🎬 Colab Setup**

In [None]:
!pip install -q johnsnowlabs

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.2/265.2 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m310.8/310.8 MB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m565.0/565.0 kB[0m [31m34.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m676.2/676.2 kB[0m [31m37.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.6/95.6 kB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m54.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.3/139.3 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m66.9/66.9 kB[0m [31m7.7

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

Please Upload your John Snow Labs License using the button below


Saving spark_nlp_for_healthcare_spark_ocr_8734_532.json to spark_nlp_for_healthcare_spark_ocr_8734_532.json


In [None]:
from johnsnowlabs import nlp, medical

nlp.install()

👌 Detected license file /content/spark_nlp_for_healthcare_spark_ocr_8734_532.json
📋 Stored John Snow Labs License in /root/.johnsnowlabs/licenses/license_number_0_for_Spark-Healthcare_Spark-OCR.json
👷 Setting up  John Snow Labs home in /root/.johnsnowlabs, this might take a few minutes.
Downloading 🐍+🚀 Python Library spark_nlp-5.3.2-py2.py3-none-any.whl
Downloading 🐍+💊 Python Library spark_nlp_jsl-5.3.2-py3-none-any.whl
Downloading 🫘+🚀 Java Library spark-nlp-assembly-5.3.2.jar
Downloading 🫘+💊 Java Library spark-nlp-jsl-5.3.2.jar
🙆 JSL Home setup in /root/.johnsnowlabs
👌 Detected license file /content/spark_nlp_for_healthcare_spark_ocr_8734_532.json
Installing /root/.johnsnowlabs/py_installs/spark_nlp_jsl-5.3.2-py3-none-any.whl to /usr/bin/python3
Installed 1 products:
💊 Spark-Healthcare==5.3.2 installed! ✅ Heal the planet with NLP! 


In [None]:
import pyspark.sql.functions as F
import pandas as pd

spark = nlp.start()

👌 Detected license file /content/spark_nlp_for_healthcare_spark_ocr_8734_532.json
👌 Launched [92mcpu optimized[39m session with with: 🚀Spark-NLP==5.3.2, 💊Spark-Healthcare==5.3.2, running on ⚡ PySpark==3.4.0


## Helper fuction to display pipeline results

In [None]:
def get_df(light_result, chunk_column):
    chunks = []
    entities = []
    sentence = []
    begin = []
    end = []
    confidence = []
    ner_source = []

    for n in light_result[0][chunk_column]:
        begin.append(n.begin)
        end.append(n.end)
        chunks.append(n.result)
        entities.append(n.metadata['entity'])
        sentence.append(n.metadata['sentence'])
        confidence.append(n.metadata['confidence'])
        ner_source.append(n.metadata['ner_source'])

    df_result = pd.DataFrame({
        chunk_column : chunks,
        'begin': begin,
        'end': end,
        'sentence_id': sentence,
        'entities': entities,
        'ner_source': ner_source,
        'confidence': confidence
    })

    return df_result


## **🖨️ Input/Output Annotation Types**

- Input: ` CHUNK`,  `CHUNK`

- Output: ` CHUNK`

## **🔎 Parameters**


`ChunkMergeModel` has the same parameters as `ChunkMergeApproach` except for `setFalsePositivesResource` and `setReplaceDictResource` parameters.
- `inputCols`: The name of the columns containing the input annotations. It can read either a String column or an Array.
- `outputCol`: The name of the column in Document type that is generated. We can specify only one column here.v
- `mergeOverlapping`: (Boolean) Sets whether to merge overlapping matched chunks. Default `True`.
- `blackList`: (String List) If defined, list of entities to ignore. The rest will be processed.
- `whiteList`: (String List) If defined, list of entities to accept.
- `selectionStrategy`: (String) Sets Whether to select annotations sequentially based on annotation order `Sequential` or using any other available strategy; currently only `Sequential` and `DiverseLonger` are available. Default `DiverseLonger`.
- `orderingFeatures`: (String List) The ordering features to use for overlapping entities. Possible values are `ChunkBegin, ChunkLength, ChunkPrecedence, ChunkConfidence.`
- `defaultConfidence`: (Float) Sets when ChunkConfidence ordering feature is included and a given annotation does not have any confidence. The value of this param will be used as confidence score for annotations without a confidence score.
- `chunkPrecedence`: (String List) Sets the precedence order when a chunk is labeled by two models.
- `chunkPrecedenceValuePrioritization`: (String List) Sets when ChunkPrecedence ordering feature is used. This param contains an Array of comma-separated values representing the desired order of prioritization for the values in the metadata fields included from chunkPrecedence.

All the parameters can be set using the corresponding set method in camel case. For example, `.setInputcols()`.

## Saving a `ChunkMergeApproach` model

Here is a pipeline that uses 3 different annotators with `chunk` outputs. We will merge all these `chunk` columns into one `merged_chunk` column.

In [None]:
# Annotator that transforms a text column from dataframe into an Annotation ready for NLP
documentAssembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

# Sentence Detector annotator, processes various sentences per line
sentenceDetector = nlp.SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

# Tokenizer splits words in a relevant format for NLP
tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

# Clinical word embeddings trained on PubMED dataset
word_embeddings = nlp.WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")


# 1- ner_clinical model
clinical_ner = medical.NerModel.pretrained("ner_clinical", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("clinical_ner")

clinical_ner_converter = medical.NerConverterInternal() \
    .setInputCols(["sentence", "token", "clinical_ner"]) \
    .setOutputCol("clinical_ner_chunk")


# 2- posology ner model
posology_ner = medical.NerModel.pretrained("ner_posology", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("posology_ner")

posology_ner_converter = medical.NerConverterInternal() \
    .setInputCols(["sentence", "token", "posology_ner"]) \
    .setOutputCol("posology_ner_chunk")


# 3- generate a text matcher annotator that extracts female related entities
entities = ['she', 'her', 'girl', 'woman', 'women', 'womanish', 'womanlike', 'womanly', 'madam', 'madame', 'senora', 'lady', 'miss', 'girlfriend', 'wife', 'bride', 'misses', 'mrs.', 'female']
with open ('female_entities.txt', 'w') as f:
    for i in entities:
        f.write(i+'\n')

# Find female entities using TextMatcher
female_entity_extractor = nlp.TextMatcher() \
    .setInputCols(["sentence",'token'])\
    .setOutputCol("female_entities")\
    .setEntities("female_entities.txt")\
    .setCaseSensitive(False)\
    .setEntityValue('female_entity')


embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_clinical download started this may take some time.
[OK!]
ner_posology download started this may take some time.
[OK!]


Below `ChunkMergeApproach` annotator is used to merge `ner_posology`, `ner_clinical` ner models, and `female_entities` TextMatcher output. Then we will save `ChunkMergeApproach` annotator to be used in `ChunkMergeModel` later.

In [None]:
from sparknlp.common.read_as import ReadAs

# Dictionary to rename NER labels
replace_dict = """
PROBLEM,CLINICAL_PROBLEM
female_entity,FEMALE_GENDER
"""
with open('replace_dict.csv', 'w') as f:
    f.write(replace_dict)


# Chunk Merge annotator is used to merge columns
chunk_merger = medical.ChunkMergeApproach()\
    .setInputCols("posology_ner_chunk", 'clinical_ner_chunk', "female_entities")\
    .setOutputCol('merged_ner_chunk')\
    .setMergeOverlapping(True) \
    .setOrderingFeatures(["ChunkConfidence"])\
    .setSelectionStrategy("Sequential")\
    .setDefaultConfidence(0.8)\
    .setReplaceDictResource('replace_dict.csv', read_as=ReadAs.TEXT, options={'delimiter': ','})\
    .setBlackList(["DURATION"])

nlpPipeline = nlp.Pipeline(stages=[
    documentAssembler,
    sentenceDetector,
    tokenizer,
    word_embeddings,
    clinical_ner,
    clinical_ner_converter,
    posology_ner,
    posology_ner_converter,
    female_entity_extractor,
    chunk_merger])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_data)

In [None]:
sample_text = """A 28 year old female with a history of gestational diabetes mellitus diagnosed eight years prior to
presentation and subsequent type two diabetes mellitus ( T2DM ), one prior episode of HTG-induced pancreatitis
three years prior to presentation , associated with an acute hepatitis , and obesity with a body mass index
( BMI ) of 33.5 kg/m2 , presented with a one-week history of polyuria , polydipsia , poor appetite , and vomiting.
Two weeks prior to presentation , The lady was treated with a five-day course of amoxicillin for a respiratory tract infection .
She was on metformin , glipizide , and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG .
The woman had been on dapagliflozin for six months at the time of presentation ."""

light_model = nlp.LightPipeline(model)

light_result = light_model.fullAnnotate(sample_text)

In [None]:
get_df(light_result, 'merged_ner_chunk').fillna("text_matcher")

Unnamed: 0,merged_ner_chunk,begin,end,sentence_id,entities,ner_source,confidence
0,female,14,19,0,FEMALE_GENDER,text_matcher,text_matcher
1,gestational diabetes mellitus,39,67,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.93516666
2,subsequent type two diabetes mellitus,117,153,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.76208
3,T2DM,157,160,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.9934
4,HTG-induced pancreatitis,186,209,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.98039997
5,an acute hepatitis,263,280,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.95486665
6,obesity,288,294,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.997
7,a body mass index,301,317,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.83547497
8,BMI,321,323,0,TEST,clinical_ner_chunk,0.6753
9,polyuria,380,387,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.9901


Then save the `ChunkMergeApproach` annotator as a pretrained model.

In [None]:
model.stages[-1].write().overwrite().save('models/custom_merge_model')

Now re-use the saved model in another pipeline with `ChunkMergeModel` annotator and `load` function.

In [None]:
# ChunkMergeModel annotator loading  a saved model.
chunk_merger_model = medical.ChunkMergeModel.load('models/custom_merge_model')\
    # .setInputCols("posology_ner_chunk", 'clinical_ner_chunk', "female_entities")\
    # .setOutputCol('merged_ner_chunk')

nlpPipeline = nlp.Pipeline(stages=[
    documentAssembler,
    sentenceDetector,
    tokenizer,
    word_embeddings,
    clinical_ner,
    clinical_ner_converter,
    posology_ner,
    posology_ner_converter,
    female_entity_extractor,
    chunk_merger_model])

empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
light_model = nlp.LightPipeline(model)
light_result = light_model.fullAnnotate(sample_text)

get_df(light_result, 'merged_ner_chunk').fillna("text_matcher")

Unnamed: 0,merged_ner_chunk,begin,end,sentence_id,entities,ner_source,confidence
0,female,14,19,0,FEMALE_GENDER,text_matcher,text_matcher
1,gestational diabetes mellitus,39,67,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.93516666
2,subsequent type two diabetes mellitus,117,153,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.76208
3,T2DM,157,160,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.9934
4,HTG-induced pancreatitis,186,209,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.98039997
5,an acute hepatitis,263,280,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.95486665
6,obesity,288,294,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.997
7,a body mass index,301,317,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.83547497
8,BMI,321,323,0,TEST,clinical_ner_chunk,0.6753
9,polyuria,380,387,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.9901


Same results as the [previous pipeline](#scrollTo=bNp_f_HJA4wr&line=1&uniqifier=1).

ChunkMergeModel can use the common parameters with the ChunkMergeApproach in the same way.

## Overriding saved model's parameters

Once a model is saved with its parameters, you can change this parameter after the model load.

Let's change the `setMergeOverlapping` parameter of the previously saved model with `False` to show all NER labels including overlapped ones.

In [None]:
# ChunkMergeModel annotator loading  a saved model.
chunk_merger_model = medical.ChunkMergeModel.load('models/custom_merge_model')\
    .setMergeOverlapping(False)
    # .setInputCols("posology_ner_chunk", 'clinical_ner_chunk', "female_entities")\
    # .setOutputCol('merged_ner_chunk')

nlpPipeline = nlp.Pipeline(stages=[
    documentAssembler,
    sentenceDetector,
    tokenizer,
    word_embeddings,
    clinical_ner,
    clinical_ner_converter,
    posology_ner,
    posology_ner_converter,
    female_entity_extractor,
    chunk_merger_model])

empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
light_model = nlp.LightPipeline(model)
light_result = light_model.fullAnnotate(sample_text)

get_df(light_result, 'merged_ner_chunk').fillna("text_matcher")

Unnamed: 0,merged_ner_chunk,begin,end,sentence_id,entities,ner_source,confidence
0,female,14,19,0,FEMALE_GENDER,text_matcher,text_matcher
1,gestational diabetes mellitus,39,67,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.93516666
2,subsequent type two diabetes mellitus,117,153,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.76208
3,T2DM,157,160,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.9934
4,HTG-induced pancreatitis,186,209,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.98039997
5,an acute hepatitis,263,280,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.95486665
6,obesity,288,294,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.997
7,a body mass index,301,317,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.83547497
8,BMI,321,323,0,TEST,clinical_ner_chunk,0.6753
9,polyuria,380,387,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.9901


## Using other parameters

Common parameters with `ChunkMergeApproach` can be used in `ChunkMergeModel` in the same way. Now we will use some of the parameters with `ChunkMergeModel`.


`setWhiteList` is used to filter some NER labels. In this case, we have to use the original NER labels of the NER model which is renamed during `ChunkMergeModel`.

In [None]:
# Chunk Merge with White List to include NER labels
chunk_merger_model = medical.ChunkMergeModel.load('models/custom_merge_model')\
    .setMergeOverlapping(True)\
    .setWhiteList(['DRUG', "PROBLEM"])


nlpPipeline = nlp.Pipeline(stages=[
    documentAssembler,
    sentenceDetector,
    tokenizer,
    word_embeddings,
    clinical_ner,
    clinical_ner_converter,
    posology_ner,
    posology_ner_converter,
    female_entity_extractor,
    chunk_merger_model])

empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
light_model = nlp.LightPipeline(model)
light_result = light_model.fullAnnotate(sample_text)

get_df(light_result, 'merged_ner_chunk')

Unnamed: 0,merged_ner_chunk,begin,end,sentence_id,entities,ner_source,confidence
0,gestational diabetes mellitus,39,67,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.93516666
1,subsequent type two diabetes mellitus,117,153,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.76208
2,T2DM,157,160,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.9934
3,HTG-induced pancreatitis,186,209,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.98039997
4,an acute hepatitis,263,280,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.95486665
5,obesity,288,294,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.997
6,a body mass index,301,317,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.83547497
7,polyuria,380,387,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.9901
8,polydipsia,391,400,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.9929
9,poor appetite,404,416,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.9979


Now let's prioritize according to NER model and NER model's label

In [None]:
chunk_merger_model = medical.ChunkMergeModel.load('models/custom_merge_model')\
    .setMergeOverlapping(True) \
    .setOrderingFeatures(["ChunkPrecedence"]) \
    .setChunkPrecedence('ner_source,entity')\
    .setChunkPrecedenceValuePrioritization(["clinical_ner_chunk,TREATMENT", "posology_ner_chunk,DRUG"])


nlpPipeline = nlp.Pipeline(stages=[
    documentAssembler,
    sentenceDetector,
    tokenizer,
    word_embeddings,
    clinical_ner,
    clinical_ner_converter,
    posology_ner,
    posology_ner_converter,
    female_entity_extractor,
    chunk_merger_model])

empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
light_model = nlp.LightPipeline(model)
light_result = light_model.fullAnnotate(sample_text)

get_df(light_result, 'merged_ner_chunk').fillna("text_matcher")

Unnamed: 0,merged_ner_chunk,begin,end,sentence_id,entities,ner_source,confidence
0,female,14,19,0,FEMALE_GENDER,text_matcher,text_matcher
1,gestational diabetes mellitus,39,67,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.93516666
2,subsequent type two diabetes mellitus,117,153,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.76208
3,T2DM,157,160,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.9934
4,HTG-induced pancreatitis,186,209,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.98039997
5,an acute hepatitis,263,280,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.95486665
6,obesity,288,294,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.997
7,a body mass index,301,317,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.83547497
8,BMI,321,323,0,TEST,clinical_ner_chunk,0.6753
9,polyuria,380,387,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.9901


Let's use setBlackList for NER labels of the `ChunkMergeModel` output. In this case, we have to use the original NER label of the NER model which is renamed during `ChunkMergeModel`.

In [None]:
chunk_merger_model = medical.ChunkMergeModel.load('models/custom_merge_model')\
    .setMergeOverlapping(True)\
    .setWhiteList(['PROBLEM'])


nlpPipeline = nlp.Pipeline(stages=[
    documentAssembler,
    sentenceDetector,
    tokenizer,
    word_embeddings,
    clinical_ner,
    clinical_ner_converter,
    posology_ner,
    posology_ner_converter,
    female_entity_extractor,
    chunk_merger_model])

empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
light_model = nlp.LightPipeline(model)
light_result = light_model.fullAnnotate(sample_text)

get_df(light_result, 'merged_ner_chunk').fillna("text_matcher")

Unnamed: 0,merged_ner_chunk,begin,end,sentence_id,entities,ner_source,confidence
0,gestational diabetes mellitus,39,67,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.93516666
1,subsequent type two diabetes mellitus,117,153,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.76208
2,T2DM,157,160,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.9934
3,HTG-induced pancreatitis,186,209,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.98039997
4,an acute hepatitis,263,280,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.95486665
5,obesity,288,294,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.997
6,a body mass index,301,317,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.83547497
7,polyuria,380,387,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.9901
8,polydipsia,391,400,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.9929
9,poor appetite,404,416,0,CLINICAL_PROBLEM,clinical_ner_chunk,0.9979
