![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/35.Voice_of_Patient_Models.ipynb)

# **Voice of Patient Models**

This notebook includes details about different kinds of pretrained models to extracts healthcare-related terms from the documents transferred from the patient’s own sentences, together with examples of each type of model.

## Healthcare NLP for Data Scientists Course

If you are not familiar with the components in this notebook, you can check [Healthcare NLP for Data Scientists Udemy Course](https://www.udemy.com/course/healthcare-nlp-for-data-scientists/) and the [MOOC Notebooks](https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/Spark_NLP_Udemy_MOOC/Healthcare_NLP) for each components.

## Setup

In [None]:
import json
import os

from google.colab import files

if 'spark_jsl.json' not in os.listdir():
  license_keys = files.upload()
  os.rename(list(license_keys.keys())[0], 'spark_jsl.json')

with open('spark_jsl.json') as f:
    license_keys = json.load(f)

# Defining license key-value pairs as local variables
locals().update(license_keys)
os.environ.update(license_keys)

In [None]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.5.1 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

# Installing Spark NLP Display Library for visualization
! pip install -q spark-nlp-display

In [3]:
import sparknlp
import sparknlp_jsl

from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp_jsl.annotator import *
from sparknlp_jsl.pretrained import InternalResourceDownloader

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.ml import Pipeline,PipelineModel

import pandas as pd
pd.set_option('display.max_colwidth', 200)

import warnings
warnings.filterwarnings('ignore')

params = {"spark.driver.memory":"16G",
          "spark.kryoserializer.buffer.max":"2000M",
          "spark.driver.maxResultSize":"2000M"}

print("Spark NLP Version :", sparknlp.version())
print("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark = sparknlp_jsl.start(license_keys['SECRET'],params=params)

spark

Spark NLP Version : 6.1.3
Spark NLP_JSL Version : 6.1.1


## **List of Pretrained Models**

In [4]:
df = pd.DataFrame()
for model_type in ['MedicalNerModel', 'AssertionDLModel', 'MedicalBertForSequenceClassification','GenericClassifierModel']:
    model_list = sorted(list(set([model[0] for model in InternalResourceDownloader.returnPrivateModels(model_type) if 'vop' in model[0]])))
    if len(model_list) > 0:
      if model_type == "MedicalNerModel":
        model_list = list(filter(lambda x: "wip" not in x, model_list))
      df = pd.concat([df, pd.DataFrame(model_list, columns = [model_type])], axis = 1)

df.fillna('')

Unnamed: 0,MedicalNerModel,AssertionDLModel,MedicalBertForSequenceClassification
0,ner_vop,assertion_vop_clinical,bert_sequence_classifier_vop_adverse_event
1,ner_vop_anatomy,assertion_vop_clinical_large,bert_sequence_classifier_vop_adverse_event_onnx
2,ner_vop_anatomy_emb_clinical_large,assertion_vop_clinical_medium,bert_sequence_classifier_vop_drug_side_effect
3,ner_vop_anatomy_emb_clinical_medium,,bert_sequence_classifier_vop_drug_side_effect_onnx
4,ner_vop_anatomy_langtest,,bert_sequence_classifier_vop_hcp_consult
5,ner_vop_clinical_dept,,bert_sequence_classifier_vop_hcp_consult_onnx
6,ner_vop_clinical_dept_emb_clinical_large,,bert_sequence_classifier_vop_self_report
7,ner_vop_clinical_dept_emb_clinical_medium,,bert_sequence_classifier_vop_self_report_onnx
8,ner_vop_clinical_dept_langtest,,bert_sequence_classifier_vop_side_effect
9,ner_vop_demographic,,bert_sequence_classifier_vop_side_effect_onnx


## NER Models

The NER models from the list include different entity groups and levels of granularity.

In [5]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")\
    .setSplitChars(["-", "\/"])

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")\
    .setInputCols(["sentence","token"])\
    .setOutputCol("embeddings")


## ner_vop_treatment
ner_vop_treatment = MedicalNerModel.pretrained("ner_vop_treatment", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner_vop_treatment")

ner_converter_vop_treatment = NerConverterInternal() \
    .setInputCols(["sentence", "token", "ner_vop_treatment"]) \
    .setOutputCol("ner_chunk_vop_treatment")

## ner_vop
ner_vop = MedicalNerModel.pretrained("ner_vop", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner_vop")

ner_converter_vop = NerConverterInternal() \
    .setInputCols(["sentence", "token", "ner_vop"]) \
    .setOutputCol("ner_chunk_vop")

## ner_vop_test
ner_vop_test = MedicalNerModel.pretrained("ner_vop_test", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner_vop_test")

ner_converter_vop_test = NerConverterInternal() \
    .setInputCols(["sentence", "token", "ner_vop_test"]) \
    .setOutputCol("ner_chunk_vop_test")

ner_pipeline = Pipeline(
    stages=[
        document_assembler,
        sentence_detector,
        tokenizer,
        word_embeddings,
        ner_vop_treatment,
        ner_converter_vop_treatment,
        ner_vop,
        ner_converter_vop,
        ner_vop_test,
        ner_converter_vop_test
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

ner_model = ner_pipeline.fit(empty_data)

sentence_detector_dl_healthcare download started this may take some time.
Approximate size to download 367.3 KB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_vop_treatment download started this may take some time.
Approximate size to download 3.6 MB
[OK!]
ner_vop download started this may take some time.
Approximate size to download 3.7 MB
[OK!]
ner_vop_test download started this may take some time.
Approximate size to download 3.6 MB
[OK!]


In [6]:
ner_vop_labels = sorted(list(set([label.split('-')[-1] for label in ner_vop.getClasses() if label != 'O'])))

len(ner_vop_labels)

31

In [7]:
label_df = pd.DataFrame()
for column in range((len(ner_vop_labels)//10)+1):
  label_df = pd.concat([label_df, pd.DataFrame(ner_vop_labels, columns = [''])[column*10:(column+1)*10].reset_index(drop= True)], axis = 1)

label_df.fillna('')

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,AdmissionDischarge,Employment,PsychologicalCondition,VitalTest
1,Age,Form,RelationshipStatus,
2,Allergen,Frequency,Route,
3,BodyPart,Gender,Substance,
4,ClinicalDept,HealthStatus,SubstanceQuantity,
5,DateTime,InjuryOrPoisoning,Symptom,
6,Disease,Laterality,Test,
7,Dosage,MedicalDevice,TestResult,
8,Drug,Modifier,Treatment,
9,Duration,Procedure,Vaccine,


In [8]:
ner_vop_treatment_labels = sorted(list(set([label.split('-')[-1] for label in ner_vop_treatment.getClasses() if label != 'O'])))

print(ner_vop_treatment_labels)

['Dosage', 'Drug', 'Duration', 'Form', 'Frequency', 'Procedure', 'Route', 'Treatment']


In [9]:
ner_vop_test_labels = sorted(list(set([label.split('-')[-1] for label in ner_vop_test.getClasses() if label != 'O'])))

print(ner_vop_test_labels)

['Measurements', 'Test', 'TestResult', 'VitalTest']


In [10]:
sample_text_1 = '''Hello, I am a 20-year-old woman who was diagnosed with hyperthyroidism around a month ago. For approximately four months, I've been experiencing symptoms such as feeling light-headed, battling poor digestion, dealing with anxiety attacks, depression, a sharp pain on my left side chest, an elevated heart rate, and a significant loss of weight. Due to these conditions, I was admitted to the hospital and just got discharged recently. During my hospital stay, a number of different tests were carried out by various physicians who initially struggled to pinpoint my actual medical condition. These tests included numerous blood tests, a brain MRI, an ultrasound scan, and an endoscopy. At long last, I was examined by a homeopathic doctor who finally diagnosed me with hyperthyroidism, indicating my TSH level was at a low 0.15 while my T3 and T4 levels were normal. Additionally, I was found to be deficient in vitamins B12 and D. Hence, I've been on a regimen of vitamin D supplements once a week and a daily dose of 1000 mcg of vitamin B12. I've been undergoing homeopathic treatment for the last 40 days and underwent a second test after a month which showed my TSH level increased to 0.5. While I'm noticing a slight improvement in my feelings of weakness and depression, over the last week, I've encountered two new challenges: difficulty breathing and a dramatically increased heart rate. I'm now at a crossroads where I am unsure if I should switch to allopathic treatment or continue with homeopathy. I understand that thyroid conditions take a while to improve, but I'm wondering if both treatments would require the same duration for recovery. Several of my acquaintances have recommended transitioning to allopathy and warn against taking risks, given the potential of developing severe complications. Please forgive any errors in my English and thank you for your understanding.'''

sample_text_2 = '''Following a visit to the nephrology department for a routine kidney function check-up, I underwent a urine test. The results revealed that I was suffering from chronic kidney disease, prompting the initiation of necessary medication for its control.'''

sample_text_3 = '''My grandmother was identified with high cholesterol and had to alter her daily habits. She also has to consume statins and eat a low-sodium diet to maintain her cholesterol levels. It's required a significant adaptation, but she's managing quite well.'''

In [11]:
data = spark.createDataFrame(pd.DataFrame([sample_text_1, sample_text_2, sample_text_3], columns = ['text']))

In [12]:
results = ner_model.transform(data).collect()

In [13]:
results[0]['ner_chunk_vop']

[Row(annotatorType='chunk', begin=14, end=24, result='20-year-old', metadata={'sentence': '0', 'chunk': '0', 'ner_source': 'ner_chunk_vop', 'entity': 'Age', 'confidence': '0.85653335'}, embeddings=[]),
 Row(annotatorType='chunk', begin=26, end=30, result='woman', metadata={'sentence': '0', 'chunk': '1', 'ner_source': 'ner_chunk_vop', 'entity': 'Gender', 'confidence': '0.9999'}, embeddings=[]),
 Row(annotatorType='chunk', begin=55, end=69, result='hyperthyroidism', metadata={'sentence': '0', 'chunk': '2', 'ner_source': 'ner_chunk_vop', 'entity': 'Disease', 'confidence': '0.9747'}, embeddings=[]),
 Row(annotatorType='chunk', begin=78, end=88, result='a month ago', metadata={'sentence': '0', 'chunk': '3', 'ner_source': 'ner_chunk_vop', 'entity': 'DateTime', 'confidence': '0.6047333'}, embeddings=[]),
 Row(annotatorType='chunk', begin=109, end=119, result='four months', metadata={'sentence': '1', 'chunk': '4', 'ner_source': 'ner_chunk_vop', 'entity': 'Duration', 'confidence': '0.6612'}, em

In [14]:
from sparknlp_display import NerVisualizer

visualiser = NerVisualizer()

In [15]:
from google.colab import widgets

t = widgets.TabBar(["ner_vop_treatment", "ner_vop_test", "ner_vop"])

with t.output_to(0):
    visualiser.display(results[2], label_col='ner_chunk_vop_treatment')

with t.output_to(1):
    visualiser.display(results[1], label_col='ner_chunk_vop_test')

with t.output_to(2):
    visualiser.display(results[0], label_col='ner_chunk_vop')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Assertion Model

<div align="center">

|    | model_name              |Predicted Entities|
|---:|:------------------------|-|
| 1        | [assertion_vop_clinical](https://nlp.johnsnowlabs.com/2023/08/17/assertion_vop_clinical_en.html)     | Hypothetical_Or_Absent, Present_Or_Past, SomeoneElse |
| 2          | [assertion_vop_clinical_medium](https://nlp.johnsnowlabs.com/2023/08/17/assertion_vop_clinical_medium_en.html)       | Hypothetical_Or_Absent, Present_Or_Past, SomeoneElse |
| 3          | [assertion_vop_clinical_large](https://nlp.johnsnowlabs.com/2023/08/17/assertion_vop_clinical_large_en.html)       | Hypothetical_Or_Absent, Present_Or_Past, SomeoneElse |
|||


</div>

[Assertion status model](https://nlp.johnsnowlabs.com/2023/08/17/assertion_vop_clinical_en.html) used to predict if an NER chunk refers to a positive finding from the patient (Present_Or_Past), or if it refers to a family member or another person (SomeoneElse) or if it is mentioned but not as something present (Hypothetical_Or_Absent).

In [16]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer() \
    .setInputCols(["sentence"]) \
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel().pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("embeddings")

ner = MedicalNerModel.pretrained("ner_vop", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = NerConverterInternal() \
    .setInputCols(["sentence", "token", "ner"]) \
    .setOutputCol("ner_chunk")\
    .setBlackList(['DATETIME',  'GENDER', 'AGE', 'SUBSTANCEQUANTITY','FORM', 'ADMISSIONDISCHARGE', 'TESTRESULT', 'TEST',
                  'MEDICALDEVICE','CLINICALDEPT','DRUG', 'ROUTE', 'DURATION',"DOSAGE",'FREQUENCY', 'BODYPART',
                   ])

assertion = AssertionDLModel.pretrained("assertion_vop_clinical", "en", "clinical/models") \
    .setInputCols(["sentence", "ner_chunk", "embeddings"]) \
    .setOutputCol("assertion")

pipeline = Pipeline(
    stages=[
        document_assembler,
        sentence_detector,
        tokenizer,
        word_embeddings,
        ner,
        ner_converter,
        assertion
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

asr_pipe = pipeline.fit(empty_data)

sentence_detector_dl_healthcare download started this may take some time.
Approximate size to download 367.3 KB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_vop download started this may take some time.
Approximate size to download 3.7 MB
[OK!]
assertion_vop_clinical download started this may take some time.
Approximate size to download 919.9 KB
[OK!]


In [17]:
assertion.getClasses()


['Hypothetical_Or_Absent', 'Present_Or_Past', 'SomeoneElse']

In [18]:
sample_text = '''Hello, I am a 20-year-old woman who was diagnosed with hyperthyroidism around a month ago. For approximately four months, I've been experiencing symptoms such as feeling light-headed, battling poor digestion, dealing with anxiety attacks, depression, a sharp pain on my left side chest, an elevated heart rate, and a significant loss of weight. Due to these conditions, I was admitted to the hospital and just got discharged recently. During my hospital stay, a number of different tests were carried out by various physicians who initially struggled to pinpoint my actual medical condition. These tests included numerous blood tests, a brain MRI, an ultrasound scan, and an endoscopy. At long last, I was examined by a homeopathic doctor who finally diagnosed me with hyperthyroidism, indicating my TSH level was at a low 0.15 while my T3 and T4 levels were normal. Additionally, I was found to be deficient in vitamins B12 and D. Hence, I've been on a regimen of vitamin D supplements once a week and a daily dose of 1000 mcg of vitamin B12. I've been undergoing homeopathic treatment for the last 40 days and underwent a second test after a month which showed my TSH level increased to 0.5. While I'm noticing a slight improvement in my feelings of weakness and depression, over the last week, I've encountered two new challenges: difficulty breathing and a dramatically increased heart rate. I'm now at a crossroads where I am unsure if I should switch to allopathic treatment or continue with homeopathy. I understand that thyroid conditions take a while to improve, but I'm wondering if both treatments would require the same duration for recovery. Several of my acquaintances have recommended transitioning to allopathy and warn against taking risks, given the potential of developing severe complications. Please forgive any errors in my English and thank you for your understanding.'''

lp = LightPipeline(asr_pipe)

lr = lp.fullAnnotate([sample_text])[0]

In [19]:
from sparknlp_display import AssertionVisualizer

vis = AssertionVisualizer()

vis.display(lr, 'ner_chunk', 'assertion')

## Classification Model

In [20]:
document_assembler = DocumentAssembler() \
    .setInputCol('text') \
    .setOutputCol('document')

tokenizer = Tokenizer() \
    .setInputCols(['document']) \
    .setOutputCol('token')

sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_vop_side_effect", "en", "clinical/models")\
    .setInputCols(["document",'token'])\
    .setOutputCol("prediction")

pipeline = Pipeline(
    stages=[
        document_assembler,
        tokenizer,
        sequenceClassifier
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_data)

bert_sequence_classifier_vop_side_effect download started this may take some time.
Approximate size to download 387.6 MB
[OK!]


In [21]:
sample_text = '''Hello, folks! Recently, my physician prescribed a medication named "SereniCalm" for my stress issues, but instead of soothing my nerves, it transformed me into a sluggish, apathetic shadow. I found myself roaming about as if I was running on severe sleep deprivation, devoid of any emotions or vitality. It was as though my mind was stuck in a perpetual state of standby. Certainly not the kind of stress relief I was expecting, right?'''

In [22]:
classification_data = spark.createDataFrame(pd.DataFrame([sample_text], columns = ['text']))

In [23]:
classification_results = model.transform(classification_data)

In [24]:
classification_results.select("text", "prediction.result").show(truncate=False)

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
|text                                                                                                                                                                                                                                                                                                                                                                                                                                               |result |
+-----------------------------------------------------------------------------------------------------------

## Pretrained NER Profiling Pipelines

We can use pretrained NER profiling pipelines for exploring all the available pretrained NER models at once.

- `ner_profiling_vop` : Returns results for vop NER models.

For more examples, please check [this notebook](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/11.2.Pretrained_NER_Profiling_Pipelines.ipynb).





<center><b>NER Profiling VOP Model List</b>

|  |  |  |
|--------------|-----------------|-----------------|
| ner_vop_clinical_dept | ner_vop_temporal | ner_vop_test |
| ner_vop | ner_vop_problem | ner_vop_problem_reduced |
| ner_vop_demographic | ner_vop_anatomy | ner_vop_treatment |




</center>

In [25]:
from sparknlp.pretrained import PretrainedPipeline

vop_profiling_pipeline = PretrainedPipeline("ner_profiling_vop", "en", "clinical/models")

ner_profiling_vop download started this may take some time.
Approx size to download 1.8 GB
[OK!]


In [26]:
text = """Hello, I am a 20-year-old woman who was diagnosed with hyperthyroidism around a month ago.For approximately four months, I've been experiencing symptoms such as feeling light-headed, battling poor digestion, dealing with anxiety attacks, depression, a sharp pain on my left side chest, an elevated heart rate, and a significant loss of weight. Due to these conditions, I was admitted to the hospital and just got discharged recently."""

In [27]:
vop_result = vop_profiling_pipeline.fullAnnotate(text)[0]
vop_result.keys()

dict_keys(['ner_chunk_vop_problem_reduced', 'vop_clinical_dept_langtest_ner', 'vop_ner', 'ner_chunk_jsl_greedy', 'vop_problem_ner', 'vop_problem_reduced_langtest_ner', 'ner_chunk_jsl_enriched', 'vop_langtest_ner', 'vop_treatment_langtest_ner', 'document', 'ner_chunk_jsl_slim', 'jsl_langtest_ner', 'vop_temporal_langtest_ner', 'ner_chunk_vop_problem_reduced_langtest', 'vop_test_ner', 'jsl_greedy_ner', 'vop_demographic_ner', 'vop_anatomy_langtest_ner', 'ner_chunk_vop_clinical_dept_ner', 'ner_chunk_vop_problem', 'jsl_enriched_ner', 'ner_chunk_vop', 'ner_chunk_vop_clinical_dept_langtest_ner', 'ner_chunk_jsl', 'ner_chunk_vop_anatomy', 'ner_chunk_vop_langtest', 'ner_chunk_vop_treatment_langtest', 'vop_problem_langtest_ner', 'jsl_slim_ner', 'vop_anatomy_ner', 'jsl_ner', 'ner_chunk_vop_test_langtest_ner', 'vop_temporal_ner', 'ner_chunk_vop_test_ner', 'ner_chunk_vop_treatment', 'token', 'vop_test_langtest_ner', 'vop_problem_reduced_ner', 'vop_treatment_ner', 'ner_chunk_vop_temporal', 'ner_chunk_

In [30]:
def get_token_results(light_result):

    tokens = [j.result for j in light_result["token"]]
    sentences = [j.metadata["sentence"] for j in light_result["token"]]
    begins = [j.begin for j in light_result["token"]]
    ends = [j.end for j in light_result["token"]]
    model_list = [ a for a in light_result.keys() if (a not in ["sentence", "token"] and "_chunks" not in a)]

    df = pd.DataFrame({'sentence':sentences, 'begin': begins, 'end': ends, 'token':tokens})

    for model_name in model_list:

        temp_df = pd.DataFrame(light_result[model_name])
        temp_df["jsl_label"] = temp_df.iloc[:,0].apply(lambda x : x.result)
        temp_df = temp_df[["jsl_label"]]

        # temp_df = get_ner_result(model_name)
        temp_df.columns = [model_name]
        df = pd.concat([df, temp_df], axis=1)

    # Filter columns to include only sentence, begin, end, token and all columns that start with 'ner_vop'
    filtered_df = df.loc[:, ['sentence', 'begin', 'end', 'token'] + [col for col in df.columns if col.startswith('vop')]]

    return filtered_df

In [31]:
get_token_results(vop_result)

Unnamed: 0,sentence,begin,end,token,vop_clinical_dept_langtest_ner,vop_ner,vop_problem_ner,vop_problem_reduced_langtest_ner,vop_langtest_ner,vop_treatment_langtest_ner,...,vop_test_ner,vop_demographic_ner,vop_anatomy_langtest_ner,vop_problem_langtest_ner,vop_anatomy_ner,vop_temporal_ner,vop_test_langtest_ner,vop_problem_reduced_ner,vop_treatment_ner,vop_clinical_dept_ner
0,0,0,4,Hello,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
1,0,5,5,",",O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
2,0,7,7,I,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
3,0,9,10,am,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
4,0,12,12,a,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75,2,404,407,just,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
76,2,409,411,got,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
77,2,413,422,discharged,B-AdmissionDischarge,B-AdmissionDischarge,O,O,B-AdmissionDischarge,O,...,O,O,O,O,O,O,O,O,O,B-AdmissionDischarge
78,2,424,431,recently,O,B-DateTime,O,O,B-DateTime,O,...,O,O,O,O,O,B-DateTime,O,O,O,O
