![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/healthcare-nlp/07.0.Pretrained_Clinical_Pipelines.ipynb)

# Pretrained_Clinical_Pipelines

## Colab Setup

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install -q johnsnowlabs

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

In [3]:
from johnsnowlabs import nlp, medical, visual
# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM

nlp.install()

👌 Detected license file /content/5.2.1.spark_nlp_for_healthcare.json
🚨 Outdated Medical Secrets in license file. Version=5.2.1.PR but should be Version=5.2.1
📋 Stored John Snow Labs License in /root/.johnsnowlabs/licenses/license_number_0_for_Spark-Healthcare_Spark-OCR.json
👷 Setting up  John Snow Labs home in /root/.johnsnowlabs, this might take a few minutes.
Downloading 🐍+🚀 Python Library spark_nlp-5.2.2-py2.py3-none-any.whl
Downloading 🐍+💊 Python Library spark_nlp_jsl-5.2.1-py3-none-any.whl
Downloading 🫘+🚀 Java Library spark-nlp-assembly-5.2.2.jar
Downloading 🫘+💊 Java Library spark-nlp-jsl-5.2.1.jar
🙆 JSL Home setup in /root/.johnsnowlabs
👌 Detected license file /content/5.2.1.spark_nlp_for_healthcare.json
Installing /root/.johnsnowlabs/py_installs/spark_nlp_jsl-5.2.1-py3-none-any.whl to /usr/bin/python3
Installed 1 products:
💊 Spark-Healthcare==5.2.1 installed! ✅ Heal the planet with NLP! 


In [4]:
from johnsnowlabs import nlp, medical
import pandas as pd
from pyspark.sql import functions as F

# Automatically load license data and start a session with all jars user has access to
spark = nlp.start()

👌 Detected license file /content/5.2.1.spark_nlp_for_healthcare.json
👌 Launched [92mcpu optimized[39m session with with: 🚀Spark-NLP==5.2.2, 💊Spark-Healthcare==5.2.1, running on ⚡ PySpark==3.1.2


In [5]:
spark

## Listing Models, Pipelines and Annotators

**You can print the list of clinical pretrained models/pipelines and annotators in Spark NLP with one-line code:**

In [6]:
# print PretrainedPipelines
medical.InternalResourceDownloader.showPrivatePipelines(lang='en')

# print models
# medical.InternalResourceDownloader.showPrivateModels(annotator="MedicalNerModel", lang='en')

# print annotators
# medical.InternalResourceDownloader.showAvailableAnnotators()

+--------------------------------------------------------------+------+---------+
| Pipeline                                                     | lang | version |
+--------------------------------------------------------------+------+---------+
| clinical_analysis                                            |  en  | 2.4.0   |
| clinical_ner_assertion                                       |  en  | 2.4.0   |
| clinical_deidentification                                    |  en  | 2.4.0   |
| explain_clinical_doc_ade                                     |  en  | 2.7.3   |
| recognize_entities_posology                                  |  en  | 3.0.0   |
| explain_clinical_doc_carp                                    |  en  | 3.0.0   |
| explain_clinical_doc_ade                                     |  en  | 3.0.0   |
| explain_clinical_doc_era                                     |  en  | 3.0.0   |
| icd10cm_snomed_mapping                                       |  en  | 3.0.2   |
| snomed_icd10cm

## Pretrained Pipelines

In order to save you from creating a pipeline from scratch, Spark NLP also has a pre-trained pipelines that are already fitted using certain annotators and transformers according to various use cases.

Here is the list of clinical pre-trained pipelines:

**1.   explain_clinical_doc_carp** :

> A pipeline with `ner_clinical`, `assertion_dl`, `re_clinical` and `ner_posology`. It will extract clinical and medication entities, assign assertion status and find relationships between clinical entities.

**2.   explain_clinical_doc_era** :

> A pipeline with `ner_clinical_events`, `assertion_dl` and `re_temporal_events_clinical`. It will extract clinical entities, assign assertion status and find temporal relationships between clinical entities.

**3.   explain_clinical_doc_ade** :

> A pipeline for `Adverse Drug Events (ADE)` with `ner_ade_biobert`, `assertiondl_biobert`, `classifierdl_ade_conversational_biobert` and `re_ade_biobert`. It will classify the document, extract `ADE` and `DRUG` entities, assign assertion status to `ADE` entities, and relate them with `DRUG` entities, then assign ADE status to a text (`True` means ADE, `False` means not related to ADE).

**letter codes in the naming conventions:**

> c : ner_clinical

> e : ner_clinical_events

> r : relation extraction

> p : ner_posology

> a : assertion

> ade : adverse drug events

**Relation Extraction types:**

`re_clinical` >> TrIP (improved), TrWP (worsened), TrCP (caused problem), TrAP (administered), TrNAP (avoided), TeRP (revealed problem), TeCP (investigate problem), PIP (problems related)

`re_temporal_events_clinical` >> `AFTER`, `BEFORE`, `OVERLAP`

**4. explain_clinical_doc_medication:**

> A pipeline for detecting posology entities with the `ner_posology_large` NER model, assigning their assertion status with `assertion_jsl` model, and extracting relations between posology-related terminology with `posology_re` relation extraction model.


**5. explain_clinical_doc_radiology**

> A pipeline for detecting radiology entities with the `ner_radiology` NER model, assigning their assertion status with `assertion_dl_radiology` model, and extracting relations between the diagnosis, test, and findings with `re_test_problem_finding` relation extraction model.

**6. Clinical Deidentification** :

>This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate `AGE`, `CONTACT`, `DATE`, `ID`, `LOCATION`, `NAME`, `PROFESSION`, `CITY`, `COUNTRY`, `DOCTOR`, `HOSPITAL`, `IDNUM`, `MEDICALRECORD`, `ORGANIZATION`, `PATIENT`, `PHONE`, `PROFESSION`, `STREET`, `USERNAME`, `ZIP`, `ACCOUNT`, `LICENSE`, `VIN`, `SSN`, `DLN`, `PLATE`, `IPADDR` entities.

**7. NER Pipelines:**

> Pipelines for all the available pretrained NER models.

**8. BERT Based NER Pipelines**

> Pipelines for all the available Bert token classification models.

**9. ner_profiling_clinical and ner_profiling_biobert:**

> Pipelines for exploring all the available pretrained NER models at once.

**10. ner_model_finder**

> A pipeline trained with bert embeddings that can be used to find the most appropriate NER model given the entity name.

**11. Resolver Pipelines**

> Pipelines for converting clinical entities to their UMLS CUI codes and medication entities to their ADE, Action, Treatment, UMLS, RxNorm, ICD9, SNOMED and NDC codes.

**12. Oncology Pipelines**

> Pipelines includes Named-Entity Recognition, Assertion Status, Relation Extraction and Entity Resolution models to extract information from oncology texts.


**Also, you can find clinical CODE MAPPING pretrained pipelines in this notebook: [Healthcare Code Mapping Notebook](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings_JSL/Healthcare/11.1.Healthcare_Code_Mapping.ipynb)**




## 1.explain_clinical_doc_carp

A pipeline with ner_clinical, assertion_dl, re_clinical and ner_posology. It will extract clinical and medication entities, assign assertion status and find relationships between clinical entities.

In [7]:
pipeline = nlp.PretrainedPipeline('explain_clinical_doc_carp', 'en', 'clinical/models')

explain_clinical_doc_carp download started this may take some time.
Approx size to download 1.6 GB
[OK!]


In [8]:
pipeline.model.stages

[DocumentAssembler_9619f8fd837c,
 SentenceDetector_c0b14c755033,
 REGEX_TOKENIZER_bcb8fead005d,
 POS_6f55785005bf,
 dependency_d5a8da6c9093,
 WORD_EMBEDDINGS_MODEL_9004b1d00302,
 MedicalNerModel_cd5ce67b529f,
 NER_CONVERTER_2f1dcb61b142,
 MedicalNerModel_4a303d875127,
 NER_CONVERTER_a8cff4d56af8,
 ASSERTION_DL_25881ab6309e,
 RelationExtractionModel_9c255241fec3]

In [None]:
# Load pretrained pipeline from local disk:

# pipeline_local = PretrainedPipeline.from_disk('/root/cache_pretrained/explain_clinical_doc_carp_en_4.4.4_3.0_1686978853744')

#### with fullAnnotate()

In [9]:
text ="""A 28-year-old female with a history of gestational diabetes mellitus, used to take metformin 1000 mg two times a day, presented with a one-week history of polyuria , polydipsia , poor appetite , and vomiting .
She was seen by the endocrinology service and discharged on 40 units of insulin glargine at night, 12 units of insulin lispro with meals.
"""

annotations = pipeline.annotate(text)

annotations.keys()


dict_keys(['sentences', 'clinical_ner_tags', 'document', 'clinical_ner_chunks', 'assertion', 'clinical_relations', 'posology_ner_tags', 'tokens', 'posology_ner_chunks', 'embeddings', 'pos_tags', 'dependencies'])

In [10]:
import pandas as pd

rows = list(zip(annotations['tokens'], annotations['clinical_ner_tags'], annotations['posology_ner_tags'], annotations['pos_tags'], annotations['dependencies']))

df = pd.DataFrame(rows, columns = ['tokens','clinical_ner_tags','posology_ner_tags','POS_tags','dependencies'])

df.head(20)

Unnamed: 0,tokens,clinical_ner_tags,posology_ner_tags,POS_tags,dependencies
0,A,O,O,DD,female
1,28-year-old,O,O,NN,female
2,female,O,O,NN,ROOT
3,with,O,O,II,history
4,a,O,O,DD,history
5,history,O,O,NN,female
6,of,O,O,II,history
7,gestational,B-PROBLEM,O,JJ,of
8,diabetes,I-PROBLEM,O,NN,mellitus
9,mellitus,I-PROBLEM,O,NN,gestational


In [11]:
text = 'Patient has a headache for the last 2 weeks and appears anxious when she walks fast. No alopecia noted. She denies pain'

result = pipeline.fullAnnotate(text)[0]

chunks=[]
entities=[]
status=[]

for n,m in zip(result['clinical_ner_chunks'],result['assertion']):

    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    status.append(m.result)

df = pd.DataFrame({'chunks':chunks, 'entities':entities, 'assertion':status})

df

Unnamed: 0,chunks,entities,assertion
0,a headache,PROBLEM,present
1,anxious,PROBLEM,present
2,alopecia,PROBLEM,absent
3,pain,PROBLEM,absent


In [12]:
text = """
The patient was prescribed 1 unit of Advil for 5 days after meals. The patient was also
given 1 unit of Metformin daily.
He was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night ,
12 units of insulin lispro with meals , and metformin 1000 mg two times a day.
"""

result = pipeline.fullAnnotate(text)[0]

chunks=[]
entities=[]
begins=[]
ends=[]

for n in result['posology_ner_chunks']:

    chunks.append(n.result)
    begins.append(n.begin)
    ends.append(n.end)
    entities.append(n.metadata['entity'])

df = pd.DataFrame({'chunks':chunks, 'begin':begins, 'end':ends, 'entities':entities})

df

Unnamed: 0,chunks,begin,end,entities
0,1 unit,28,33,DOSAGE
1,Advil,38,42,DRUG
2,for 5 days,44,53,DURATION
3,1 unit,95,100,DOSAGE
4,Metformin,105,113,DRUG
5,daily,115,119,FREQUENCY
6,40 units,189,196,DOSAGE
7,insulin glargine,201,216,DRUG
8,at night,218,225,FREQUENCY
9,12 units,229,236,DOSAGE


#### with transform()

In [13]:
data = spark.createDataFrame([["""A 28-year-old female with a history of gestational diabetes mellitus, used to take metformin 1000 mg two times a day, presented with a one-week history of polyuria , polydipsia , poor appetite , and vomiting .
She was seen by the endocrinology service and discharged on 40 units of insulin glargine at night, 12 units of insulin lispro with meals.
"""]]).toDF("text")

result = pipeline.transform(data)

result.select(F.explode(F.arrays_zip(result.tokens.result,
                                     result.clinical_ner_tags.result,
                                     result.posology_ner_tags.result,
                                     result.pos_tags.result,
                                     result.dependencies.result)).alias("cols"))\
        .select(F.expr("cols['0']").alias("tokens"),
                F.expr("cols['1']").alias("clinical_ner_tags"),
                F.expr("cols['2']").alias("posology_ner_tags"),
                F.expr("cols['3']").alias("pos_tags"),
                F.expr("cols['4']").alias("dependencies")).show()

+-----------+-----------------+-----------------+--------+------------+
|     tokens|clinical_ner_tags|posology_ner_tags|pos_tags|dependencies|
+-----------+-----------------+-----------------+--------+------------+
|          A|                O|                O|      DD|      female|
|28-year-old|                O|                O|      NN|      female|
|     female|                O|                O|      NN|        ROOT|
|       with|                O|                O|      II|     history|
|          a|                O|                O|      DD|     history|
|    history|                O|                O|      NN|      female|
|         of|                O|                O|      II|     history|
|gestational|        B-PROBLEM|                O|      JJ|          of|
|   diabetes|        I-PROBLEM|                O|      NN|    mellitus|
|   mellitus|        I-PROBLEM|                O|      NN| gestational|
|          ,|                O|                O|      NN|      

In [14]:
data = spark.createDataFrame([["""Patient has a headache for the last 2 weeks and appears anxious when she walks fast. No alopecia noted. She denies pain.
"""]]).toDF("text")

result = pipeline.transform(data)

result.select(F.explode(F.arrays_zip(result.clinical_ner_chunks.result,
                                     result.clinical_ner_chunks.begin,
                                     result.clinical_ner_chunks.end,
                                     result.clinical_ner_chunks.metadata,
                                     result.assertion.result)).alias("cols"))\
        .select(F.expr("cols['0']").alias("chunk"),
                F.expr("cols['1']").alias("begin"),
                F.expr("cols['2']").alias("end"),
                F.expr("cols['3']['entity']").alias("ner_label"),
                F.expr("cols['4']").alias("status")).show()

+----------+-----+---+---------+-------+
|     chunk|begin|end|ner_label| status|
+----------+-----+---+---------+-------+
|a headache|   12| 21|  PROBLEM|present|
|   anxious|   56| 62|  PROBLEM|present|
|  alopecia|   88| 95|  PROBLEM| absent|
|      pain|  115|118|  PROBLEM| absent|
+----------+-----+---+---------+-------+



In [15]:
data = spark.createDataFrame([["""
The patient was prescribed 1 unit of Advil for 5 days after meals. The patient was also
given 1 unit of Metformin daily.
He was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night ,
12 units of insulin lispro with meals , and metformin 1000 mg two times a day.
"""]]).toDF("text")

result = pipeline.transform(data)

result.select(F.explode(F.arrays_zip(result.posology_ner_chunks.result,
                                     result.posology_ner_chunks.begin,
                                     result.posology_ner_chunks.end,
                                     result.posology_ner_chunks.metadata)).alias("cols"))\
        .select(F.expr("cols['0']").alias("chunk"),
                F.expr("cols['1']").alias("begin"),
                F.expr("cols['2']").alias("end"),
                F.expr("cols['3']['entity']").alias("ner_label")).show()

+----------------+-----+---+---------+
|           chunk|begin|end|ner_label|
+----------------+-----+---+---------+
|          1 unit|   28| 33|   DOSAGE|
|           Advil|   38| 42|     DRUG|
|      for 5 days|   44| 53| DURATION|
|          1 unit|   95|100|   DOSAGE|
|       Metformin|  105|113|     DRUG|
|           daily|  115|119|FREQUENCY|
|        40 units|  189|196|   DOSAGE|
|insulin glargine|  201|216|     DRUG|
|        at night|  218|225|FREQUENCY|
|        12 units|  229|236|   DOSAGE|
|  insulin lispro|  241|254|     DRUG|
|      with meals|  256|265|FREQUENCY|
|       metformin|  273|281|     DRUG|
|         1000 mg|  283|289| STRENGTH|
| two times a day|  291|305|FREQUENCY|
+----------------+-----+---+---------+



## **2.   explain_clinical_doc_era**

> A pipeline with `ner_clinical_events`, `assertion_dl` and `re_temporal_events_clinical`. It will extract clinical entities, assign assertion status and find temporal relationships between clinical entities.



In [None]:
era_pipeline = nlp.PretrainedPipeline('explain_clinical_doc_era', 'en', 'clinical/models')

explain_clinical_doc_era download started this may take some time.
Approx size to download 1.6 GB
[OK!]


In [None]:
era_pipeline.model.stages

[DocumentAssembler_81ef1f17c7c1,
 SentenceDetector_0b67d45c215f,
 REGEX_TOKENIZER_7460db626996,
 POS_6f55785005bf,
 dependency_d5a8da6c9093,
 WORD_EMBEDDINGS_MODEL_9004b1d00302,
 MedicalNerModel_7cb29c8c904c,
 NER_CONVERTER_c619eb00b46c,
 RelationExtractionModel_14b00157fc1a,
 ASSERTION_DL_25881ab6309e]

In [None]:
text ="""She is admitted to The John Hopkins Hospital 2 days ago with a history of gestational diabetes mellitus diagnosed. She denied pain and any headache.
She was seen by the endocrinology service and she was discharged on 03/02/2018 on 40 units of insulin glargine,
12 units of insulin lispro, and metformin 1000 mg two times a day. She had close follow-up with endocrinology post discharge.
"""

result = era_pipeline.fullAnnotate(text)[0]


In [None]:
result.keys()

dict_keys(['sentences', 'clinical_ner_tags', 'document', 'clinical_ner_chunks', 'assertion', 'clinical_relations', 'tokens', 'embeddings', 'pos_tags', 'dependencies'])

In [None]:
chunks=[]
entities=[]
begins=[]
ends=[]

for n in result['clinical_ner_chunks']:

    chunks.append(n.result)
    begins.append(n.begin)
    ends.append(n.end)
    entities.append(n.metadata['entity'])

df = pd.DataFrame({'chunks':chunks, 'begin':begins, 'end':ends, 'entities':entities})

df

Unnamed: 0,chunks,begin,end,entities
0,admitted,7,14,OCCURRENCE
1,The John Hopkins Hospital,19,43,CLINICAL_DEPT
2,2 days ago,45,54,DATE
3,gestational diabetes mellitus,74,102,PROBLEM
4,denied,119,124,EVIDENTIAL
5,pain,126,129,PROBLEM
6,any headache,135,146,PROBLEM
7,the endocrinology service,165,189,CLINICAL_DEPT
8,discharged,203,212,OCCURRENCE
9,03/02/2018,217,226,DATE


In [None]:
chunks=[]
entities=[]
status=[]

for n,m in zip(result['clinical_ner_chunks'],result['assertion']):

    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    status.append(m.result)

df = pd.DataFrame({'chunks':chunks, 'entities':entities, 'assertion':status})

df

Unnamed: 0,chunks,entities,assertion
0,admitted,OCCURRENCE,present
1,The John Hopkins Hospital,CLINICAL_DEPT,present
2,2 days ago,DATE,present
3,gestational diabetes mellitus,PROBLEM,present
4,denied,EVIDENTIAL,absent
5,pain,PROBLEM,absent
6,any headache,PROBLEM,absent
7,the endocrinology service,CLINICAL_DEPT,present
8,discharged,OCCURRENCE,present
9,03/02/2018,DATE,present


In [None]:
import pandas as pd

def get_relations_df (results, col='relations'):
  rel_pairs=[]
  for rel in results[0][col]:
      rel_pairs.append((
          rel.result,
          rel.metadata['entity1'],
          rel.metadata['entity1_begin'],
          rel.metadata['entity1_end'],
          rel.metadata['chunk1'],
          rel.metadata['entity2'],
          rel.metadata['entity2_begin'],
          rel.metadata['entity2_end'],
          rel.metadata['chunk2'],
          rel.metadata['confidence']
      ))

  rel_df = pd.DataFrame(rel_pairs, columns=['relation','entity1','entity1_begin','entity1_end','chunk1','entity2','entity2_begin','entity2_end','chunk2', 'confidence'])

  rel_df.confidence = rel_df.confidence.astype(float)

  return rel_df

In [None]:
annotations = era_pipeline.fullAnnotate(text)

rel_df = get_relations_df (annotations, 'clinical_relations')

rel_df

Unnamed: 0,relation,entity1,entity1_begin,entity1_end,chunk1,entity2,entity2_begin,entity2_end,chunk2,confidence
0,AFTER,OCCURRENCE,7,14,admitted,CLINICAL_DEPT,19,43,The John Hopkins Hospital,0.96211
1,OVERLAP,OCCURRENCE,7,14,admitted,DATE,45,54,2 days ago,0.999708
2,BEFORE,OCCURRENCE,7,14,admitted,PROBLEM,74,102,gestational diabetes mellitus,0.999855
3,OVERLAP,CLINICAL_DEPT,19,43,The John Hopkins Hospital,DATE,45,54,2 days ago,0.857712
4,BEFORE,CLINICAL_DEPT,19,43,The John Hopkins Hospital,PROBLEM,74,102,gestational diabetes mellitus,0.905534
5,OVERLAP,DATE,45,54,2 days ago,PROBLEM,74,102,gestational diabetes mellitus,0.912592
6,BEFORE,EVIDENTIAL,119,124,denied,PROBLEM,126,129,pain,1.0
7,BEFORE,EVIDENTIAL,119,124,denied,PROBLEM,135,146,any headache,1.0
8,OVERLAP,PROBLEM,126,129,pain,PROBLEM,135,146,any headache,1.0
9,OVERLAP,CLINICAL_DEPT,165,189,the endocrinology service,OCCURRENCE,203,212,discharged,0.587001


In [None]:
annotations[0]['clinical_relations']

[Annotation(category, 7, 43, AFTER, {'chunk2': 'The John Hopkins Hospital', 'confidence': '0.9621104', 'entity2_end': '43', 'chunk1': 'admitted', 'entity1': 'OCCURRENCE', 'entity2_begin': '19', 'chunk2_confidence': '0.859', 'entity1_begin': '7', 'sentence': '0', 'direction': 'both', 'entity1_end': '14', 'entity2': 'CLINICAL_DEPT', 'chunk1_confidence': '0.9958'}, []),
 Annotation(category, 7, 54, OVERLAP, {'chunk2': '2 days ago', 'confidence': '0.99970835', 'entity2_end': '54', 'chunk1': 'admitted', 'entity1': 'OCCURRENCE', 'entity2_begin': '45', 'chunk2_confidence': '0.80329996', 'entity1_begin': '7', 'sentence': '0', 'direction': 'both', 'entity1_end': '14', 'entity2': 'DATE', 'chunk1_confidence': '0.9958'}, []),
 Annotation(category, 7, 102, BEFORE, {'chunk2': 'gestational diabetes mellitus', 'confidence': '0.99985516', 'entity2_end': '102', 'chunk1': 'admitted', 'entity1': 'OCCURRENCE', 'entity2_begin': '74', 'chunk2_confidence': '0.8622667', 'entity1_begin': '7', 'sentence': '0', '

## 3.explain_clinical_doc_ade

A pipeline for `Adverse Drug Events (ADE)` with `ner_ade_healthcare`, and `classifierdl_ade_biobert`. It will extract `ADE` and `DRUG` clinical entities, and then assign ADE status to a text(`True` means ADE, `False` means not related to ADE). Also extracts relations between `DRUG` and `ADE` entities (`1` means the adverse event and drug entities are related, `0` is not related).

In [None]:
ade_pipeline = nlp.PretrainedPipeline('explain_clinical_doc_ade', 'en', 'clinical/models')

explain_clinical_doc_ade download started this may take some time.
Approx size to download 462.6 MB
[OK!]


In [None]:
result = ade_pipeline.fullAnnotate("Been taking Lipitor for 15 years , have experienced severe fatigue a lot!!! . Doctor moved me to voltaren 2 months ago , so far , have only experienced cramps")

result[0].keys()

dict_keys(['bert_sentence_embeddings', 'bert_embeddings', 'document', 'ner_chunks_ade_assertion', 'ner_tags_ade', 'relations_ade_drug', 'ner_chunks_ade', 'assertion_ade', 'tokens', 'class', 'pos_tags', 'dependencies'])

In [None]:
result[0]['class'][0].metadata

{'sentence': '0', 'False': '0.0031648206', 'True': '0.9968352'}

In [None]:
text = """Been taking Lipitor for 15 years , have experienced severe fatigue a lot!!! .
Doctor moved me to voltaren 2 months ago , so far , have only experienced cramps"""

chunks = []
entities = []
begin =[]
end = []

print ('sentence:', text)
print()

result = ade_pipeline.fullAnnotate(text)

print ('ADE status:', result[0]['class'][0].result)

print ('prediction probability>> True : ', result[0]['class'][0].metadata['True'], \
        'False: ', result[0]['class'][0].metadata['False'])

for n in result[0]['ner_chunks_ade']:

    begin.append(n.begin)
    end.append(n.end)
    chunks.append(n.result)
    entities.append(n.metadata['entity'])

df = pd.DataFrame({'chunks':chunks, 'entities':entities,
                'begin': begin, 'end': end})

df


sentence: Been taking Lipitor for 15 years , have experienced severe fatigue a lot!!! .
Doctor moved me to voltaren 2 months ago , so far , have only experienced cramps

ADE status: True
prediction probability>> True :  0.9968352 False:  0.0031648206


Unnamed: 0,chunks,entities,begin,end
0,Lipitor,DRUG,12,18
1,severe fatigue,ADE,52,65
2,voltaren,DRUG,97,104
3,cramps,ADE,152,157


#### with AssertionDL

In [None]:

text = """Been taking Lipitor for 15 years , have experienced severe fatigue a lot!!! .
Doctor moved me to voltaren 2 months ago , so far , have only experienced cramps"""

print (text)

light_result = ade_pipeline.fullAnnotate(text)[0]

chunks=[]
entities=[]
status=[]
confidence=[]

for n,m in zip(light_result['ner_chunks_ade_assertion'],light_result['assertion_ade']):

    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    status.append(m.result)
    confidence.append(m.metadata['confidence'])

df = pd.DataFrame({'chunks':chunks, 'entities':entities, 'assertion':status, 'confidence':confidence})

df

Been taking Lipitor for 15 years , have experienced severe fatigue a lot!!! .
Doctor moved me to voltaren 2 months ago , so far , have only experienced cramps


Unnamed: 0,chunks,entities,assertion,confidence
0,severe fatigue,ADE,conditional,0.9931
1,cramps,ADE,present,0.9999


#### with Relation Extraction

In [None]:
text = """Been taking Lipitor for 15 years , have experienced severe fatigue a lot!!! .
Doctor moved me to voltaren 2 months ago , so far , have only experienced cramps
"""

print (text)

results = ade_pipeline.fullAnnotate(text)

rel_pairs=[]

for rel in results[0]["relations_ade_drug"]:
    rel_pairs.append((
        rel.result,
        rel.metadata['entity1'],
        rel.metadata['entity1_begin'],
        rel.metadata['entity1_end'],
        rel.metadata['chunk1'],
        rel.metadata['entity2'],
        rel.metadata['entity2_begin'],
        rel.metadata['entity2_end'],
        rel.metadata['chunk2'],
        rel.metadata['confidence']
    ))

rel_df = pd.DataFrame(rel_pairs, columns=['relation','entity1','entity1_begin','entity1_end','chunk1','entity2','entity2_begin','entity2_end','chunk2', 'confidence'])
rel_df

Been taking Lipitor for 15 years , have experienced severe fatigue a lot!!! .
Doctor moved me to voltaren 2 months ago , so far , have only experienced cramps



Unnamed: 0,relation,entity1,entity1_begin,entity1_end,chunk1,entity2,entity2_begin,entity2_end,chunk2,confidence
0,1,DRUG,12,18,Lipitor,ADE,52,65,severe fatigue,1.0
1,1,DRUG,12,18,Lipitor,ADE,152,157,cramps,1.0
2,0,ADE,52,65,severe fatigue,DRUG,97,104,voltaren,0.50062835
3,1,DRUG,97,104,voltaren,ADE,152,157,cramps,0.99999857


## exlain_clinical_doc_medication

> A pipeline for detecting posology entities with the `ner_posology_large` NER model, assigning their assertion status with `assertion_jsl` model, and extracting relations between posology-related terminology with `posology_re` relation extraction model.

In [None]:
medication_pipeline = nlp.PretrainedPipeline('explain_clinical_doc_medication', 'en', 'clinical/models')

explain_clinical_doc_medication download started this may take some time.
Approx size to download 1.6 GB
[OK!]


In [None]:
medication_pipeline.model.stages

[DocumentAssembler_5b6b25ae3a32,
 SentenceDetectorDLModel_6bafc4746ea5,
 REGEX_TOKENIZER_c6f98309fb6d,
 WORD_EMBEDDINGS_MODEL_9004b1d00302,
 MedicalNerModel_5e6f59103f25,
 NER_CONVERTER_c6fab90a28d5,
 NER_CONVERTER_3b6a5b14db49,
 ASSERTION_DL_e5e007602386,
 POS_6f55785005bf,
 dependency_e7755462ba78,
 PosologyREModel_d7fe3a9e8310]

In [None]:
text = """The patient is a 30-year-old female with a long history of insulin dependent diabetes, type 2. She received a course of Bactrim for 14 days for UTI.
She was prescribed 5000 units of Fragmin  subcutaneously daily, and along with Lantus 40 units subcutaneously at bedtime."""

result = medication_pipeline.fullAnnotate(text)[0]

In [None]:
result.keys()

dict_keys(['assertion_ner_chunk', 'document', 'assertion', 'ner_posology_chunk', 'token', 'relations', 'embeddings_clinical', 'pos_tags', 'dependencies', 'ner_posology', 'sentence'])

In [None]:
chunks=[]
entities=[]
begins=[]
ends=[]

for n in result['ner_posology_chunk']:

    chunks.append(n.result)
    begins.append(n.begin)
    ends.append(n.end)
    entities.append(n.metadata['entity'])

df = pd.DataFrame({'chunks':chunks, 'begin':begins, 'end':ends, 'entities':entities})

df

Unnamed: 0,chunks,begin,end,entities
0,insulin,59,65,DRUG
1,Bactrim,120,126,DRUG
2,for 14 days,128,138,DURATION
3,5000 units,168,177,DOSAGE
4,Fragmin,182,188,DRUG
5,subcutaneously,191,204,ROUTE
6,daily,206,210,FREQUENCY
7,Lantus,228,233,DRUG
8,40 units,235,242,DOSAGE
9,subcutaneously,244,257,ROUTE


In [None]:
chunks=[]
entities=[]
status=[]

for n,m in zip(result['assertion_ner_chunk'],result['assertion']):

    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    status.append(m.result)

df = pd.DataFrame({'chunks':chunks, 'entities':entities, 'assertion':status})

df

Unnamed: 0,chunks,entities,assertion
0,insulin,DRUG,Family
1,Bactrim,DRUG,Past
2,Fragmin,DRUG,Planned
3,Lantus,DRUG,Past


In [None]:
annotations = medication_pipeline.fullAnnotate(text)

rel_df = get_relations_df(annotations, 'relations')

rel_df

Unnamed: 0,relation,entity1,entity1_begin,entity1_end,chunk1,entity2,entity2_begin,entity2_end,chunk2,confidence
0,DRUG-DURATION,DRUG,120,126,Bactrim,DURATION,128,138,for 14 days,1.0
1,DOSAGE-DRUG,DOSAGE,168,177,5000 units,DRUG,182,188,Fragmin,1.0
2,DRUG-ROUTE,DRUG,182,188,Fragmin,ROUTE,191,204,subcutaneously,1.0
3,DRUG-FREQUENCY,DRUG,182,188,Fragmin,FREQUENCY,206,210,daily,1.0
4,DRUG-DOSAGE,DRUG,228,233,Lantus,DOSAGE,235,242,40 units,1.0
5,DRUG-ROUTE,DRUG,228,233,Lantus,ROUTE,244,257,subcutaneously,1.0
6,DRUG-FREQUENCY,DRUG,228,233,Lantus,FREQUENCY,259,268,at bedtime,1.0


In [None]:
annotations[0]['relations']

[Annotation(category, 120, 138, DRUG-DURATION, {'chunk2': 'for 14 days', 'confidence': '1.0', 'entity2_end': '138', 'chunk1': 'Bactrim', 'entity1': 'DRUG', 'entity2_begin': '128', 'chunk2_confidence': '0.79349995', 'entity1_begin': '120', 'sentence': '1', 'direction': 'both', 'entity1_end': '126', 'entity2': 'DURATION', 'chunk1_confidence': '0.9994'}, []),
 Annotation(category, 168, 188, DOSAGE-DRUG, {'chunk2': 'Fragmin', 'confidence': '1.0', 'entity2_end': '188', 'chunk1': '5000 units', 'entity1': 'DOSAGE', 'entity2_begin': '182', 'chunk2_confidence': '0.9996', 'entity1_begin': '168', 'sentence': '2', 'direction': 'both', 'entity1_end': '177', 'entity2': 'DRUG', 'chunk1_confidence': '0.80009997'}, []),
 Annotation(category, 182, 204, DRUG-ROUTE, {'chunk2': 'subcutaneously', 'confidence': '1.0', 'entity2_end': '204', 'chunk1': 'Fragmin', 'entity1': 'DRUG', 'entity2_begin': '191', 'chunk2_confidence': '0.9994', 'entity1_begin': '182', 'sentence': '2', 'direction': 'both', 'entity1_end':

## explain_clinical_doc_radiology

> A pipeline for detecting radiology entities with the `ner_radiology` NER model, assigning their assertion status with `assertion_dl_radiology` model, and extracting relations between the diagnosis, test, and findings with `re_test_problem_finding` relation extraction model.

In [None]:
radiology_pipeline = nlp.PretrainedPipeline('explain_clinical_doc_radiology', 'en', 'clinical/models')

explain_clinical_doc_radiology download started this may take some time.
Approx size to download 1.6 GB
[OK!]


In [None]:
radiology_pipeline.model.stages

[DocumentAssembler_39bf0d96e8c0,
 SentenceDetectorDLModel_6bafc4746ea5,
 REGEX_TOKENIZER_19064791a957,
 WORD_EMBEDDINGS_MODEL_9004b1d00302,
 MedicalNerModel_f7f58f2addf7,
 NerConverter_3c8a46700409,
 NerConverter_417e39edbfab,
 ASSERTION_DL_614cf4bf71de,
 POS_6f55785005bf,
 dependency_e7755462ba78,
 RelationExtractionModel_853993778cd5]

In [None]:
text = """Bilateral breast ultrasound was subsequently performed, which demonstrated an ovoid mass measuring approximately 0.5 x 0.5 x 0.4 cm in diameter located within the anteromedial aspect of the left shoulder.
This mass demonstrates isoechoic echotexture to the adjacent muscle, with no evidence of internal color flow.
This may represent benign fibrous tissue or a lipoma."""

result = radiology_pipeline.fullAnnotate(text)[0]

In [None]:
result.keys()

dict_keys(['document', 'ner_radiology_chunk', 'assertion', 'token', 'relations', 'embeddings_clinical', 'pos_tags', 'dependencies', 'assertion_radiology_chunk', 'ner_radiology', 'sentence'])

In [None]:
chunks=[]
entities=[]
begins=[]
ends=[]

for n in result['ner_radiology_chunk']:

    chunks.append(n.result)
    begins.append(n.begin)
    ends.append(n.end)
    entities.append(n.metadata['entity'])

df = pd.DataFrame({'chunks':chunks, 'begin':begins, 'end':ends, 'entities':entities})

df

Unnamed: 0,chunks,begin,end,entities
0,Bilateral breast,0,15,BodyPart
1,ultrasound,17,26,ImagingTest
2,ovoid mass,78,87,ImagingFindings
3,0.5 x 0.5 x 0.4,113,127,Measurements
4,cm,129,130,Units
5,anteromedial aspect of the left shoulder,163,202,BodyPart
6,mass,210,213,ImagingFindings
7,isoechoic echotexture,228,248,ImagingFindings
8,muscle,266,271,BodyPart
9,internal color flow,294,312,ImagingFindings


In [None]:
chunks=[]
entities=[]
status=[]

for n,m in zip(result['assertion_radiology_chunk'],result['assertion']):

    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    status.append(m.result)

df = pd.DataFrame({'chunks':chunks, 'entities':entities, 'assertion':status})

df

Unnamed: 0,chunks,entities,assertion
0,ultrasound,ImagingTest,Confirmed
1,ovoid mass,ImagingFindings,Confirmed
2,mass,ImagingFindings,Confirmed
3,isoechoic echotexture,ImagingFindings,Confirmed
4,internal color flow,ImagingFindings,Negative
5,benign fibrous tissue,ImagingFindings,Suspected
6,lipoma,Disease_Syndrome_Disorder,Suspected


In [None]:
annotations = radiology_pipeline.fullAnnotate(text)

rel_df = get_relations_df(annotations, 'relations')

rel_df

Unnamed: 0,relation,entity1,entity1_begin,entity1_end,chunk1,entity2,entity2_begin,entity2_end,chunk2,confidence
0,1,ImagingTest,17,26,ultrasound,ImagingFindings,78,87,ovoid mass,0.999569
1,1,ImagingFindings,334,354,benign fibrous tissue,Disease_Syndrome_Disorder,361,366,lipoma,0.560097


In [None]:
annotations[0]['relations']

[Annotation(category, 17, 87, 1, {'chunk2': 'ovoid mass', 'confidence': '0.99956936', 'entity2_end': '87', 'chunk1': 'ultrasound', 'entity1': 'ImagingTest', 'entity2_begin': '78', 'chunk2_confidence': '0.6095', 'entity1_begin': '17', 'sentence': '0', 'direction': 'both', 'entity1_end': '26', 'entity2': 'ImagingFindings', 'chunk1_confidence': '0.6734'}, []),
 Annotation(category, 334, 366, 1, {'chunk2': 'lipoma', 'confidence': '0.56009704', 'entity2_end': '366', 'chunk1': 'benign fibrous tissue', 'entity1': 'ImagingFindings', 'entity2_begin': '361', 'chunk2_confidence': '0.6081', 'entity1_begin': '334', 'sentence': '2', 'direction': 'both', 'entity1_end': '354', 'entity2': 'Disease_Syndrome_Disorder', 'chunk1_confidence': '0.5240666'}, [])]

## Clinical Deidentification

This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate `AGE`, `CONTACT`, `DATE`, `ID`, `LOCATION`, `NAME`, `PROFESSION`, `CITY`, `COUNTRY`, `DOCTOR`, `HOSPITAL`, `IDNUM`, `MEDICALRECORD`, `ORGANIZATION`, `PATIENT`, `PHONE`, `PROFESSION`, `STREET`, `USERNAME`, `ZIP`, `ACCOUNT`, `LICENSE`, `VIN`, `SSN`, `DLN`, `PLATE`, `IPADDR` entities.

|index|model|lang|
|-----:|:-----|----|
| 1| [clinical_deidentification](https://nlp.johnsnowlabs.com/2022/03/03/clinical_deidentification_de_3_0.html)  |de|
| 2| [clinical_deidentification](https://nlp.johnsnowlabs.com/2021/05/27/clinical_deidentification_en.html)  |en|
| 3| [clinical_deidentification_glove](https://nlp.johnsnowlabs.com/2022/03/04/clinical_deidentification_glove_en_3_0.html)  |en|
| 4| [clinical_deidentification_glove_augmented](https://nlp.johnsnowlabs.com/2022/03/22/clinical_deidentification_glove_augmented_en_3_0.html)  |en|
| 5| [clinical_deidentification](https://nlp.johnsnowlabs.com/2022/03/02/clinical_deidentification_es_2_4.html)  |es|
| 6| [clinical_deidentification_augmented](https://nlp.johnsnowlabs.com/2022/03/03/clinical_deidentification_augmented_es_2_4.html)  |es|
| 7| [clinical_deidentification](https://nlp.johnsnowlabs.com/2022/03/04/clinical_deidentification_fr_2_4.html)  |fr|
| 8| [clinical_deidentification](https://nlp.johnsnowlabs.com/2022/03/28/clinical_deidentification_it_3_0.html)  |it|
| 9| [clinical_deidentification](https://nlp.johnsnowlabs.com/2022/06/21/clinical_deidentification_pt_3_0.html)  |pt|
| 10| [clinical_deidentification](https://nlp.johnsnowlabs.com/2022/06/28/clinical_deidentification_ro_3_0.html)  |ro|
| 11| [clinical_deidentification](https://nlp.johnsnowlabs.com/2023/06/22/clinical_deidentification_ar.html)  |ar|


You can find **`German`, `Spanish`, `French`, `Italian`, `Portuguese`, `Romanian`**  and **`Arabic`** deidentification models and pretrained pipeline examples in this notebook:   [Clinical Multi Language Deidentification Notebook](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/healthcare-nlp/04.1.Clinical_Multi_Language_Deidentification.ipynb)


In [None]:
deid_pipeline = nlp.PretrainedPipeline("clinical_deidentification", "en", "clinical/models")

clinical_deidentification download started this may take some time.
Approx size to download 1.6 GB
[OK!]


In [None]:
deid_res = deid_pipeline.annotate("Record date : 2093-01-13 , David Hale , M.D .  Name : Hendrickson , Ora MR 25 years-old . # 719435 Date : 01/13/93 . Signed by Oliveira Sander . Record date : 2079-11-09 . Cocke County Baptist Hospital . 0295 Keats Street. Phone 302-786-5227.")

In [None]:
deid_res.keys()

dict_keys(['masked', 'obfuscated', 'ner_chunk', 'masked_fixed_length_chars', 'sentence', 'masked_with_chars'])

In [None]:
pd.set_option("display.max_colwidth", 100)

df= pd.DataFrame(list(zip(deid_res["sentence"],
                          deid_res["masked"],
                          deid_res["masked_with_chars"],
                          deid_res["masked_fixed_length_chars"],
                          deid_res["obfuscated"])),
                 columns= ["Sentence", "Masked", "Masked with Chars", "Masked with Fixed Chars", "Obfuscated"])

df

Unnamed: 0,Sentence,Masked,Masked with Chars,Masked with Fixed Chars,Obfuscated
0,"Record date : 2093-01-13 , David Hale , M.D .","Record date : <DATE> , <DOCTOR> , M.D .","Record date : [********] , [********] , M.D .","Record date : **** , **** , M.D .","Record date : 2093-02-11 , Nobie Putnam , M.D ."
1,"Name : Hendrickson , Ora MR 25 years-old .",Name : <PATIENT> MR <AGE> years-old .,Name : [***************] MR ** years-old .,Name : **** MR **** years-old .,Name : Cecilie Lowers MR 37 years-old .
2,# 719435 Date : 01/13/93 .,# <MEDICALRECORD> Date : <DATE> .,# [****] Date : [******] .,# **** Date : **** .,# 112998 Date : 02/11/93 .
3,Signed by Oliveira Sander .,Signed by <DOCTOR> .,Signed by [*************] .,Signed by **** .,Signed by Bradley Ferris .
4,Record date : 2079-11-09 .,Record date : <DATE> .,Record date : [********] .,Record date : **** .,Record date : 2079-12-08 .
5,Cocke County Baptist Hospital . 0295 Keats Street.,<LOCATION>.,[***********************************************].,****.,701 Superior Ave.
6,Phone 302-786-5227.,Phone <PHONE>.,Phone [**********].,Phone ****.,Phone 733-031-2585.


## NER Pipelines





**`NER pretrained ` Model List**

|index|model|index|model|index|model|index|model|
|-----:|:-----|-----:|:-----|-----:|:-----|-----:|:-----|
| 1| [jsl_ner_wip_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/jsl_ner_wip_clinical_pipeline_en_3_0.html)  | 2| [jsl_ner_wip_greedy_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/jsl_ner_wip_greedy_biobert_pipeline_en_3_0.html)  | 3| [jsl_ner_wip_greedy_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/jsl_ner_wip_greedy_clinical_pipeline_en_3_0.html)  | 4| [jsl_ner_wip_modifier_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/jsl_ner_wip_modifier_clinical_pipeline_en_3_0.html)  |
| 5| [jsl_rd_ner_wip_greedy_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/jsl_rd_ner_wip_greedy_biobert_pipeline_en_3_0.html)  | 6| [jsl_rd_ner_wip_greedy_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/jsl_rd_ner_wip_greedy_clinical_pipeline_en_3_0.html)  | 7| [ner_abbreviation_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_abbreviation_clinical_pipeline_en_3_0.html)  | 8| [ner_ade_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_ade_biobert_pipeline_en_3_0.html)  |
| 9| [ner_ade_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_ade_clinical_pipeline_en_3_0.html)  | 10| [ner_ade_clinicalbert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_ade_clinicalbert_pipeline_en_3_0.html)  | 11| [ner_ade_healthcare_pipeline](https://nlp.johnsnowlabs.com/2022/03/22/ner_ade_healthcare_pipeline_en_3_0.html)  | 12| [ner_anatomy_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_anatomy_biobert_pipeline_en_3_0.html)  |
| 13| [ner_anatomy_coarse_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_anatomy_coarse_biobert_pipeline_en_3_0.html)  | 14| [ner_anatomy_coarse_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_anatomy_coarse_pipeline_en_3_0.html)  | 15| [ner_anatomy_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_anatomy_pipeline_en_3_0.html)  | 16| [ner_bacterial_species_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_bacterial_species_pipeline_en_3_0.html)  |
| 17| [ner_biomarker_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_biomarker_pipeline_en_3_0.html)  | 18| [ner_biomedical_bc2gm_pipeline](https://nlp.johnsnowlabs.com/2022/06/22/ner_biomedical_bc2gm_pipeline_en_3_0.html)  | 19| [ner_bionlp_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_bionlp_biobert_pipeline_en_3_0.html)  | 20| [ner_bionlp_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_bionlp_pipeline_en_3_0.html)  |
| 21| [ner_cancer_genetics_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_cancer_genetics_pipeline_en_3_0.html)  | 22| [ner_cellular_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_cellular_biobert_pipeline_en_3_0.html)  | 23| [ner_cellular_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_cellular_pipeline_en_3_0.html)  | 24| [ner_chemicals_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_chemicals_pipeline_en_3_0.html)  |
| 25| [ner_chemprot_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_chemprot_biobert_pipeline_en_3_0.html)  | 26| [ner_chemprot_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_chemprot_clinical_pipeline_en_3_0.html)  | 27| [ner_chexpert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_chexpert_pipeline_en_3_0.html)  | 28| [ner_clinical_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_clinical_biobert_pipeline_en_3_0.html)  |
| 29| [ner_clinical_large_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_clinical_large_pipeline_en_3_0.html)  | 30| [ner_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_clinical_pipeline_en_3_0.html)  | 31| [ner_clinical_trials_abstracts_pipeline](https://nlp.johnsnowlabs.com/2022/06/27/ner_clinical_trials_abstracts_pipeline_en_3_0.html)  | 32| [ner_diseases_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_diseases_biobert_pipeline_en_3_0.html)  |
| 33| [ner_diseases_large_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_diseases_large_pipeline_en_3_0.html)  | 34| [ner_diseases_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_diseases_pipeline_en_3_0.html)  | 35| [ner_drugprot_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_drugprot_clinical_pipeline_en_3_0.html)  | 36| [ner_drugs_greedy_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_drugs_greedy_pipeline_en_3_0.html)  |
| 37| [ner_drugs_large_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_drugs_large_pipeline_en_3_0.html)  | 38| [ner_drugs_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_drugs_pipeline_en_3_0.html)  | 39| [ner_events_admission_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_events_admission_clinical_pipeline_en_3_0.html)  | 40| [ner_events_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_events_biobert_pipeline_en_3_0.html)  |
| 41| [ner_events_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_events_clinical_pipeline_en_3_0.html)  | 42| [ner_events_healthcare_pipeline](https://nlp.johnsnowlabs.com/2022/03/22/ner_events_healthcare_pipeline_en_3_0.html)  | 43| [ner_genetic_variants_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_genetic_variants_pipeline_en_3_0.html)  | 44| [ner_healthcare_pipeline](https://nlp.johnsnowlabs.com/2022/03/22/ner_healthcare_pipeline_en_3_0.html)  |
| 45| [ner_human_phenotype_gene_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_human_phenotype_gene_biobert_pipeline_en_3_0.html)  | 46| [ner_human_phenotype_gene_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_human_phenotype_gene_clinical_pipeline_en_3_0.html)  | 47| [ner_human_phenotype_go_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_human_phenotype_go_biobert_pipeline_en_3_0.html)  | 48| [ner_human_phenotype_go_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_human_phenotype_go_clinical_pipeline_en_3_0.html)  |
| 49| [ner_jsl_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_jsl_biobert_pipeline_en_3_0.html)  | 50| [ner_jsl_enriched_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_jsl_enriched_biobert_pipeline_en_3_0.html)  | 51| [ner_jsl_enriched_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_jsl_enriched_pipeline_en_3_0.html)  | 52| [ner_jsl_greedy_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_jsl_greedy_biobert_pipeline_en_3_0.html)  |
| 53| [ner_jsl_greedy_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_jsl_greedy_pipeline_en_3_0.html)  | 54| [ner_jsl_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_jsl_pipeline_en_3_0.html)  | 55| [ner_jsl_slim_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_jsl_slim_pipeline_en_3_0.html)  | 56| [ner_measurements_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_measurements_clinical_pipeline_en_3_0.html)  |
| 57| [ner_medmentions_coarse_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_medmentions_coarse_pipeline_en_3_0.html)  | 58| [ner_nihss_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_nihss_pipeline_en_3_0.html)  | 59| [ner_pathogen_pipeline](https://nlp.johnsnowlabs.com/2022/06/29/ner_pathogen_pipeline_en_3_0.html)  | 60| [ner_posology_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_posology_biobert_pipeline_en_3_0.html)  |
| 61| [ner_posology_experimental_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_posology_experimental_pipeline_en_3_0.html)  | 62| [ner_posology_greedy_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_posology_greedy_pipeline_en_3_0.html)  | 63| [ner_posology_healthcare_pipeline](https://nlp.johnsnowlabs.com/2022/03/22/ner_posology_healthcare_pipeline_en_3_0.html)  | 64| [ner_posology_large_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_posology_large_biobert_pipeline_en_3_0.html)  |
| 65| [ner_posology_large_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_posology_large_pipeline_en_3_0.html)  | 66| [ner_posology_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_posology_pipeline_en_3_0.html)  | 67| [ner_posology_small_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_posology_small_pipeline_en_3_0.html)  | 68| [ner_radiology_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_radiology_pipeline_en_3_0.html)  |
| 69| [ner_radiology_wip_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_radiology_wip_clinical_pipeline_en_3_0.html)  | 70| [ner_risk_factors_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_risk_factors_biobert_pipeline_en_3_0.html)  | 71| [ner_risk_factors_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_risk_factors_pipeline_en_3_0.html)  | 72| [ner_medication_pipeline](https://nlp.johnsnowlabs.com/2022/07/26/ner_medication_pipeline_en_3_0.html)|


**Let's show an example of `ner_jsl_pipeline` can label clinical entities with about 80 different labels.**

In [None]:
ner_pipeline = nlp.PretrainedPipeline('ner_jsl_pipeline', 'en', 'clinical/models')

ner_jsl_pipeline download started this may take some time.
Approx size to download 1.6 GB
[OK!]


In [None]:
ner_pipeline.model.stages

[DocumentAssembler_cefadf0e0f93,
 SentenceDetectorDLModel_c83c27f46b97,
 REGEX_TOKENIZER_a731b3529dca,
 WORD_EMBEDDINGS_MODEL_9004b1d00302,
 MedicalNerModel_c89cbceb1028,
 NER_CONVERTER_42a801d9e143]

In [None]:
text = """A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus ( T2DM ), one prior episode of HTG-induced pancreatitis three years prior to presentation , associated with an acute hepatitis , and obesity with a body mass index ( BMI ) of 33.5 kg/m2 , presented with a one-week history of polyuria , polydipsia , poor appetite , and vomiting .
Two weeks prior to presentation , she was treated with a five-day course of amoxicillin for a respiratory tract infection . She was on metformin , glipizide , and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG . She had been on dapagliflozin for six months at the time of presentation . Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness , guarding , or rigidity .
Pertinent laboratory findings on admission were : serum glucose 111 mg/dl , bicarbonate 18 mmol/l , anion gap 20 , creatinine 0.4 mg/dL , triglycerides 508 mg/dL , total cholesterol 122 mg/dL , glycated hemoglobin ( HbA1c ) 10% , and venous pH 7.27 . Serum lipase was normal at 43 U/L . Serum acetone levels could not be assessed as blood samples kept hemolyzing due to significant lipemia .
The patient was initially admitted for starvation ketosis , as she reported poor oral intake for three days prior to admission . However , serum chemistry obtained six hours after presentation revealed her glucose was 186 mg/dL , the anion gap was still elevated at 21 , serum bicarbonate was 16 mmol/L , triglyceride level peaked at 2050 mg/dL , and lipase was 52 U/L .
The β-hydroxybutyrate level was obtained and found to be elevated at 5.29 mmol/L - the original sample was centrifuged and the chylomicron layer removed prior to analysis due to interference from turbidity caused by lipemia again . The patient was treated with an insulin drip for euDKA and HTG with a reduction in the anion gap to 13 and triglycerides to 1400 mg/dL , within 24 hours .
Her euDKA was thought to be precipitated by her respiratory tract infection in the setting of SGLT2 inhibitor use . The patient was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night , 12 units of insulin lispro with meals , and metformin 1000 mg two times a day .
It was determined that all SGLT2 inhibitors should be discontinued indefinitely . She had close follow-up with endocrinology post discharge ."""

greedy_result = ner_pipeline.fullAnnotate(text)[0]

In [None]:
greedy_result.keys()

dict_keys(['document', 'ner_chunk', 'token', 'ner', 'embeddings', 'sentence'])

In [None]:
chunks=[]
entities=[]
begins=[]
ends=[]

for n in greedy_result['ner_chunk']:

    chunks.append(n.result)
    begins.append(n.begin)
    ends.append(n.end)
    entities.append(n.metadata['entity'])

df = pd.DataFrame({'chunks':chunks, 'begin':begins, 'end':ends, 'entities':entities})

df

Unnamed: 0,chunks,begin,end,entities
0,28-year-old,2,12,Age
1,female,14,19,Gender
2,gestational diabetes mellitus,39,67,Diabetes
3,eight years prior,79,95,RelativeDate
4,type two diabetes mellitus,128,153,Diabetes
...,...,...,...,...
116,two times a day,2357,2371,Frequency
117,SGLT2 inhibitors,2402,2417,Drug_Ingredient
118,She,2457,2459,Gender
119,endocrinology,2486,2498,Clinical_Dept


## Bert Based NER Pipelines


**`bert token classification pretrained ` Model List**


|index|model|index|model|
|-----:|:-----|-----:|:-----|
| 1| [bert_token_classifier_drug_development_trials_pipeline](https://nlp.johnsnowlabs.com/2022/03/23/bert_token_classifier_drug_development_trials_pipeline_en_3_0.html)  | 8| [bert_token_classifier_ner_chemprot_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/bert_token_classifier_ner_chemprot_pipeline_en_3_0.html)  |
| 2| [bert_token_classifier_ner_ade_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/bert_token_classifier_ner_ade_pipeline_en_3_0.html)  | 9| [bert_token_classifier_ner_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/bert_token_classifier_ner_clinical_pipeline_en_3_0.html)  |
| 3| [bert_token_classifier_ner_anatomy_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/bert_token_classifier_ner_anatomy_pipeline_en_3_0.html)  | 10| [bert_token_classifier_ner_deid_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/bert_token_classifier_ner_deid_pipeline_en_3_0.html)  |
| 4| [bert_token_classifier_ner_bacteria_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/bert_token_classifier_ner_bacteria_pipeline_en_3_0.html)  | 11| [bert_token_classifier_ner_drugs_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/bert_token_classifier_ner_drugs_pipeline_en_3_0.html)  |
| 5| [bert_token_classifier_ner_bionlp_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/bert_token_classifier_ner_bionlp_pipeline_en_3_0.html)  | 12| [bert_token_classifier_ner_jsl_pipeline](https://nlp.johnsnowlabs.com/2022/03/23/bert_token_classifier_ner_jsl_pipeline_en_3_0.html)  |
| 6| [bert_token_classifier_ner_cellular_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/bert_token_classifier_ner_cellular_pipeline_en_3_0.html)  | 13| [bert_token_classifier_ner_jsl_slim_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/bert_token_classifier_ner_jsl_slim_pipeline_en_3_0.html)  |
| 7| [bert_token_classifier_ner_chemicals_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/bert_token_classifier_ner_chemicals_pipeline_en_3_0.html)  |


**Let's show an example of `bert_token_classifier_ner_drugs_pipeline` can extract `DRUG` entities in clinical texts.**

In [None]:
bert_token_pipeline = nlp.PretrainedPipeline("bert_token_classifier_ner_drugs_pipeline", "en", "clinical/models")

bert_token_classifier_ner_drugs_pipeline download started this may take some time.
Approx size to download 386.1 MB
[OK!]


In [None]:
bert_token_pipeline.model.stages

[DocumentAssembler_fbb1736f8270,
 SentenceDetectorDLModel_8aaebf7e098e,
 REGEX_TOKENIZER_204ede018690,
 BERT_FOR_TOKEN_CLASSIFICATION_3fa6213c0542,
 NER_CONVERTER_70b935c1d6d8]

In [None]:
test_sentence = """The human KCNJ9 (Kir 3.3, GIRK3) is a member of the G-protein-activated inwardly rectifying potassium (GIRK) channel family. Here we describe the genomicorganization of the KCNJ9 locus on chromosome 1q21-23 as a candidate gene forType II diabetes mellitus in the Pima Indian population. The gene spansapproximately 7.6 kb and contains one noncoding and two coding exons separated byapproximately 2.2 and approximately 2.6 kb introns, respectively. We identified14 single nucleotide polymorphisms (SNPs), including one that predicts aVal366Ala substitution, and an 8 base-pair (bp) insertion/deletion. Ourexpression studies revealed the presence of the transcript in various humantissues including pancreas, and two major insulin-responsive tissues: fat andskeletal muscle. The characterization of the KCNJ9 gene should facilitate furtherstudies on the function of the KCNJ9 protein and allow evaluation of thepotential role of the locus in Type II diabetes.BACKGROUND: At present, it is one of the most important issues for the treatment of breast cancer to develop the standard therapy for patients previously treated with anthracyclines and taxanes. With the objective of determining the usefulnessof vinorelbine monotherapy in patients with advanced or recurrent breast cancerafter standard therapy, we evaluated the efficacy and safety of vinorelbine inpatients previously treated with anthracyclines and taxanes."""

bert_result = bert_token_pipeline.fullAnnotate(test_sentence)[0]

In [None]:
bert_result.keys()

dict_keys(['document', 'ner_chunk', 'token', 'ner', 'sentence'])

In [None]:
bert_result["ner_chunk"][0]

Annotation(chunk, 92, 100, potassium, {'chunk': '0', 'confidence': '0.9902544', 'ner_source': 'ner_chunk', 'entity': 'DrugChem', 'sentence': '0'}, [])

In [None]:

chunks=[]
entities=[]
begins=[]
ends=[]

for n in bert_result['ner_chunk']:

    chunks.append(n.result)
    begins.append(n.begin)
    ends.append(n.end)
    entities.append(n.metadata['entity'])

df = pd.DataFrame({'chunks':chunks, 'begin':begins, 'end':ends, 'entities':entities})

df

Unnamed: 0,chunks,begin,end,entities
0,potassium,92,100,DrugChem
1,nucleotide,471,480,DrugChem
2,anthracyclines,1124,1137,DrugChem
3,taxanes,1143,1149,DrugChem
4,vinorelbine,1203,1213,DrugChem
5,vinorelbine,1343,1353,DrugChem
6,anthracyclines,1390,1403,DrugChem
7,taxanes,1409,1415,DrugChem


## NER Profiling Pipelines

We can use pretrained NER profiling pipelines for exploring all the available pretrained NER models at once. In Spark NLP we have two different NER profiling pipelines;

- `ner_profiling_clinical` : Returns results for clinical NER models trained with `embeddings_clinical`.
- `ner_profiling_biobert` : Returns results for clinical NER models trained with `biobert_pubmed_base_cased`.

For more examples, please check [this notebook](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings_JSL/Healthcare/11.2.Pretrained_NER_Profiling_Pipelines.ipynb).





<center> <b>NER Profiling Clinical Model List</b>

|| | | |
|--------------|-----------------|-----------------|-----------------|
| jsl_ner_wip_clinical | jsl_ner_wip_greedy_clinical | jsl_ner_wip_modifier_clinical | jsl_rd_ner_wip_greedy_clinical |
| ner_abbreviation_clinical | ner_ade_binary | ner_ade_clinical | ner_anatomy |
| ner_anatomy_coarse | ner_bacterial_species | ner_biomarker | ner_biomedical_bc2gm |
| ner_bionlp | ner_cancer_genetics | ner_cellular | ner_chemd_clinical |
| ner_chemicals | ner_chemprot_clinical | ner_chexpert | ner_clinical |
| ner_clinical_large | ner_clinical_trials_abstracts | ner_covid_trials | ner_deid_augmented |
| ner_deid_enriched | ner_deid_generic_augmented | ner_deid_large | ner_deid_sd |
| ner_deid_sd_large | ner_deid_subentity_augmented | ner_deid_subentity_augmented_i2b2 | ner_deid_synthetic |
| ner_deidentify_dl | ner_diseases | ner_diseases_large | ner_drugprot_clinical |
| ner_drugs | ner_drugs_greedy | ner_drugs_large | ner_events_admission_clinical |
| ner_events_clinical | ner_genetic_variants | ner_human_phenotype_gene_clinical | ner_human_phenotype_go_clinical |
| ner_jsl | ner_jsl_enriched | ner_jsl_greedy | ner_jsl_slim |
| ner_living_species | ner_measurements_clinical | ner_medmentions_coarse | ner_nature_nero_clinical |
| ner_nihss | ner_pathogen | ner_posology | ner_posology_experimental |
| ner_posology_greedy | ner_posology_large | ner_posology_small | ner_radiology |
| ner_radiology_wip_clinical | ner_risk_factors | ner_supplement_clinical | nerdl_tumour_demo |

<b>NER Profiling BioBert Model List</b>

| | |
|-|-|
| ner_cellular_biobert           | ner_clinical_biobert             |
| ner_diseases_biobert           | ner_anatomy_coarse_biobert       |
| ner_events_biobert             | ner_human_phenotype_gene_biobert |
| ner_bionlp_biobert             | ner_posology_large_biobert       |
| ner_jsl_greedy_biobert         | jsl_rd_ner_wip_greedy_biobert    |
| ner_jsl_biobert                | ner_posology_biobert             |
| ner_anatomy_biobert            | jsl_ner_wip_greedy_biobert       |
| ner_jsl_enriched_biobert       | ner_chemprot_biobert             |
| ner_human_phenotype_go_biobert | ner_ade_biobert                  |
| ner_deid_biobert               | ner_risk_factors_biobert         |
| ner_deid_enriched_biobert      | ner_living_species_biobert                                |


</center>

You can check [Models Hub](https://nlp.johnsnowlabs.com/models) page for more information about all these models and more.

In [None]:
clinical_profiling_pipeline = nlp.PretrainedPipeline("ner_profiling_clinical", "en", "clinical/models")

ner_profiling_clinical download started this may take some time.
Approx size to download 2.9 GB
[OK!]


In [None]:
text = '''A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus ( T2DM ), one prior episode of HTG-induced pancreatitis three years prior to presentation , associated with an acute hepatitis , and obesity with a body mass index ( BMI ) of 33.5 kg/m2 , presented with a one-week history of polyuria , polydipsia , poor appetite , and vomiting .'''

In [None]:
clinical_result = clinical_profiling_pipeline.fullAnnotate(text)[0]
clinical_result.keys()

dict_keys(['ner_vop_problem_wip_chunks', 'ner_sdoh_demographics_wip_chunks', 'ner_ade_clinical_chunks', 'ner_deid_augmented', 'ner_deid_subentity_augmented_i2b2', 'ner_posology_greedy_chunks', 'ner_vop_demographic_wip_chunks', 'ner_sdoh_community_condition_wip_chunks', 'ner_radiology_wip_clinical', 'ner_vop_slim_wip', 'ner_oncology_diagnosis_chunks', 'ner_vop_test_wip_chunks', 'ner_jsl_slim', 'ner_vop_wip', 'ner_risk_factors_chunks', 'jsl_ner_wip_clinical_chunks', 'ner_oncology_unspecific_posology_chunks', 'ner_vop_problem_reduced_wip', 'ner_vop_anatomy_wip', 'ner_vop_problem_reduced_wip_chunks', 'ner_eu_clinical_case_chunks', 'ner_oncology_test_chunks', 'ner_deid_synthetic', 'ner_oncology_posology_chunks', 'ner_oncology_tnm_chunks', 'ner_oncology_anatomy_general_chunks', 'ner_drugs_greedy', 'ner_abbreviation_clinical_chunks', 'ner_covid_trials_chunks', 'ner_human_phenotype_gene_clinical_chunks', 'ner_events_admission_clinical', 'jsl_ner_wip_greedy_clinical_chunks', 'ner_posology_greed

In [None]:
import pandas as pd

def get_token_results(light_result):

  tokens = [j.result for j in light_result["token"]]
  sentences = [j.metadata["sentence"] for j in light_result["token"]]
  begins = [j.begin for j in light_result["token"]]
  ends = [j.end for j in light_result["token"]]
  model_list = [ a for a in light_result.keys() if (a not in ["sentence", "token"] and "_chunks" not in a)]

  df = pd.DataFrame({'sentence':sentences, 'begin': begins, 'end': ends, 'token':tokens})

  for model_name in model_list:

    temp_df = pd.DataFrame(light_result[model_name])
    temp_df["jsl_label"] = temp_df.iloc[:,0].apply(lambda x : x.result)
    temp_df = temp_df[["jsl_label"]]

    # temp_df = get_ner_result(model_name)
    temp_df.columns = [model_name]
    df = pd.concat([df, temp_df], axis=1)

  return df

In [None]:
get_token_results(clinical_result)

Unnamed: 0,sentence,begin,end,token,ner_deid_augmented,ner_deid_subentity_augmented_i2b2,ner_radiology_wip_clinical,ner_vop_slim_wip,ner_jsl_slim,ner_vop_wip,...,ner_events_clinical,ner_supplement_clinical,ner_oncology_unspecific_posology,ner_genetic_variants,ner_sdoh_wip,ner_radiology,ner_eu_clinical_case,ner_posology,ner_oncology_posology,ner_covid_trials
0,0,0,0,A,O,O,O,O,O,O,...,O,O,O,O,O,O,B-patient,O,O,O
1,0,2,12,28-year-old,O,O,O,O,B-Age,B-Age,...,O,O,O,O,B-Gender,O,I-patient,O,O,B-Gender
2,0,14,19,female,O,O,O,B-Gender,B-Demographics,B-Gender,...,O,O,O,O,B-Gender,O,I-patient,O,O,B-Gender
3,0,21,24,with,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
4,0,26,26,a,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
68,0,409,416,appetite,O,O,I-Symptom,I-Symptom,I-Symptom,I-Symptom,...,I-PROBLEM,O,O,O,O,I-Symptom,I-clinical_condition,O,O,O
69,0,418,418,",",O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
70,0,420,422,and,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
71,0,424,431,vomiting,O,O,B-Symptom,B-Symptom,B-Symptom,B-Symptom,...,B-PROBLEM,B-CONDITION,O,O,O,B-Symptom,B-clinical_condition,O,O,O


## NER Model Finder Pretrained Pipeline
`ner_model_finder`  pretrained pipeline trained with bert embeddings that can be used to find the most appropriate NER model given the entity name.

In [None]:
finder_pipeline = nlp.PretrainedPipeline("ner_model_finder", "en", "clinical/models")

ner_model_finder download started this may take some time.
Approx size to download 148.7 MB
[OK!]


In [None]:
result = finder_pipeline.fullAnnotate("oncology")[0]
result.keys()

dict_keys(['model_names'])

From the metadata in the 'model_names' column, we'll get to the top models to the given 'oncology' entity and oncology related categories.

In [None]:
df= pd.DataFrame(zip(result["model_names"][0].metadata["all_k_resolutions"].split(":::"),
                     result["model_names"][0].metadata["all_k_results"].split(":::")),
                 columns=["category", "top_models"])

In [None]:
df.head()

Unnamed: 0,category,top_models
0,oncology therapy,"['ner_jsl', 'jsl_rd_ner_wip_greedy_clinical', 'jsl_ner_wip_modifier_clinical', 'ner_jsl_enriched..."
1,clinical department,"['ner_jsl', 'jsl_rd_ner_wip_greedy_clinical', 'jsl_ner_wip_modifier_clinical', 'ner_events_clini..."
2,biomedical unit,['ner_clinical_trials_abstracts']
3,cancer genetics,['ner_cancer_genetics']
4,anatomy,"['ner_bionlp', 'ner_medmentions_coarse', 'ner_chexpert', 'ner_anatomy_coarse', 'ner_anatomy', 'n..."


## Resolver Pipelines

We have **Resolver pipelines** for converting clinical entities to their UMLS CUI codes. You will just feed your text and it will return the corresponding UMLS codes.



**Resolver Pipelines Model List:**



| Pipeline Name                                                                                                                            | Entity                  | Target   |
|------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|----------|
| [umls_drug_resolver_pipeline](https://nlp.johnsnowlabs.com/2022/07/26/umls_drug_resolver_pipeline_en_3_0.html)                           | Drugs                   | UMLS CUI |
| [umls_clinical_findings_resolver_pipeline](https://nlp.johnsnowlabs.com/2022/07/26/umls_clinical_findings_resolver_pipeline_en_3_0.html) | Clinical Findings       | UMLS CUI |
| [umls_disease_syndrome_resolver_pipeline](https://nlp.johnsnowlabs.com/2022/07/26/umls_disease_syndrome_resolver_pipeline_en_3_0.html)   | Disease and Syndromes   | UMLS CUI |
| [umls_major_concepts_resolver_pipeline](https://nlp.johnsnowlabs.com/2022/07/25/umls_major_concepts_resolver_pipeline_en_3_0.html)       | Clinical Major Concepts | UMLS CUI |
| [umls_drug_substance_resolver_pipeline](https://nlp.johnsnowlabs.com/2022/07/25/umls_drug_substance_resolver_pipeline_en_3_0.html)       | Drug Substances         | UMLS CUI |
| [medication_resolver_pipeline](https://nlp.johnsnowlabs.com/2022/09/01/medication_resolver_pipeline_en.html)                           | Drugs                   | RxNorm, UMLS, NDC, SNOMED CT |
| [medication_resolver_transform_pipeline](https://nlp.johnsnowlabs.com/2022/09/01/medication_resolver_transform_pipeline_en.html)                           | Drugs                   | RxNorm, UMLS, NDC, SNOMED CT |
| [icd9_resolver_pipeline](https://nlp.johnsnowlabs.com/2022/09/30/icd9_resolver_pipeline_en.html)                           | PROBLEM                   | ICD-9-CM |
| [cvx_resolver_pipeline](https://nlp.johnsnowlabs.com/2022/10/12/cvx_resolver_pipeline_en.html)                           | Vaccine                   | CVX |
| [icd10cm_resolver_pipeline](https://nlp.johnsnowlabs.com/2022/11/02/icd10cm_resolver_pipeline_en.html)                           | PROBLEM                     | ICD-10-CM |
|[abbreviation_pipeline](https://nlp.johnsnowlabs.com/2023/08/16/abbreviation_pipeline_en.html)|ABBR|definitions and categories.|
|[icd10cm_multi_mapper_pipeline](https://nlp.johnsnowlabs.com/2023/08/16/icd10cm_multi_mapper_pipeline_en.html)|ICD-10-CM|billable mappings, hcc codes, cause mappings, claim mappings, SNOMED codes, UMLS codes and ICD-9 codes|
|[rxnorm_multi_mapper_pipeline](https://nlp.johnsnowlabs.com/2023/08/16/rxnorm_multi_mapper_pipeline_en.html)|RxNorm codes|drug brand names, rxnorm extension brand names, action mappings, treatment mappings, UMLS codes, NDC product codes and NDC package codes|
|[rxnorm_resolver_pipeline](https://nlp.johnsnowlabs.com/2023/08/16/rxnorm_resolver_pipeline_en.html)|Drug|RxNorm codes|
|[snomed_multi_mapper_pipeline](https://nlp.johnsnowlabs.com/2023/08/16/snomed_multi_mapper_pipeline_en.html)|SNOMED codes|ICD-10, ICD-O,UMLS|


### umls_clinical_findings_resolver_pipeline

In [None]:
pipeline= nlp.PretrainedPipeline("umls_clinical_findings_resolver_pipeline", "en", "clinical/models")
result= pipeline.fullAnnotate("HTG-induced pancreatitis associated with an acute hepatitis, and obesity")[0]

umls_clinical_findings_resolver_pipeline download started this may take some time.
Approx size to download 4.1 GB
[OK!]


In [None]:
chunks=[]
entities=[]
resolver= []

for n, m in list(zip(result['chunk'], result["umls"])):

    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    resolver.append(m.result)

df = pd.DataFrame({'chunks':chunks, 'ner_label':entities, 'umls_code': resolver})

df

Unnamed: 0,chunks,ner_label,umls_code
0,HTG-induced pancreatitis,PROBLEM,C1963198
1,an acute hepatitis,PROBLEM,C4750596
2,obesity,PROBLEM,C1963185


### medication_resolver_pipeline

> A pretrained resolver pipeline to extract medications and resolve their adverse reactions (ADE), RxNorm, UMLS, NDC, SNOMED CT codes, and action/treatments in clinical text.

> Action/treatments are available for branded medication, and SNOMED codes are available for non-branded medication.

> This pipeline can be used as Lightpipeline (with `annotate/fullAnnotate`). You can use `medication_resolver_transform_pipeline` for Spark transform.

In [None]:
med_resolver_pipeline = nlp.PretrainedPipeline("medication_resolver_pipeline", "en", "clinical/models")

medication_resolver_pipeline download started this may take some time.
Approx size to download 2.9 GB
[OK!]


In [None]:
med_resolver_pipeline.model.stages

[DocumentAssembler_681a8b875c27,
 SentenceDetectorDLModel_6bafc4746ea5,
 REGEX_TOKENIZER_40adfa118221,
 WORD_EMBEDDINGS_MODEL_9004b1d00302,
 MedicalNerModel_419f5f2e48fa,
 NER_CONVERTER_6048b586aea4,
 ENTITY_EXTRACTOR_de22d49b574d,
 MERGE_ff522a97d806,
 CHUNKER-MAPPER_7316b33a7307,
 CHUNKER-MAPPER_f4e3b6461abe,
 ChunkMapperFilterer_84c54e6ec474,
 Chunk2Doc_471d0b2055b3,
 BERT_SENTENCE_EMBEDDINGS_0bee53f1b2cc,
 ENTITY_6ce17d64b14a,
 ResolverMerger_c81324f9d40e,
 ResolverMerger_51b98e724b7a,
 CHUNKER-MAPPER_4773f87b7a2f,
 CHUNKER-MAPPER_4773f87b7a2f,
 CHUNKER-MAPPER_5ac1d410c34d,
 CHUNKER-MAPPER_aa4297e14fc3,
 CHUNKER-MAPPER_2d7b0e176787,
 CHUNKER-MAPPER_2d7b0e176787,
 Finisher_ae4620b3f966]

In [None]:
text = """The patient was prescribed Amlodopine Vallarta 10-320mg, Eviplera. The other patient is given Lescol 40 MG and Everolimus 1.5 mg tablet."""

result = med_resolver_pipeline.fullAnnotate(text)[0]

In [None]:
result.keys()

dict_keys(['ner_chunk', 'NDC_Package', 'SNOMED_CT', 'RxNorm_Chunk', 'UMLS', 'Treatment', 'NDC_Product', 'ADE', 'Action', 'sentence'])

In [None]:
chunks = []
entities = []
ndc_package = []
ndc_product = []
snomed = []
rxnorm = []
umls = []
treatment = []
ade = []
action = []



for a, b, c, d, e, f, g, h, j in zip(result['ner_chunk'], result['NDC_Package'],
                                     result['SNOMED_CT'], result['RxNorm_Chunk'],
                                     result['UMLS'], result['Treatment'],
                                     result['NDC_Product'], result['ADE'],
                                     result['Action']
                                    ):

    chunks.append(a.result)
    entities.append(a.metadata['entity'])
    ndc_package.append(b.result)
    snomed.append(c.result)
    rxnorm.append(d.result)
    umls.append(e.result)
    treatment.append(f.result)
    ndc_product.append(g.result)
    ade.append(h.result)
    action.append(j.result)


df = pd.DataFrame({'chunks':chunks, 'label':entities, 'Treatment':treatment, 'ADE':ade, 'Action':action,
                   'snomed':snomed, 'rxnorm':rxnorm, 'umls': umls, 'NDC_package':ndc_package, 'NDC_Product':ndc_product})

df

Unnamed: 0,chunks,label,Treatment,ADE,Action,snomed,rxnorm,umls,NDC_package,NDC_Product
0,Amlodopine Vallarta 10-320mg,DRUG,NONE,Gynaecomastia,NONE,425838008,722131,C1949334,00093-7693-56,00093-7693
1,Eviplera,DRUG,Osteoporosis,Anxiety,Inhibitory Bone Resorption,NONE,217010,C0720318,NONE,NONE
2,Lescol 40 MG,DRUG,Heterozygous Familial Hypercholesterolemia,NONE,Hypocholesterolemic,NONE,103919,C0353573,00078-0234-05,00078-0234
3,Everolimus 1.5 mg tablet,DRUG,NONE,Acute myocardial infarction,NONE,NONE,2056895,C4723581,00054-0604-21,00054-0604


## Oncology Pipelines

**`oncology pretrained ` Model List**

This pipeline includes Named-Entity Recognition, Assertion Status, Relation Extraction and Entity Resolution models to extract information from oncology texts.


|index|model|index|model|
|-----:|:-----|-----:|:-----|
| 1| [oncology_biomarker_pipeline](https://nlp.johnsnowlabs.com/2022/12/01/oncology_biomarker_pipeline_en.html)  | 2| [oncology_general_pipeline](https://nlp.johnsnowlabs.com/2022/12/01/oncology_general_pipeline_en.html)  |
| 3| [oncology_therapy_pipeline](https://nlp.johnsnowlabs.com/2022/12/01/oncology_therapy_pipeline_en.html)  | 4| [oncology_diagnosis_pipeline](https://nlp.johnsnowlabs.com/2022/12/01/oncology_diagnosis_pipeline_en.html)  

In [None]:
oncology_pipeline = nlp.PretrainedPipeline("oncology_biomarker_pipeline", "en", "clinical/models")

oncology_biomarker_pipeline download started this may take some time.
Approx size to download 1.6 GB
[OK!]


In [None]:
oncology_pipeline.model.stages

[DocumentAssembler_dab6eeac879e,
 SentenceDetectorDLModel_6bafc4746ea5,
 REGEX_TOKENIZER_1f483a1f8252,
 WORD_EMBEDDINGS_MODEL_9004b1d00302,
 MedicalNerModel_ecf280ca65e5,
 NerConverter_929f666beebc,
 MedicalNerModel_aeadb24f76a3,
 NerConverter_4f9b8da8c4c3,
 MedicalNerModel_eb9da4b9039b,
 NerConverter_40028785be7b,
 MedicalNerModel_299a97740594,
 NerConverter_a8bf552d0249,
 MERGE_acda18b976a6,
 MERGE_12af4df6fa27,
 ASSERTION_DL_8d77f383c928,
 ASSERTION_DL_163867728788,
 POS_6f55785005bf,
 dependency_e7755462ba78,
 RelationExtractionModel_d0af74510daa,
 RelationExtractionModel_68ebe11369b6,
 RelationExtractionModel_513eb6317779]

In [None]:
text = """Immunohistochemistry was negative for thyroid transcription factor-1 and napsin A. The test was positive for ER and PR, and negative for HER2."""

result = oncology_pipeline.fullAnnotate(text)[0]

result.keys()

dict_keys(['re_oncology_granular_wip', 'assertion_oncology_test_binary_wip', 're_oncology_wip', 'ner_oncology_biomarker_wip_chunk', 'ner_biomarker_chunk', 'ner_oncology_test_wip', 'document', 're_oncology_biomarker_result_wip', 'merged_chunk', 'ner_oncology_biomarker_wip', 'ner_biomarker', 'ner_oncology_test_wip_chunk', 'token', 'embeddings', 'pos_tags', 'assertion_oncology_wip', 'assertion_chunk', 'ner_oncology_wip', 'dependencies', 'ner_oncology_wip_chunk', 'sentence'])

**NER Results**

In [None]:
chunks=[]
entities=[]
begins=[]
ends=[]
confidence=[]
for n in result['merged_chunk']:

    chunks.append(n.result)
    begins.append(n.begin)
    ends.append(n.end)
    entities.append(n.metadata['entity'])
    confidence.append(n.metadata['confidence'])

df = pd.DataFrame({'chunks':chunks, 'begin':begins, 'end':ends, 'entities':entities, 'confidence':confidence})

df

Unnamed: 0,chunks,begin,end,entities,confidence
0,Immunohistochemistry,0,19,Pathology_Test,0.9967
1,negative,25,32,Biomarker_Result,0.8323
2,thyroid transcription factor-1,38,67,Biomarker,0.296675
3,napsin A,73,80,Biomarker,0.64309996
4,positive,96,103,Biomarker_Result,0.8017
5,ER,109,110,Biomarker,0.948
6,PR,116,117,Biomarker,0.8711
7,negative,124,131,Biomarker_Result,0.8385
8,HER2,137,140,Oncogene,0.9359


**Assertion Status Results**

In [None]:
chunks=[]
entities=[]
status=[]
confidence=[]

for n,m in zip(result['merged_chunk'],result['assertion_oncology_test_binary_wip']):

    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    status.append(m.result)
    confidence.append(m.metadata['confidence'])

df = pd.DataFrame({'chunks':chunks, 'entities':entities, 'assertion':status, 'confidence':confidence})

df

Unnamed: 0,chunks,entities,assertion,confidence
0,Immunohistochemistry,Pathology_Test,Medical_History,0.9926
1,negative,Biomarker_Result,Medical_History,0.9951
2,thyroid transcription factor-1,Biomarker,Medical_History,0.9951
3,napsin A,Biomarker,Medical_History,0.9926
4,positive,Biomarker_Result,Medical_History,0.9931
5,ER,Biomarker,Medical_History,0.9938


**Relation Extraction Results**

In [None]:
result = oncology_pipeline.fullAnnotate(text)

rel_df = get_relations_df(result, 're_oncology_wip')

rel_df[rel_df.relation!= "O"]

Unnamed: 0,relation,entity1,entity1_begin,entity1_end,chunk1,entity2,entity2_begin,entity2_end,chunk2,confidence
1,is_related_to,Biomarker_Result,25,32,negative,Biomarker,38,67,thyroid transcription factor-1,0.997901
2,is_related_to,Biomarker_Result,25,32,negative,Biomarker,73,80,napsin A,0.999566
3,is_related_to,Biomarker_Result,96,103,positive,Biomarker,109,110,ER,0.98782
4,is_related_to,Biomarker_Result,96,103,positive,Biomarker,116,117,PR,0.897783
8,is_related_to,Biomarker_Result,124,131,negative,Oncogene,137,140,HER2,0.986855


In [None]:
result[0]['re_oncology_wip']

[Annotation(category, 0, 32, O, {'chunk2': 'negative', 'confidence': '0.97084755', 'entity2_end': '32', 'chunk1': 'Immunohistochemistry', 'entity1': 'Pathology_Test', 'entity2_begin': '25', 'chunk2_confidence': '0.8323', 'entity1_begin': '0', 'sentence': '0', 'direction': 'both', 'entity1_end': '19', 'entity2': 'Biomarker_Result', 'chunk1_confidence': '0.9967'}, []),
 Annotation(category, 25, 67, is_related_to, {'chunk2': 'thyroid transcription factor-1', 'confidence': '0.99790084', 'entity2_end': '67', 'chunk1': 'negative', 'entity1': 'Biomarker_Result', 'entity2_begin': '38', 'chunk2_confidence': '0.296675', 'entity1_begin': '25', 'sentence': '0', 'direction': 'both', 'entity1_end': '32', 'entity2': 'Biomarker', 'chunk1_confidence': '0.8323'}, []),
 Annotation(category, 25, 80, is_related_to, {'chunk2': 'napsin A', 'confidence': '0.9995658', 'entity2_end': '80', 'chunk1': 'negative', 'entity1': 'Biomarker_Result', 'entity2_begin': '73', 'chunk2_confidence': '0.64309996', 'entity1_beg