![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/11.Pretrained_Clinical_Pipelines.ipynb)

# 11. Pretrained_Clinical_Pipelines

## Colab Setup

In [None]:
import json
import os

from google.colab import files

if 'spark_jsl.json' not in os.listdir():
  license_keys = files.upload()
  os.rename(list(license_keys.keys())[0], 'spark_jsl.json')

with open('spark_jsl.json') as f:
    license_keys = json.load(f)

# Defining license key-value pairs as local variables
locals().update(license_keys)
os.environ.update(license_keys)

In [None]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.4.1 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

In [3]:
import json
import os

import sparknlp
import sparknlp_jsl

from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp_jsl.annotator import *

from pyspark.ml import Pipeline
from pyspark.sql import SparkSession

params = {"spark.driver.memory":"16G",
          "spark.kryoserializer.buffer.max":"2000M",
          "spark.driver.maxResultSize":"2000M"}

spark = sparknlp_jsl.start(license_keys['SECRET'],params=params)

print("Spark NLP Version :", sparknlp.version())
print("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark

Spark NLP Version : 6.1.3
Spark NLP_JSL Version : 6.1.1


## Healthcare NLP for Data Scientists Course

If you are not familiar with the components in this notebook, you can check [Healthcare NLP for Data Scientists Udemy Course](https://www.udemy.com/course/healthcare-nlp-for-data-scientists/) and the [MOOC Notebooks](https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/Spark_NLP_Udemy_MOOC/Healthcare_NLP) for each components.

## Listing Models, Pipelines and Annotators

**You can print the list of clinical pretrained models/pipelines and annotators in Spark NLP with one-line code:**

In [4]:
from sparknlp_jsl.pretrained import InternalResourceDownloader

# print PretrainedPipelines
InternalResourceDownloader.showPrivatePipelines(lang='en')

# print models
#InternalResourceDownloader.showPrivateModels(annotator="MedicalNerModel", lang='en')

# print annotators
# InternalResourceDownloader.showAvailableAnnotators()

+--------------------------------------------------------------+------+---------+
| Pipeline                                                     | lang | version |
+--------------------------------------------------------------+------+---------+
| clinical_analysis                                            |  en  | 2.4.0   |
| clinical_ner_assertion                                       |  en  | 2.4.0   |
| clinical_deidentification                                    |  en  | 2.4.0   |
| explain_clinical_doc_ade                                     |  en  | 2.7.3   |
| recognize_entities_posology                                  |  en  | 3.0.0   |
| explain_clinical_doc_carp                                    |  en  | 3.0.0   |
| explain_clinical_doc_ade                                     |  en  | 3.0.0   |
| explain_clinical_doc_era                                     |  en  | 3.0.0   |
| icd10cm_snomed_mapping                                       |  en  | 3.0.2   |
| snomed_icd10cm

## Pretrained Pipelines

In order to save you from creating a pipeline from scratch, Spark NLP also has a pre-trained pipelines that are already fitted using certain annotators and transformers according to various use cases.

**1.   explain_clinical_doc_granular** :

> A pipeline with `ner_jsl`, `assertion_jsl` and `re_test_result_date`. It will extract clinical and medication entities, assign assertion status and find relationships between clinical entities.

**2.   explain_clinical_doc_carp** :

> A pipeline with `ner_clinical`, `assertion_dl`, `re_clinical` and `ner_posology`. It will extract clinical and medication entities, assign assertion status and find relationships between clinical entities.

**3.   explain_clinical_doc_era** :

> A pipeline with `ner_clinical_events`, `assertion_dl` and `re_temporal_events_clinical`. It will extract clinical entities, assign assertion status and find temporal relationships between clinical entities.

**4.   explain_clinical_doc_ade** :

> A pipeline for `Adverse Drug Events (ADE)` with `ner_ade_biobert`, `assertiondl_biobert`, `classifierdl_ade_conversational_biobert` and `re_ade_biobert`. It will classify the document, extract `ADE` and `DRUG` entities, assign assertion status to `ADE` entities, and relate them with `DRUG` entities, then assign ADE status to a text (`True` means ADE, `False` means not related to ADE).

**letter codes in the naming conventions:**

> c : ner_clinical

> e : ner_clinical_events

> r : relation extraction

> p : ner_posology

> a : assertion

> ade : adverse drug events

**Relation Extraction types:**

`re_clinical` >> TrIP (improved), TrWP (worsened), TrCP (caused problem), TrAP (administered), TrNAP (avoided), TeRP (revealed problem), TeCP (investigate problem), PIP (problems related)

`re_temporal_events_clinical` >> `AFTER`, `BEFORE`, `OVERLAP`

**5. explain_clinical_doc_medication:**

> A pipeline for detecting posology entities with the `ner_posology_large` NER model, assigning their assertion status with `assertion_jsl` model, and extracting relations between posology-related terminology with `posology_re` relation extraction model.


**6. explain_clinical_doc_radiology**

> A pipeline for detecting radiology entities with the `ner_radiology` NER model, assigning their assertion status with `assertion_dl_radiology` model, and extracting relations between the diagnosis, test, and findings with `re_test_problem_finding` relation extraction model.

**7. Clinical Deidentification** :

>This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate `AGE`, `CONTACT`, `DATE`, `ID`, `LOCATION`, `NAME`, `PROFESSION`, `CITY`, `COUNTRY`, `DOCTOR`, `HOSPITAL`, `IDNUM`, `MEDICALRECORD`, `ORGANIZATION`, `PATIENT`, `PHONE`, `PROFESSION`, `STREET`, `USERNAME`, `ZIP`, `ACCOUNT`, `LICENSE`, `VIN`, `SSN`, `DLN`, `PLATE`, `IPADDR` entities.

**8.   explain_clinical_doc_generic** :

> A pipeline with `ner_clinical`, `assertion_dl` and `re_clinical`. It will extract clinical and medication entities, assign assertion status and find relationships between clinical entities.

- Clinical Entity Labels: `PROBLEM`, `TEST`, `TREATMENT`

- Assertion Status Labels: `Present`, `Absent`, `Possible`, `Planned`, `Past`, `Family`, `Hypotetical`, `SomeoneElse`

- Relation Extraction Labels: `TrAP`, `TeRP`, `TrIP`, `TrWP`, `TrCP`, `TrAP`, `TrNAP`, `TeCP`, `PIP`

**9.   explain_clinical_doc_oncology** :

> Pipelines include Named-Entity Recognition, Assertion Status and Relation Extraction models to extract information from oncology texts.


**10.   explain_clinical_doc_vop** :

> Pipelines include Named-Entity Recognition, Assertion Status, Relation Extraction and Entity Resolution models to extract information from clinical texts.

**11.   explain_clinical_doc_public_health** :

> Pipelines include Named-Entity Recognition, Assertion Status and Relation Extraction models to extract information from clinical texts.

**12.   explain_clinical_doc_biomarker** :

> Pipelines include Named-Entity Recognition, Text Matcher, Sentence Classifier and Relation Extraction models to extract information from clinical texts.

**13.   explain_clinical_doc_sdoh** :

> This pipeline is designed to extract all clinical/medical entities, assertion status, and relation informations which may be considered as Social Determinants of Health (SDOH) entities from text.

**14.   explain_clinical_doc_mental_health** :

> This pipeline is designed to extract all mental health-related entities, assertion status, and relation information from text.

**15.   ner_medication_generic_pipeline** :

> This pre-trained pipeline is designed to identify generic `DRUG` entities in clinical texts. It was built on top of the `ner_posology_greedy`, `ner_jsl_greedy`, `ner_drugs_large` and `drug_matcher` models to detect the entities `DRUG`, `DOSAGE`, `ROUTE` and `STRENGTH`, chunking them into a larger entity as `DRUG` when they appear together.

**16. NER Pipelines:**

> Pipelines for all the available pretrained NER models.

**17. BERT Based NER Pipelines**

> Pipelines for all the available Bert token classification models.

**18. ner_profiling_clinical and ner_profiling_biobert:**

> Pipelines for exploring all the available pretrained NER models at once.

**19. ner_model_finder**

> A pipeline trained with bert embeddings that can be used to find the most appropriate NER model given the entity name.

**20. Resolver Pipelines**

> Pipelines for converting clinical entities to their UMLS CUI codes and medication entities to their ADE, Action, Treatment, UMLS, RxNorm, ICD9, SNOMED and NDC codes.

**21. Oncology Pipelines**

> Pipelines include Named-Entity Recognition, Assertion Status, Relation Extraction and Entity Resolution models to extract information from oncology texts.


**Also, you can find clinical CODE MAPPING pretrained pipelines in this notebook: [Healthcare Code Mapping Notebook](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/11.1.Healthcare_Code_Mapping.ipynb)**




# Task Based Pretrained Pipelines

|index|pipeline|description|
|-----:|:-----|:-----|
| 1| [explain_clinical_doc_generic](https://nlp.johnsnowlabs.com/2024/01/17/explain_clinical_doc_generic_en.html)  |This pipeline is designed to extract all clinical/medical entities,<br> assign assertion status to the extracted entities,<br> establish relations between the extracted entities from the clinical texts.|
| 2| [explain_clinical_doc_granular](https://nlp.johnsnowlabs.com/2024/01/26/explain_clinical_doc_granular_en.html)  |This pipeline is designed to extract all clinical/medical entities,<br> assign assertion status to the extracted entities,<br> establish relations between the extracted entities from the clinical texts.|
| 3| [explain_clinical_doc_biomarker](https://nlp.johnsnowlabs.com/2024/03/11/explain_clinical_doc_biomarker_en.html)  |This specialized biomarker pipeline can extract biomarker entities,<br>classify sentences whether they contain biomarker entities or not,<br>establish relations between the extracted biomarker and biomarker<br>results from the clinical documents.|
| 4| [explain_clinical_doc_oncology](https://nlp.johnsnowlabs.com/2024/01/18/explain_clinical_doc_oncology_en.html)  |This specialized oncology pipeline can extract oncological entities,<br>  assign assertion status to the extracted entities, <br> establish relations between the extracted entities from the clinical documents.|
| 5| [explain_clinical_doc_radiology](https://nlp.johnsnowlabs.com/2024/01/18/explain_clinical_doc_radiology_en.html) |This pipeline is designed to extract all clinical/medical entities,<br> assign assertion status to the extracted entities,<br> establish relations between the extracted entities from the clinical texts.|
| 6| [explain_clinical_doc_vop](https://nlp.johnsnowlabs.com/2024/01/16/explain_clinical_doc_vop_en.html)|This pipeline is designed to extract healthcare-related terms entities,<br> assign assertion status to the extracted entities, establish relations<br> between the extracted entities from the documents transferred from the patient’s sentences.|
| 7| [explain_clinical_doc_carp](https://nlp.johnsnowlabs.com/2023/06/17/explain_clinical_doc_carp_en.html) |A pipeline with `ner_clinical`, `assertion_dl`, `re_clinical` and `ner_posology`. <br>It will extract clinical and medication entities, <br>assign assertion status and find relationships between clinical entities.|
| 8| [explain_clinical_doc_era](https://nlp.johnsnowlabs.com/2023/06/17/explain_clinical_doc_era_en.html)  |A pipeline with `ner_clinical_events`, `assertion_dl` and `re_temporal_events_clinical`. <br>It will extract clinical entities, assign assertion status and find <br>temporal relationships between clinical entities.|
| 9| [explain_clinical_doc_ade](https://nlp.johnsnowlabs.com/2023/06/17/explain_clinical_doc_ade_en.html)|A pipeline for Adverse Drug Events (ADE) with `ner_ade_biobert`, `assertion_dl_biobert`, <br>`classifierdl_ade_conversational_biobert`, and `re_ade_biobert` . <br>It will classify the document, extract ADE and DRUG clinical<br> entities, assign assertion status to ADE entities, <br>and relate Drugs with their ADEs.|
| 10| [explain_clinical_doc_medication](https://nlp.johnsnowlabs.com/2023/06/17/explain_clinical_doc_medication_en.html)|A pipeline for detecting posology entities with the `ner_posology_large` <br>NER model, assigning their assertion status with `assertion_jsl` model<br>, and extracting relations between posology-related terminology<br> with `posology_re` relation extraction model.|
| 11| [explain_clinical_doc_risk_factors](https://nlp.johnsnowlabs.com/2024/03/25/explain_clinical_doc_risk_factors_en.html)|This pipeline is designed to extract all clinical/medical entities,<br> which may be considered as risk factors from text, <br>assign assertion status to the extracted entities, establish relations between the extracted entities.|
| 12| [explain_clinical_doc_public_health](https://nlp.johnsnowlabs.com/2024/03/19/explain_clinical_doc_public_health_en.html)|This specialized public health pipeline can extract public health-related entities,<br> assign assertion status to the extracted entities,<br> establish relations between the extracted entities <br>from the clinical documents. In this pipeline, five NER, one assertion, <br>and one relation extraction model were used to achieve those tasks.|
| 13| [explain_clinical_doc_sdoh](https://nlp.johnsnowlabs.com/2024/05/01/explain_clinical_doc_sdoh_en.html)|This pipeline is designed to extract all clinical/medical entities, <br>assertion status, and relation informations which may be considered as <br>Social Determinants of Health (SDOH) entities from text.|
| 14| [explain_clinical_doc_mental_health](https://nlp.johnsnowlabs.com/2024/05/06/explain_clinical_doc_mental_health_en.html)|This pipeline is designed to extract all mental health-related entities, <br>assertion status, and relation information from text.|
| 15| [ner_medication_generic_pipeline](https://nlp.johnsnowlabs.com/2024/04/25/ner_medication_generic_pipeline_en.html)|This pre-trained pipeline is designed to identify generic<br> `DRUG` entities in clinical texts. It was built on top of the `ner_posology_greedy`, `ner_jsl_greedy`, <br>`ner_drugs_large` and `drug_matcher` models to detect the entities <br>`DRUG`, `DOSAGE`, `ROUTE` and `STRENGTH`, chunking<br> them into a larger entity as `DRUG` when they appear together.|
| 16| [explain_clinical_doc_oncology_slim](https://nlp.johnsnowlabs.com/2025/02/06/explain_clinical_doc_oncology_slim_en.html)|This specialized oncology pipeline can extract oncological and cancer type entities,<br> assign assertion status to these entities, and establish<br> relationships between the extracted entities from clinical documents.|
| 17| [explain_clinical_doc_generic_light](https://nlp.johnsnowlabs.com/2025/06/26/explain_clinical_doc_generic_light_en.html)|This pipeline is designed to extract clinical/medical entities.<br> In this pipeline, four NER models are used to extract these Clinical Entity Labels: `PROBLEM`, `TEST`, `TREATMENT`|
| 18| [explain_clinical_doc_biomarker_light](https://nlp.johnsnowlabs.com/2025/06/27/explain_clinical_doc_biomarker_light_en.html)|This pipeline is designed to extract biomarker-related entities from text. <br>In this pipeline, three NER models and a Text Matcher are used to extract the biomarkers and their results.|
| 19| [explain_clinical_doc_medication_generic_light](https://nlp.johnsnowlabs.com/2025/06/27/explain_clinical_doc_medication_generic_light_en.html)|This pipeline is designed to extract medication entities in generic form from texts. <br>In this pipeline, two NER models and a Text Matcher are used to extract the medication entities.|
| 20| [explain_clinical_doc_medication_light](https://nlp.johnsnowlabs.com/2025/06/27/explain_clinical_doc_medication_light_en.html)|This pipeline is designed to extract medication entities from texts. <br>In this pipeline, two NER models and a Text Matcher are used to extract these entities.|
| 21| [explain_clinical_doc_mental_health_light](https://nlp.johnsnowlabs.com/2025/06/27/explain_clinical_doc_mental_health_light_en.html)|This pipeline is designed to extract mental health-related clinical/medical entities from text. <br>In this pipeline, two NER models and a Text Matcher are used to extract the clinical entities.|
| 22| [explain_clinical_doc_oncology_light](https://nlp.johnsnowlabs.com/2025/06/27/explain_clinical_doc_oncology_light_en.html)|This pipeline is designed to extract oncology-related clinical/medical entities. <br>In this pipeline, four NER models and two Text Matchers are used to extract the clinical entity labels.|
| 23| [explain_clinical_doc_public_health_light](https://nlp.johnsnowlabs.com/2025/06/27/explain_clinical_doc_public_health_light_en.html)|This pipeline is designed to extract public health-related clinical/medical entities from text. <br>In this pipeline, three NER models and a Text Matcher are used to extract the clinical entity labels.|
| 24| [explain_clinical_doc_radiology_light](https://nlp.johnsnowlabs.com/2025/06/27/explain_clinical_doc_radiology_light_en.html)|This pipeline is designed to extract radiology-related clinical/medical entities. <br>In this pipeline, three NER models are used to extract the clinical entity labels.|
| 25| [explain_clinical_doc_risk_factors_light](https://nlp.johnsnowlabs.com/2025/06/27/explain_clinical_doc_risk_factors_light_en.html)|This pipeline is designed to extract entities that may be considered as risk factors from text. <br>In this pipeline, three NER models and two Text Matchers are used to extract the related clinical/medical entity labels.|
| 26| [explain_clinical_doc_sdoh_light](https://nlp.johnsnowlabs.com/2025/06/27/explain_clinical_doc_sdoh_light_en.html)|This pipeline is designed to extract social determinants of health-related clinical/medical entities from text. <br>In this pipeline, three NER models and two Text Matchers are used to extract the related entities from text.|
| 27| [explain_clinical_doc_vop_light](https://nlp.johnsnowlabs.com/2025/06/27/explain_clinical_doc_vop_light_en.html)|This pipeline is designed to extract clinical/medical entities from texts written by non-healthcare professionals. <br>In this pipeline, three NER models and two Text Matchers are used to extract the clinical entity labels.|
| 28 | [explain_clinical_doc_ade_light](https://nlp.johnsnowlabs.com/2025/06/30/explain_clinical_doc_ade_light_en.html) | This pipeline is designed to extract `ADE` and `DRUG` entities and establish relations between the extracted `DRUG` and `ADE` results from the clinical documents. <br>Two NER models and a Text Matcher are used to accomplish the designated tasks. |
| 29 | [explain_clinical_doc_granular_light](https://nlp.johnsnowlabs.com/2025/06/30/explain_clinical_doc_granular_light_en.html) | This pipeline is designed to extract clinical entities and establish relations between the extracted entities. |
| 30 | [explain_clinical_doc_vop_small](https://nlp.johnsnowlabs.com/2024/09/09/explain_clinical_doc_vop_small_en.html) | This pipeline is designed to extract all clinical/medical entities, assertion status, and relation informations which may be considered as Voice Of Patient (VOP) entities from text. |
| 31 | [explain_clinical_doc_cancer_type](https://nlp.johnsnowlabs.com/2024/09/16/explain_clinical_doc_cancer_type_en.html) | This pipeline is designed to extract all clinical/medical entities, assertion status, and relation informations which may be considered as extract oncological and cancer type entities from text. |
| 32 | [explain_clinical_doc_sdoh_small](https://nlp.johnsnowlabs.com/2024/09/27/explain_clinical_doc_sdoh_small_en.html) | This pipeline is designed to extract all social determinants of health (SDOH) entities from text, assign assertion status to the extracted entities, establish relations between the extracted entities. |

### explain_clinical_doc_granular

A pipeline with ner_jsl, assertion_jsl and re_test_result_date. It will extract clinical and medication entities, assign assertion status and find relationships between clinical entities.

- Clinical Entity Labels: `Admission_Discharge`, `Age`, `Alcohol`, `Allergen`, `BMI`, `Birth_Entity`, `Blood_Pressure`, `Cerebrovascular_Disease`, `Clinical_Dept`, `Communicable_Disease`, `Date`, `Death_Entity`, `Diabetes`, `Diet`, `Direction`, `Disease_Syndrome_Disorder`, `Dosage`, `Drug_BrandName`, `Drug_Ingredient`, `Duration`, `EKG_Findings`, `Employment`, `External_body_part_or_region`, `Family_History_Header`, `Fetus_NewBorn`, `Form`, `Frequency`, `Gender`, `HDL`, `Heart_Disease`, `Height`, `Hyperlipidemia`, `Hypertension`, `ImagingFindings`, `Imaging_Technique`, `Injury_or_Poisoning`, `Internal_organ_or_component`, `Kidney_Disease`, `LDL`, `Labour_Delivery`, `Medical_Device`, `Medical_History_Header`, `Modifier`, `O2_Saturation`, `Obesity`, `Oncological`, `Overweight`, `Oxygen_Therapy`, `Pregnancy`, `Procedure`, `Psychological_Condition`, `Pulse`, `Race_Ethnicity`, `Relationship_Status`, `RelativeDate`, `RelativeTime`, `Respiration`, `Route`, `Section_Header`, `Sexually_Active_or_Sexual_Orientation`, `Smoking`, `Social_History_Header`, `Strength`, `Substance`, `Substance_Quantity`, `Symptom`, `Temperature`, `Test`, `Test_Result`, `Time`, `Total_Cholesterol`, `Treatment`, `Triglycerides`, `VS_Finding`, `Vaccine`, `Vaccine_Name`, `Vital_Signs_Header`, `Weight`

- Assertion Status Labels: `Hypothetical`, `Someoneelse`, `Past`, `Absent`, `Family`, `Planned`, `Possible`, `Present`

- Relation Extraction Labels: `is_finding_of`, `is_date_of`, `is_result_of`, `O`


In [5]:
from sparknlp.pretrained import PretrainedPipeline

In [6]:
pipeline = PretrainedPipeline("explain_clinical_doc_granular", "en", "clinical/models")

explain_clinical_doc_granular download started this may take some time.
Approx size to download 1.6 GB
[OK!]


In [7]:
pipeline.model.stages

[DocumentAssembler_303213c0081e,
 SentenceDetectorDLModel_6bafc4746ea5,
 REGEX_TOKENIZER_c22a01cc8d15,
 WORD_EMBEDDINGS_MODEL_9004b1d00302,
 MedicalNerModel_c89cbceb1028,
 NER_CONVERTER_45a759a88cab,
 NER_CONVERTER_61818d1722ef,
 ASSERTION_DL_46798c01711e,
 POS_6f55785005bf,
 dependency_e7755462ba78,
 RelationExtractionModel_ce79d77d1bf1,
 PosologyREModel_97f30e947e43,
 AnnotationMerger_54286a4bee17]

In [8]:
# Load pretrained pipeline from local disk:

# pipeline_local = PretrainedPipeline.from_disk('/root/cache_pretrained/explain_clinical_doc_granular_en_5.2.1_3.4_1706289187844')

#### with fullAnnotate()

In [9]:
result = pipeline.fullAnnotate("""The patient admitted for gastrointestinal pathology, under working treatment.
History of prior heart murmur with echocardiogram findings as above on March 1998.
According to the latest echocardiogram, basically revealed normal left ventricular function with left atrial enlargement .
Based on the above findings, we will treat her medically with ACE inhibitors and diuretics and see how she fares.""")

In [10]:
result[0].keys()

dict_keys(['assertion_ner_chunk', 'test_result_date_relations', 'document', 'posology_relations', 'jsl_ner_chunk', 'assertion', 'all_relations', 'jsl_ner', 'token', 'embeddings', 'pos_tags', 'dependencies', 'sentence'])

**NER Results**

In [11]:
import pandas as pd
chunks=[]
entities=[]
begins=[]
ends=[]

for n in result[0]['jsl_ner_chunk']:

    chunks.append(n.result)
    begins.append(n.begin)
    ends.append(n.end)
    entities.append(n.metadata['entity'])

df = pd.DataFrame({'chunks':chunks, 'begin':begins, 'end':ends, 'entities':entities})

df

Unnamed: 0,chunks,begin,end,entities
0,admitted,12,19,Admission_Discharge
1,gastrointestinal pathology,25,50,Clinical_Dept
2,heart murmur,95,106,Heart_Disease
3,echocardiogram,113,126,Test
4,March 1998,149,158,Date
5,echocardiogram,185,198,Test
6,normal,220,225,Test_Result
7,left ventricular function,227,251,Test
8,left atrial enlargement,258,280,Heart_Disease
9,her,327,329,Gender


**Assertion Status Results**

In [12]:
chunks=[]
entities=[]
status=[]
begin=[]
end=[]


for n,m in zip(result[0]['assertion_ner_chunk'],result[0]['assertion']):

    chunks.append(n.result)
    begin.append(n.begin)
    end.append(n.end)
    entities.append(n.metadata['entity'])
    status.append(m.result)


df = pd.DataFrame({'chunks':chunks, 'begin':begin, 'end':end, 'entities':entities, 'assertion':status})

df

Unnamed: 0,chunks,begin,end,entities,assertion
0,heart murmur,95,106,Heart_Disease,Past
1,echocardiogram,113,126,Test,Past
2,echocardiogram,185,198,Test,Past
3,normal,220,225,Test_Result,Present
4,left ventricular function,227,251,Test,Past
5,left atrial enlargement,258,280,Heart_Disease,Present
6,ACE inhibitors,346,359,Drug_Ingredient,Planned
7,diuretics,365,373,Drug_Ingredient,Hypothetical


**Relation Extraction Results**

In [13]:
rel_pairs=[]
for rel in result[0]["all_relations"]:
    rel_pairs.append((
        rel.result,
        rel.metadata['entity1'],
        rel.metadata['entity1_begin'],
        rel.metadata['entity1_end'],
        rel.metadata['chunk1'],
        rel.metadata['entity2'],
        rel.metadata['entity2_begin'],
        rel.metadata['entity2_end'],
        rel.metadata['chunk2'],
        rel.metadata['confidence']
    ))

rel_df = pd.DataFrame(rel_pairs, columns=['relation',
                                          'entity1',
                                          'entity1_begin',
                                          'entity1_end',
                                          'chunk1',
                                          'entity2',
                                          'entity2_begin',
                                          'entity2_end',
                                          'chunk2',
                                          'confidence'])

rel_df.confidence = rel_df.confidence.astype(float)
rel_df[rel_df.relation!="O"]

Unnamed: 0,relation,entity1,entity1_begin,entity1_end,chunk1,entity2,entity2_begin,entity2_end,chunk2,confidence
0,is_finding_of,Heart_Disease,95,106,heart murmur,Test,113,126,echocardiogram,1.0
1,is_date_of,Heart_Disease,95,106,heart murmur,Date,149,158,March 1998,1.0
2,is_date_of,Test,113,126,echocardiogram,Date,149,158,March 1998,1.0
4,is_finding_of,Test,185,198,echocardiogram,Heart_Disease,258,280,left atrial enlargement,1.0
5,is_result_of,Test_Result,220,225,normal,Test,227,251,left ventricular function,1.0
6,is_finding_of,Test,227,251,left ventricular function,Heart_Disease,258,280,left atrial enlargement,1.0


#### with transform()

In [14]:
data = spark.createDataFrame([["""
The patient admitted for gastrointestinal pathology, under working treatment.
History of prior heart murmur with echocardiogram findings as above on March 1998.
According to the latest echocardiogram, basically revealed normal left ventricular function with left atrial enlargement .
Based on the above findings, we will treat her medically with ACE inhibitors and diuretics and see how she fares.
"""]]).toDF("text")

result = pipeline.transform(data)

result.select(F.explode(F.arrays_zip(result.jsl_ner_chunk.result,
                                     result.jsl_ner_chunk.begin,
                                     result.jsl_ner_chunk.end,
                                     result.jsl_ner_chunk.metadata)).alias("cols"))\
        .select(F.expr("cols['0']").alias("chunk"),
                F.expr("cols['1']").alias("begin"),
                F.expr("cols['2']").alias("end"),
                F.expr("cols['3']['entity']").alias("ner_label")).show()

+--------------------+-----+---+-------------------+
|               chunk|begin|end|          ner_label|
+--------------------+-----+---+-------------------+
|            admitted|   13| 20|Admission_Discharge|
|gastrointestinal ...|   26| 51|      Clinical_Dept|
|        heart murmur|   96|107|      Heart_Disease|
|      echocardiogram|  114|127|               Test|
|          March 1998|  150|159|               Date|
|      echocardiogram|  186|199|               Test|
|              normal|  221|226|        Test_Result|
|left ventricular ...|  228|252|               Test|
|left atrial enlar...|  259|281|      Heart_Disease|
|                 her|  328|330|             Gender|
|      ACE inhibitors|  347|360|    Drug_Ingredient|
|           diuretics|  366|374|    Drug_Ingredient|
|                 she|  388|390|             Gender|
+--------------------+-----+---+-------------------+



In [15]:
result.select(F.explode(F.arrays_zip(result.assertion_ner_chunk.result,
                                     result.assertion_ner_chunk.begin,
                                     result.assertion_ner_chunk.end,
                                     result.assertion_ner_chunk.metadata,
                                     result.assertion.result)).alias("cols"))\
        .select(F.expr("cols['0']").alias("chunk"),
                F.expr("cols['1']").alias("begin"),
                F.expr("cols['2']").alias("end"),
                F.expr("cols['3']['entity']").alias("ner_label"),
                F.expr("cols['4']").alias("status")).show()

+--------------------+-----+---+---------------+------------+
|               chunk|begin|end|      ner_label|      status|
+--------------------+-----+---+---------------+------------+
|        heart murmur|   96|107|  Heart_Disease|        Past|
|      echocardiogram|  114|127|           Test|        Past|
|      echocardiogram|  186|199|           Test|        Past|
|              normal|  221|226|    Test_Result|     Present|
|left ventricular ...|  228|252|           Test|        Past|
|left atrial enlar...|  259|281|  Heart_Disease|     Present|
|      ACE inhibitors|  347|360|Drug_Ingredient|     Planned|
|           diuretics|  366|374|Drug_Ingredient|Hypothetical|
+--------------------+-----+---+---------------+------------+



In [16]:
result.select(F.explode(F.arrays_zip(result.token.result,
                                     result.jsl_ner.result,
                                     result.pos_tags.result,
                                     result.dependencies.result)).alias("cols"))\
        .select(F.expr("cols['0']").alias("tokens"),
                F.expr("cols['1']").alias("jsl_ner"),
                F.expr("cols['2']").alias("pos_tags"),
                F.expr("cols['3']").alias("dependencies")).show()

+----------------+--------------------+--------+------------+
|          tokens|             jsl_ner|pos_tags|dependencies|
+----------------+--------------------+--------+------------+
|             The|                   O|      DD|    admitted|
|         patient|                   O|      NN|    admitted|
|        admitted|B-Admission_Disch...|     VVN|        ROOT|
|             for|                   O|      II|   pathology|
|gastrointestinal|     B-Clinical_Dept|      JJ|   pathology|
|       pathology|     I-Clinical_Dept|      NN|    admitted|
|               ,|                   O|      NN|    admitted|
|           under|                   O|      II|   treatment|
|         working|                   O|    VVGJ|   treatment|
|       treatment|                   O|      NN|    admitted|
|               .|                   O|      NN|    admitted|
|         History|                   O|      NN|        ROOT|
|              of|                   O|      II|       heart|
|       

### explain_clinical_doc_carp

A pipeline with ner_clinical, assertion_dl, re_clinical and ner_posology. It will extract clinical and medication entities, assign assertion status and find relationships between clinical entities.

In [17]:
pipeline = PretrainedPipeline('explain_clinical_doc_carp', 'en', 'clinical/models')

explain_clinical_doc_carp download started this may take some time.
Approx size to download 1.6 GB
[OK!]


In [18]:
pipeline.model.stages

[DocumentAssembler_9619f8fd837c,
 SentenceDetector_c0b14c755033,
 REGEX_TOKENIZER_3087df5b9e9d,
 POS_6f55785005bf,
 dependency_d5a8da6c9093,
 WORD_EMBEDDINGS_MODEL_9004b1d00302,
 MedicalNerModel_cd5ce67b529f,
 NER_CONVERTER_2f1dcb61b142,
 MedicalNerModel_4a303d875127,
 NER_CONVERTER_a8cff4d56af8,
 ASSERTION_DL_25881ab6309e,
 RelationExtractionModel_9c255241fec3]

In [19]:
text ="""A 28-year-old female with a history of gestational diabetes mellitus, used to take metformin 1000 mg two times a day, presented with a one-week history of polyuria , polydipsia , poor appetite , and vomiting .
She was seen by the endocrinology service and discharged on 40 units of insulin glargine at night, 12 units of insulin lispro with meals.
"""

annotations = pipeline.annotate(text)

annotations.keys()


dict_keys(['sentences', 'clinical_ner_tags', 'document', 'clinical_ner_chunks', 'assertion', 'clinical_relations', 'posology_ner_tags', 'tokens', 'posology_ner_chunks', 'embeddings', 'pos_tags', 'dependencies'])

In [20]:
import pandas as pd

rows = list(zip(annotations['tokens'], annotations['clinical_ner_tags'], annotations['posology_ner_tags'], annotations['pos_tags'], annotations['dependencies']))

df = pd.DataFrame(rows, columns = ['tokens','clinical_ner_tags','posology_ner_tags','POS_tags','dependencies'])

df.head(20)

Unnamed: 0,tokens,clinical_ner_tags,posology_ner_tags,POS_tags,dependencies
0,A,O,O,DD,female
1,28-year-old,O,O,NN,female
2,female,O,O,NN,ROOT
3,with,O,O,II,history
4,a,O,O,DD,history
5,history,O,O,NN,female
6,of,O,O,II,history
7,gestational,B-PROBLEM,O,JJ,of
8,diabetes,I-PROBLEM,O,NN,mellitus
9,mellitus,I-PROBLEM,O,NN,gestational


In [21]:
text = 'Patient has a headache for the last 2 weeks and appears anxious when she walks fast. No alopecia noted. She denies pain'

result = pipeline.fullAnnotate(text)[0]

chunks=[]
entities=[]
status=[]
begin=[]
end=[]
confidence=[]

for n,m in zip(result['clinical_ner_chunks'],result['assertion']):

    chunks.append(n.result)
    begin.append(n.begin)
    end.append(n.end)
    entities.append(n.metadata['entity'])
    status.append(m.result)
    confidence.append(m.metadata['confidence'])

df = pd.DataFrame({'chunks':chunks,'begin':begin, 'end':end, 'entities':entities, 'assertion':status, 'confidence':confidence})

df

Unnamed: 0,chunks,begin,end,entities,assertion,confidence
0,a headache,12,21,PROBLEM,present,0.9992
1,anxious,56,62,PROBLEM,present,0.8782
2,alopecia,88,95,PROBLEM,absent,0.9992
3,pain,115,118,PROBLEM,absent,0.9238


In [22]:
text = """
The patient was prescribed 1 unit of Advil for 5 days after meals. The patient was also
given 1 unit of Metformin daily.
He was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night ,
12 units of insulin lispro with meals , and metformin 1000 mg two times a day.
"""

result = pipeline.fullAnnotate(text)[0]

chunks=[]
entities=[]
begins=[]
ends=[]

for n in result['posology_ner_chunks']:

    chunks.append(n.result)
    begins.append(n.begin)
    ends.append(n.end)
    entities.append(n.metadata['entity'])

df = pd.DataFrame({'chunks':chunks, 'begin':begins, 'end':ends, 'entities':entities})

df

Unnamed: 0,chunks,begin,end,entities
0,1 unit,28,33,DOSAGE
1,Advil,38,42,DRUG
2,for 5 days,44,53,DURATION
3,1 unit,95,100,DOSAGE
4,Metformin,105,113,DRUG
5,daily,115,119,FREQUENCY
6,40 units,189,196,DOSAGE
7,insulin glargine,201,216,DRUG
8,at night,218,225,FREQUENCY
9,12 units,229,236,DOSAGE


### explain_clinical_doc_oncology

> This pipeline includes Named-Entity Recognition, Assertion Status and Relation Extraction models to extract information from oncology texts.


- Clinical Entity Labels: `Adenopathy`, `Age`, `Biomarker`,`Biomarker_Result`, `Cancer_Dx`, `Cancer_Score` ,`Cancer_Surgery`, `Chemotherapy`, `Cycle_Count` ,`Cycle_Day`, `Cycle_Number`, `Date` ,`Death_Entity`, `Direction`, `Dosage` ,`Duration`, `Frequency`, `Gender` ,`Grade`, `Histological_Type`, `Hormonal_Therapy` ,`Imaging_Test`, `Immunotherapy`, `Invasion` ,`Line_Of_Therapy`, `Metastasis`, `Oncogene` ,`PROBLEM`, `Pathology_Result`, `Pathology_Test` ,`PROBLEM`, `Performance_Status`, `Race_Ethnicity` ,`Radiotherapy`, `Response_To_Treatment`, `Relative_Date` ,`Route`, `Site_Bone`, `Site_Brain` ,`Site_Breast`, `Site_Liver`, `Site_Lung` ,`Site_Lymph_Node`, `Site_Other_Body_Part`, `Smoking_Status` ,`Staging`, `Targeted_Therapy`, `Tumor_Finding` ,`Tumor_Size`, `Unspecific_Therapy`, `Radiation_Dose` ,`Anatomical_Site`, `Cancer_Therapy`, `Size_Trend` ,`Lymph_Node`, `Tumor_Description`,`Lymph_Node_Modifier`, `Posology_Information`, `Oncological`,`Weight`,`Alcohol`,`Communicable_Disease`,`BMI`,`Obesity`,`Diabetes`

- Assertion Status Labels: `Present`, `Absent`, `Possible`, `Past`, `Family`, `Hypotetical`

- Relation Extraction Labels: `is_size_of`, `is_finding_of`, `is_date_of`, `is_location_of`


In [23]:
ner_pipeline = PretrainedPipeline("explain_clinical_doc_oncology", "en", "clinical/models")

explain_clinical_doc_oncology download started this may take some time.
Approx size to download 1.8 GB
[OK!]


In [24]:
ner_pipeline.model.stages

[DocumentAssembler_c87f754f30a5,
 SentenceDetectorDLModel_6bafc4746ea5,
 REGEX_TOKENIZER_99be4a04da74,
 WORD_EMBEDDINGS_MODEL_9004b1d00302,
 MedicalNerModel_8c59079bd37d,
 NER_CONVERTER_b4412c365ed7,
 MedicalNerModel_f17b93cea1f2,
 NER_CONVERTER_257714ea4f94,
 MedicalNerModel_58233645f160,
 NER_CONVERTER_d681d37ca98d,
 MedicalNerModel_185d3fbf617b,
 NER_CONVERTER_d374ed043a37,
 MedicalNerModel_ceff3f64ab20,
 NER_CONVERTER_2b6456f7a65e,
 MedicalNerModel_003a41401784,
 NER_CONVERTER_0cba989aacd1,
 MedicalNerModel_c89cbceb1028,
 NER_CONVERTER_4392e0c2b05e,
 MedicalNerModel_d674613cb476,
 NER_CONVERTER_cb00a2bafc28,
 MedicalNerModel_fd1b709ca66a,
 NER_CONVERTER_580e14b464d0,
 MedicalNerModel_299a97740594,
 NER_CONVERTER_bcd609854cb1,
 ENTITY_EXTRACTOR_580b61c0ff9d,
 ENTITY_EXTRACTOR_97d5ccc4aacb,
 MERGE_f8bd00c11215,
 MERGE_e15189677f3c,
 ASSERTION_DL_d9d32f5f411d,
 POS_6f55785005bf,
 dependency_e7755462ba78,
 RelationExtractionModel_68ebe11369b6,
 PosologyREModel_583a49a5b62b,
 Annotation

In [25]:
text ="""The Patient underwent a computed tomography (CT) scan of the abdomen and pelvis,
which showed a complex ovarian mass. A Pap smear performed one month later was positive for
atypical glandular cells suspicious for adenocarcinoma. The pathologic specimen showed extension
of the tumor throughout the fallopian tubes, appendix, omentum, and 5 out of 5 enlarged lymph nodes.
The final pathologic diagnosis of the tumor was stage IIIC papillary serous ovarian adenocarcinoma.
Two months later, the patient was diagnosed with lung metastases.
"""

result = ner_pipeline.fullAnnotate(text)[0]


In [26]:
result.keys()

dict_keys(['entity_biomarker', 'ner_cancer_type_chunk', 'ner_oncology_unspecific_posology_chunk', 'ner_oncology_anatomy_general_chunk', 'ner_biomarker_chunk', 'ner_oncology', 'document', 're_posology_granular', 'ner_biomarker_langtest_chunk', 'merged_chunk', 'entity_cancer_dx', 'ner_biomarker', 'merged_chunk_for_assertion', 're_oncology_granular', 'ner_biomarker_langtest', 'ner_oncology_anatomy_general', 'assertion', 'all_relations', 'ner_posology_chunk', 'ner_oncology_response_to_treatment', 'ner_oncology_chunk', 'token', 'ner_oncology_tnm_chunk', 'ner_jsl_chunk', 'embeddings', 'ner_oncology_tnm', 'pos_tags', 'ner_jsl', 'dependencies', 'ner_oncology_unspecific_posology', 'ner_posology', 'ner_oncology_response_to_treatment_chunk', 'sentence', 'ner_cancer_type'])

In [27]:
import pandas as pd

chunks=[]
entities=[]
begins=[]
ends=[]

for n in result['merged_chunk']:

    chunks.append(n.result)
    begins.append(n.begin)
    ends.append(n.end)
    entities.append(n.metadata['entity'])

df = pd.DataFrame({'chunks':chunks, 'begin':begins, 'end':ends, 'entities':entities})

df

Unnamed: 0,chunks,begin,end,entities
0,computed tomography,24,42,Imaging_Test
1,CT,45,46,Imaging_Test
2,abdomen,61,67,Site_Other_Body_Part
3,pelvis,73,78,Site_Other_Body_Part
4,ovarian,104,110,Site_Other_Body_Part
5,mass,112,115,Tumor_Finding
6,Pap smear,120,128,Pathology_Test
7,one month later,140,154,Relative_Date
8,atypical glandular cells,173,196,Pathology_Result
9,adenocarcinoma,213,226,Cancer_Dx


In [28]:
chunks=[]
entities=[]
status=[]

for n,m in zip(result['merged_chunk_for_assertion'],result['assertion']):

    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    status.append(m.result)

df = pd.DataFrame({'chunks':chunks, 'entities':entities, 'assertion':status})

df

Unnamed: 0,chunks,entities,assertion
0,computed tomography,Imaging_Test,Past
1,CT,Imaging_Test,Present
2,mass,Tumor_Finding,Present
3,Pap smear,Pathology_Test,Past
4,atypical glandular cells,Pathology_Result,Present
5,adenocarcinoma,Cancer_Dx,Possible
6,pathologic specimen,Pathology_Test,Past
7,extension,Invasion,Present
8,tumor,Tumor_Finding,Present
9,enlarged,Lymph_Node_Modifier,Present


In [29]:
import pandas as pd

def get_relations_df (results, col='relations'):
  rel_pairs=[]
  for rel in results[0][col]:
      rel_pairs.append((
          rel.result,
          rel.metadata['entity1'],
          rel.metadata['entity1_begin'],
          rel.metadata['entity1_end'],
          rel.metadata['chunk1'],
          rel.metadata['entity2'],
          rel.metadata['entity2_begin'],
          rel.metadata['entity2_end'],
          rel.metadata['chunk2'],
          rel.metadata['confidence']
      ))

  rel_df = pd.DataFrame(rel_pairs, columns=['relation','entity1','entity1_begin','entity1_end','chunk1','entity2','entity2_begin','entity2_end','chunk2', 'confidence'])

  rel_df.confidence = rel_df.confidence.astype(float)

  return rel_df

In [30]:
annotations = ner_pipeline.fullAnnotate(text)

rel_df = get_relations_df (annotations, 'all_relations')

rel_df[rel_df.relation!="O"]

Unnamed: 0,relation,entity1,entity1_begin,entity1_end,chunk1,entity2,entity2_begin,entity2_end,chunk2,confidence
2,is_location_of,Site_Other_Body_Part,104,110,ovarian,Tumor_Finding,112,115,mass,0.922661
3,is_finding_of,Pathology_Test,120,128,Pap smear,Cancer_Dx,213,226,adenocarcinoma,0.525421
4,is_location_of,Tumor_Finding,277,281,tumor,Site_Other_Body_Part,298,312,fallopian tubes,0.90263
5,is_location_of,Tumor_Finding,277,281,tumor,Site_Other_Body_Part,315,322,appendix,0.664927


In [31]:
annotations[0]['all_relations']

[Annotation(category, 61, 115, O, {'chunk2': 'mass', 'confidence': '0.9439166', 'entity2_end': '115', 'chunk1': 'abdomen', 'entity1': 'Site_Other_Body_Part', 'entity2_begin': '112', 'chunk2_confidence': '0.9557', 'entity1_begin': '61', 'sentence': '0', 'direction': 'both', 'entity1_end': '67', 'entity2': 'Tumor_Finding', 'chunk1_confidence': '0.9446'}, []),
 Annotation(category, 73, 115, O, {'chunk2': 'mass', 'confidence': '0.9611397', 'entity2_end': '115', 'chunk1': 'pelvis', 'entity1': 'Site_Other_Body_Part', 'entity2_begin': '112', 'chunk2_confidence': '0.9557', 'entity1_begin': '73', 'sentence': '0', 'direction': 'both', 'entity1_end': '78', 'entity2': 'Tumor_Finding', 'chunk1_confidence': '0.6514'}, []),
 Annotation(category, 104, 115, is_location_of, {'chunk2': 'mass', 'confidence': '0.922661', 'entity2_end': '115', 'chunk1': 'ovarian', 'entity1': 'Site_Other_Body_Part', 'entity2_begin': '112', 'chunk2_confidence': '0.9557', 'entity1_begin': '104', 'sentence': '0', 'direction': '

### explain_clinical_doc_ade

A pipeline for `Adverse Drug Events (ADE)` with `ner_ade_healthcare`, and `classifierdl_ade_biobert`. It will extract `ADE` and `DRUG` clinical entities, and then assign ADE status to a text(`True` means ADE, `False` means not related to ADE). Also extracts relations between `DRUG` and `ADE` entities (`1` means the adverse event and drug entities are related, `0` is not related).

In [32]:
ade_pipeline = PretrainedPipeline("explain_clinical_doc_ade", 'en', 'clinical/models')

explain_clinical_doc_ade download started this may take some time.
Approx size to download 2 GB
[OK!]


In [33]:
result = ade_pipeline.fullAnnotate("The main adverse effects of Leflunomide consist of diarrhea, nausea, liver enzyme elevation, hypertension, alopecia, and allergic skin reactions.")

result[0].keys()

dict_keys(['document', 'assertion', 'drug_ner_chunk', 'ner_posology_chunk', 'token', 'ner_chunks_ade', 'relations', 'ade_clinica_ner', 'ade_mappings', 'class', 'embeddings', 'pos_tags', 'dependencies', 'ner_posology', 'sentence', 'ner_ade_clinical_chunk', 'matcher_chunk'])

In [34]:
result[0]['class'][0].metadata

{'sentence': '0', 'Some(ADE)': '0.9999942', 'Some(noADE)': '5.7812E-6'}

**NER Results**

In [35]:
text = """Been taking Lipitor for 3 months, have experienced severe fatigue a lot!!! ,
I have only experienced cramps so far, after Doctor moved me to voltaren 2 months ago."""

import pandas as pd

chunks = []
entities = []
begin =[]
end = []

print ('sentence:', text)
print()

result = ade_pipeline.fullAnnotate(text)

print ('ADE status:', result[0]['class'][0].result)

print ('prediction probability>> True : ', result[0]['class'][0].metadata['True'], \
        'False: ', result[0]['class'][0].metadata['False'])

for n in result[0]['ner_chunks_ade']:

    begin.append(n.begin)
    end.append(n.end)
    chunks.append(n.result)
    entities.append(n.metadata['entity'])

df = pd.DataFrame({'chunks':chunks, 'entities':entities,
                'begin': begin, 'end': end})

df


sentence: Been taking Lipitor for 3 months, have experienced severe fatigue a lot!!! ,
I have only experienced cramps so far, after Doctor moved me to voltaren 2 months ago.

ADE status: ADE
prediction probability>> True :  None False:  None


Unnamed: 0,chunks,entities,begin,end
0,Lipitor,DRUG,12,18
1,severe fatigue,ADE,51,64
2,cramps,ADE,101,106
3,voltaren,DRUG,141,148


**Assertion Status Results**

In [36]:
import pandas as pd

text = """The side effects of 5-FU in a colon cancer patient who suffered severe mucositis,
desquamating dermatitis and prolonged myelosuppression. Last week the patient experienced anterior
lumbosacral radiculopathy and blurred vision after intrathecal methotrexate treatment."""

print (text)

light_result = ade_pipeline.fullAnnotate(text)[0]

chunks=[]
entities=[]
status=[]

for n,m in zip(light_result['ner_chunks_ade'],light_result['assertion']):

    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    status.append(m.result)

df = pd.DataFrame({'chunks':chunks, 'entities':entities, 'assertion':status})

df

The side effects of 5-FU in a colon cancer patient who suffered severe mucositis,
desquamating dermatitis and prolonged myelosuppression. Last week the patient experienced anterior
lumbosacral radiculopathy and blurred vision after intrathecal methotrexate treatment.


Unnamed: 0,chunks,entities,assertion
0,5-FU,DRUG,Past
1,severe mucositis,ADE,Past
2,desquamating dermatitis,ADE,Past
3,myelosuppression,ADE,Past
4,anterior\nlumbosacral radiculopathy,ADE,Past
5,blurred vision,ADE,Past
6,methotrexate,DRUG,Past


**Relation Extraction Results**

In [37]:
import pandas as pd

text = """Been taking Lipitor for 3 months, have experienced severe fatigue a lot!!! ,
I have only experienced cramps so far, after Doctor moved me to voltaren 2 months ago."""

print (text)

results = ade_pipeline.fullAnnotate(text)

rel_pairs=[]

for rel in results[0]["relations"]:
    rel_pairs.append((
        rel.result,
        rel.metadata['entity1'],
        rel.metadata['entity1_begin'],
        rel.metadata['entity1_end'],
        rel.metadata['chunk1'],
        rel.metadata['entity2'],
        rel.metadata['entity2_begin'],
        rel.metadata['entity2_end'],
        rel.metadata['chunk2'],
        rel.metadata['confidence']
    ))

rel_df = pd.DataFrame(rel_pairs, columns=['relation','entity1','entity1_begin','entity1_end','chunk1','entity2','entity2_begin','entity2_end','chunk2', 'confidence'])
rel_df

Been taking Lipitor for 3 months, have experienced severe fatigue a lot!!! ,
I have only experienced cramps so far, after Doctor moved me to voltaren 2 months ago.


Unnamed: 0,relation,entity1,entity1_begin,entity1_end,chunk1,entity2,entity2_begin,entity2_end,chunk2,confidence
0,1,DRUG,12,18,Lipitor,ADE,51,64,severe fatigue,1.0
1,0,ADE,101,106,cramps,DRUG,141,148,voltaren,0.81595194


### exlain_clinical_doc_medication

> A pipeline for detecting posology entities with the `ner_posology_large` NER model, assigning their assertion status with `assertion_jsl` model, and extracting relations between posology-related terminology with `posology_re` relation extraction model.

In [38]:
medication_pipeline = PretrainedPipeline('explain_clinical_doc_medication', 'en', 'clinical/models')

explain_clinical_doc_medication download started this may take some time.
Approx size to download 1.6 GB
[OK!]


In [39]:
medication_pipeline.model.stages

[DocumentAssembler_54400872e595,
 SentenceDetectorDLModel_6bafc4746ea5,
 REGEX_TOKENIZER_ca584bb84cf2,
 WORD_EMBEDDINGS_MODEL_9004b1d00302,
 MedicalNerModel_5e6f59103f25,
 NerConverter_a3da9eede8e3,
 NER_CONVERTER_9c72525c1035,
 ASSERTION_DL_8d77f383c928,
 POS_6f55785005bf,
 dependency_e7755462ba78,
 PosologyREModel_d3003d90160f]

In [40]:
text = """The patient is a 30-year-old female with an insulin dependent diabetes, type 2. She received a course of Bactrim for 14 days for UTI.
She was prescribed  5000 units of Fragmin subcutaneously daily, and with  prescribed Lantus  40 units  subcutaneously at bedtime."""

result = medication_pipeline.fullAnnotate(text)[0]

In [41]:
result.keys()

dict_keys(['assertion_ner_chunk', 'document', 'assertion', 'ner_posology_chunk', 'token', 'relations', 'embeddings_clinical', 'pos_tags', 'dependencies', 'ner_posology', 'sentence'])

In [42]:
import pandas as pd

In [43]:
chunks=[]
entities=[]
begins=[]
ends=[]

for n in result['ner_posology_chunk']:

    chunks.append(n.result)
    begins.append(n.begin)
    ends.append(n.end)
    entities.append(n.metadata['entity'])

df = pd.DataFrame({'chunks':chunks, 'begin':begins, 'end':ends, 'entities':entities})

df

Unnamed: 0,chunks,begin,end,entities
0,insulin,44,50,DRUG
1,Bactrim,105,111,DRUG
2,for 14 days,113,123,DURATION
3,5000 units,154,163,DOSAGE
4,Fragmin,168,174,DRUG
5,subcutaneously,176,189,ROUTE
6,daily,191,195,FREQUENCY
7,Lantus,219,224,DRUG
8,40 units,227,234,DOSAGE
9,subcutaneously,237,250,ROUTE


In [44]:
chunks=[]
entities=[]
status=[]

for n,m in zip(result['assertion_ner_chunk'],result['assertion']):

    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    status.append(m.result)

df = pd.DataFrame({'chunks':chunks, 'entities':entities, 'assertion':status})

df

Unnamed: 0,chunks,entities,assertion
0,insulin,DRUG,Present
1,Bactrim,DRUG,Past
2,Fragmin,DRUG,Present
3,Lantus,DRUG,Present


In [45]:
annotations = medication_pipeline.fullAnnotate(text)

rel_df = get_relations_df(annotations, 'relations')

rel_df

Unnamed: 0,relation,entity1,entity1_begin,entity1_end,chunk1,entity2,entity2_begin,entity2_end,chunk2,confidence
0,DRUG-DURATION,DRUG,105,111,Bactrim,DURATION,113,123,for 14 days,1.0
1,DOSAGE-DRUG,DOSAGE,154,163,5000 units,DRUG,168,174,Fragmin,1.0
2,DRUG-ROUTE,DRUG,168,174,Fragmin,ROUTE,176,189,subcutaneously,1.0
3,DRUG-FREQUENCY,DRUG,168,174,Fragmin,FREQUENCY,191,195,daily,1.0
4,DRUG-DOSAGE,DRUG,219,224,Lantus,DOSAGE,227,234,40 units,1.0
5,DRUG-ROUTE,DRUG,219,224,Lantus,ROUTE,237,250,subcutaneously,1.0
6,DRUG-FREQUENCY,DRUG,219,224,Lantus,FREQUENCY,252,261,at bedtime,1.0


In [46]:
annotations[0]['relations']

[Annotation(category, 105, 123, DRUG-DURATION, {'chunk2': 'for 14 days', 'confidence': '1.0', 'entity2_end': '123', 'chunk1': 'Bactrim', 'entity1': 'DRUG', 'entity2_begin': '113', 'chunk2_confidence': '0.79349995', 'entity1_begin': '105', 'sentence': '1', 'direction': 'both', 'entity1_end': '111', 'entity2': 'DURATION', 'chunk1_confidence': '0.9994'}, []),
 Annotation(category, 154, 174, DOSAGE-DRUG, {'chunk2': 'Fragmin', 'confidence': '1.0', 'entity2_end': '174', 'chunk1': '5000 units', 'entity1': 'DOSAGE', 'entity2_begin': '168', 'chunk2_confidence': '0.9996', 'entity1_begin': '154', 'sentence': '2', 'direction': 'both', 'entity1_end': '163', 'entity2': 'DRUG', 'chunk1_confidence': '0.7849'}, []),
 Annotation(category, 168, 189, DRUG-ROUTE, {'chunk2': 'subcutaneously', 'confidence': '1.0', 'entity2_end': '189', 'chunk1': 'Fragmin', 'entity1': 'DRUG', 'entity2_begin': '176', 'chunk2_confidence': '0.9993', 'entity1_begin': '168', 'sentence': '2', 'direction': 'both', 'entity1_end': '17

### explain_clinical_doc_radiology

> A pipeline for detecting radiology entities with the `ner_radiology` NER model, assigning their assertion status with `assertion_dl_radiology` model, and extracting relations between the diagnosis, test, and findings with `re_test_problem_finding` relation extraction model.

In [47]:
radiology_pipeline = PretrainedPipeline('explain_clinical_doc_radiology', 'en', 'clinical/models')

explain_clinical_doc_radiology download started this may take some time.
Approx size to download 1.7 GB
[OK!]


In [48]:
radiology_pipeline.model.stages

[DocumentAssembler_1e251400f05b,
 SentenceDetectorDLModel_6bafc4746ea5,
 REGEX_TOKENIZER_5350385cdb03,
 WORD_EMBEDDINGS_MODEL_9004b1d00302,
 MedicalNerModel_f7f58f2addf7,
 NER_CONVERTER_e89388e2f0fe,
 MedicalNerModel_32430beeafc0,
 NER_CONVERTER_e40bc4137e64,
 MedicalNerModel_5e6f59103f25,
 NER_CONVERTER_10c6f1c1f982,
 MedicalNerModel_8c59079bd37d,
 NER_CONVERTER_c69535539c93,
 MedicalNerModel_c89cbceb1028,
 NER_CONVERTER_950639e22149,
 MERGE_36b985d254fe,
 MERGE_f804cc8df33b,
 ASSERTION_DL_50d2c028e615,
 POS_6f55785005bf,
 dependency_e7755462ba78,
 RelationExtractionModel_853993778cd5]

In [49]:
text = """Bilateral breast ultrasound was subsequently performed, which demonstrated an ovoid mass measuring approximately 0.5 x 0.5 x 0.4 cm in diameter located within the anteromedial aspect of the left shoulder.
This mass demonstrates isoechoic echotexture to the adjacent muscle, with no evidence of internal color flow.
This may represent benign fibrous tissue or a lipoma."""

result = radiology_pipeline.fullAnnotate(text)[0]

In [50]:
result.keys()

dict_keys(['radiology_ner_chunk', 'ner_chexpert_chunk', 'posology_ner', 'ner_oncology', 'document', 'ner_chexpert', 'merged_chunk', 'posology_ner_chunk', 'merged_chunk_for_assertion', 'jsl_ner_chunk', 'radiology_ner', 'assertion', 'jsl_ner', 'ner_oncology_chunk', 'token', 'relations', 'embeddings', 'pos_tags', 'dependencies', 'sentence'])

In [51]:
chunks=[]
entities=[]
begins=[]
ends=[]

for n in result['radiology_ner_chunk']:

    chunks.append(n.result)
    begins.append(n.begin)
    ends.append(n.end)
    entities.append(n.metadata['entity'])

df = pd.DataFrame({'chunks':chunks, 'begin':begins, 'end':ends, 'entities':entities})

df

Unnamed: 0,chunks,begin,end,entities
0,Bilateral breast,0,15,BodyPart
1,ultrasound,17,26,Imaging_Test
2,ovoid mass,78,87,ImagingFindings
3,0.5 x 0.5 x 0.4,113,127,Measurements
4,cm,129,130,Units
5,anteromedial aspect of the left shoulder,163,202,BodyPart
6,mass,210,213,ImagingFindings
7,isoechoic echotexture,228,248,ImagingFindings
8,muscle,266,271,BodyPart
9,internal color flow,294,312,ImagingFindings


In [52]:
chunks=[]
entities=[]
status=[]

for n,m in zip(result['merged_chunk_for_assertion'],result['assertion']):

    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    status.append(m.result)

df = pd.DataFrame({'chunks':chunks, 'entities':entities, 'assertion':status})

df

Unnamed: 0,chunks,entities,assertion
0,ovoid mass,ImagingFindings,Confirmed
1,mass,ImagingFindings,Confirmed
2,isoechoic echotexture,ImagingFindings,Confirmed
3,internal color flow,ImagingFindings,Negative
4,benign fibrous tissue,ImagingFindings,Suspected
5,lipoma,Disease_Syndrome_Disorder,Suspected


In [53]:
annotations = radiology_pipeline.fullAnnotate(text)

rel_df = get_relations_df(annotations, 'relations')

rel_df

Unnamed: 0,relation,entity1,entity1_begin,entity1_end,chunk1,entity2,entity2_begin,entity2_end,chunk2,confidence
0,is_related,BodyPart,0,15,Bilateral breast,Imaging_Test,17,26,ultrasound,1.0
1,is_related,BodyPart,0,15,Bilateral breast,ImagingFindings,78,87,ovoid mass,0.999997
2,is_related,Imaging_Test,17,26,ultrasound,ImagingFindings,78,87,ovoid mass,0.999569
3,is_related,ImagingFindings,78,87,ovoid mass,Measurements,113,130,0.5 x 0.5 x 0.4 cm,1.0
4,is_related,ImagingFindings,210,213,mass,BodyPart,257,271,adjacent muscle,0.997639
5,is_related,ImagingFindings,228,248,isoechoic echotexture,BodyPart,257,271,adjacent muscle,0.999999


In [54]:
annotations[0]['relations']

[Annotation(category, 0, 26, is_related, {'chunk2': 'ultrasound', 'confidence': '1.0', 'entity2_end': '26', 'chunk1': 'Bilateral breast', 'entity1': 'BodyPart', 'entity2_begin': '17', 'chunk2_confidence': '0.6734', 'entity1_begin': '0', 'sentence': '0', 'direction': 'both', 'entity1_end': '15', 'entity2': 'Imaging_Test', 'chunk1_confidence': '0.945'}, []),
 Annotation(category, 0, 87, is_related, {'chunk2': 'ovoid mass', 'confidence': '0.99999714', 'entity2_end': '87', 'chunk1': 'Bilateral breast', 'entity1': 'BodyPart', 'entity2_begin': '78', 'chunk2_confidence': '0.6095', 'entity1_begin': '0', 'sentence': '0', 'direction': 'both', 'entity1_end': '15', 'entity2': 'ImagingFindings', 'chunk1_confidence': '0.945'}, []),
 Annotation(category, 17, 87, is_related, {'chunk2': 'ovoid mass', 'confidence': '0.99956936', 'entity2_end': '87', 'chunk1': 'ultrasound', 'entity1': 'Imaging_Test', 'entity2_begin': '78', 'chunk2_confidence': '0.6095', 'entity1_begin': '17', 'sentence': '0', 'direction'

### Clinical Deidentification

This pipeline can be used to deidentify PHI information from medical texts. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate `AGE`, `CONTACT`, `DATE`, `ID`, `LOCATION`, `NAME`, `PROFESSION`, `CITY`, `COUNTRY`, `DOCTOR`, `HOSPITAL`, `IDNUM`, `MEDICALRECORD`, `ORGANIZATION`, `PATIENT`, `PHONE`, `PROFESSION`, `STREET`, `USERNAME`, `ZIP`, `ACCOUNT`, `LICENSE`, `VIN`, `SSN`, `DLN`, `PLATE`, `IPADDR` entities.

|index|model|lang|
|-----:|:-----|----|
| 1 | [clinical_deidentification](https://nlp.johnsnowlabs.com/2022/09/14/clinical_deidentification_en.html) | ar, de, en, es, fr, it, pt, ro |
| 2 | [clinical_deidentification_augmented](https://nlp.johnsnowlabs.com/2022/03/03/clinical_deidentification_augmented_es_2_4.html) | es |
| 3 | [clinical_deidentification_docwise_benchmark](https://nlp.johnsnowlabs.com/2025/01/16/clinical_deidentification_docwise_benchmark_en.html) | en |
| 4 | [clinical_deidentification_docwise_benchmark_large](https://nlp.johnsnowlabs.com/2025/07/25/clinical_deidentification_docwise_benchmark_large_en.html) | en |
| 5 | [clinical_deidentification_docwise_benchmark_light](https://nlp.johnsnowlabs.com/2025/06/10/clinical_deidentification_docwise_benchmark_light_en.html) | en |
| 6 | [clinical_deidentification_docwise_benchmark_light_v2](https://nlp.johnsnowlabs.com/2025/06/11/clinical_deidentification_docwise_benchmark_light_v2_en.html) | en |
| 7 | [clinical_deidentification_docwise_benchmark_medium](https://nlp.johnsnowlabs.com/2025/07/31/clinical_deidentification_docwise_benchmark_medium_en.html) | en |
| 8 | [clinical_deidentification_docwise_benchmark_optimized](https://nlp.johnsnowlabs.com/2025/06/19/clinical_deidentification_docwise_benchmark_optimized_en.html) | en |
| 9 | [clinical_deidentification_docwise_large_wip](https://nlp.johnsnowlabs.com/2024/11/29/clinical_deidentification_docwise_large_wip_de.html) | de |
| 10 | [clinical_deidentification_docwise_medium_wip](https://nlp.johnsnowlabs.com/2024/12/03/clinical_deidentification_docwise_medium_wip_en.html) | en |
| 11 | [clinical_deidentification_docwise_wip](https://nlp.johnsnowlabs.com/2024/11/29/clinical_deidentification_docwise_wip_de.html) | de |
| 12 | [clinical_deidentification_generic](https://nlp.johnsnowlabs.com/2024/02/21/clinical_deidentification_generic_en.html) | en |
| 13 | [clinical_deidentification_generic_optimized](https://nlp.johnsnowlabs.com/2024/03/14/clinical_deidentification_generic_optimized_en.html) | en |
| 14 | [clinical_deidentification_glove](https://nlp.johnsnowlabs.com/2022/03/04/clinical_deidentification_glove_en_3_0.html) | en |
| 15 | [clinical_deidentification_glove_augmented](https://nlp.johnsnowlabs.com/2022/09/16/clinical_deidentification_glove_augmented_en.html) | en |
| 16 | [clinical_deidentification_langtest](https://nlp.johnsnowlabs.com/2024/01/10/clinical_deidentification_langtest_en.html) | en |
| 17 | [clinical_deidentification_light](https://nlp.johnsnowlabs.com/2025/01/06/clinical_deidentification_light_en.html) | en |
| 18 | [clinical_deidentification_multi_mode_output](https://nlp.johnsnowlabs.com/2024/03/27/clinical_deidentification_multi_mode_output_en.html) | en |
| 19 | [clinical_deidentification_nameAugmented_docwise](https://nlp.johnsnowlabs.com/2025/03/14/clinical_deidentification_nameAugmented_docwise_en.html) | en |
| 20 | [clinical_deidentification_nameAugmented_v2](https://nlp.johnsnowlabs.com/2024/10/03/clinical_deidentification_nameAugmented_v2_en.html) | en |
| 21 | [clinical_deidentification_nameAugmented_v3](https://nlp.johnsnowlabs.com/2025/03/13/clinical_deidentification_nameAugmented_v3_en.html) | en |
| 22 | [clinical_deidentification_obfuscation_medium](https://nlp.johnsnowlabs.com/2024/02/09/clinical_deidentification_obfuscation_medium_en.html) | en |
| 23 | [clinical_deidentification_obfuscation_small](https://nlp.johnsnowlabs.com/2024/02/09/clinical_deidentification_obfuscation_small_en.html) | en |
| 24 | [clinical_deidentification_slim](https://nlp.johnsnowlabs.com/2023/06/17/clinical_deidentification_slim_en.html) | en |
| 25 | [clinical_deidentification_subentity](https://nlp.johnsnowlabs.com/2024/02/21/clinical_deidentification_subentity_en.html) | en |
| 26 | [clinical_deidentification_subentity_enriched_ar](https://nlp.johnsnowlabs.com/2025/03/13/clinical_deidentification_subentity_enriched_ar.html) | ar |
| 27 | [clinical_deidentification_subentity_nameAugmented](https://nlp.johnsnowlabs.com/2024/03/14/clinical_deidentification_subentity_nameAugmented_en.html) | en |
| 28 | [clinical_deidentification_subentity_optimized](https://nlp.johnsnowlabs.com/2024/03/14/clinical_deidentification_subentity_optimized_en.html) | en |
| 29 | [clinical_deidentification_v2_wip](https://nlp.johnsnowlabs.com/2024/10/03/clinical_deidentification_v2_wip_en.html) | en |
| 30 | [clinical_deidentification_wip](https://nlp.johnsnowlabs.com/2023/06/17/clinical_deidentification_wip_en.html) | en |
| 31 | [clinical_deidentification_zeroshot_large](https://nlp.johnsnowlabs.com/2024/12/04/clinical_deidentification_zeroshot_large_en.html) | en |
| 32 | [clinical_deidentification_zeroshot_medium](https://nlp.johnsnowlabs.com/2024/12/04/clinical_deidentification_zeroshot_medium_en.html) | en |

You can find **`German`, `Spanish`, `French`, `Italian`, `Portuguese`, `Romanian`**  and **`Arabic`**  deidentification models and pretrained pipeline examples in this notebook:   [Clinical Multi Language Deidentification Notebook](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/4.1.Clinical_Multi_Language_Deidentification.ipynb)


We have also NER pipelines:

| index | model | lang |
|------:|:------|:-----|
| 1  | [ner_deid_augmented_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_deid_augmented_pipeline_en_3_0.html) | en |
| 2  | [ner_deid_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_deid_biobert_pipeline_en_3_0.html) | en |
| 3  | [ner_deid_enriched_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_deid_enriched_biobert_pipeline_en_3_0.html) | en |
| 4  | [ner_deid_enriched_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_deid_enriched_pipeline_en_3_0.html) | en |
| 5  | [ner_deid_generic_augmented_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_deid_generic_augmented_pipeline_en_3_0.html) | en |
| 6  | [ner_deid_generic_bert_pipeline](https://nlp.johnsnowlabs.com/2023/03/09/ner_deid_generic_bert_pipeline_ro.html) | ro |
| 7  | [ner_deid_generic_glove_pipeline](https://nlp.johnsnowlabs.com/2023/03/13/ner_deid_generic_glove_pipeline_en.html) | en |
| 8  | [ner_deid_generic_pipeline](https://nlp.johnsnowlabs.com/2023/05/31/ner_deid_generic_pipeline_ar.html) | ar, de, it, ro |
| 9  | [ner_deid_large_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_deid_large_pipeline_en_3_0.html) | en |
| 10 | [ner_deid_sd_large_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_deid_sd_large_pipeline_en_3_0.html) | en |
| 11 | [ner_deid_sd_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_deid_sd_pipeline_en_3_0.html) | en |
| 12 | [ner_deid_subentity_augmented_i2b2_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_deid_subentity_augmented_i2b2_pipeline_en_3_0.html) | en |
| 13 | [ner_deid_subentity_augmented_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_deid_subentity_augmented_pipeline_en_3_0.html) | en |
| 14 | [ner_deid_subentity_bert_pipeline](https://nlp.johnsnowlabs.com/2023/03/09/ner_deid_subentity_bert_pipeline_ro.html) | ro |
| 15 | [ner_deid_subentity_glove_pipeline](https://nlp.johnsnowlabs.com/2023/03/13/ner_deid_subentity_glove_pipeline_en.html) | en |
| 16 | [ner_deid_subentity_pipeline](https://nlp.johnsnowlabs.com/2023/05/31/ner_deid_subentity_pipeline_ar.html) | ar, de, it, ro |
| 17 | [ner_deid_synthetic_pipeline](https://nlp.johnsnowlabs.com/2023/03/13/ner_deid_synthetic_pipeline_en.html) | en |
| 18 | [ner_deidentify_dl_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_deidentify_dl_pipeline_en_3_0.html) | en |
| 19 | [zeroshot_ner_deid_subentity_docwise_large](https://nlp.johnsnowlabs.com/2024/11/29/zeroshot_ner_deid_subentity_docwise_large_en.html) | en |
| 20 | [zeroshot_ner_deid_generic_docwise_large](https://nlp.johnsnowlabs.com/2024/11/28/zeroshot_ner_deid_generic_docwise_large_en.html) | en |
| 21 | [zeroshot_ner_deid_generic_docwise_medium](https://nlp.johnsnowlabs.com/2024/11/28/zeroshot_ner_deid_generic_docwise_medium_en.html) | en |
| 22 | [zeroshot_ner_deid_subentity_docwise_medium](https://nlp.johnsnowlabs.com/2024/11/28/zeroshot_ner_deid_subentity_docwise_medium_en.html) | en |
| 23 | [zeroshot_ner_deid_subentity_merged_medium](https://nlp.johnsnowlabs.com/2024/11/27/zeroshot_ner_deid_subentity_merged_medium_en.html) | en |
| 24 | [zeroshot_ner_deid_generic_docwise_large](https://nlp.johnsnowlabs.com/2024/11/28/zeroshot_ner_deid_generic_docwise_large_de.html) | de |
| 25 | [ner_deid_docwise_benchmark_optimized](https://nlp.johnsnowlabs.com/2025/06/17/ner_deid_docwise_benchmark_optimized_en.html) | en |
| 26 | [ner_deid_docwise_benchmark_optimized_zeroshot_partial](https://nlp.johnsnowlabs.com/2025/06/17/ner_deid_docwise_benchmark_optimized_zeroshot_partial_en.html) | en |
| 27 | [ner_deid_nameAugmented_docwise_pipeline](https://nlp.johnsnowlabs.com/2025/03/25/ner_deid_nameAugmented_docwise_pipeline_en.html) | en |
| 28 | [ner_deid_nameAugmented_pipeline_v3](https://nlp.johnsnowlabs.com/2025/03/25/ner_deid_nameAugmented_pipeline_v3_en.html) | en |
| 29 | [ner_deid_subentity_docwise_augmented_pipeline_v2](https://nlp.johnsnowlabs.com/2025/03/24/ner_deid_subentity_docwise_augmented_pipeline_v2_en.html) | en |
| 30 | [ner_deid_generic_docwise](https://nlp.johnsnowlabs.com/2024/10/24/ner_deid_generic_docwise_de.html) | de |
| 31 | [ner_deid_subentity_langtest](https://nlp.johnsnowlabs.com/2024/10/24/ner_deid_subentity_langtest_de.html) | de |

In [55]:
deid_pipeline = PretrainedPipeline("clinical_deidentification_multi_mode_output", "en", "clinical/models")

clinical_deidentification_multi_mode_output download started this may take some time.
Approx size to download 1.6 GB
[OK!]


In [56]:
deid_res = deid_pipeline.annotate("Record date : 2093-01-13 , David Hale , M.D .  Name : Hendrickson , Ora MR 25 years-old . # 719435 Date : 01/13/93 . Signed by Oliveira Sander . Record date : 2079-11-09 . Cocke County Baptist Hospital . 0295 Keats Street. Phone 302-786-5227.")

In [57]:
deid_res.keys()

dict_keys(['masked', 'obfuscated', 'ner_chunk', 'masked_fixed_length_chars', 'sentence', 'masked_with_chars'])

In [58]:
pd.set_option("display.max_colwidth", 100)

df= pd.DataFrame(list(zip(deid_res["sentence"],
                           deid_res["masked"],
                           deid_res["masked_with_chars"],
                           deid_res["masked_fixed_length_chars"],
                          deid_res["obfuscated"])),
                 columns= ["Sentence", "masked","masked_with_chars","masked_fixed_length_chars","Obfuscated"])

df

Unnamed: 0,Sentence,masked,masked_with_chars,masked_fixed_length_chars,Obfuscated
0,"Record date : 2093-01-13 , David Hale , M.D .","Record date : <DATE> , <DOCTOR> , M.D .","Record date : [********] , [********] , M.D .","Record date : **** , **** , M.D .","Record date : 2093-02-02 , Chaney Hal , M.D ."
1,"Name : Hendrickson , Ora MR 25 years-old .",Name : <PATIENT> MR <AGE> years-old .,Name : [***************] MR ** years-old .,Name : **** MR **** years-old .,Name : Liz Blackbird MR 22 years-old .
2,# 719435 Date : 01/13/93 .,# <PHONE> Date : <DATE> .,# [****] Date : [******] .,# **** Date : **** .,# 640582 Date : 02/02/93 .
3,Signed by Oliveira Sander .,Signed by <DOCTOR> .,Signed by [*************] .,Signed by **** .,Signed by Rubie Rile .
4,Record date : 2079-11-09 .,Record date : <DATE> .,Record date : [********] .,Record date : **** .,Record date : 2079-11-29 .
5,Cocke County Baptist Hospital . 0295 Keats Street.,<HOSPITAL> . <STREET>.,[***************************] . [***************].,**** . ****.,Manhattan Psychiatric Center . 5050 County Road 472.
6,Phone 302-786-5227.,Phone <PHONE>.,Phone [**********].,Phone ****.,Phone 871-639-2116.


## NER Pipelines

**`NER pretrained ` Model List**

|index|model|index|model|index|model|index|model|
|-----:|:-----|-----:|:-----|-----:|:-----|-----:|:-----|
| 1| [jsl_ner_wip_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/jsl_ner_wip_clinical_pipeline_en_3_0.html) | 2| [jsl_ner_wip_greedy_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/jsl_ner_wip_greedy_biobert_pipeline_en_3_0.html) | 3| [jsl_ner_wip_greedy_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/jsl_ner_wip_greedy_clinical_pipeline_en_3_0.html) | 4| [jsl_ner_wip_modifier_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/jsl_ner_wip_modifier_clinical_pipeline_en_3_0.html) |
| 5| [jsl_rd_ner_wip_greedy_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/jsl_rd_ner_wip_greedy_biobert_pipeline_en_3_0.html) | 6| [jsl_rd_ner_wip_greedy_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/jsl_rd_ner_wip_greedy_clinical_pipeline_en_3_0.html) | 7| [ner_abbreviation_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_abbreviation_clinical_pipeline_en_3_0.html) | 8| [ner_ade_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_ade_biobert_pipeline_en_3_0.html) |
| 9| [ner_ade_clinical_langtest_pipeline](https://nlp.johnsnowlabs.com/2023/09/09/ner_ade_clinical_langtest_pipeline_en.html) | 10| [ner_ade_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_ade_clinical_pipeline_en_3_0.html) | 11| [ner_ade_clinicalbert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_ade_clinicalbert_pipeline_en_3_0.html) | 12| [ner_ade_healthcare_pipeline](https://nlp.johnsnowlabs.com/2022/03/22/ner_ade_healthcare_pipeline_en_3_0.html) |
| 13| [ner_anatomy_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_anatomy_biobert_pipeline_en_3_0.html) | 14| [ner_anatomy_coarse_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_anatomy_coarse_biobert_pipeline_en_3_0.html) | 15| [ner_anatomy_coarse_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_anatomy_coarse_pipeline_en_3_0.html) | 16| [ner_anatomy_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_anatomy_pipeline_en_3_0.html) |
| 17| [ner_bacterial_species_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_bacterial_species_pipeline_en_3_0.html) | 18| [ner_biomarker_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_biomarker_pipeline_en_3_0.html) | 19| [ner_biomedical_bc2gm_pipeline](https://nlp.johnsnowlabs.com/2023/03/14/ner_biomedical_bc2gm_pipeline_en.html) | 20| [ner_bionlp_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_bionlp_biobert_pipeline_en_3_0.html) |
| 21| [ner_bionlp_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_bionlp_pipeline_en_3_0.html) | 22| [ner_cancer_genetics_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_cancer_genetics_pipeline_en_3_0.html) | 23| [ner_cellular_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_cellular_biobert_pipeline_en_3_0.html) | 24| [ner_cellular_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_cellular_pipeline_en_3_0.html) |
| 25| [ner_chemd_clinical_pipeline](https://nlp.johnsnowlabs.com/2023/03/14/ner_chemd_clinical_pipeline_en.html) | 26| [ner_chemicals_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_chemicals_pipeline_en_3_0.html) | 27| [ner_chemprot_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_chemprot_biobert_pipeline_en_3_0.html) | 28| [ner_chemprot_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_chemprot_clinical_pipeline_en_3_0.html) |
| 29| [ner_chexpert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_chexpert_pipeline_en_3_0.html) | 30| [ner_clinical_bert_pipeline](https://nlp.johnsnowlabs.com/2023/03/09/ner_clinical_bert_pipeline_ro.html) | 31| [ner_clinical_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_clinical_biobert_pipeline_en_3_0.html) | 32| [ner_clinical_large_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_clinical_large_pipeline_en_3_0.html) |
| 33| [ner_clinical_pipeline](https://nlp.johnsnowlabs.com/2023/09/02/ner_clinical_pipeline_es.html) | 34| [ner_clinical_trials_abstracts_pipeline](https://nlp.johnsnowlabs.com/2023/03/09/ner_clinical_trials_abstracts_pipeline_en.html) | 35| [ner_covid_trials_pipeline](https://nlp.johnsnowlabs.com/2023/03/09/ner_covid_trials_pipeline_en.html) | 36| [ner_diag_proc_pipeline](https://nlp.johnsnowlabs.com/2023/03/15/ner_diag_proc_pipeline_es.html) |
| 37| [ner_diseases_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_diseases_biobert_pipeline_en_3_0.html) | 38| [ner_diseases_large_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_diseases_large_pipeline_en_3_0.html) | 39| [ner_diseases_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_diseases_pipeline_en_3_0.html) | 40| [ner_drugprot_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_drugprot_clinical_pipeline_en_3_0.html) |
| 41| [ner_drugs_greedy_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_drugs_greedy_pipeline_en_3_0.html) | 42| [ner_drugs_large_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_drugs_large_pipeline_en_3_0.html) | 43| [ner_drugs_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_drugs_pipeline_en_3_0.html) | 44| [ner_eu_clinical_case_pipeline](https://nlp.johnsnowlabs.com/2023/03/08/ner_eu_clinical_case_pipeline_en.html) |
| 45| [ner_eu_clinical_condition_pipeline](https://nlp.johnsnowlabs.com/2023/03/08/ner_eu_clinical_condition_pipeline_es.html) | 46| [ner_events_admission_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_events_admission_clinical_pipeline_en_3_0.html) | 47| [ner_events_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_events_biobert_pipeline_en_3_0.html) | 48| [ner_events_clinical_langtest_pipeline](https://nlp.johnsnowlabs.com/2023/09/09/ner_events_clinical_langtest_pipeline_en.html) |
| 49| [ner_events_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_events_clinical_pipeline_en_3_0.html) | 50| [ner_events_healthcare_pipeline](https://nlp.johnsnowlabs.com/2022/03/22/ner_events_healthcare_pipeline_en_3_0.html) | 51| [ner_genetic_variants_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_genetic_variants_pipeline_en_3_0.html) | 52| [ner_healthcare_pipeline](https://nlp.johnsnowlabs.com/2022/03/22/ner_healthcare_pipeline_en_3_0.html) |
| 53| [ner_healthcare_slim_pipeline](https://nlp.johnsnowlabs.com/2023/03/15/ner_healthcare_slim_pipeline_de.html) | 54| [ner_human_phenotype_gene_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_human_phenotype_gene_biobert_pipeline_en_3_0.html) | 55| [ner_human_phenotype_gene_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_human_phenotype_gene_clinical_pipeline_en_3_0.html) | 56| [ner_human_phenotype_go_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_human_phenotype_go_biobert_pipeline_en_3_0.html) |
| 57| [ner_human_phenotype_go_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_human_phenotype_go_clinical_pipeline_en_3_0.html) | 58| [ner_jsl_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_jsl_biobert_pipeline_en_3_0.html) | 59| [ner_jsl_enriched_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_jsl_enriched_biobert_pipeline_en_3_0.html) | 60| [ner_jsl_enriched_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_jsl_enriched_pipeline_en_3_0.html) |
| 61| [ner_jsl_greedy_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_jsl_greedy_biobert_pipeline_en_3_0.html) | 62| [ner_jsl_greedy_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_jsl_greedy_pipeline_en_3_0.html) | 63| [ner_jsl_langtest_pipeline](https://nlp.johnsnowlabs.com/2023/09/09/ner_jsl_langtest_pipeline_en.html) | 64| [ner_jsl_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_jsl_pipeline_en_3_0.html) |
| 65| [ner_jsl_slim_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_jsl_slim_pipeline_en_3_0.html) | 66| [ner_living_species_300_pipeline](https://nlp.johnsnowlabs.com/2023/03/09/ner_living_species_300_pipeline_es.html) | 67| [ner_living_species_bert_pipeline](https://nlp.johnsnowlabs.com/2023/03/13/ner_living_species_bert_pipeline_es.html) | 68| [ner_living_species_biobert_pipeline](https://nlp.johnsnowlabs.com/2023/03/20/ner_living_species_biobert_pipeline_en.html) |
| 69| [ner_living_species_pipeline](https://nlp.johnsnowlabs.com/2023/03/13/ner_living_species_pipeline_en.html) | 70| [ner_living_species_roberta_pipeline](https://nlp.johnsnowlabs.com/2023/03/13/ner_living_species_roberta_pipeline_es.html) | 71| [ner_measurements_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_measurements_clinical_pipeline_en_3_0.html) | 72| [ner_medication_pipeline](https://nlp.johnsnowlabs.com/2023/06/17/ner_medication_pipeline_en.html) |
| 73| [ner_medmentions_coarse_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_medmentions_coarse_pipeline_en_3_0.html) | 74| [ner_nature_nero_clinical_pipeline](https://nlp.johnsnowlabs.com/2023/03/14/ner_nature_nero_clinical_pipeline_en.html) | 75| [ner_negation_uncertainty_pipeline](https://nlp.johnsnowlabs.com/2023/03/09/ner_negation_uncertainty_pipeline_es.html) | 76| [ner_neoplasms_pipeline](https://nlp.johnsnowlabs.com/2023/03/15/ner_neoplasms_pipeline_es.html) |
| 77| [ner_nihss_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_nihss_pipeline_en_3_0.html) | 78| [ner_oncology_anatomy_general_healthcare_pipeline](https://nlp.johnsnowlabs.com/2023/03/08/ner_oncology_anatomy_general_healthcare_pipeline_en.html) | 79| [ner_oncology_anatomy_general_langtest_pipeline](https://nlp.johnsnowlabs.com/2023/09/09/ner_oncology_anatomy_general_langtest_pipeline_en.html) | 80| [ner_oncology_anatomy_general_pipeline](https://nlp.johnsnowlabs.com/2023/03/08/ner_oncology_anatomy_general_pipeline_en.html) |
| 81| [ner_oncology_anatomy_granular_langtest_pipeline](https://nlp.johnsnowlabs.com/2023/09/09/ner_oncology_anatomy_granular_langtest_pipeline_en.html) | 82| [ner_oncology_anatomy_granular_pipeline](https://nlp.johnsnowlabs.com/2023/03/08/ner_oncology_anatomy_granular_pipeline_en.html) | 83| [ner_oncology_biomarker_healthcare_pipeline](https://nlp.johnsnowlabs.com/2023/03/08/ner_oncology_biomarker_healthcare_pipeline_en.html) | 84| [ner_oncology_biomarker_pipeline](https://nlp.johnsnowlabs.com/2023/03/09/ner_oncology_biomarker_pipeline_en.html) |
| 85| [ner_oncology_demographics_langtest_pipeline](https://nlp.johnsnowlabs.com/2023/09/09/ner_oncology_demographics_langtest_pipeline_en.html) | 86| [ner_oncology_demographics_pipeline](https://nlp.johnsnowlabs.com/2023/03/09/ner_oncology_demographics_pipeline_en.html) | 87| [ner_oncology_diagnosis_pipeline](https://nlp.johnsnowlabs.com/2023/03/09/ner_oncology_diagnosis_pipeline_en.html) | 88| [ner_oncology_pipeline](https://nlp.johnsnowlabs.com/2023/03/08/ner_oncology_pipeline_en.html) |
| 89| [ner_oncology_posology_langtest_pipeline](https://nlp.johnsnowlabs.com/2023/09/09/ner_oncology_posology_langtest_pipeline_en.html) | 90| [ner_oncology_posology_pipeline](https://nlp.johnsnowlabs.com/2023/03/08/ner_oncology_posology_pipeline_en.html) | 91| [ner_oncology_response_to_treatment_langtest_pipeline](https://nlp.johnsnowlabs.com/2023/09/09/ner_oncology_response_to_treatment_langtest_pipeline_en.html) | 92| [ner_oncology_response_to_treatment_pipeline](https://nlp.johnsnowlabs.com/2023/03/09/ner_oncology_response_to_treatment_pipeline_en.html) |
| 93| [ner_oncology_risk_pipeline](https://nlp.johnsnowlabs.com/2023/03/08/ner_oncology_risk_pipeline_en.html) | 94| [ner_oncology_treatment_healthcare_pipeline](https://nlp.johnsnowlabs.com/2023/03/09/ner_oncology_treatment_healthcare_pipeline_en.html) | 95| [ner_oncology_treatment_pipeline](https://nlp.johnsnowlabs.com/2023/03/09/ner_oncology_treatment_pipeline_en.html) | 96| [ner_posology_pipeline](https://nlp.johnsnowlabs.com/2023/03/09/ner_posology_pipeline_es.html) |
| 97| [ner_procedure_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_procedure_clinical_pipeline_en_3_0.html) | 98| [ner_procedure_healthcare_pipeline](https://nlp.johnsnowlabs.com/2022/03/22/ner_procedure_healthcare_pipeline_en_3_0.html) | 99| [ner_procedure_large_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_procedure_large_pipeline_en_3_0.html) | 100| [ner_procedure_pipeline](https://nlp.johnsnowlabs.com/2023/03/15/ner_procedure_pipeline_es.html) |
| 101| [ner_protein_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_protein_biobert_pipeline_en_3_0.html) | 102| [ner_protein_biomedical_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_protein_biomedical_pipeline_en_3_0.html) | 103| [ner_protein_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_protein_pipeline_en_3_0.html) | 104| [ner_protein_secondary_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_protein_secondary_pipeline_en_3_0.html) |
| 105| [ner_proteins_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/ner_proteins_pipeline_en_3_0.html) | 106| [ner_risk_factors_pipeline](https://nlp.johnsnowlabs.com/2022/03/22/ner_risk_factors_pipeline_en_3_0.html) | 107| [ner_rxnorm_pipeline](https://nlp.johnsnowlabs.com/2022/03/22/ner_rxnorm_pipeline_en_3_0.html) | 108| [ner_sdoh_biobert_pipeline](https://nlp.johnsnowlabs.com/2023/06/17/ner_sdoh_biobert_pipeline_en.html) |
| 109| [ner_sdoh_demographic_pipeline](https://nlp.johnsnowlabs.com/2023/06/17/ner_sdoh_demographic_pipeline_en.html) | 110| [ner_sdoh_diet_pipeline](https://nlp.johnsnowlabs.com/2023/06/17/ner_sdoh_diet_pipeline_en.html) | 111| [ner_sdoh_housing_pipeline](https://nlp.johnsnowlabs.com/2023/06/17/ner_sdoh_housing_pipeline_en.html) | 112| [ner_sdoh_langtest_pipeline](https://nlp.johnsnowlabs.com/2023/09/09/ner_sdoh_langtest_pipeline_en.html) |
| 113| [ner_sdoh_mentions_pipeline](https://nlp.johnsnowlabs.com/2023/06/17/ner_sdoh_mentions_pipeline_en.html) | 114| [ner_sdoh_mentions_pipeline](https://nlp.johnsnowlabs.com/2023/03/08/ner_sdoh_mentions_pipeline_en.html) | 115| [ner_supplement_clinical_pipeline](https://nlp.johnsnowlabs.com/2023/03/14/ner_supplement_clinical_pipeline_en.html) | 116| [ner_supplement_pipeline](https://nlp.johnsnowlabs.com/2023/03/13/ner_supplement_pipeline_en.html) |
| 117| [ner_symptoms_biobert_pipeline](https://nlp.johnsnowlabs.com/2022/03/22/ner_symptoms_biobert_pipeline_en_3_0.html) | 118| [ner_symptoms_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/22/ner_symptoms_clinical_pipeline_en_3_0.html) | 119| [ner_symptoms_healthcare_pipeline](https://nlp.johnsnowlabs.com/2022/03/22/ner_symptoms_healthcare_pipeline_en_3_0.html) | 120| [ner_symptoms_large_pipeline](https://nlp.johnsnowlabs.com/2022/03/22/ner_symptoms_large_pipeline_en_3_0.html) |
| 121| [ner_vop_problem_pipeline](https://nlp.johnsnowlabs.com/2023/06/22/ner_vop_problem_pipeline_en.html) | 122| [ner_vop_problem_reduced_pipeline](https://nlp.johnsnowlabs.com/2023/06/22/ner_vop_problem_reduced_pipeline_en.html) | 123| [ner_vop_temporal_pipeline](https://nlp.johnsnowlabs.com/2023/06/22/ner_vop_temporal_pipeline_en.html) | 124| [ner_vop_test_pipeline](https://nlp.johnsnowlabs.com/2023/06/22/ner_vop_test_pipeline_en.html) |
| 125| [ner_vop_treatment_pipeline](https://nlp.johnsnowlabs.com/2023/06/22/ner_vop_treatment_pipeline_en.html) | 126| [ner_vaccine_types_pipeline](https://nlp.johnsnowlabs.com/2025/07/14/ner_vaccine_types_pipeline_en.html) | 127| [ner_admission_discharge_benchmark_pipeline](https://nlp.johnsnowlabs.com/2025/03/27/ner_admission_discharge_benchmark_pipeline_en.html) | 128| [ner_body_part_benchmark_pipeline](https://nlp.johnsnowlabs.com/2025/03/27/ner_body_part_benchmark_pipeline_en.html) |
| 129| [ner_drug_benchmark_pipeline](https://nlp.johnsnowlabs.com/2025/03/27/ner_drug_benchmark_pipeline_en.html) | 130| [ner_procedure_benchmark_pipeline](https://nlp.johnsnowlabs.com/2025/03/27/ner_procedure_benchmark_pipeline_en.html) | 131| [ner_test_benchmark_pipeline](https://nlp.johnsnowlabs.com/2025/03/27/ner_test_benchmark_pipeline_en.html) | 132| [ner_treatment_benchmark_pipeline](https://nlp.johnsnowlabs.com/2025/03/27/ner_treatment_benchmark_pipeline_en.html) |133| [ner_consumption_benchmark_pipeline](https://nlp.johnsnowlabs.com/2025/03/28/ner_consumption_benchmark_pipeline_en.html) |134| [ner_grade_stage_severity_benchmark_pipeline](https://nlp.johnsnowlabs.com/2025/03/28/ner_grade_stage_severity_benchmark_pipeline_en.html) |135| [ner_problem_benchmark_pipeline](https://nlp.johnsnowlabs.com/2025/03/28/ner_problem_benchmark_pipeline_en.html) |136| [ner_docwise_benchmark_medium](https://nlp.johnsnowlabs.com/2025/07/31/ner_docwise_benchmark_medium_en.html) |
|137| [ner_docwise_benchmark_large](https://nlp.johnsnowlabs.com/2025/07/31/ner_docwise_benchmark_large_en.html) |138| [ner_atc_pipeline](https://nlp.johnsnowlabs.com/2025/06/24/ner_atc_pipeline_en.html) |139| [ner_hcc_pipeline](https://nlp.johnsnowlabs.com/2025/06/24/ner_hcc_pipeline_en.html) |140| [ner_icd10cm_pipeline](https://nlp.johnsnowlabs.com/2025/06/24/ner_icd10cm_pipeline_en.html) |
|141| [ner_cpt_pipeline](https://nlp.johnsnowlabs.com/2025/06/24/ner_cpt_pipeline_en.html) |142| [ner_hgnc_pipeline](https://nlp.johnsnowlabs.com/2025/06/24/ner_hgnc_pipeline_en.html) |143| [ner_icd10pcs_pipeline](https://nlp.johnsnowlabs.com/2025/06/24/ner_icd10pcs_pipeline_en.html) |144| [ner_icdo_pipeline](https://nlp.johnsnowlabs.com/2025/06/24/ner_icdo_pipeline_en.html) |
|145| [ner_loinc_pipeline](https://nlp.johnsnowlabs.com/2025/06/24/ner_loinc_pipeline_en.html) |146| [ner_mesh_pipeline](https://nlp.johnsnowlabs.com/2025/06/24/ner_mesh_pipeline_en.html) |147| [ner_ncit_pipeline](https://nlp.johnsnowlabs.com/2025/06/24/ner_ncit_pipeline_en.html) |148| [ner_ndc_pipeline](https://nlp.johnsnowlabs.com/2025/06/24/ner_ndc_pipeline_en.html) |
|149| [ner_rxcui_pipeline](https://nlp.johnsnowlabs.com/2025/06/24/ner_rxcui_pipeline_en.html) |150| [ner_rxnorm_pipeline](https://nlp.johnsnowlabs.com/2025/06/24/ner_rxnorm_pipeline_en.html) |151| [ner_hcpcs_pipeline](https://nlp.johnsnowlabs.com/2025/06/25/ner_hcpcs_pipeline_en.html) |152| [ner_hpo_pipeline](https://nlp.johnsnowlabs.com/2025/06/25/ner_hpo_pipeline_en.html) |
|153| [ner_meddra_llt_pipeline](https://nlp.johnsnowlabs.com/2025/06/25/ner_meddra_llt_pipeline_en.html) |154| [ner_meddra_pt_pipeline](https://nlp.johnsnowlabs.com/2025/06/25/ner_meddra_pt_pipeline_en.html) |155| [ner_snomed_auxConcepts_findings_pipeline](https://nlp.johnsnowlabs.com/2025/06/25/ner_snomed_auxConcepts_findings_pipeline_en.html) |156| [ner_snomed_auxConcepts_pipeline](https://nlp.johnsnowlabs.com/2025/06/25/ner_snomed_auxConcepts_pipeline_en.html) |
|157| [ner_snomed_bodyStructure_pipeline](https://nlp.johnsnowlabs.com/2025/06/25/ner_snomed_bodyStructure_pipeline_en.html) |158| [ner_snomed_conditions_pipeline](https://nlp.johnsnowlabs.com/2025/06/25/ner_snomed_conditions_pipeline_en.html) |159| [ner_snomed_drug_pipeline](https://nlp.johnsnowlabs.com/2025/06/25/ner_snomed_drug_pipeline_en.html) |160| [ner_snomed_findings_pipeline](https://nlp.johnsnowlabs.com/2025/06/25/ner_snomed_findings_pipeline_en.html) |
|161| [ner_snomed_procedures_measurements_pipeline](https://nlp.johnsnowlabs.com/2025/06/25/ner_snomed_procedures_measurements_pipeline_en.html) |162| [ner_snomed_term_pipeline](https://nlp.johnsnowlabs.com/2025/06/25/ner_snomed_term_pipeline_en.html) |163| [ner_umls_clinical_drugs_pipeline](https://nlp.johnsnowlabs.com/2025/06/25/ner_umls_clinical_drugs_pipeline_en.html) |164| [ner_umls_clinical_findings_pipeline](https://nlp.johnsnowlabs.com/2025/06/25/ner_umls_clinical_findings_pipeline_en.html) |
|165| [ner_umls_disease_syndrome_pipeline](https://nlp.johnsnowlabs.com/2025/06/25/ner_umls_disease_syndrome_pipeline_en.html) |166| [ner_umls_drug_substance_pipeline](https://nlp.johnsnowlabs.com/2025/06/25/ner_umls_drug_substance_pipeline_en.html) |167| [ner_umls_major_concepts_pipeline](https://nlp.johnsnowlabs.com/2025/06/25/ner_umls_major_concepts_pipeline_en.html) |168| [ner_ade_age_meddra_test_pipeline](https://nlp.johnsnowlabs.com/2025/06/29/ner_ade_age_meddra_test_pipeline_en.html) |

**Let's show an example of `ner_jsl_pipeline` can label clinical entities with about 80 different labels.**

In [59]:
ner_pipeline = PretrainedPipeline('ner_jsl_pipeline', 'en', 'clinical/models')

ner_jsl_pipeline download started this may take some time.
Approx size to download 1.6 GB
[OK!]


In [60]:
ner_pipeline.model.stages

[DocumentAssembler_cefadf0e0f93,
 SentenceDetectorDLModel_c83c27f46b97,
 REGEX_TOKENIZER_cef816a12166,
 WORD_EMBEDDINGS_MODEL_9004b1d00302,
 MedicalNerModel_c89cbceb1028,
 NER_CONVERTER_42a801d9e143]

In [61]:
text = """A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus ( T2DM ), one prior episode of HTG-induced pancreatitis three years prior to presentation , associated with an acute hepatitis , and obesity with a body mass index ( BMI ) of 33.5 kg/m2 , presented with a one-week history of polyuria , polydipsia , poor appetite , and vomiting .
Two weeks prior to presentation , she was treated with a five-day course of amoxicillin for a respiratory tract infection . She was on metformin , glipizide , and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG . She had been on dapagliflozin for six months at the time of presentation . Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness , guarding , or rigidity .
Pertinent laboratory findings on admission were : serum glucose 111 mg/dl , bicarbonate 18 mmol/l , anion gap 20 , creatinine 0.4 mg/dL , triglycerides 508 mg/dL , total cholesterol 122 mg/dL , glycated hemoglobin ( HbA1c ) 10% , and venous pH 7.27 . Serum lipase was normal at 43 U/L . Serum acetone levels could not be assessed as blood samples kept hemolyzing due to significant lipemia .
The patient was initially admitted for starvation ketosis , as she reported poor oral intake for three days prior to admission . However , serum chemistry obtained six hours after presentation revealed her glucose was 186 mg/dL , the anion gap was still elevated at 21 , serum bicarbonate was 16 mmol/L , triglyceride level peaked at 2050 mg/dL , and lipase was 52 U/L .
The β-hydroxybutyrate level was obtained and found to be elevated at 5.29 mmol/L - the original sample was centrifuged and the chylomicron layer removed prior to analysis due to interference from turbidity caused by lipemia again . The patient was treated with an insulin drip for euDKA and HTG with a reduction in the anion gap to 13 and triglycerides to 1400 mg/dL , within 24 hours .
Her euDKA was thought to be precipitated by her respiratory tract infection in the setting of SGLT2 inhibitor use . The patient was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night , 12 units of insulin lispro with meals , and metformin 1000 mg two times a day .
It was determined that all SGLT2 inhibitors should be discontinued indefinitely . She had close follow-up with endocrinology post discharge ."""

greedy_result = ner_pipeline.fullAnnotate(text)[0]

In [62]:
greedy_result.keys()

dict_keys(['document', 'ner_chunk', 'token', 'ner', 'embeddings', 'sentence'])

In [63]:
import pandas as pd

chunks=[]
entities=[]
begins=[]
ends=[]

for n in greedy_result['ner_chunk']:

    chunks.append(n.result)
    begins.append(n.begin)
    ends.append(n.end)
    entities.append(n.metadata['entity'])

df = pd.DataFrame({'chunks':chunks, 'begin':begins, 'end':ends, 'entities':entities})

df

Unnamed: 0,chunks,begin,end,entities
0,28-year-old,2,12,Age
1,female,14,19,Gender
2,gestational diabetes mellitus,39,67,Diabetes
3,eight years prior,79,95,RelativeDate
4,type two diabetes mellitus,128,153,Diabetes
...,...,...,...,...
116,two times a day,2357,2371,Frequency
117,SGLT2 inhibitors,2402,2417,Drug_Ingredient
118,She,2457,2459,Gender
119,endocrinology,2486,2498,Clinical_Dept


## Bert Based NER Pipelines


**`bert token classification pretrained ` Pipeline List**

| Index | Model | Index | Model | Index | Model | Index | Model |
|------:|:------|------:|:------|------:|:------|------:|:------|
| 1 | [bert_token_classifier_ade_tweet_binary_pipeline](https://nlp.johnsnowlabs.com/2023/03/20/bert_token_classifier_ade_tweet_binary_pipeline_en.html) | 2 | [bert_token_classifier_disease_mentions_tweet_pipeline](https://nlp.johnsnowlabs.com/2023/03/20/bert_token_classifier_disease_mentions_tweet_pipeline_es.html) | 3 | [bert_token_classifier_ner_jsl](https://nlp.johnsnowlabs.com/2021/08/28/bert_token_classifier_ner_jsl_en.html) | 4 | [bert_token_classifier_drug_development_trials_pipeline](https://nlp.johnsnowlabs.com/2022/03/23/bert_token_classifier_drug_development_trials_pipeline_en_3_0.html) |
| 5 | [bert_token_classifier_dutch_udlassy_ner_pipeline](https://nlp.johnsnowlabs.com/2022/04/19/bert_token_classifier_dutch_udlassy_ner_pipeline_nl_3_0.html) | 6 | [bert_token_classifier_hi_en_ner_pipeline](https://nlp.johnsnowlabs.com/2022/03/22/bert_token_classifier_hi_en_ner_pipeline_hi_3_0.html) | 7 | [bert_token_classifier_negation_uncertainty_pipeline](https://nlp.johnsnowlabs.com/2023/03/20/bert_token_classifier_negation_uncertainty_pipeline_es.html) | 8 | [bert_token_classifier_ner_ade_binary_pipeline](https://nlp.johnsnowlabs.com/2023/03/20/bert_token_classifier_ner_ade_binary_pipeline_en.html) |
| 9 | [bert_token_classifier_ner_ade_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/bert_token_classifier_ner_ade_pipeline_en_3_0.html) | 10 | [bert_token_classifier_ner_anatem_pipeline](https://nlp.johnsnowlabs.com/2023/03/20/bert_token_classifier_ner_anatem_pipeline_en.html) | 11 | [bert_token_classifier_ner_anatomy_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/bert_token_classifier_ner_anatomy_pipeline_en_3_0.html) | 12 | [bert_token_classifier_ner_bacteria_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/bert_token_classifier_ner_bacteria_pipeline_en_3_0.html) |
| 13 | [bert_token_classifier_ner_bc2gm_gene_pipeline](https://nlp.johnsnowlabs.com/2023/03/20/bert_token_classifier_ner_bc2gm_gene_pipeline_en.html) | 14 | [bert_token_classifier_ner_bc4chemd_chemicals_pipeline](https://nlp.johnsnowlabs.com/2023/03/20/bert_token_classifier_ner_bc4chemd_chemicals_pipeline_en.html) | 15 | [bert_token_classifier_ner_bc5cdr_chemicals_pipeline](https://nlp.johnsnowlabs.com/2023/03/20/bert_token_classifier_ner_bc5cdr_chemicals_pipeline_en.html) | 16 | [bert_token_classifier_ner_bc5cdr_disease_pipeline](https://nlp.johnsnowlabs.com/2023/03/20/bert_token_classifier_ner_bc5cdr_disease_pipeline_en.html) |
| 17 | [bert_token_classifier_ner_bionlp_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/bert_token_classifier_ner_bionlp_pipeline_en_3_0.html) | 18 | [bert_token_classifier_ner_cellular_pipeline](https://nlp.johnsnowlabs.com/2022/03/09/bert_token_classifier_ner_cellular_pipeline_en_3_0.html) | 19 | [bert_token_classifier_ner_chemicals_pipeline](https://nlp.johnsnowlabs.com/2022/03/14/bert_token_classifier_ner_chemicals_pipeline_en_2_4.html) | 20 | [bert_token_classifier_ner_chemprot_pipeline](https://nlp.johnsnowlabs.com/2022/03/15/bert_token_classifier_ner_chemprot_pipeline_en_2_4.html) |
| 21 | [bert_token_classifier_ner_clinical_pipeline](https://nlp.johnsnowlabs.com/2022/03/15/bert_token_classifier_ner_clinical_pipeline_en_2_4.html) | 22 | [bert_token_classifier_ner_clinical_trials_abstracts_pipeline](https://nlp.johnsnowlabs.com/2023/03/20/bert_token_classifier_ner_clinical_trials_abstracts_pipeline_en.html) | 23 | [bert_token_classifier_ner_deid_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/bert_token_classifier_ner_deid_pipeline_en_3_0.html) | 24 | [bert_token_classifier_drug_development_trials](https://nlp.johnsnowlabs.com/2021/12/17/bert_token_classifier_drug_development_trials_en.html) |
| 25 | [bert_token_classifier_ner_drugs_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/bert_token_classifier_ner_drugs_pipeline_en_3_0.html) | 26 | [bert_token_classifier_ner_jnlpba_cellular_pipeline](https://nlp.johnsnowlabs.com/2023/03/20/bert_token_classifier_ner_jnlpba_cellular_pipeline_en.html) | 27 | [bert_token_classifier_ner_jsl_pipeline](https://nlp.johnsnowlabs.com/2022/03/23/bert_token_classifier_ner_jsl_pipeline_en_3_0.html) | 28 | [bert_token_classifier_ner_jsl_slim_pipeline](https://nlp.johnsnowlabs.com/2022/03/21/bert_token_classifier_ner_jsl_slim_pipeline_en_3_0.html) |
| 29 | [bert_token_classifier_ner_linnaeus_species_pipeline](https://nlp.johnsnowlabs.com/2023/03/20/bert_token_classifier_ner_linnaeus_species_pipeline_en.html) | 30 | [bert_token_classifier_ner_living_species_pipeline](https://nlp.johnsnowlabs.com/2023/03/20/bert_token_classifier_ner_living_species_pipeline_en.html) | 31 | [bert_token_classifier_ner_ncbi_disease_pipeline](https://nlp.johnsnowlabs.com/2023/03/20/bert_token_classifier_ner_ncbi_disease_pipeline_en.html) | 32 | [bert_token_classifier_ner_pathogen_pipeline](https://nlp.johnsnowlabs.com/2023/03/20/bert_token_classifier_ner_pathogen_pipeline_en.html) |
| 33 | [bert_token_classifier_ner_species_pipeline](https://nlp.johnsnowlabs.com/2023/03/20/bert_token_classifier_ner_species_pipeline_en.html) | 34 | [bert_token_classifier_pharmacology_pipeline](https://nlp.johnsnowlabs.com/2023/03/20/bert_token_classifier_pharmacology_pipeline_es.html) | 35 | [bert_token_classifier_scandi_ner_pipeline](https://nlp.johnsnowlabs.com/2022/02/15/bert_token_classifier_scandi_ner_pipeline_xx.html) |   |   |



**Let's show an example of `bert_token_classifier_ner_drugs_pipeline` can extract `DRUG` entities in clinical texts.**

In [64]:
bert_token_pipeline = PretrainedPipeline("bert_token_classifier_ner_drugs_pipeline", "en", "clinical/models")

bert_token_classifier_ner_drugs_pipeline download started this may take some time.
Approx size to download 386.1 MB
[OK!]


In [65]:
bert_token_pipeline.model.stages

[DocumentAssembler_fbb1736f8270,
 SentenceDetectorDLModel_8aaebf7e098e,
 REGEX_TOKENIZER_bd5df3943e2f,
 BERT_FOR_TOKEN_CLASSIFICATION_3fa6213c0542,
 NER_CONVERTER_70b935c1d6d8]

In [66]:
test_sentence = """The human KCNJ9 (Kir 3.3, GIRK3) is a member of the G-protein-activated inwardly rectifying potassium (GIRK) channel family. Here we describe the genomicorganization of the KCNJ9 locus on chromosome 1q21-23 as a candidate gene forType II diabetes mellitus in the Pima Indian population. The gene spansapproximately 7.6 kb and contains one noncoding and two coding exons separated byapproximately 2.2 and approximately 2.6 kb introns, respectively. We identified14 single nucleotide polymorphisms (SNPs), including one that predicts aVal366Ala substitution, and an 8 base-pair (bp) insertion/deletion. Ourexpression studies revealed the presence of the transcript in various humantissues including pancreas, and two major insulin-responsive tissues: fat andskeletal muscle. The characterization of the KCNJ9 gene should facilitate furtherstudies on the function of the KCNJ9 protein and allow evaluation of thepotential role of the locus in Type II diabetes.BACKGROUND: At present, it is one of the most important issues for the treatment of breast cancer to develop the standard therapy for patients previously treated with anthracyclines and taxanes. With the objective of determining the usefulnessof vinorelbine monotherapy in patients with advanced or recurrent breast cancerafter standard therapy, we evaluated the efficacy and safety of vinorelbine inpatients previously treated with anthracyclines and taxanes."""

bert_result = bert_token_pipeline.fullAnnotate(test_sentence)[0]

In [67]:
bert_result.keys()

dict_keys(['document', 'ner_chunk', 'token', 'ner', 'sentence'])

In [68]:
bert_result["ner_chunk"][0]

Annotation(chunk, 92, 100, potassium, {'entity': 'DrugChem', 'confidence': '0.99056387', 'ner_source': 'ner_chunk', 'chunk': '0', 'sentence': '0'}, [])

In [69]:
import pandas as pd

chunks=[]
entities=[]
begins=[]
ends=[]

for n in bert_result['ner_chunk']:

    chunks.append(n.result)
    begins.append(n.begin)
    ends.append(n.end)
    entities.append(n.metadata['entity'])

df = pd.DataFrame({'chunks':chunks, 'begin':begins, 'end':ends, 'entities':entities})

df

Unnamed: 0,chunks,begin,end,entities
0,potassium,92,100,DrugChem
1,nucleotide,471,480,DrugChem
2,anthracyclines,1124,1137,DrugChem
3,taxanes,1143,1149,DrugChem
4,vinorelbine,1203,1213,DrugChem
5,vinorelbine,1343,1353,DrugChem
6,anthracyclines,1390,1403,DrugChem
7,taxanes,1409,1415,DrugChem


## NER Profiling Pipelines

We can use pretrained NER profiling pipelines for exploring all the available pretrained NER models at once. In Spark NLP we have two different NER profiling pipelines;

- `ner_profiling_clinical` : Returns results for clinical NER models trained with `embeddings_clinical`.


|| | | |
|--------------|-----------------|-----------------|-----------------|
| jsl_ner_wip_clinical | jsl_ner_wip_greedy_clinical | jsl_ner_wip_modifier_clinical | jsl_rd_ner_wip_greedy_clinical |
| ner_abbreviation_clinical | ner_ade_binary | ner_ade_clinical | ner_anatomy |
| ner_anatomy_coarse | ner_bacterial_species | ner_biomarker | ner_biomedical_bc2gm |
| ner_bionlp | ner_cancer_genetics | ner_cellular | ner_chemd_clinical |
| ner_chemicals | ner_chemprot_clinical | ner_chexpert | ner_clinical |
| ner_clinical_large | ner_clinical_trials_abstracts | ner_covid_trials | ner_deid_augmented |
| ner_deid_enriched | ner_deid_generic_augmented | ner_deid_large | ner_deid_sd |
| ner_deid_sd_large | ner_deid_subentity_augmented | ner_deid_subentity_augmented_i2b2 | ner_deid_synthetic |
| ner_deidentify_dl | ner_diseases | ner_diseases_large | ner_drugprot_clinical |
| ner_drugs | ner_drugs_greedy | ner_drugs_large | ner_events_admission_clinical |
| ner_events_clinical | ner_genetic_variants | ner_human_phenotype_gene_clinical | ner_human_phenotype_go_clinical |
| ner_jsl | ner_jsl_enriched | ner_jsl_greedy | ner_jsl_slim |
| ner_living_species | ner_measurements_clinical | ner_medmentions_coarse | ner_nature_nero_clinical |
| ner_nihss | ner_pathogen | ner_posology | ner_posology_experimental |
| ner_posology_greedy | ner_posology_large | ner_posology_small | ner_radiology |
| ner_radiology_wip_clinical | ner_risk_factors | ner_supplement_clinical | nerdl_tumour_demo |




- `ner_profiling_biobert` : Returns results for clinical NER models trained with `biobert_pubmed_base_cased`.

| | |
|-|-|
| ner_cellular_biobert           | ner_clinical_biobert             |
| ner_diseases_biobert           | ner_anatomy_coarse_biobert       |
| ner_events_biobert             | ner_human_phenotype_gene_biobert |
| ner_bionlp_biobert             | ner_posology_large_biobert       |
| ner_jsl_greedy_biobert         | jsl_rd_ner_wip_greedy_biobert    |
| ner_jsl_biobert                | ner_posology_biobert             |
| ner_anatomy_biobert            | jsl_ner_wip_greedy_biobert       |
| ner_jsl_enriched_biobert       | ner_chemprot_biobert             |
| ner_human_phenotype_go_biobert | ner_ade_biobert                  |
| ner_deid_biobert               | ner_risk_factors_biobert         |
| ner_deid_enriched_biobert      | ner_living_species_biobert                                |


- `ner_profiling_oncology` :This pipeline can be used to explore all the available pretrained NER models at once for Oncology. When you run this pipeline over your text, you will end up with the predictions coming out of each pretrained clinical NER model trained with embeddings_clinical


|                 |                 |                 |                 |
| --------------- | --------------- | --------------- | --------------- |
| 1. ner_oncology_unspecific_posology      | 2. ner_oncology_tnm                      | 3. ner_oncology_therapy                  | 4. ner_oncology_test                     |
| 5. ner_oncology_response_to_treatment    | 6. ner_oncology_posology                 | 7. ner_oncology                          | 8. ner_oncology_limited_80p_for_benchmarks|
| 9. ner_oncology_diagnosis                | 10. ner_oncology_demographics             | 11. ner_oncology_biomarker                | 12. ner_oncology_anatomy_granular         |
| 13. ner_oncology_anatomy_general          |                 |                 |                 |


- `ner_profiling_vop` : This pipeline can be used to explore all the available pretrained NER models at once for Voice of Patients. When you run this pipeline over your text, you will end up with the predictions coming out of each pretrained clinical NER model trained with embeddings_clinical.


|                 |                 |                 |                 |
| --------------- | --------------- | --------------- | --------------- |
| 1. ner_oncology_unspecific_posology         | 2. ner_oncology_tnm                       | 3. ner_oncology_therapy                   | 4. ner_oncology_test                      |
| 5. ner_oncology_response_to_treatment       | 6. ner_oncology_posology                  | 7. ner_oncology                           | 8. ner_oncology_limited_80p_for_benchmarks |
| 9. ner_oncology_diagnosis                   | 10. ner_oncology_demographics             | 11. ner_oncology_biomarker                | 12. ner_oncology_anatomy_granular          |
| 13. ner_oncology_anatomy_general            |                                           |                                            |                                           |




- `ner_profiling_sdoh` :This pipeline can be used to explore all the available pretrained NER models at once for Social Determinants of Health. When you run this pipeline over your text, you will end up with the predictions coming out of each pretrained clinical NER model trained with embeddings_clinical.

|                 |                 |                 |                 |
| --------------- | --------------- | --------------- | --------------- |
| 1. ner_sdoh                                   | 2. ner_sdoh_social_environment_wip             | 3. ner_sdoh_mentions                             | 4. ner_sdoh_demographics_wip                    |
| 5. ner_sdoh_community_condition_wip           | 6. ner_sdoh_substance_usage_wip                | 7. ner_sdoh_access_to_healthcare_wip            | 8. ner_sdoh_health_behaviours_problems_wip     |
| 9. ner_sdoh_income_social_status_wip         |                                                 |                                                  |                                                  |


For more examples, please check [this notebook](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/11.2.Pretrained_NER_Profiling_Pipelines.ipynb).









You can check [Models Hub](https://nlp.johnsnowlabs.com/models) page for more information about all these models and more.

In [70]:
clinical_profiling_pipeline = PretrainedPipeline("ner_profiling_clinical", "en", "clinical/models")

ner_profiling_clinical download started this may take some time.
Approx size to download 3.6 GB
[OK!]


In [71]:
text = '''A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus ( T2DM ), one prior episode of HTG-induced pancreatitis three years prior to presentation , associated with an acute hepatitis , and obesity with a body mass index ( BMI ) of 33.5 kg/m2 , presented with a one-week history of polyuria , polydipsia , poor appetite , and vomiting .'''

In [72]:
clinical_result = clinical_profiling_pipeline.fullAnnotate(text)[0]
clinical_result.keys()

dict_keys(['ner_oncology_tnm_langtest', 'ner_ade_clinical_chunks', 'ner_deid_augmented', 'ner_deid_subentity_augmented_i2b2', 'ner_posology_greedy_chunks', 'ner_risk_factors_langtest_chunks', 'ner_sdoh_substance_usage_chunks', 'ner_sdoh_demographics_chunks', 'ner_vop_clinical_dept_langtest', 'ner_human_phenotype_gene_clinical_langtest_chunks', 'ner_clinical_abbreviation_langtest_chunks', 'ner_risk_factors_langtest', 'ner_radiology_wip_clinical', 'ner_deidentify_dl', 'ner_oncology_diagnosis_chunks', 'ner_vop_v2', 'ner_vop_treatment_langtest_chunks', 'ner_jsl_slim', 'ner_vop_anatomy_langtest_chunks', 'ner_vop_clinical_dept_chunks', 'ner_ade_clinical_langtest', 'ner_vop_langtest_chunks', 'ner_risk_factors_chunks', 'jsl_ner_wip_clinical_chunks', 'ner_deid_subentity_augmented_langtest', 'ner_oncology_unspecific_posology_chunks', 'ner_oncology_demographics_langtest_chunks', 'ner_eu_clinical_case_chunks', 'ner_deid_large_langtest_chunks', 'ner_oncology_test_chunks', 'ner_vop_demographic_chunk

In [73]:
import pandas as pd

def get_token_results(light_result):

  tokens = [j.result for j in light_result["token"]]
  sentences = [j.metadata["sentence"] for j in light_result["token"]]
  begins = [j.begin for j in light_result["token"]]
  ends = [j.end for j in light_result["token"]]
  model_list = [ a for a in light_result.keys() if (a not in ["sentence", "token"] and "_chunks" not in a)]

  df = pd.DataFrame({'sentence':sentences, 'begin': begins, 'end': ends, 'token':tokens})

  for model_name in model_list:

    temp_df = pd.DataFrame(light_result[model_name])
    temp_df["jsl_label"] = temp_df.iloc[:,0].apply(lambda x : x.result)
    temp_df = temp_df[["jsl_label"]]

    # temp_df = get_ner_result(model_name)
    temp_df.columns = [model_name]
    df = pd.concat([df, temp_df], axis=1)

  return df

In [74]:
get_token_results(clinical_result)

Unnamed: 0,sentence,begin,end,token,ner_oncology_tnm_langtest,ner_deid_augmented,ner_deid_subentity_augmented_i2b2,ner_vop_clinical_dept_langtest,ner_risk_factors_langtest,ner_radiology_wip_clinical,...,ner_clinical_langtest,ner_oncology_unspecific_posology,ner_genetic_variants,ner_radiology,ner_eu_clinical_case,ner_posology,ner_oncology_posology,ner_covid_trials,ner_vop_problem_reduced_langtest,ner_posology_langtest
0,0,0,0,A,O,O,O,O,O,O,...,O,O,O,O,B-patient,O,O,O,O,O
1,0,2,12,28-year-old,O,O,O,O,O,O,...,O,O,O,O,I-patient,O,O,B-Gender,O,O
2,0,14,19,female,O,O,O,O,O,O,...,O,O,O,O,I-patient,O,O,B-Gender,O,O
3,0,21,24,with,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
4,0,26,26,a,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
68,0,409,416,appetite,O,O,O,O,O,I-Symptom,...,I-PROBLEM,O,O,I-Symptom,I-clinical_condition,O,O,O,I-Problem,O
69,0,418,418,",",O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
70,0,420,422,and,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
71,0,424,431,vomiting,O,O,O,O,O,B-Symptom,...,B-PROBLEM,O,O,B-Symptom,B-clinical_condition,O,O,O,B-Problem,O


## NER Model Finder Pretrained Pipeline
`ner_model_finder`  pretrained pipeline trained with bert embeddings that can be used to find the most appropriate NER model given the entity name.

In [75]:
from sparknlp.pretrained import PretrainedPipeline
finder_pipeline = PretrainedPipeline("ner_model_finder", "en", "clinical/models")

ner_model_finder download started this may take some time.
Approx size to download 148.7 MB
[OK!]


In [76]:
result = finder_pipeline.fullAnnotate("oncology")[0]
result.keys()

dict_keys(['model_names'])

From the metadata in the 'model_names' column, we'll get to the top models to the given 'oncology' entity and oncology related categories.

In [77]:
df= pd.DataFrame(zip(result["model_names"][0].metadata["all_k_resolutions"].split(":::"),
                     result["model_names"][0].metadata["all_k_results"].split(":::")),
                 columns=["category", "top_models"])

In [78]:
df.head()

Unnamed: 0,category,top_models
0,oncology therapy,"['ner_jsl', 'jsl_rd_ner_wip_greedy_clinical', 'jsl_ner_wip_modifier_clinical', 'ner_jsl_enriched..."
1,clinical department,"['ner_jsl', 'jsl_rd_ner_wip_greedy_clinical', 'jsl_ner_wip_modifier_clinical', 'ner_events_clini..."
2,biomedical unit,['ner_clinical_trials_abstracts']
3,cancer genetics,['ner_cancer_genetics']
4,anatomy,"['ner_bionlp', 'ner_medmentions_coarse', 'ner_chexpert', 'ner_anatomy_coarse', 'ner_anatomy', 'n..."



## Resolver Pipelines

We have **Resolver pipelines** for converting clinical entities to their UMLS CUI codes. You will just feed your text and it will return the corresponding UMLS codes.

**Resolver Pipelines:**

| index | pipeline | Entity | Target |
|-:|:-|:-|:-|
| 1 | [umls_disease_syndrome_resolver_pipeline](https://nlp.johnsnowlabs.com/2022/07/26/umls_disease_syndrome_resolver_pipeline_en_3_0.html) | Disease and Syndromes | UMLS CUI |
| 2 | [umls_drug_resolver_pipeline](https://nlp.johnsnowlabs.com/2022/07/26/umls_drug_resolver_pipeline_en_3_0.html) | Drug | UMLS CUI |
| 3 | [umls_drug_substance_resolver_pipeline](https://nlp.johnsnowlabs.com/2022/07/25/umls_drug_substance_resolver_pipeline_en_3_0.html) | Drug Substance | UMLS CUI |
| 4 | [medication_resolver_pipeline](https://nlp.johnsnowlabs.com/2022/09/01/medication_resolver_pipeline_en.html) | Drug | Adverse Reaction, RxNorm, UMLS<br>NDC, SNOMED CT |
| 5 | [medication_resolver_transform_pipeline](https://nlp.johnsnowlabs.com/2022/09/01/medication_resolver_transform_pipeline_en.html) | Drug | Adverse Reaction, RxNorm<br>UMLS, NDC, SNOMED CT |
| 6 | [umls_major_concepts_resolver_pipeline](https://nlp.johnsnowlabs.com/2022/07/25/umls_major_concepts_resolver_pipeline_en_3_0.html) | Clinical Major Concepts | UMLS CUI |
| 7 | [umls_clinical_findings_resolver_pipeline](https://nlp.johnsnowlabs.com/2022/07/26/umls_clinical_findings_resolver_pipeline_en_3_0.html) | Clinical Findings | UMLS CUI |
| 8 | [icd9_resolver_pipeline](https://nlp.johnsnowlabs.com/2022/09/30/icd9_resolver_pipeline_en.html) | Clinical Findings | ICD-9 |
| 9 | [atc_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/01/17/atc_resolver_pipeline_en.html) | Drug | ATC |
| 10 | [cpt_procedures_measurements_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/01/17/cpt_procedures_measurements_resolver_pipeline_en.html) | Procedure, Measurement | CPT |
| 11 | [hcc_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/01/17/hcc_resolver_pipeline_en.html) | Clinical Findings | HCC |
| 12 | [hpo_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/01/17/hpo_resolver_pipeline_en.html) | Human Phenotype | HPO |
| 13 | [cvx_resolver_pipeline](https://nlp.johnsnowlabs.com/2022/10/12/cvx_resolver_pipeline_en.html) | Vaccine | CVX |
| 14 | [icd10cm_resolver_pipeline](https://nlp.johnsnowlabs.com/2022/11/02/icd10cm_resolver_pipeline_en.html) | Problem | ICD-10 |
| 15 | [icd10pcs_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/02/02/icd10pcs_resolver_pipeline_en.html) | Procedure | ICD-10-PCS |
| 16 | [icdo_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/02/02/icdo_resolver_pipeline_en.html) | Problem | ICD-O |
| 17 | [loinc_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/02/02/loinc_resolver_pipeline_en.html) | Test | LOINC |
| 18 | [loinc_numeric_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/01/30/loinc_numeric_resolver_pipeline_en.html) | Test | LOINC |
| 19 | [snomed_body_structure_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/01/17/snomed_body_structure_resolver_pipeline_en.html) | Clinical Findings | SNOMED |
| 20 | [snomed_findings_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/01/17/snomed_findings_resolver_pipeline_en.html) | Clinical Findings | SNOMED CT |
| 21 | [snomed_procedures_measurements_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/01/31/snomed_findings_resolver_pipelin_en.html) | Procedure, Test | SNOMED |
| 22 | [mesh_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/01/25/mesh_resolver_pipeline_en.html) | Clinical Findings | MeSH |
| 23 | [ndc_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/01/25/ndc_resolver_pipeline_en.html) | Drug | NDC |
| 24 | [ncit_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/01/25/ncit_resolver_pipeline_en.html) | Clinical Findings | NCIt |
| 25 | [rxcui_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/02/01/rxcui_resolver_pipeline_en.html) | Clinical Findings | RxCUI |
| 26 | [hcpcs_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/01/30/hcpcs_resolver_pipeline_en.html) | Procedure | HCPCS |
| 27 | [hgnc_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/01/30/hgnc_resolver_pipeline_en.html) | Gene | HGNC |
| 28 | [icd10cm_generalised_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/01/30/icd10cm_generalised_resolver_pipeline_en.html) | Clinical Findings | ICD-10-CM |
| 29 | [abbreviation_pipeline](https://nlp.johnsnowlabs.com/2023/08/16/abbreviation_pipeline_en.html) | ABBR | Definitions and Categories |
| 30 | [icd10cm_multi_mapper_pipeline](https://nlp.johnsnowlabs.com/2023/08/16/icd10cm_multi_mapper_pipeline_en.html) | ICD-10-CM | Billable Mappings, HCC Codes<br>Cause Mappings, Claim Mappings<br>SNOMED Codes, UMLS Codes<br> ICD-9 Codes |
| 31 | [rxnorm_multi_mapper_pipeline](https://nlp.johnsnowlabs.com/2023/08/16/rxnorm_multi_mapper_pipeline_en.html) | RxNorm | Drug Brand Names, RxNorm Extension Brand Names<br> Action Mappings, Treatment Mappings<br> UMLS Codes, NDC Product Codes<br> NDC Package Codes |
| 32 | [rxnorm_resolver_pipeline](https://nlp.johnsnowlabs.com/2023/08/16/rxnorm_resolver_pipeline_en.html) | Drug | RxNorm |
| 33 | [snomed_multi_mapper_pipeline](https://nlp.johnsnowlabs.com/2023/08/16/snomed_multi_mapper_pipeline_en.html) | SNOMED Codes | ICD-10, ICD-O, UMLS |
| 34 | [icd10cm_rxnorm_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/03/07/icd10cm_rxnorm_resolver_pipeline_en.html) | Drug | ICD-10, RxNorm |
| 35 | [snomed_term_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/03/22/snomed_term_resolver_pipeline_en.html) | SNOMED Codes | SNOMED terms |
| 36 | [snomed_auxConcepts_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/03/12/snomed_auxConcepts_resolver_pipeline_en.html) | Morph Abnormality, Clinical Drug<br>Clinical Drug Form, Procedure<br> Substance, Physical Object<br> Body Structure | SNOMED Codes |
| 37 | [snomed_conditions_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/03/11/snomed_conditions_resolver_pipeline_en.html) | Clinical Findings | SNOMED Codes |
| 38 | [snomed_drug_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/03/11/snomed_drug_resolver_pipeline_en.html) | Drug | SNOMED Codes |
| 39 | [snomed_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/03/11/snomed_resolver_pipeline_en.html) | Clinical Findings, Morph Abnormality<br>Clinical Drug, Clinical Drug Form<br> Procedure, Substance<br>Physical Object, Body Structure | SNOMED Codes |
| 40 | [icd10gm_resolver_pipeline](https://nlp.johnsnowlabs.com/2023/07/01/icd10gm_resolver_pipeline_de.html) | Clinical Findings | ICD-10-GM |
| 41 | [meddra_llt_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/03/26/meddra_llt_resolver_pipeline_en.html) | Clinical Findings | MedDRA LLT |
| 42 | [meddra_pt_resolver_pipeline](https://nlp.johnsnowlabs.com/2024/03/26/meddra_pt_resolver_pipeline_en.html) | Clinical Findings | MedDRA PT |


**icd10cm_resolver_pipeline**

This pretrained pipeline extracts clinical conditions and map them to their corresponding ICD-10-CM codes. You’ll just feed your text and it will detect the related entities and return their corresponding ICD-10-CM codes.

In [79]:
resolver_pipeline = PretrainedPipeline("icd10cm_resolver_pipeline", "en", "clinical/models")

text = """A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years and anisakiasis. Also, it was reported that fetal and neonatal hemorrhage"""

result = resolver_pipeline.fullAnnotate(text)[0]

icd10cm_resolver_pipeline download started this may take some time.
Approx size to download 2.4 GB
[OK!]


In [82]:
result.keys()

dict_keys(['document', 'icd10cm', 'word_embeddings', 'jsl_ner_chunk', 'sentence_embeddings', 'jsl_ner', 'resolver_code', 'icd10cm_mapper', 'clinical_ner', 'token', 'doc_chunk', 'clinical_ner_chunk', 'chunks_fail', 'sentence', 'icd10cm_ner_chunk'])

In [83]:
import pandas as pd
chunks=[]
entities=[]
icd10cm_code=[]


for n,m in zip(result['icd10cm_ner_chunk'], result['icd10cm']):

    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    icd10cm_code.append(m.result)


df = pd.DataFrame({'chunks':chunks,
                   'entities':entities,
                   'icd10cm_code':icd10cm_code})

df

Unnamed: 0,chunks,entities,icd10cm_code
0,gestational diabetes mellitus,PROBLEM,O24.4
1,anisakiasis,PROBLEM,B81.0
2,fetal and neonatal hemorrhage,PROBLEM,P54.5


## Oncology Pipelines

**Oncology Pretrained Pipeline List:**

| index | model | index | model |
|------:|:------|------:|:------|
| 1 | [oncology_biomarker_pipeline](https://nlp.johnsnowlabs.com/2022/12/01/oncology_biomarker_pipeline_en.html) | 2 | [oncology_general_pipeline](https://nlp.johnsnowlabs.com/2022/12/01/oncology_general_pipeline_en.html) |
| 3 | [oncology_therapy_pipeline](https://nlp.johnsnowlabs.com/2022/12/01/oncology_therapy_pipeline_en.html) | 4 | [oncology_diagnosis_pipeline](https://nlp.johnsnowlabs.com/2022/12/01/oncology_diagnosis_pipeline_en.html) |
| 5 | [explain_clinical_doc_oncology](https://nlp.johnsnowlabs.com/2024/01/18/explain_clinical_doc_oncology_en.html) |   |   |

In [None]:
oncology_pipeline = PretrainedPipeline("oncology_biomarker_pipeline", "en", "clinical/models")

In [114]:
oncology_pipeline.model.stages

[DocumentAssembler_eb57bf5cf30e,
 SentenceDetectorDLModel_6bafc4746ea5,
 REGEX_TOKENIZER_680255179420,
 WORD_EMBEDDINGS_MODEL_9004b1d00302,
 MedicalNerModel_8c59079bd37d,
 NER_CONVERTER_d79f163a7a32,
 MedicalNerModel_9fb8ec89af4a,
 NER_CONVERTER_fa7b541b0591,
 MedicalNerModel_74c49312f388,
 NER_CONVERTER_3a4f3eafbd08,
 MedicalNerModel_299a97740594,
 NER_CONVERTER_41fb99c5a464,
 ENTITY_EXTRACTOR_97d5ccc4aacb,
 MERGE_50eacd2b5f25,
 MERGE_1c1f6694a68e,
 ASSERTION_DL_d9d32f5f411d,
 ChunkFilterer_c0760fd4f0e6,
 ASSERTION_DL_163867728788,
 AssertionMerger_2841f135a8d0,
 POS_6f55785005bf,
 dependency_e7755462ba78,
 RelationExtractionModel_68ebe11369b6,
 RelationExtractionModel_513eb6317779,
 AnnotationMerger_8e6af0218865]

In [115]:
text = """Immunohistochemistry was negative for thyroid transcription factor-1 and napsin A. The test was positive for ER and PR, and negative for HER2."""

result = oncology_pipeline.fullAnnotate(text)[0]

result.keys()

dict_keys(['ner_oncology_biomarker_chunk', 'cancer_dx', 'assertion_oncology', 'ner_biomarker_chunk', 'ner_oncology', 'document', 'merged_chunk', 'ner_biomarker', 're_oncology_granular', 'ner_oncology_biomarker', 'ner_oncology_test_chunk', 're_oncology_biomarker_result', 'all_relations', 'ner_oncology_test', 'ner_oncology_chunk', 'token', 'assertion_chunk_test', 'embeddings', 'pos_tags', 'assertion_chunk_oncology', 'assertion_merger', 'dependencies', 'assertion_oncology_test_binary', 'sentence'])

**NER Results**

In [116]:
chunks=[]
entities=[]
begins=[]
ends=[]
confidence=[]
for n in result['merged_chunk']:

    chunks.append(n.result)
    begins.append(n.begin)
    ends.append(n.end)
    entities.append(n.metadata['entity'])
    confidence.append(n.metadata['confidence'])

df = pd.DataFrame({'chunks':chunks, 'begin':begins, 'end':ends, 'entities':entities, 'confidence':confidence})

df

Unnamed: 0,chunks,begin,end,entities,confidence
0,Immunohistochemistry,0,19,Pathology_Test,0.9986
1,negative,25,32,Biomarker_Result,0.9933
2,thyroid transcription factor-1,38,67,Biomarker,0.924675
3,napsin A,73,80,Biomarker,0.9865
4,positive,96,103,Biomarker_Result,0.9952
5,ER,109,110,Biomarker,0.9985
6,PR,116,117,Biomarker,0.9941
7,negative,124,131,Biomarker_Result,0.9985
8,HER2,137,140,Oncogene,0.9996


**Assertion Status Results**

In [124]:
chunks=[]
entities=[]
status=[]
confidence=[]

for n, m in zip(result['merged_chunk'], result['assertion_merger']):
    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    status.append(m.result)
    confidence.append(m.metadata['confidence'])

df = pd.DataFrame({'chunks':chunks, 'entities':entities, 'assertion':status, 'confidence':confidence})

df

Unnamed: 0,chunks,entities,assertion,confidence
0,Immunohistochemistry,Pathology_Test,Past,1.0
1,negative,Biomarker_Result,Past,0.9908
2,thyroid transcription factor-1,Biomarker,Present,0.999
3,napsin A,Biomarker,Present,0.9999
4,positive,Biomarker_Result,Present,0.9999
5,ER,Biomarker,Present,0.9999
6,PR,Biomarker,Present,0.9999
7,negative,Biomarker_Result,Present,0.9998
8,HER2,Oncogene,Present,0.9999


**Relation Extraction Results**

In [104]:
result = oncology_pipeline.fullAnnotate(text)

rel_df = get_relations_df(result, 're_oncology_granular')

rel_df[rel_df.relation!= "O"]

Unnamed: 0,relation,entity1,entity1_begin,entity1_end,chunk1,entity2,entity2_begin,entity2_end,chunk2,confidence
1,is_finding_of,Biomarker_Result,25,32,negative,Biomarker,38,67,thyroid transcription factor-1,0.923983
2,is_finding_of,Biomarker_Result,25,32,negative,Biomarker,73,80,napsin A,0.905294
3,is_finding_of,Biomarker_Result,96,103,positive,Biomarker,109,110,ER,0.923827
4,is_finding_of,Biomarker_Result,96,103,positive,Biomarker,116,117,PR,0.871721
5,is_finding_of,Biomarker_Result,96,103,positive,Oncogene,137,140,HER2,0.672547
8,is_finding_of,Biomarker_Result,124,131,negative,Oncogene,137,140,HER2,0.922905


In [106]:
result[0]['re_oncology_granular']

[Annotation(category, 0, 32, O, {'chunk2': 'negative', 'confidence': '0.7320461', 'entity2_end': '32', 'chunk1': 'Immunohistochemistry', 'entity1': 'Pathology_Test', 'entity2_begin': '25', 'chunk2_confidence': '0.9933', 'entity1_begin': '0', 'sentence': '0', 'direction': 'both', 'entity1_end': '19', 'entity2': 'Biomarker_Result', 'chunk1_confidence': '0.9986'}, []),
 Annotation(category, 25, 67, is_finding_of, {'chunk2': 'thyroid transcription factor-1', 'confidence': '0.9239829', 'entity2_end': '67', 'chunk1': 'negative', 'entity1': 'Biomarker_Result', 'entity2_begin': '38', 'chunk2_confidence': '0.924675', 'entity1_begin': '25', 'sentence': '0', 'direction': 'both', 'entity1_end': '32', 'entity2': 'Biomarker', 'chunk1_confidence': '0.9933'}, []),
 Annotation(category, 25, 80, is_finding_of, {'chunk2': 'napsin A', 'confidence': '0.905294', 'entity2_end': '80', 'chunk1': 'negative', 'entity1': 'Biomarker_Result', 'entity2_begin': '73', 'chunk2_confidence': '0.9865', 'entity1_begin': '2