![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/27.Oncology_Model.ipynb)

# **ONCOLOGY MODELS**

This notebook includes details about different kinds of pretrained models to extract oncology-related information from clinical texts, together with examples of each type of model.

## Setup

In [None]:
import json
import os

from google.colab import files

license_keys = files.upload()

with open(list(license_keys.keys())[0]) as f:
    license_keys = json.load(f)

locals().update(license_keys)

os.environ.update(license_keys)

In [None]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.1.2 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

# Installing Spark NLP Display Library for visualization
! pip install -q spark-nlp-display

In [3]:
import sparknlp
import sparknlp_jsl

from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp_jsl.annotator import *
from sparknlp_jsl.pretrained import InternalResourceDownloader

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.ml import Pipeline,PipelineModel

import pandas as pd
pd.set_option('display.max_colwidth', 200)

import warnings
warnings.filterwarnings('ignore')

params = {"spark.driver.memory":"16G",
          "spark.kryoserializer.buffer.max":"2000M",
          "spark.driver.maxResultSize":"2000M"}

print("Spark NLP Version :", sparknlp.version())
print("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark = sparknlp_jsl.start(license_keys['SECRET'],params=params)

spark

Spark NLP Version : 4.4.4
Spark NLP_JSL Version : 4.4.4


## **List of Pretrained Models**

In [None]:
df = pd.DataFrame()
for model_type in ['MedicalNerModel', 'BertForTokenClassification', 'RelationExtractionModel', 'RelationExtractionDLModel', 'AssertionDLModel']:
    model_list = sorted(list(set([model[0] for model in InternalResourceDownloader.returnPrivateModels(model_type) if 'oncology' in model[0]])))
    if len(model_list) > 0:
      if model_type == "MedicalNerModel":
        model_list = list(filter(lambda x: "wip" not in x, model_list))
      df = pd.concat([df, pd.DataFrame(model_list, columns = [model_type])], axis = 1)

df.fillna('')

Unnamed: 0,MedicalNerModel,RelationExtractionModel,RelationExtractionDLModel,AssertionDLModel
0,ner_oncology,re_oncology_biomarker_result_wip,redl_oncology_biobert_wip,assertion_oncology_demographic_binary_wip
1,ner_oncology_anatomy_general,re_oncology_granular_wip,redl_oncology_biomarker_result_biobert_wip,assertion_oncology_family_history_wip
2,ner_oncology_anatomy_general_healthcare,re_oncology_location_wip,redl_oncology_granular_biobert_wip,assertion_oncology_problem_wip
3,ner_oncology_anatomy_granular,re_oncology_size_wip,redl_oncology_location_biobert_wip,assertion_oncology_response_to_treatment_wip
4,ner_oncology_biomarker,re_oncology_temporal_wip,redl_oncology_size_biobert_wip,assertion_oncology_smoking_status_wip
5,ner_oncology_biomarker_healthcare,re_oncology_test_result_wip,redl_oncology_temporal_biobert_wip,assertion_oncology_test_binary_wip
6,ner_oncology_demographics,re_oncology_wip,redl_oncology_test_result_biobert_wip,assertion_oncology_treatment_binary_wip
7,ner_oncology_diagnosis,,,assertion_oncology_wip
8,ner_oncology_emb_clinical_large,,,
9,ner_oncology_emb_clinical_medium,,,


**Medidical NER Models and labels**

<br>


**labels**                 | **description**                                                                                                                                                                                                                        | **ner_oncology** | **ner_oncology_anatomy_general** | **ner_oncology_anatomy_general_healthcare** | **ner_oncology_anatomy_granular** | **ner_oncology_biomarker** | **ner_oncology_biomarker_healthcare** | **ner_oncology_demographics** | **ner_oncology_diagnosis** | **ner_oncology_emb_clinical_large** | **ner_oncology_emb_clinical_medium** | **ner_oncology_limited_80p_for_benchmarks** | **ner_oncology_posology** | **ner_oncology_response_to_treatment** | **ner_oncology_test** | **ner_oncology_therapy** | **ner_oncology_tnm** | **ner_oncology_unspecific_posology** | **ner_oncology_unspecific_posology_healthcare**
:-------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------:|:--------------------------------:|:-------------------------------------------:|:---------------------------------:|:--------------------------:|:-------------------------------------:|:-----------------------------:|:--------------------------:|:-----------------------------------:|:------------------------------------:|:-------------------------------------------:|:-------------------------:|:--------------------------------------:|:---------------------:|:------------------------:|:--------------------:|:------------------------------------:|:-----------------------------------------------:
 **Adenopathy**            | Mentions of pathological findings of the lymph nodes.                                                                                                                                                                                  | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Age**                   | All mention of ages, past or present, related to the patient or with anybody else.                                                                                                                                                     | X                |                                  |                                             |                                   |                            |                                       | X                             |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Biomarker**             | Biological molecules that indicate the presence or absence of cancer, or the type of cancer. Oncogenes are excluded from this category.                                                                                                | X                |                                  |                                             |                                   | X                          | X                                     |                               |                            | X                                   | X                                    | X                                           |                           |                                        | X                     |                          |                      |                                      |                                                 
 **Biomarker_Result**      | Terms or values that are identified as the result of a biomarkers.                                                                                                                                                                     | X                |                                  |                                             |                                   | X                          | X                                     |                               |                            | X                                   | X                                    | X                                           |                           |                                        | X                     |                          |                      |                                      |                                                 
 **Cancer_Dx**             | Mentions of cancer diagnoses (such as “breast cancer”) or pathological types that are usually used as synonyms for “cancer” (e.g. “carcinoma”). <br> When anatomical references are present, they are included in the Cancer_Dx extraction. | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          | X                    |                                      |                                                 
 **Cancer_Score**          | Clinical or imaging scores that are specific for cancer settings (e.g. “BI-RADS” or “Allred score”).                                                                                                                                   | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Cancer_Surgery**        | Terms that indicate surgery as a form of cancer treatment.                                                                                                                                                                             | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Chemotherapy**          | Mentions of chemotherapy drugs, or unspecific words such as “chemotherapy”.                                                                                                                                                            | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       | X                        |                      |                                      |                                                 
 **Cycle_Count**           | The total number of cycles being administered of an oncological therapy (e.g. “5 cycles”).                                                                                                                                             | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Cycle_Day**             | References to the day of the cycle of oncological therapy (e.g. “day 5”).                                                                                                                                                              | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Cycle_Number**          | The number of the cycle of an oncological therapy that is being applied (e.g. “third cycle”).                                                                                                                                          | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Date**                  | Mentions of exact dates, in any format, including day number, month and/or year.                                                                                                                                                       | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Death_Entity**          | Words that indicate the death of the patient or someone else (including family members), such as “died” or “passed away”.                                                                                                              | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Direction**             | Directional and laterality terms, such as “left”, “right”, “bilateral”, “upper” and “lower”.                                                                                                                                           | X                | X                                | X                                           | X                                 |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Dosage**                | The quantity prescribed by the physician for an active ingredient.                                                                                                                                                                     | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Duration**              | Words indicating the duration of a treatment (e.g. “for 2 weeks”).                                                                                                                                                                     | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Frequency**             | Words indicating the frequency of treatment administration (e.g. “daily” or “bid”).                                                                                                                                                    | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Gender**                | Gender-specific nouns and pronouns (including words such as “him” or “she”, and family members such as “father”).                                                                                                                      | X                |                                  |                                             |                                   |                            |                                       | X                             |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Grade**                 | All pathological grading of tumors (e.g. “grade 1”) or degrees of cellular differentiation (e.g. “well-differentiated”)                                                                                                                | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Histological_Type**     | Histological variants or cancer subtypes, such as “papillary”, “clear cell” or “medullary”.                                                                                                                                            | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Hormonal_Therapy**      | Mentions of hormonal drugs used to treat cancer, or unspecific words such as “hormonal therapy”.                                                                                                                                       | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       | X                        |                      |                                      |                                                 
 **Imaging_Test**          | Imaging tests mentioned in texts, such as “chest CT scan”.                                                                                                                                                                             | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        | X                     |                          |                      |                                      |                                                 
 **Immunotherapy**         | Mentions of immunotherapy drugs, or unspecific words such as “immunotherapy”.                                                                                                                                                          | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       | X                        |                      |                                      |                                                 
 **Invasion**              | Mentions that refer to tumor invasion, such as “invasion” or “involvement”. Metastases or lymph node involvement are excluded from this category.                                                                                      | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Line_Of_Therapy**       | Explicit references to the line of therapy of an oncological therapy (e.g. “first-line treatment”).                                                                                                                                    | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           | X                                      |                       | X                        |                      |                                      |                                                 
 **Metastasis**            | Terms that indicate a metastatic disease. Anatomical references are not included in these extractions.                                                                                                                                 | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          | X                    |                                      |                                                 
 **Oncogene**              | Mentions of genes that are implicated in the etiology of cancer.                                                                                                                                                                       | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        | X                     |                          |                      |                                      |                                                 
 **Pathology_Result**      | The findings of a biopsy from the pathology report that is not covered by another entity (e.g. “malignant ductal cells”).                                                                                                              | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Pathology_Test**        | Mentions of biopsies or tests that use tissue samples.                                                                                                                                                                                 | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        | X                     |                          |                      |                                      |                                                 
 **Performance_Status**    | Mentions of performance status scores, such as ECOG and Karnofsky. The name of the score is extracted together with the result (e.g. “ECOG performance status of 4”).                                                                  | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Race_Ethnicity**        | The race and ethnicity categories include racial and national origin or sociocultural groups.                                                                                                                                          | X                |                                  |                                             |                                   |                            |                                       | X                             |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Radiotherapy**          | Terms that indicate the use of Radiotherapy.                                                                                                                                                                                           | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Response_To_Treatment** | Terms related to clinical progress of the patient related to cancer treatment, including “recurrence”, “bad response” or “improvement”.                                                                                                | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           | X                                      |                       | X                        |                      |                                      |                                                 
 **Relative_Date**         | Temporal references that are relative to the date of the text or to any other specific date (e.g. “yesterday” or “three years later”).                                                                                                 | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Route**                 | Words indicating the type of administration route (such as “PO” or “transdermal”).                                                                                                                                                     | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Site_Bone**             | Anatomical terms that refer to the human skeleton.                                                                                                                                                                                     | X                |                                  |                                             | X                                 |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Site_Brain**            | Anatomical terms that refer to the central nervous system (including the brain stem and the cerebellum).                                                                                                                               | X                |                                  |                                             | X                                 |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Site_Breast**           | Anatomical terms that refer to the breasts.                                                                                                                                                                                            | X                |                                  |                                             | X                                 |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Site_Liver**            | Anatomical terms that refer to the liver.                                                                                                                                                                                              | X                |                                  |                                             | X                                 |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Site_Lung**             | Anatomical terms that refer to the lungs.                                                                                                                                                                                              | X                |                                  |                                             | X                                 |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Site_Lymph_Node**       | Anatomical terms that refer to lymph nodes, excluding adenopathies.                                                                                                                                                                    | X                |                                  |                                             | X                                 |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Site_Other_Body_Part**  | Relevant anatomical terms that are not included in the rest of the anatomical entities.                                                                                                                                                | X                |                                  |                                             | X                                 |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Smoking_Status**        | All mentions of smoking related to the patient or to someone else.                                                                                                                                                                     | X                |                                  |                                             |                                   |                            |                                       | X                             |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Staging**               | Mentions of cancer stage such as “stage 2b” or “T2N1M0”. It also includes words such as “in situ”, “early-stage” or “advanced”.                                                                                                        | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          | X                    |                                      |                                                 
 **Targeted_Therapy**      | Mentions of targeted therapy drugs, or unspecific words such as “targeted therapy”.                                                                                                                                                    | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       | X                        |                      |                                      |                                                 
 **Tumor_Finding**         | All nonspecific terms that may be related to tumors, either malignant or benign (for example: “mass”, “tumor”, “lesion”, or “neoplasm”).                                                                                               | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Tumor_Size**            | Size of the tumor, including numerical value and unit of measurement (e.g. “3 cm”).                                                                                                                                                    | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Unspecific_Therapy**    | Terms that indicate a known cancer therapy but that is not specific to any other therapy entity (e.g. “chemoradiotherapy” or “adjuvant therapy”).                                                                                      | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       | X                        |                      |                                      |                                                 
 **Radiation_Dose**        | Dose used in radiotherapy.                                                                                                                                                                                                             | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Anatomical_Site**       | Relevant anatomical terms mentioned in text.                                                                                                                                                                                           |                  | X                                | X                                           |                                   |                            |                                       |                               |                            |                                     |                                      |                                             |                           |                                        |                       |                          |                      |                                      |                                                 
 **Cancer_Therapy**        | Mentions of cancer treatments, including chemotherapy, radiotherapy, surgery and other.                                                                                                                                                |                  |                                  |                                             |                                   |                            |                                       |                               |                            |                                     |                                      |                                             | X                         |                                        |                       |                          |                      | X                                    | X                                               
 **Size_Trend**            | Terms related to the changes in the size of the tumor (such as “growth” or “reduced in size”).                                                                                                                                         |                  |                                  |                                             |                                   |                            |                                       |                               |                            |                                     |                                      |                                             |                           | X                                      |                       |                          |                      |                                      |                                                 
 **Lymph_Node**            | Mentions of lymph nodes and pathological findings of the lymph nodes.                                                                                                                                                                  |                  |                                  |                                             |                                   |                            |                                       |                               |                            |                                     |                                      |                                             |                           |                                        |                       |                          | X                    |                                      |                                                 
 **Tumor_Description**     | Information related to tumor characteristics, such as size, presence of invasion, grade and hystological type.                                                                                                                         |                  |                                  |                                             |                                   |                            |                                       |                               |                            |                                     |                                      |                                             |                           |                                        |                       |                          | X                    |                                      |                                                 
 **Tumor**                 | All nonspecific terms that may be related to tumors, either malignant or benign (for example: “mass”, “tumor”, “lesion”, or “neoplasm”).                                                                                               |                  |                                  |                                             |                                   |                            |                                       |                               |                            |                                     |                                      |                                             |                           |                                        |                       |                          | X                    |                                      |                                                 
 **Lymph_Node_Modifier**   | Words that refer to a lymph node being abnormal (such as “enlargement”).                                                                                                                                                               |                  |                                  |                                             |                                   |                            |                                       |                               |                            |                                     |                                      |                                             |                           |                                        |                       |                          | X                    |                                      |                                                 
 **Posology_Information**  | Terms related to the posology of the treatment, including duration, frequencies and dosage.                                                                                                                                            |                  |                                  |                                             |                                   |                            |                                       |                               |                            |                                     |                                      |                                             |                           |                                        |                       |                          |                      | X                                    | X                                               



**Assertion Models and labels**

<br>

| **labels**                 | **assertion_oncology_wip** | **assertion_oncology_demographic_binary_wip** | **assertion_oncology_family_history_wip** | **assertion_oncology_problem_wip** | **assertion_oncology_response_to_treatment_wip** | **assertion_oncology_smoking_status_wip** | **assertion_oncology_test_binary_wip** | **assertion_oncology_treatment_binary_wip** |
|:--------------------------|:--------------------------:|:---------------------------------------------:|:-----------------------------------------:|:----------------------------------:|:------------------------------------------------:|:-----------------------------------------:|:--------------------------------------:|:-------------------------------------------:|
| **Present**                | X                          |                                               |                                           |                                    |                                                  | X                                         |                                        |                                             |
| **Past**                   | X                          |                                               |                                           |                                    |                                                  | X                                         |                                        |                                             |
| **Present_Or_Past**        |                            |                                               |                                           |                                    | X                                                |                                           |                                        | X                                           |
| **Absent**                 | X                          |                                               |                                           |                                    |                                                  | X                                         |                                        |                                             |
| **Someone_Else**           |                            | X                                             |                                           |                                    |                                                  |                                           |                                        |                                             |
| **Family**                 | X                          |                                               |                                           |                                    |                                                  |                                           |                                        |                                             |
| **Family_History**         |                            |                                               | X                                         | X                                  |                                                  |                                           | X                                      |                                             |
| **Hypothetical**           | X                          |                                               |                                           |                                    |                                                  |                                           |                                        |                                             |
| **Hypothetical_Or_Absent** |                            |                                               |                                           | X                                  | X                                                |                                           | X                                      | X                                           |
| **Possible**               | X                          |                                               |                                           | X                                  |                                                  |                                           | X                                      |                                             |
| **Patient**                |                            | X                                             |                                           |                                    |                                                  |                                           |                                        |                                             |
| **Medical_History**        |                            |                                               |                                           | X                                  |                                                  |                                           | X                                      |                                             |
| **Other**                  |                            |                                               | X                                         |                                    |                                                  |                                           |                                        |                                             |


## NER Models

The NER models from the list include different entity groups and levels of granularity. If you want to extract as much information as possible from oncology texts, then ner_oncology is the best option for you, as it is the most general and granular model. But you may want to use other models depending on your needs (for instance, if you need to extract information related with staging, ner_oncology_tnm would be the most suitable model).

In [None]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")\
    .setSplitChars(["-", "\/"])

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")\
    .setInputCols(["sentence","token"])\
    .setOutputCol("embeddings")

# ner_oncology

ner_oncology = MedicalNerModel.pretrained("ner_oncology","en","clinical/models")\
    .setInputCols(["sentence","token","embeddings"])\
    .setOutputCol("ner_oncology")\

ner_oncology_converter = NerConverterInternal()\
    .setInputCols(["sentence","token","ner_oncology"])\
    .setOutputCol("ner_oncology_chunk")

# ner_oncology_tnm

ner_oncology_tnm = MedicalNerModel.pretrained("ner_oncology_tnm","en","clinical/models")\
    .setInputCols(["sentence","token","embeddings"])\
    .setOutputCol("ner_oncology_tnm")\

ner_oncology_tnm_converter = NerConverterInternal()\
    .setInputCols(["sentence","token","ner_oncology_tnm"])\
    .setOutputCol("ner_oncology_tnm_chunk")

# # ner_oncology_biomarker

ner_oncology_biomarker = MedicalNerModel.pretrained("ner_oncology_biomarker","en","clinical/models")\
    .setInputCols(["sentence","token","embeddings"])\
    .setOutputCol("ner_oncology_biomarker")\

ner_oncology_biomarker_converter = NerConverterInternal()\
    .setInputCols(["sentence","token","ner_oncology_biomarker"])\
    .setOutputCol("ner_oncology_biomarker_chunk")

ner_stages = [document_assembler,
    sentence_detector,
    tokenizer,
    word_embeddings,
    ner_oncology,
    ner_oncology_converter,
    ner_oncology_tnm,
    ner_oncology_tnm_converter,
    ner_oncology_biomarker,
    ner_oncology_biomarker_converter]

ner_pipeline = Pipeline(stages=ner_stages)

empty_data = spark.createDataFrame([[""]]).toDF("text")

ner_model = ner_pipeline.fit(empty_data)

sentence_detector_dl_healthcare download started this may take some time.
Approximate size to download 367.3 KB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_oncology download started this may take some time.
[OK!]
ner_oncology_tnm download started this may take some time.
[OK!]
ner_oncology_biomarker download started this may take some time.
[OK!]


In [None]:
ner_oncology_labels = sorted(list(set([label.split('-')[-1] for label in ner_oncology.getClasses() if label != 'O'])))

len(ner_oncology_labels)

49

In [None]:
label_df = pd.DataFrame()
for column in range((len(ner_oncology_labels)//10)+1):
  label_df = pd.concat([label_df, pd.DataFrame(ner_oncology_labels, columns = [''])[column*10:(column+1)*10].reset_index(drop= True)], axis = 1)

label_df.fillna('')

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5
0,Adenopathy,Cycle_Number,Hormonal_Therapy,Race_Ethnicity,Site_Lung
1,Age,Date,Imaging_Test,Radiation_Dose,Site_Lymph_Node
2,Biomarker,Death_Entity,Immunotherapy,Radiotherapy,Site_Other_Body_Part
3,Biomarker_Result,Direction,Invasion,Relative_Date,Smoking_Status
4,Cancer_Dx,Dosage,Line_Of_Therapy,Response_To_Treatment,Staging
5,Cancer_Score,Duration,Metastasis,Route,Targeted_Therapy
6,Cancer_Surgery,Frequency,Oncogene,Site_Bone,Tumor_Finding
7,Chemotherapy,Gender,Pathology_Result,Site_Brain,Tumor_Size
8,Cycle_Count,Grade,Pathology_Test,Site_Breast,Unspecific_Therapy
9,Cycle_Day,Histological_Type,Performance_Status,Site_Liver,


In [None]:
ner_oncology_tnm_labels = sorted(list(set([label.split('-')[-1] for label in ner_oncology_tnm.getClasses() if label != 'O'])))

print(ner_oncology_tnm_labels)

['Cancer_Dx', 'Lymph_Node', 'Lymph_Node_Modifier', 'Metastasis', 'Staging', 'Tumor', 'Tumor_Description']


In [None]:
ner_oncology_biomarker_labels = sorted(list(set([label.split('-')[-1] for label in ner_oncology_biomarker.getClasses() if label != 'O'])))

print(ner_oncology_biomarker_labels)

['Biomarker', 'Biomarker_Result']


In [None]:
sample_text_1 = '''A 65-year-old woman had a history of debulking surgery, bilateral oophorectomy with omentectomy, total anterior hysterectomy with radical pelvic lymph nodes dissection due to ovarian carcinoma (mucinous-type carcinoma, stage Ic) 1 year ago. Patient's medical compliance was poor and failed to complete her chemotherapy (cyclophosphamide 750 mg/m2, carboplatin 300 mg/m2). Recently, she noted a palpable right breast mass, 15 cm in size which nearly occupied the whole right breast in 2 months. Core needle biopsy revealed metaplastic carcinoma. Neoadjuvant chemotherapy with the regimens of Taxotere (75 mg/m2), Epirubicin (75 mg/m2), and Cyclophosphamide (500 mg/m2) was given for 6 cycles with poor response, followed by a modified radical mastectomy (MRM) with dissection of axillary lymph nodes and skin grafting. Postoperatively, radiotherapy was done with 5000 cGy in 25 fractions. The histopathologic examination revealed a metaplastic carcinoma with squamous differentiation associated with adenomyoepithelioma. Immunohistochemistry study showed that the tumor cells are positive for epithelial markers-cytokeratin (AE1/AE3) stain, and myoepithelial markers, including cytokeratin 5/6 (CK 5/6), p63, and S100 stains. Expressions of hormone receptors, including ER, PR, and Her-2/Neu, were all negative. The dissected axillary lymph nodes showed metastastic carcinoma with negative hormone receptors in 3 nodes. The patient was staged as pT3N1aM0, with histologic tumor grade III.'''

sample_text_2 = '''She underwent a computed tomography (CT) scan of the abdomen and pelvis, which showed a complex ovarian mass. A Pap smear performed one month later was positive for atypical glandular cells suspicious for adenocarcinoma. The pathologic specimen showed extension of the tumor throughout the fallopian tubes, appendix, omentum, and 5 out of 5 enlarged lymph nodes. The final pathologic diagnosis of the tumor was stage IIIC papillary serous ovarian adenocarcinoma. Two months later, the patient was diagnosed with lung metastases.'''

sample_text_3 = '''In the bone- marrow (BM) aspiration, blasts accounted for 88.1% of ANCs, which were positive for CD9, CD10, CD13, CD19, CD20, CD34, CD38, CD58, CD66c, CD123, HLA-DR, cCD79a, and TdT on flow cytometry.

Measurements of serum tumor markers showed elevated level of cytokeratin 19 fragment (Cyfra21-1: 4.77 ng/mL), neuron-specific enolase (NSE: 19.60 ng/mL), and squamous cell carcinoma antigen (SCCA: 2.58 ng/mL). The results were negative for serum carbohydrate antigen 125 (CA125), carcinoembryonic antigen (CEA) and vascular endothelial growth factor (VEGF). Immunohistochemical staining showed positive staining for CK5/6, P40 and PD-L1 (+ 80% tumor cells), and negative staining for TTF-1, PD-1 and weakly positive staining for ALK. Molecular analysis indicated no EGFR mutation or ROS1 fusion.'''

In [None]:
data = spark.createDataFrame(pd.DataFrame([sample_text_1, sample_text_2, sample_text_3], columns = ['text']))

In [None]:
results = ner_model.transform(data).collect()

In [None]:
from sparknlp_display import NerVisualizer

visualiser = NerVisualizer()

In [None]:
from google.colab import widgets

t = widgets.TabBar(["ner_oncology_biomarker", "ner_oncology_tnm", "ner_oncology"])

with t.output_to(0):
    visualiser.display(results[2], label_col='ner_oncology_biomarker_chunk')

with t.output_to(1):
    visualiser.display(results[1], label_col='ner_oncology_tnm_chunk')

with t.output_to(2):
    visualiser.display(results[0], label_col='ner_oncology_chunk')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Relation Extraction Models

RE Models are used to link entities that are related. For oncology entities, you can use general models (such as re_oncology_granular_wip) or you can select a specific model depending on your needs (e.g. re_oncology_size_wip to link tumors and their sizes, or re_oncology_biomarker_result_wip to link biomarkers and their results).

In [None]:
pos_tagger = PerceptronModel.pretrained("pos_clinical", "en", "clinical/models") \
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("pos_tags")

dependency_parser = DependencyParserModel.pretrained("dependency_conllu", "en") \
    .setInputCols(["sentence", "pos_tags", "token"]) \
    .setOutputCol("dependencies")

re_oncology_granular_wip = RelationExtractionModel.pretrained("re_oncology_granular_wip", "en", "clinical/models") \
    .setInputCols(["embeddings", "pos_tags", "ner_oncology_chunk", "dependencies"]) \
    .setOutputCol("re_oncology_granular_wip") \
    .setRelationPairs(['Date-Cancer_Dx', 'Cancer_Dx-Date', 'Tumor_Finding-Site_Breast', 'Site_Breast-Tumor_Finding',
                       'Relative_Date-Tumor_Finding', 'Tumor_Fiding-Relative_Date', 'Tumor_Finding-Tumor_Size', 'Tumor_Size-Tumor_Finding',
                       'Pathology_Test-Cancer_Dx', 'Cancer_Dx-Pathology_Test']) \
    .setMaxSyntacticDistance(10)

re_oncology_size_wip = RelationExtractionModel.pretrained("re_oncology_size_wip", "en", "clinical/models") \
    .setInputCols(["embeddings", "pos_tags", "ner_oncology_chunk", "dependencies"]) \
    .setOutputCol("re_oncology_size_wip") \
    .setRelationPairs(['Tumor_Finding-Tumor_Size', 'Tumor_Size-Tumor_Finding']) \
    .setMaxSyntacticDistance(10)

re_oncology_biomarker_result_wip = RelationExtractionModel.pretrained("re_oncology_biomarker_result_wip", "en", "clinical/models") \
    .setInputCols(["embeddings", "pos_tags", "ner_oncology_biomarker_chunk", "dependencies"]) \
    .setOutputCol("re_oncology_biomarker_result_wip") \
    .setRelationPairs(['Biomarker-Biomarker_Result', 'Biomarker_Result-Biomarker']) \
    .setMaxSyntacticDistance(10)

re_stages = ner_stages + [pos_tagger, dependency_parser, re_oncology_granular_wip, re_oncology_size_wip, re_oncology_biomarker_result_wip]

re_pipeline = Pipeline(stages=re_stages)

re_model = re_pipeline.fit(empty_data)

pos_clinical download started this may take some time.
Approximate size to download 1.5 MB
[OK!]
dependency_conllu download started this may take some time.
Approximate size to download 16.7 MB
[OK!]
re_oncology_granular_wip download started this may take some time.
[OK!]
re_oncology_size_wip download started this may take some time.
[OK!]
re_oncology_biomarker_result_wip download started this may take some time.
[OK!]


In [None]:
sample_text_4 = '''Two years ago, she noted a palpable right breast mass, 15 cm in size. Core needle biopsy revealed metaplastic carcinoma.'''

sample_text_5 = '''The patient presented a 2 cm mass in her left breast, and the tumor in her other breast was 3 cm long.'''

sample_text_6 = '''Immunohistochemical staining showed positive staining for CK5/6, P40 and PD-L1, and negative staining for TTF-1, PD-1 and weakly positive staining for ALK. Immunohistochemistry study showed that the tumor cells are positive for epithelial markers-cytokeratin and myoepithelial markers, including cytokeratin 5/6, p63, and S100 stains.'''

In [None]:
re_data = spark.createDataFrame(pd.DataFrame([sample_text_4, sample_text_5, sample_text_6], columns = ['text']))

In [None]:
re_results = re_model.transform(re_data).collect()

In [None]:
from sparknlp_display import RelationExtractionVisualizer

re_visualiser = RelationExtractionVisualizer()

In [None]:
re_t = widgets.TabBar(["re_oncology_biomarker_result_wip", "re_oncology_size_wip", "re_oncology_granular_wip"])

with re_t.output_to(0):
    re_visualiser.display(re_results[2], relation_col='re_oncology_biomarker_result_wip')

with re_t.output_to(1):
    re_visualiser.display(re_results[1], relation_col='re_oncology_size_wip')

with re_t.output_to(2):
    re_visualiser.display(re_results[0], relation_col='re_oncology_granular_wip')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Assertion Status Models

With assertion status models, you will be able to identify if entities included in texts are mentioned as something present, absent, hypothetical, possible, etc. You can either try using the general assertion_oncology_wip model, or other models that are recommended for specific entity groups (such as assertion_oncology_problem_wip, which should be used for problem entities like Cancer_Dx or Metastasis).

In [None]:
assertion_oncology_wip = AssertionDLModel.pretrained("assertion_oncology_wip", "en", "clinical/models") \
    .setInputCols(["sentence", 'ner_oncology_chunk', "embeddings"]) \
    .setOutputCol("assertion_oncology_wip")

assertion_oncology_problem_wip = AssertionDLModel.pretrained("assertion_oncology_problem_wip", "en", "clinical/models") \
    .setInputCols(["sentence", 'ner_oncology_tnm_chunk', "embeddings"]) \
    .setOutputCol("assertion_oncology_problem_wip")

assertion_oncology_treatment_binary_wip = AssertionDLModel.pretrained("assertion_oncology_treatment_binary_wip", "en", "clinical/models") \
    .setInputCols(["sentence", 'ner_oncology_chunk', "embeddings"]) \
    .setOutputCol("assertion_oncology_treatment_binary_wip")

assertion_stages = ner_stages + [assertion_oncology_wip, assertion_oncology_problem_wip, assertion_oncology_treatment_binary_wip]

assertion_pipeline = Pipeline(stages=assertion_stages)

assertion_model = assertion_pipeline.fit(empty_data)

assertion_oncology_wip download started this may take some time.
[OK!]
assertion_oncology_problem_wip download started this may take some time.
[OK!]
assertion_oncology_treatment_binary_wip download started this may take some time.
[OK!]


In [None]:
sample_text_7 = 'The patient is suspected to have colorectal cancer. Family history is positive for other cancers. The result of the biopsy was positive. A CT scan was ordered to rule out metastases.'

sample_text_8 = 'The patient was diagnosed with breast cancer. She was suspected to have metastases in her lungs. Her family history is positive for ovarian cancer.'

sample_text_9 = 'The patient underwent a mastectomy. We recommend to start radiotherapy. The patient refused to chemotherapy.'

In [None]:
assertion_data = spark.createDataFrame(pd.DataFrame([sample_text_7, sample_text_8, sample_text_9], columns = ['text']))

In [None]:
assertion_results = assertion_model.transform(assertion_data).collect()

In [None]:
from sparknlp_display import AssertionVisualizer

assertion_visualiser = AssertionVisualizer()

In [None]:
assertion_t = widgets.TabBar(["assertion_oncology_treatment_binary_wip", "assertion_oncology_problem_wip", "assertion_oncology_wip"])

with assertion_t.output_to(0):
    assertion_visualiser.display(assertion_results[2], label_col ='ner_oncology_chunk', assertion_col='assertion_oncology_treatment_binary_wip')

with assertion_t.output_to(1):
    assertion_visualiser.display(assertion_results[1], label_col ='ner_oncology_tnm_chunk', assertion_col='assertion_oncology_problem_wip')

with assertion_t.output_to(2):
    assertion_visualiser.display(assertion_results[0], label_col ='ner_oncology_chunk', assertion_col='assertion_oncology_wip')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Pretrained NER Profiling Pipelines

We can use pretrained NER profiling pipelines for exploring all the available pretrained NER models at once.

- `ner_profiling_oncology` : Returns results for oncology NER models.

For more examples, please check [this notebook](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/11.2.Pretrained_NER_Profiling_Pipelines.ipynb).





<center><b>NER Profiling Oncology Model List</b>

|| | | |
|--------------|-----------------|-----------------|-----------------|
| ner_oncology_unspecific_posology | ner_oncology_tnm | ner_oncology_therapy | ner_oncology_test |
| ner_oncology_response_to_treatment | ner_oncology_posology | ner_oncology | ner_oncology_limited_80p_for_benchmarks |
| ner_oncology_diagnosis | ner_oncology_demographics | ner_oncology_biomarker | ner_oncology_anatomy_granular | ner_oncology_anatomy_general |



</center>

In [4]:
from sparknlp.pretrained import PretrainedPipeline

oncology_profiling_pipeline = PretrainedPipeline("ner_profiling_oncology", "en", "clinical/models")

ner_profiling_oncology download started this may take some time.
Approx size to download 2 GB
[OK!]


In [5]:
text = """The had previously undergone a left mastectomy and an axillary lymph node dissection for a left breast cancer twenty years ago.
The tumor was positive for ER and PR. Postoperatively, radiotherapy was administered to the residual breast.
The cancer recurred as a right lung metastasis 13 years later. He underwent a regimen consisting of adriamycin (60 mg/m2) and cyclophosphamide (600 mg/m2) over six courses, as first line therapy."""

In [6]:
oncology_result = oncology_profiling_pipeline.fullAnnotate(text)[0]
oncology_result.keys()

dict_keys(['ner_chunk_oncology_limited_80p_for_benchmarks', 'ner_oncology', 'ner_chunk_oncology_anatomy_general', 'document', 'ner_chunk_oncology_test', 'ner_chunk_oncology_tnm', 'ner_chunk_oncology', 'ner_chunk_oncology_therapy', 'ner_oncology_anatomy_general', 'ner_oncology_biomarker', 'ner_chunk_oncology_biomarker', 'ner_oncology_test', 'ner_oncology_response_to_treatment', 'token', 'ner_oncology_anatomy_granular', 'ner_oncology_therapy', 'ner_chunk_oncology_demographics', 'ner_chunk_oncology_response_to_treatment', 'ner_oncology_demographics', 'ner_oncology_diagnosis', 'ner_chunk_oncology_posology', 'ner_oncology_limited_80p_for_benchmarks', 'embeddings', 'ner_oncology_tnm', 'ner_chunk_oncology_diagnosis', 'ner_oncology_unspecific_posology', 'sentence', 'ner_chunk_oncology_unspecific_posology', 'ner_oncology_posology', 'ner_chunk_oncology_anatomy_granular'])

In [9]:
def get_token_results(light_result):

    tokens = [j.result for j in light_result["token"]]
    sentences = [j.metadata["sentence"] for j in light_result["token"]]
    begins = [j.begin for j in light_result["token"]]
    ends = [j.end for j in light_result["token"]]
    model_list = [ a for a in light_result.keys() if (a not in ["sentence", "token"] and "_chunks" not in a)]

    df = pd.DataFrame({'sentence':sentences, 'begin': begins, 'end': ends, 'token':tokens})

    for model_name in model_list:

        temp_df = pd.DataFrame(light_result[model_name])
        temp_df["jsl_label"] = temp_df.iloc[:,0].apply(lambda x : x.result)
        temp_df = temp_df[["jsl_label"]]

        # temp_df = get_ner_result(model_name)
        temp_df.columns = [model_name]
        df = pd.concat([df, temp_df], axis=1)

    # Filter columns to include only sentence, begin, end, token and all columns that start with 'ner_vop'
    filtered_df = df.loc[:, ['sentence', 'begin', 'end', 'token'] + [col for col in df.columns if col.startswith('ner_oncology')]]

    return filtered_df

In [10]:
get_token_results(oncology_result)

Unnamed: 0,sentence,begin,end,token,ner_oncology,ner_oncology_anatomy_general,ner_oncology_biomarker,ner_oncology_test,ner_oncology_response_to_treatment,ner_oncology_anatomy_granular,ner_oncology_therapy,ner_oncology_demographics,ner_oncology_diagnosis,ner_oncology_limited_80p_for_benchmarks,ner_oncology_tnm,ner_oncology_unspecific_posology,ner_oncology_posology
0,0,0,2,The,O,O,O,O,O,O,O,O,O,O,O,O,O
1,0,4,6,had,O,O,O,O,O,O,O,O,O,O,O,O,O
2,0,8,17,previously,O,O,O,O,O,O,O,O,O,O,O,O,O
3,0,19,27,undergone,O,O,O,O,O,O,O,O,O,O,O,O,O
4,0,29,29,a,O,O,O,O,O,O,O,O,O,O,O,O,O
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
74,3,410,411,as,O,O,O,O,O,O,O,O,O,O,O,O,O
75,3,413,417,first,B-Line_Of_Therapy,O,O,O,B-Line_Of_Therapy,O,B-Line_Of_Therapy,O,O,B-Line_Of_Therapy,O,O,O
76,3,419,422,line,I-Line_Of_Therapy,O,O,O,I-Line_Of_Therapy,O,I-Line_Of_Therapy,O,O,I-Line_Of_Therapy,O,O,O
77,3,424,430,therapy,O,O,O,O,O,O,O,O,O,O,O,O,O


## General Oncology Pretrained Pipelines

**`oncology pretrained ` Model List**

This pipeline includes Named-Entity Recognition, Assertion Status, Relation Extraction and Entity Resolution models to extract information from oncology texts.


|index|model|index|model|
|-----:|:-----|-----:|:-----|
| 1| [oncology_biomarker_pipeline](https://nlp.johnsnowlabs.com/2022/12/01/oncology_biomarker_pipeline_en.html)  | 2| [oncology_general_pipeline](https://nlp.johnsnowlabs.com/2022/12/01/oncology_general_pipeline_en.html)  |
| 3| [oncology_therapy_pipeline](https://nlp.johnsnowlabs.com/2022/12/01/oncology_therapy_pipeline_en.html)  | 4| [oncology_diagnosis_pipeline](https://nlp.johnsnowlabs.com/2022/12/01/oncology_diagnosis_pipeline_en.html)  

In [None]:
oncology_pipeline = PretrainedPipeline("oncology_biomarker_pipeline", "en", "clinical/models")

oncology_biomarker_pipeline download started this may take some time.
Approx size to download 1.6 GB
[OK!]


In [None]:
oncology_pipeline.model.stages

[DocumentAssembler_dab6eeac879e,
 SentenceDetectorDLModel_6bafc4746ea5,
 REGEX_TOKENIZER_1f483a1f8252,
 WORD_EMBEDDINGS_MODEL_9004b1d00302,
 MedicalNerModel_ecf280ca65e5,
 NerConverter_929f666beebc,
 MedicalNerModel_aeadb24f76a3,
 NerConverter_4f9b8da8c4c3,
 MedicalNerModel_eb9da4b9039b,
 NerConverter_40028785be7b,
 MedicalNerModel_299a97740594,
 NerConverter_a8bf552d0249,
 MERGE_acda18b976a6,
 MERGE_12af4df6fa27,
 ASSERTION_DL_8d77f383c928,
 ASSERTION_DL_163867728788,
 POS_6f55785005bf,
 dependency_e7755462ba78,
 RelationExtractionModel_d0af74510daa,
 RelationExtractionModel_68ebe11369b6,
 RelationExtractionModel_513eb6317779]

In [None]:
text = """Immunohistochemistry was negative for thyroid transcription factor-1 and napsin A. The test was positive for ER and PR, and negative for HER2."""

result = oncology_pipeline.fullAnnotate(text)[0]

result.keys()

dict_keys(['re_oncology_granular_wip', 'assertion_oncology_test_binary_wip', 're_oncology_wip', 'ner_oncology_biomarker_wip_chunk', 'ner_biomarker_chunk', 'ner_oncology_test_wip', 'document', 're_oncology_biomarker_result_wip', 'merged_chunk', 'ner_oncology_biomarker_wip', 'ner_biomarker', 'ner_oncology_test_wip_chunk', 'token', 'embeddings', 'pos_tags', 'assertion_oncology_wip', 'assertion_chunk', 'ner_oncology_wip', 'dependencies', 'ner_oncology_wip_chunk', 'sentence'])

**NER Results**

In [None]:
chunks=[]
entities=[]
begins=[]
ends=[]
confidence=[]
for n in result['merged_chunk']:

    chunks.append(n.result)
    begins.append(n.begin)
    ends.append(n.end)
    entities.append(n.metadata['entity'])
    confidence.append(n.metadata['confidence'])

df = pd.DataFrame({'chunks':chunks, 'begin':begins, 'end':ends, 'entities':entities, 'confidence':confidence})

df

Unnamed: 0,chunks,begin,end,entities,confidence
0,Immunohistochemistry,0,19,Pathology_Test,0.9967
1,negative,25,32,Biomarker_Result,0.8323
2,thyroid transcription factor-1,38,67,Biomarker,0.296675
3,napsin A,73,80,Biomarker,0.64309996
4,positive,96,103,Biomarker_Result,0.8017
5,ER,109,110,Biomarker,0.948
6,PR,116,117,Biomarker,0.8711
7,negative,124,131,Biomarker_Result,0.8385
8,HER2,137,140,Oncogene,0.9359


**Assertion Status Results**

In [None]:
chunks=[]
entities=[]
status=[]
confidence=[]

for n,m in zip(result['merged_chunk'],result['assertion_oncology_test_binary_wip']):

    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    status.append(m.result)
    confidence.append(m.metadata['confidence'])

df = pd.DataFrame({'chunks':chunks, 'entities':entities, 'assertion':status, 'confidence':confidence})

df

Unnamed: 0,chunks,entities,assertion,confidence
0,Immunohistochemistry,Pathology_Test,Medical_History,0.9926
1,negative,Biomarker_Result,Medical_History,0.9951
2,thyroid transcription factor-1,Biomarker,Medical_History,0.9951
3,napsin A,Biomarker,Medical_History,0.9926
4,positive,Biomarker_Result,Medical_History,0.9931
5,ER,Biomarker,Medical_History,0.9938


**Relation Extraction Results**

In [None]:
import pandas as pd

def get_relations_df (results, col='relations'):
  rel_pairs=[]
  for rel in results[0][col]:
      rel_pairs.append((
          rel.result,
          rel.metadata['entity1'],
          rel.metadata['entity1_begin'],
          rel.metadata['entity1_end'],
          rel.metadata['chunk1'],
          rel.metadata['entity2'],
          rel.metadata['entity2_begin'],
          rel.metadata['entity2_end'],
          rel.metadata['chunk2'],
          rel.metadata['confidence']
      ))

  rel_df = pd.DataFrame(rel_pairs, columns=['relation','entity1','entity1_begin','entity1_end','chunk1','entity2','entity2_begin','entity2_end','chunk2', 'confidence'])

  rel_df.confidence = rel_df.confidence.astype(float)

  return rel_df

In [None]:
result = oncology_pipeline.fullAnnotate(text)

rel_df = get_relations_df(result, 're_oncology_wip')

rel_df[rel_df.relation!= "O"]

Unnamed: 0,relation,entity1,entity1_begin,entity1_end,chunk1,entity2,entity2_begin,entity2_end,chunk2,confidence
1,is_related_to,Biomarker_Result,25,32,negative,Biomarker,38,67,thyroid transcription factor-1,0.997901
2,is_related_to,Biomarker_Result,25,32,negative,Biomarker,73,80,napsin A,0.999566
3,is_related_to,Biomarker_Result,96,103,positive,Biomarker,109,110,ER,0.98782
4,is_related_to,Biomarker_Result,96,103,positive,Biomarker,116,117,PR,0.897783
8,is_related_to,Biomarker_Result,124,131,negative,Oncogene,137,140,HER2,0.986855


In [None]:
result[0]['re_oncology_wip']

[Annotation(category, 0, 32, O, {'chunk2': 'negative', 'confidence': '0.97084755', 'entity2_end': '32', 'chunk1': 'Immunohistochemistry', 'entity2_begin': '25', 'entity1': 'Pathology_Test', 'chunk2_confidence': '0.8323', 'entity1_begin': '0', 'direction': 'both', 'entity1_end': '19', 'chunk1_confidence': '0.9967', 'entity2': 'Biomarker_Result'}, []),
 Annotation(category, 25, 67, is_related_to, {'chunk2': 'thyroid transcription factor-1', 'confidence': '0.99790084', 'entity2_end': '67', 'chunk1': 'negative', 'entity2_begin': '38', 'entity1': 'Biomarker_Result', 'chunk2_confidence': '0.296675', 'entity1_begin': '25', 'direction': 'both', 'entity1_end': '32', 'chunk1_confidence': '0.8323', 'entity2': 'Biomarker'}, []),
 Annotation(category, 25, 80, is_related_to, {'chunk2': 'napsin A', 'confidence': '0.9995658', 'entity2_end': '80', 'chunk1': 'negative', 'entity2_begin': '73', 'entity1': 'Biomarker_Result', 'chunk2_confidence': '0.64309996', 'entity1_begin': '25', 'direction': 'both', 'e