![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/27.Oncology_Model.ipynb)

# **ONCOLOGY MODELS**

This notebook includes details about different kinds of pretrained models to extract oncology-related information from clinical texts, together with examples of each type of model.

## Healthcare NLP for Data Scientists Course

If you are not familiar with the components in this notebook, you can check [Healthcare NLP for Data Scientists Udemy Course](https://www.udemy.com/course/healthcare-nlp-for-data-scientists/) and the [MOOC Notebooks](https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/Spark_NLP_Udemy_MOOC/Healthcare_NLP) for each components.

## Setup

In [None]:
import json
import os

from google.colab import files

license_keys = files.upload()

with open(list(license_keys.keys())[0]) as f:
    license_keys = json.load(f)

locals().update(license_keys)

os.environ.update(license_keys)

In [None]:
# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.4.1 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

# Installing Spark NLP Display Library for visualization
! pip install -q spark-nlp-display

In [3]:
import sparknlp
import sparknlp_jsl

from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp_jsl.annotator import *
from sparknlp_jsl.pretrained import InternalResourceDownloader

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.ml import Pipeline,PipelineModel

import pandas as pd
pd.set_option('display.max_colwidth', 200)

import warnings
warnings.filterwarnings('ignore')

params = {"spark.driver.memory":"16G",
          "spark.kryoserializer.buffer.max":"2000M",
          "spark.driver.maxResultSize":"2000M"}

print("Spark NLP Version :", sparknlp.version())
print("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark = sparknlp_jsl.start(license_keys['SECRET'],params=params)

spark

Spark NLP Version : 6.1.3
Spark NLP_JSL Version : 6.1.1


## **List of Pretrained Models**

In [4]:
import pandas as pd

df = pd.DataFrame()
for model_type in ['MedicalNerModel', 'RelationExtractionModel', 'RelationExtractionDLModel',
                   'AssertionDLModel', 'MedicalBertForSequenceClassification']:

    model_list = [model[0] for model in InternalResourceDownloader.returnPrivateModels(model_type)
                  if ("oncology" in model[0]) or ("biomarker" in model[0]) or ("response" in model[0])]
    #model_list = [model_name.replace("_wip","") for model_name in model_list]
    model_list = [model_name for model_name in model_list if "wip" not in model_name]
    model_list = sorted(set(model_list))
    df = pd.concat([df, pd.DataFrame(model_list, columns = [model_type])], axis = 1)

df.fillna('')

Unnamed: 0,MedicalNerModel,RelationExtractionModel,RelationExtractionDLModel,AssertionDLModel,MedicalBertForSequenceClassification
0,ner_biomarker,re_oncology,redl_oncology_biobert,assertion_oncology,bert_sequence_classifier_biomarker
1,ner_biomarker_langtest,re_oncology_biomarker_result,redl_oncology_biomarker_result_biobert,assertion_oncology_demographic_binary,bert_sequence_classifier_biomarker_onnx
2,ner_oncology,re_oncology_granular,redl_oncology_granular_biobert,assertion_oncology_family_history,bert_sequence_classifier_response_to_treatment
3,ner_oncology_anatomy_general,re_oncology_location,redl_oncology_location_biobert,assertion_oncology_problem,bert_sequence_classifier_response_to_treatment_onnx
4,ner_oncology_anatomy_general_healthcare,re_oncology_size,redl_oncology_size_biobert,assertion_oncology_response_to_treatment,
5,ner_oncology_anatomy_general_langtest,re_oncology_temporal,redl_oncology_temporal_biobert,assertion_oncology_smoking_status,
6,ner_oncology_anatomy_granular,re_oncology_test_result,redl_oncology_test_result_biobert,assertion_oncology_test_binary,
7,ner_oncology_anatomy_granular_langtest,,,assertion_oncology_treatment_binary,
8,ner_oncology_biomarker,,,,
9,ner_oncology_biomarker_docwise,,,,


**Medidical NER Models and labels**

<br>


**labels**                 | **description**                                                                                                                                                                                                                        | **ner_oncology** | **ner_oncology_anatomy_general** | **ner_oncology_anatomy_general_healthcare** | **ner_oncology_anatomy_granular** | **ner_oncology_biomarker** | **ner_oncology_biomarker_healthcare** | **ner_oncology_demographics** | **ner_oncology_diagnosis** | **ner_oncology_emb_clinical_large** | **ner_oncology_emb_clinical_medium** | **ner_oncology_limited_80p_for_benchmarks** | **ner_oncology_posology** | **ner_oncology_response_to_treatment** | **ner_oncology_test** | **ner_oncology_therapy** | **ner_oncology_tnm** | **ner_oncology_unspecific_posology** | **ner_oncology_unspecific_posology_healthcare**
:-------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------:|:--------------------------------:|:-------------------------------------------:|:---------------------------------:|:--------------------------:|:-------------------------------------:|:-----------------------------:|:--------------------------:|:-----------------------------------:|:------------------------------------:|:-------------------------------------------:|:-------------------------:|:--------------------------------------:|:---------------------:|:------------------------:|:--------------------:|:------------------------------------:|:-----------------------------------------------:
 **Adenopathy**            | Mentions of pathological findings of the lymph nodes.                                                                                                                                                                                  | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Age**                   | All mention of ages, past or present, related to the patient or with anybody else.                                                                                                                                                     | X                |                                  |                                             |                                   |                            |                                       | X                             |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Biomarker**             | Biological molecules that indicate the presence or absence of cancer, or the type of cancer. Oncogenes are excluded from this category.                                                                                                | X                |                                  |                                             |                                   | X                          | X                                     |                               |                            | X                                   | X                                    | X                                           |                           |                                        | X                     |                          |                      |                                      |                                                 
 **Biomarker_Result**      | Terms or values that are identified as the result of a biomarkers.                                                                                                                                                                     | X                |                                  |                                             |                                   | X                          | X                                     |                               |                            | X                                   | X                                    | X                                           |                           |                                        | X                     |                          |                      |                                      |                                                 
 **Cancer_Dx**             | Mentions of cancer diagnoses (such as “breast cancer”) or pathological types that are usually used as synonyms for “cancer” (e.g. “carcinoma”). <br> When anatomical references are present, they are included in the Cancer_Dx extraction. | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          | X                    |                                      |                                                 
 **Cancer_Score**          | Clinical or imaging scores that are specific for cancer settings (e.g. “BI-RADS” or “Allred score”).                                                                                                                                   | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Cancer_Surgery**        | Terms that indicate surgery as a form of cancer treatment.                                                                                                                                                                             | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Chemotherapy**          | Mentions of chemotherapy drugs, or unspecific words such as “chemotherapy”.                                                                                                                                                            | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       | X                        |                      |                                      |                                                 
 **Cycle_Count**           | The total number of cycles being administered of an oncological therapy (e.g. “5 cycles”).                                                                                                                                             | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Cycle_Day**             | References to the day of the cycle of oncological therapy (e.g. “day 5”).                                                                                                                                                              | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Cycle_Number**          | The number of the cycle of an oncological therapy that is being applied (e.g. “third cycle”).                                                                                                                                          | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Date**                  | Mentions of exact dates, in any format, including day number, month and/or year.                                                                                                                                                       | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Death_Entity**          | Words that indicate the death of the patient or someone else (including family members), such as “died” or “passed away”.                                                                                                              | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Direction**             | Directional and laterality terms, such as “left”, “right”, “bilateral”, “upper” and “lower”.                                                                                                                                           | X                | X                                | X                                           | X                                 |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Dosage**                | The quantity prescribed by the physician for an active ingredient.                                                                                                                                                                     | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Duration**              | Words indicating the duration of a treatment (e.g. “for 2 weeks”).                                                                                                                                                                     | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Frequency**             | Words indicating the frequency of treatment administration (e.g. “daily” or “bid”).                                                                                                                                                    | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Gender**                | Gender-specific nouns and pronouns (including words such as “him” or “she”, and family members such as “father”).                                                                                                                      | X                |                                  |                                             |                                   |                            |                                       | X                             |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Grade**                 | All pathological grading of tumors (e.g. “grade 1”) or degrees of cellular differentiation (e.g. “well-differentiated”)                                                                                                                | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Histological_Type**     | Histological variants or cancer subtypes, such as “papillary”, “clear cell” or “medullary”.                                                                                                                                            | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Hormonal_Therapy**      | Mentions of hormonal drugs used to treat cancer, or unspecific words such as “hormonal therapy”.                                                                                                                                       | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       | X                        |                      |                                      |                                                 
 **Imaging_Test**          | Imaging tests mentioned in texts, such as “chest CT scan”.                                                                                                                                                                             | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        | X                     |                          |                      |                                      |                                                 
 **Immunotherapy**         | Mentions of immunotherapy drugs, or unspecific words such as “immunotherapy”.                                                                                                                                                          | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       | X                        |                      |                                      |                                                 
 **Invasion**              | Mentions that refer to tumor invasion, such as “invasion” or “involvement”. Metastases or lymph node involvement are excluded from this category.                                                                                      | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Line_Of_Therapy**       | Explicit references to the line of therapy of an oncological therapy (e.g. “first-line treatment”).                                                                                                                                    | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           | X                                      |                       | X                        |                      |                                      |                                                 
 **Metastasis**            | Terms that indicate a metastatic disease. Anatomical references are not included in these extractions.                                                                                                                                 | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          | X                    |                                      |                                                 
 **Oncogene**              | Mentions of genes that are implicated in the etiology of cancer.                                                                                                                                                                       | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        | X                     |                          |                      |                                      |                                                 
 **Pathology_Result**      | The findings of a biopsy from the pathology report that is not covered by another entity (e.g. “malignant ductal cells”).                                                                                                              | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Pathology_Test**        | Mentions of biopsies or tests that use tissue samples.                                                                                                                                                                                 | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        | X                     |                          |                      |                                      |                                                 
 **Performance_Status**    | Mentions of performance status scores, such as ECOG and Karnofsky. The name of the score is extracted together with the result (e.g. “ECOG performance status of 4”).                                                                  | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Race_Ethnicity**        | The race and ethnicity categories include racial and national origin or sociocultural groups.                                                                                                                                          | X                |                                  |                                             |                                   |                            |                                       | X                             |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Radiotherapy**          | Terms that indicate the use of Radiotherapy.                                                                                                                                                                                           | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Response_To_Treatment** | Terms related to clinical progress of the patient related to cancer treatment, including “recurrence”, “bad response” or “improvement”.                                                                                                | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           | X                                      |                       | X                        |                      |                                      |                                                 
 **Relative_Date**         | Temporal references that are relative to the date of the text or to any other specific date (e.g. “yesterday” or “three years later”).                                                                                                 | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Route**                 | Words indicating the type of administration route (such as “PO” or “transdermal”).                                                                                                                                                     | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Site_Bone**             | Anatomical terms that refer to the human skeleton.                                                                                                                                                                                     | X                |                                  |                                             | X                                 |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Site_Brain**            | Anatomical terms that refer to the central nervous system (including the brain stem and the cerebellum).                                                                                                                               | X                |                                  |                                             | X                                 |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Site_Breast**           | Anatomical terms that refer to the breasts.                                                                                                                                                                                            | X                |                                  |                                             | X                                 |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Site_Liver**            | Anatomical terms that refer to the liver.                                                                                                                                                                                              | X                |                                  |                                             | X                                 |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Site_Lung**             | Anatomical terms that refer to the lungs.                                                                                                                                                                                              | X                |                                  |                                             | X                                 |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Site_Lymph_Node**       | Anatomical terms that refer to lymph nodes, excluding adenopathies.                                                                                                                                                                    | X                |                                  |                                             | X                                 |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Site_Other_Body_Part**  | Relevant anatomical terms that are not included in the rest of the anatomical entities.                                                                                                                                                | X                |                                  |                                             | X                                 |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Smoking_Status**        | All mentions of smoking related to the patient or to someone else.                                                                                                                                                                     | X                |                                  |                                             |                                   |                            |                                       | X                             |                            | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Staging**               | Mentions of cancer stage such as “stage 2b” or “T2N1M0”. It also includes words such as “in situ”, “early-stage” or “advanced”.                                                                                                        | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          | X                    |                                      |                                                 
 **Targeted_Therapy**      | Mentions of targeted therapy drugs, or unspecific words such as “targeted therapy”.                                                                                                                                                    | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       | X                        |                      |                                      |                                                 
 **Tumor_Finding**         | All nonspecific terms that may be related to tumors, either malignant or benign (for example: “mass”, “tumor”, “lesion”, or “neoplasm”).                                                                                               | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Tumor_Size**            | Size of the tumor, including numerical value and unit of measurement (e.g. “3 cm”).                                                                                                                                                    | X                |                                  |                                             |                                   |                            |                                       |                               | X                          | X                                   | X                                    | X                                           |                           |                                        |                       |                          |                      |                                      |                                                 
 **Unspecific_Therapy**    | Terms that indicate a known cancer therapy but that is not specific to any other therapy entity (e.g. “chemoradiotherapy” or “adjuvant therapy”).                                                                                      | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           |                           |                                        |                       | X                        |                      |                                      |                                                 
 **Radiation_Dose**        | Dose used in radiotherapy.                                                                                                                                                                                                             | X                |                                  |                                             |                                   |                            |                                       |                               |                            | X                                   | X                                    | X                                           | X                         |                                        |                       | X                        |                      |                                      |                                                 
 **Anatomical_Site**       | Relevant anatomical terms mentioned in text.                                                                                                                                                                                           |                  | X                                | X                                           |                                   |                            |                                       |                               |                            |                                     |                                      |                                             |                           |                                        |                       |                          |                      |                                      |                                                 
 **Cancer_Therapy**        | Mentions of cancer treatments, including chemotherapy, radiotherapy, surgery and other.                                                                                                                                                |                  |                                  |                                             |                                   |                            |                                       |                               |                            |                                     |                                      |                                             | X                         |                                        |                       |                          |                      | X                                    | X                                               
 **Size_Trend**            | Terms related to the changes in the size of the tumor (such as “growth” or “reduced in size”).                                                                                                                                         |                  |                                  |                                             |                                   |                            |                                       |                               |                            |                                     |                                      |                                             |                           | X                                      |                       |                          |                      |                                      |                                                 
 **Lymph_Node**            | Mentions of lymph nodes and pathological findings of the lymph nodes.                                                                                                                                                                  |                  |                                  |                                             |                                   |                            |                                       |                               |                            |                                     |                                      |                                             |                           |                                        |                       |                          | X                    |                                      |                                                 
 **Tumor_Description**     | Information related to tumor characteristics, such as size, presence of invasion, grade and hystological type.                                                                                                                         |                  |                                  |                                             |                                   |                            |                                       |                               |                            |                                     |                                      |                                             |                           |                                        |                       |                          | X                    |                                      |                                                 
 **Tumor**                 | All nonspecific terms that may be related to tumors, either malignant or benign (for example: “mass”, “tumor”, “lesion”, or “neoplasm”).                                                                                               |                  |                                  |                                             |                                   |                            |                                       |                               |                            |                                     |                                      |                                             |                           |                                        |                       |                          | X                    |                                      |                                                 
 **Lymph_Node_Modifier**   | Words that refer to a lymph node being abnormal (such as “enlargement”).                                                                                                                                                               |                  |                                  |                                             |                                   |                            |                                       |                               |                            |                                     |                                      |                                             |                           |                                        |                       |                          | X                    |                                      |                                                 
 **Posology_Information**  | Terms related to the posology of the treatment, including duration, frequencies and dosage.                                                                                                                                            |                  |                                  |                                             |                                   |                            |                                       |                               |                            |                                     |                                      |                                             |                           |                                        |                       |                          |                      | X                                    | X                                               



**Assertion Models and labels**

<br>

| **labels**                 | **assertion_oncology_wip** | **assertion_oncology_demographic_binary_wip** | **assertion_oncology_family_history_wip** | **assertion_oncology_problem_wip** | **assertion_oncology_response_to_treatment_wip** | **assertion_oncology_smoking_status_wip** | **assertion_oncology_test_binary_wip** | **assertion_oncology_treatment_binary_wip** |
|:--------------------------|:--------------------------:|:---------------------------------------------:|:-----------------------------------------:|:----------------------------------:|:------------------------------------------------:|:-----------------------------------------:|:--------------------------------------:|:-------------------------------------------:|
| **Present**                | X                          |                                               |                                           |                                    |                                                  | X                                         |                                        |                                             |
| **Past**                   | X                          |                                               |                                           |                                    |                                                  | X                                         |                                        |                                             |
| **Present_Or_Past**        |                            |                                               |                                           |                                    | X                                                |                                           |                                        | X                                           |
| **Absent**                 | X                          |                                               |                                           |                                    |                                                  | X                                         |                                        |                                             |
| **Someone_Else**           |                            | X                                             |                                           |                                    |                                                  |                                           |                                        |                                             |
| **Family**                 | X                          |                                               |                                           |                                    |                                                  |                                           |                                        |                                             |
| **Family_History**         |                            |                                               | X                                         | X                                  |                                                  |                                           | X                                      |                                             |
| **Hypothetical**           | X                          |                                               |                                           |                                    |                                                  |                                           |                                        |                                             |
| **Hypothetical_Or_Absent** |                            |                                               |                                           | X                                  | X                                                |                                           | X                                      | X                                           |
| **Possible**               | X                          |                                               |                                           | X                                  |                                                  |                                           | X                                      |                                             |
| **Patient**                |                            | X                                             |                                           |                                    |                                                  |                                           |                                        |                                             |
| **Medical_History**        |                            |                                               |                                           | X                                  |                                                  |                                           | X                                      |                                             |
| **Other**                  |                            |                                               | X                                         |                                    |                                                  |                                           |                                        |                                             |


## NER Models

The NER models from the list include different entity groups and levels of granularity. If you want to extract as much information as possible from oncology texts, then ner_oncology is the best option for you, as it is the most general and granular model. But you may want to use other models depending on your needs (for instance, if you need to extract information related with staging, ner_oncology_tnm would be the most suitable model).

In [5]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")\
    .setSplitChars(["-", "\/"])

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")\
    .setInputCols(["sentence","token"])\
    .setOutputCol("embeddings")

# ner_oncology

ner_oncology = MedicalNerModel.pretrained("ner_oncology","en","clinical/models")\
    .setInputCols(["sentence","token","embeddings"])\
    .setOutputCol("ner_oncology")\

ner_oncology_converter = NerConverterInternal()\
    .setInputCols(["sentence","token","ner_oncology"])\
    .setOutputCol("ner_oncology_chunk")

# ner_oncology_tnm

ner_oncology_tnm = MedicalNerModel.pretrained("ner_oncology_tnm","en","clinical/models")\
    .setInputCols(["sentence","token","embeddings"])\
    .setOutputCol("ner_oncology_tnm")\

ner_oncology_tnm_converter = NerConverterInternal()\
    .setInputCols(["sentence","token","ner_oncology_tnm"])\
    .setOutputCol("ner_oncology_tnm_chunk")

# # ner_oncology_biomarker

ner_oncology_biomarker = MedicalNerModel.pretrained("ner_oncology_biomarker","en","clinical/models")\
    .setInputCols(["sentence","token","embeddings"])\
    .setOutputCol("ner_oncology_biomarker")\

ner_oncology_biomarker_converter = NerConverterInternal()\
    .setInputCols(["sentence","token","ner_oncology_biomarker"])\
    .setOutputCol("ner_oncology_biomarker_chunk")

ner_stages = [
    document_assembler,
    sentence_detector,
    tokenizer,
    word_embeddings,
    ner_oncology,
    ner_oncology_converter,
    ner_oncology_tnm,
    ner_oncology_tnm_converter,
    ner_oncology_biomarker,
    ner_oncology_biomarker_converter
]

ner_pipeline = Pipeline(stages=ner_stages)

empty_data = spark.createDataFrame([[""]]).toDF("text")

ner_model = ner_pipeline.fit(empty_data)

sentence_detector_dl_healthcare download started this may take some time.
Approximate size to download 367.3 KB
[OK!]
embeddings_clinical download started this may take some time.
Approximate size to download 1.6 GB
[OK!]
ner_oncology download started this may take some time.
Approximate size to download 33 MB
[OK!]
ner_oncology_tnm download started this may take some time.
Approximate size to download 32.6 MB
[OK!]
ner_oncology_biomarker download started this may take some time.
Approximate size to download 32.7 MB
[OK!]


In [6]:
ner_oncology_labels = sorted(list(set([label.split('-')[-1] for label in ner_oncology.getClasses() if label != 'O'])))

len(ner_oncology_labels)

49

In [7]:
label_df = pd.DataFrame()
for column in range((len(ner_oncology_labels)//10)+1):
  label_df = pd.concat([label_df, pd.DataFrame(ner_oncology_labels, columns = [''])[column*10:(column+1)*10].reset_index(drop= True)], axis = 1)

label_df.fillna('')

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5
0,Adenopathy,Cycle_Number,Hormonal_Therapy,Race_Ethnicity,Site_Lung
1,Age,Date,Imaging_Test,Radiation_Dose,Site_Lymph_Node
2,Biomarker,Death_Entity,Immunotherapy,Radiotherapy,Site_Other_Body_Part
3,Biomarker_Result,Direction,Invasion,Relative_Date,Smoking_Status
4,Cancer_Dx,Dosage,Line_Of_Therapy,Response_To_Treatment,Staging
5,Cancer_Score,Duration,Metastasis,Route,Targeted_Therapy
6,Cancer_Surgery,Frequency,Oncogene,Site_Bone,Tumor_Finding
7,Chemotherapy,Gender,Pathology_Result,Site_Brain,Tumor_Size
8,Cycle_Count,Grade,Pathology_Test,Site_Breast,Unspecific_Therapy
9,Cycle_Day,Histological_Type,Performance_Status,Site_Liver,


In [8]:
ner_oncology_tnm_labels = sorted(list(set([label.split('-')[-1] for label in ner_oncology_tnm.getClasses() if label != 'O'])))

print(ner_oncology_tnm_labels)

['Cancer_Dx', 'Lymph_Node', 'Lymph_Node_Modifier', 'Metastasis', 'Staging', 'Tumor', 'Tumor_Description']


In [9]:
ner_oncology_biomarker_labels = sorted(list(set([label.split('-')[-1] for label in ner_oncology_biomarker.getClasses() if label != 'O'])))

print(ner_oncology_biomarker_labels)

['Biomarker', 'Biomarker_Result']


In [10]:
sample_text_1 = '''A 65-year-old woman had a history of debulking surgery, bilateral oophorectomy with omentectomy, total anterior hysterectomy with radical pelvic lymph nodes dissection due to ovarian carcinoma (mucinous-type carcinoma, stage Ic) 1 year ago. Patient's medical compliance was poor and failed to complete her chemotherapy (cyclophosphamide 750 mg/m2, carboplatin 300 mg/m2). Recently, she noted a palpable right breast mass, 15 cm in size which nearly occupied the whole right breast in 2 months. Core needle biopsy revealed metaplastic carcinoma. Neoadjuvant chemotherapy with the regimens of Taxotere (75 mg/m2), Epirubicin (75 mg/m2), and Cyclophosphamide (500 mg/m2) was given for 6 cycles with poor response, followed by a modified radical mastectomy (MRM) with dissection of axillary lymph nodes and skin grafting. Postoperatively, radiotherapy was done with 5000 cGy in 25 fractions. The histopathologic examination revealed a metaplastic carcinoma with squamous differentiation associated with adenomyoepithelioma. Immunohistochemistry study showed that the tumor cells are positive for epithelial markers-cytokeratin (AE1/AE3) stain, and myoepithelial markers, including cytokeratin 5/6 (CK 5/6), p63, and S100 stains. Expressions of hormone receptors, including ER, PR, and Her-2/Neu, were all negative. The dissected axillary lymph nodes showed metastastic carcinoma with negative hormone receptors in 3 nodes. The patient was staged as pT3N1aM0, with histologic tumor grade III.'''

sample_text_2 = '''She underwent a computed tomography (CT) scan of the abdomen and pelvis, which showed a complex ovarian mass. A Pap smear performed one month later was positive for atypical glandular cells suspicious for adenocarcinoma. The pathologic specimen showed extension of the tumor throughout the fallopian tubes, appendix, omentum, and 5 out of 5 enlarged lymph nodes. The final pathologic diagnosis of the tumor was stage IIIC papillary serous ovarian adenocarcinoma. Two months later, the patient was diagnosed with lung metastases.'''

sample_text_3 = '''In the bone- marrow (BM) aspiration, blasts accounted for 88.1% of ANCs, which were positive for CD9, CD10, CD13, CD19, CD20, CD34, CD38, CD58, CD66c, CD123, HLA-DR, cCD79a, and TdT on flow cytometry.

Measurements of serum tumor markers showed elevated level of cytokeratin 19 fragment (Cyfra21-1: 4.77 ng/mL), neuron-specific enolase (NSE: 19.60 ng/mL), and squamous cell carcinoma antigen (SCCA: 2.58 ng/mL). The results were negative for serum carbohydrate antigen 125 (CA125), carcinoembryonic antigen (CEA) and vascular endothelial growth factor (VEGF). Immunohistochemical staining showed positive staining for CK5/6, P40 and PD-L1 (+ 80% tumor cells), and negative staining for TTF-1, PD-1 and weakly positive staining for ALK. Molecular analysis indicated no EGFR mutation or ROS1 fusion.'''

In [11]:
data = spark.createDataFrame(pd.DataFrame([sample_text_1, sample_text_2, sample_text_3], columns = ['text']))

In [12]:
results = ner_model.transform(data).collect()

In [13]:
from sparknlp_display import NerVisualizer

visualiser = NerVisualizer()

In [14]:
from google.colab import widgets

t = widgets.TabBar(["ner_oncology_biomarker", "ner_oncology_tnm", "ner_oncology"])

with t.output_to(0):
    visualiser.display(results[2], label_col='ner_oncology_biomarker_chunk')

with t.output_to(1):
    visualiser.display(results[1], label_col='ner_oncology_tnm_chunk')

with t.output_to(2):
    visualiser.display(results[0], label_col='ner_oncology_chunk')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

**ner_cancer_types_wip**



| Model Name              | Description |
|-------------------------|-------------|
|[ner_cancer_types_wip](https://nlp.johnsnowlabs.com/2024/08/16/ner_cancer_types_wip_en.html)      | This Named Entity Recognition (NER) model is specifically trained to recognize 6 main cancer types, body sites, biomarkers and their results. |



In [15]:
ner_model = MedicalNerModel.pretrained('ner_cancer_types_wip', "en", "clinical/models")\
    .setInputCols(["sentence", "token","embeddings"])\
    .setOutputCol("ner")

ner_converter = NerConverterInternal()\
    .setInputCols(['sentence', 'token', 'ner'])\
    .setOutputCol('ner_chunk')

pipeline = Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    word_embeddings,
    ner_model,
    ner_converter
    ])

ner_cancer_types_wip download started this may take some time.
Approximate size to download 4.7 MB
[OK!]


In [16]:
sample_texts_4 = """
Patient A, a 55-year-old female, presented with carcinoma in the left breast. A biopsy revealed an elevated HER2. The patient also showed a slightly elevated CA 15-3 level at 45 U/mL. Follow-up imaging revealed metastasis to the axillary lymph nodes, and further scans indicated small metastatic lesions in the liver.
Additionally, imaging of the patient's lower back indicated a possible sarcoma. Subsequent tests identified elevated levels of lactate dehydrogenase (LDH), with a result of 580 IU/L (normal range: 140-280 IU/L), and a biopsy confirmed metastasis to the lungs.
Routine bloodwork revealed a mild increase in B2M (Beta-2 microglobulin), suggestive of possible lymphoma, and a normal range for hemoglobin and white blood cells, ruling out leukemia. CNS involvement was ruled out as imaging did not indicate any anomalies.
For melanoma screening, a suspicious mole on the patient's arm was biopsied, and tests confirmed a BRAF V600E mutation. Further imaging revealed metastatic spread to the lungs and liver.
"""

In [17]:
light_model = LightPipeline(pipeline.fit(data))

light_result = light_model.fullAnnotate(sample_texts_4)

chunks = []
entities = []
sentence= []
begin = []
end = []
confidence = []

for n in light_result[0]['ner_chunk']:

    begin.append(n.begin)
    end.append(n.end)
    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    sentence.append(n.metadata['sentence'])
    confidence.append(n.metadata["confidence"])


df_clinical = pd.DataFrame({'chunks':chunks, 'begin': begin, 'end':end,
                   'sentence_id':sentence, 'entities':entities, 'confidence':confidence})

df_clinical

Unnamed: 0,chunks,begin,end,sentence_id,entities,confidence
0,carcinoma,49,57,0,Carcinoma_Type,0.9855
1,breast,71,76,0,Body_Site,0.9988
2,elevated,100,107,0,Biomarker_Result,0.9782
3,HER2,109,112,0,Biomarker,0.9999
4,elevated,150,157,1,Biomarker_Result,0.9904
5,CA 15-3,159,165,1,Biomarker,0.8341
6,metastasis,212,221,2,Metastasis,1.0
7,axillary lymph nodes,230,249,2,Body_Site,0.9835334
8,metastatic,286,295,2,Metastasis,0.9998
9,liver,312,316,2,Body_Site,0.9979


In [18]:
from sparknlp_display import NerVisualizer

visualiser = NerVisualizer()

visualiser.display(light_result[0], label_col='ner_chunk', document_col='document')

## Relation Extraction Models

RE Models are used to link entities that are related. For oncology entities, you can use general models (such as re_oncology_granular_wip) or you can select a specific model depending on your needs (e.g. re_oncology_size_wip to link tumors and their sizes, or re_oncology_biomarker_result_wip to link biomarkers and their results).

In [19]:
pos_tagger = PerceptronModel.pretrained("pos_clinical", "en", "clinical/models") \
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("pos_tags")

dependency_parser = DependencyParserModel.pretrained("dependency_conllu", "en") \
    .setInputCols(["sentence", "pos_tags", "token"]) \
    .setOutputCol("dependencies")

re_oncology_granular = RelationExtractionModel.pretrained("re_oncology_granular", "en", "clinical/models") \
    .setInputCols(["embeddings", "pos_tags", "ner_oncology_chunk", "dependencies"]) \
    .setOutputCol("re_oncology_granular") \
    .setRelationPairs(['Date-Cancer_Dx', 'Cancer_Dx-Date', 'Tumor_Finding-Site_Breast', 'Site_Breast-Tumor_Finding',
                       'Relative_Date-Tumor_Finding', 'Tumor_Finding-Relative_Date', 'Tumor_Finding-Tumor_Size', 'Tumor_Size-Tumor_Finding',
                       'Pathology_Test-Cancer_Dx', 'Cancer_Dx-Pathology_Test']) \
    .setMaxSyntacticDistance(10)

re_oncology_size = RelationExtractionModel.pretrained("re_oncology_size", "en", "clinical/models") \
    .setInputCols(["embeddings", "pos_tags", "ner_oncology_chunk", "dependencies"]) \
    .setOutputCol("re_oncology_size") \
    .setRelationPairs(['Tumor_Finding-Tumor_Size', 'Tumor_Size-Tumor_Finding']) \
    .setMaxSyntacticDistance(10)

re_oncology_biomarker_result = RelationExtractionModel.pretrained("re_oncology_biomarker_result", "en", "clinical/models") \
    .setInputCols(["embeddings", "pos_tags", "ner_oncology_biomarker_chunk", "dependencies"]) \
    .setOutputCol("re_oncology_biomarker_result") \
    .setRelationPairs(['Biomarker-Biomarker_Result', 'Biomarker_Result-Biomarker']) \
    .setMaxSyntacticDistance(10)

re_stages = ner_stages + [pos_tagger, dependency_parser, re_oncology_granular, re_oncology_size, re_oncology_biomarker_result]

re_pipeline = Pipeline(stages=re_stages)

re_model = re_pipeline.fit(empty_data)

pos_clinical download started this may take some time.
Approximate size to download 1.5 MB
[OK!]
dependency_conllu download started this may take some time.
Approximate size to download 16.7 MB
[OK!]
re_oncology_granular download started this may take some time.
Approximate size to download 261 KB
[OK!]
re_oncology_size download started this may take some time.
Approximate size to download 261.3 KB
[OK!]
re_oncology_biomarker_result download started this may take some time.
Approximate size to download 259.6 KB
[OK!]


In [20]:
sample_text_5 = '''Two years ago, she noted a palpable right breast mass, 15 cm in size. Core needle biopsy revealed metaplastic carcinoma.'''

sample_text_6 = '''The patient presented a 2 cm mass in her left breast, and the tumor in her other breast was 3 cm long.'''

sample_text_7 = '''Immunohistochemical staining showed positive staining for CK5/6, P40 and PD-L1, and negative staining for TTF-1, PD-1 and weakly positive staining for ALK. Immunohistochemistry study showed that the tumor cells are positive for epithelial markers-cytokeratin and myoepithelial markers, including cytokeratin 5/6, p63, and S100 stains.'''

In [21]:
re_data = spark.createDataFrame(pd.DataFrame([sample_text_5, sample_text_6, sample_text_7], columns = ['text']))

In [22]:
re_results = re_model.transform(re_data).collect()

In [23]:
from sparknlp_display import RelationExtractionVisualizer

re_visualiser = RelationExtractionVisualizer()

In [24]:
re_t = widgets.TabBar(["re_oncology_biomarker_result", "re_oncology_size", "re_oncology_granular"])

with re_t.output_to(0):
    re_visualiser.display(re_results[2], relation_col='re_oncology_biomarker_result')

with re_t.output_to(1):
    re_visualiser.display(re_results[1], relation_col='re_oncology_size')

with re_t.output_to(2):
    re_visualiser.display(re_results[0], relation_col='re_oncology_granular')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Assertion Status Models

With assertion status models, you will be able to identify if entities included in texts are mentioned as something present, absent, hypothetical, possible, etc. You can either try using the general assertion_oncology_wip model, or other models that are recommended for specific entity groups (such as assertion_oncology_problem_wip, which should be used for problem entities like Cancer_Dx or Metastasis).

In [25]:
assertion_oncology = AssertionDLModel.pretrained("assertion_oncology", "en", "clinical/models") \
    .setInputCols(["sentence", 'ner_oncology_chunk', "embeddings"]) \
    .setOutputCol("assertion_oncology")

assertion_oncology_problem = AssertionDLModel.pretrained("assertion_oncology_problem", "en", "clinical/models") \
    .setInputCols(["sentence", 'ner_oncology_tnm_chunk', "embeddings"]) \
    .setOutputCol("assertion_oncology_problem")

assertion_oncology_treatment_binary = AssertionDLModel.pretrained("assertion_oncology_treatment_binary", "en", "clinical/models") \
    .setInputCols(["sentence", 'ner_oncology_chunk', "embeddings"]) \
    .setOutputCol("assertion_oncology_treatment_binary")

assertion_stages = ner_stages + [assertion_oncology, assertion_oncology_problem, assertion_oncology_treatment_binary]

assertion_pipeline = Pipeline(stages=assertion_stages)

assertion_model = assertion_pipeline.fit(empty_data)

assertion_oncology download started this may take some time.
Approximate size to download 1.4 MB
[OK!]
assertion_oncology_problem download started this may take some time.
Approximate size to download 1.4 MB
[OK!]
assertion_oncology_treatment_binary download started this may take some time.
Approximate size to download 1.4 MB
[OK!]


In [26]:
sample_text_8 = 'The patient is suspected to have colorectal cancer. Family history is positive for other cancers. The result of the biopsy was positive. A CT scan was ordered to rule out metastases.'

sample_text_9 = 'The patient was diagnosed with breast cancer. She was suspected to have metastases in her lungs. Her family history is positive for ovarian cancer.'

sample_text_10 = 'The patient underwent a mastectomy. We recommend to start radiotherapy. The patient refused to chemotherapy.'

In [27]:
assertion_data = spark.createDataFrame(pd.DataFrame([sample_text_8, sample_text_9, sample_text_10], columns = ['text']))

In [28]:
assertion_results = assertion_model.transform(assertion_data).collect()

In [29]:
from sparknlp_display import AssertionVisualizer

assertion_visualiser = AssertionVisualizer()

In [30]:
assertion_t = widgets.TabBar(["assertion_oncology_treatment_binary", "assertion_oncology_problem", "assertion_oncology"])

with assertion_t.output_to(0):
    assertion_visualiser.display(assertion_results[2], label_col ='ner_oncology_chunk', assertion_col='assertion_oncology_treatment_binary')

with assertion_t.output_to(1):
    assertion_visualiser.display(assertion_results[1], label_col ='ner_oncology_tnm_chunk', assertion_col='assertion_oncology_problem')

with assertion_t.output_to(2):
    assertion_visualiser.display(assertion_results[0], label_col ='ner_oncology_chunk', assertion_col='assertion_oncology')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Classification Models

You can use classification models to see if there is a phrase related to oncology in a sentence.

<center><b>Oncology Classifier Model List</b>

| Model Name              | Description |
|-------------------------|-------------|
|[bert_sequence_classifier_metastasis](https://nlp.johnsnowlabs.com/2024/08/02/bert_sequence_classifier_metastasis_en.html)      | This model is a metastasis classification model that can determine whether clinical sentences include terms related to metastasis or not. |
|[classifierdl_metastasis](https://nlp.johnsnowlabs.com/2024/08/09/classifierdl_metastasis_en.html)      | This model is a metastasis classification model that determines whether clinical sentences include terms related to metastasis. |
|[generic_classifier_metastasis](https://nlp.johnsnowlabs.com/2024/08/09/generic_classifier_metastasis_en.html)      | This model is a metastasis classification model that determines whether clinical sentences include terms related to metastasis. |
|[generic_logreg_classifier_metastasis](https://nlp.johnsnowlabs.com/2024/08/09/generic_logreg_classifier_metastasis_en.html)      | This model is trained with the Generic Classifier annotator and the Logistic Regression algorithm and classifies text/sentence into two categories. |
|[generic_svm_classifier_metastasis](https://nlp.johnsnowlabs.com/2024/08/09/generic_svm_classifier_metastasis_en.html)      | This model is trained with the Generic Classifier annotator and the Support Vector Machine (SVM) algorithm and classifies text/sentence into two categories.|
|[generic_classifier_oncology](https://nlp.johnsnowlabs.com/2024/08/13/generic_classifier_oncology_en.html)      | This model is an oncology classification model that determines whether clinical sentences include terms related to oncology.|
|[generic_classifier_therapy](https://nlp.johnsnowlabs.com/2024/08/16/generic_classifier_therapy_en.html)      | This model is a therapy classification model that determines whether clinical sentences include terms related to therapy.|

In [31]:
document_assembler = DocumentAssembler()\
    .setInputCol('text')\
    .setOutputCol('document')

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(['sentence'])\
    .setOutputCol('token')

sequenceClassifier = MedicalBertForSequenceClassification\
    .pretrained("bert_sequence_classifier_metastasis","en","clinical/models")\
    .setInputCols(["sentence",'token'])\
    .setOutputCol("prediction")

pipeline = Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    sequenceClassifier
])

sample_texts = [
    ["Contrast MRI confirmed the findings of meningeal carcinomatosis."],
    ["A 62-year-old male presents with weight loss, persistent cough, and episodes of hemoptysis."],
    ["The primary tumor (T) is staged as T3 due to its size and local invasion, there is no nodal involvement (N0), and due to multiple bone and liver lesions, it is classified as M1, reflecting distant metastatic foci."] ,
    ["After all procedures done and reviewing the findings, biochemical results and screening, the TNM classification is determined."],
    ["The oncologist noted that the tumor had spread to the liver, indicating advanced stage cancer."],
    ["The patient's care plan is adjusted to focus on symptom management and slowing the progression of the disease."],
]

sample_data = spark.createDataFrame(sample_texts).toDF("text")

sentence_detector_dl_healthcare download started this may take some time.
Approximate size to download 367.3 KB
[OK!]
bert_sequence_classifier_metastasis download started this may take some time.
Approximate size to download 387.6 MB
[OK!]


In [32]:
result = pipeline.fit(sample_data).transform(sample_data)

result.selectExpr("text", "prediction.result[0]").show(truncate=80)

+--------------------------------------------------------------------------------+--------------------+
|                                                                            text|prediction.result[0]|
+--------------------------------------------------------------------------------+--------------------+
|                Contrast MRI confirmed the findings of meningeal carcinomatosis.|                   1|
|A 62-year-old male presents with weight loss, persistent cough, and episodes ...|                   0|
|The primary tumor (T) is staged as T3 due to its size and local invasion, the...|                   1|
|After all procedures done and reviewing the findings, biochemical results and...|                   0|
|The oncologist noted that the tumor had spread to the liver, indicating advan...|                   1|
|The patient's care plan is adjusted to focus on symptom management and slowin...|                   0|
+---------------------------------------------------------------

## Pretrained NER Profiling Pipelines

We can use pretrained NER profiling pipelines for exploring all the available pretrained NER models at once.

- `ner_profiling_oncology` : Returns results for oncology NER models.

For more examples, please check [this notebook](https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Healthcare/11.2.Pretrained_NER_Profiling_Pipelines.ipynb).





<center><b>NER Profiling Oncology Model List</b>

|| | | |
|--------------|-----------------|-----------------|-----------------|
| ner_oncology_unspecific_posology | ner_oncology_tnm | ner_oncology_therapy | ner_oncology_test |
| ner_oncology_response_to_treatment | ner_oncology_posology | ner_oncology | ner_oncology_limited_80p_for_benchmarks |
| ner_oncology_diagnosis | ner_oncology_demographics | ner_oncology_biomarker | ner_oncology_anatomy_granular | ner_oncology_anatomy_general |



</center>

In [33]:
from sparknlp.pretrained import PretrainedPipeline

oncology_profiling_pipeline = PretrainedPipeline("ner_profiling_oncology", "en", "clinical/models")

ner_profiling_oncology download started this may take some time.
Approx size to download 2.2 GB
[OK!]


In [34]:
text = """The had previously undergone a left mastectomy and an axillary lymph node dissection for a left breast cancer twenty years ago.
The tumor was positive for ER and PR. Postoperatively, radiotherapy was administered to the residual breast.
The cancer recurred as a right lung metastasis 13 years later. He underwent a regimen consisting of adriamycin (60 mg/m2) and cyclophosphamide (600 mg/m2) over six courses, as first line therapy."""

In [35]:
oncology_result = oncology_profiling_pipeline.fullAnnotate(text)[0]
oncology_result.keys()

dict_keys(['oncology_anatomy_general_ner', 'oncology_anatomy_granular_ner', 'oncology_diagnosis_langtest_ner', 'oncology_anatomy_general_langtest_ner', 'oncology_langtest_ner', 'oncology_anatomy_granular_langtest_ner', 'ner_chunk_jsl_greedy', 'ner_chunk_jsl_enriched', 'oncology_ner', 'oncology_tnm_ner', 'ner_chunk_oncology_anatomy_general', 'document', 'ner_chunk_oncology_test', 'ner_chunk_jsl_slim', 'ner_chunk_oncology_posology_langtest', 'jsl_langtest_ner', 'oncology_test_langtest_ner', 'jsl_greedy_ner', 'oncology_biomarker_langtest_ner', 'ner_chunk_oncology_biomarker_langtest', 'ner_chunk_oncology_limited_80p_for_benchmarks_ner', 'oncology_tnm_langtest_ner', 'ner_chunk_oncology_tnm', 'ner_chunk_oncology', 'jsl_enriched_ner', 'oncology_response_to_treatment_langtest_ner', 'ner_chunk_oncology_therapy', 'ner_chunk_jsl', 'oncology_limited_80p_for_benchmarks_ner', 'oncology_therapy_ner', 'ner_chunk_oncology_therapy_langtest', 'ner_chunk_oncology_response_to_treatment_langtest_ner', 'ner_

In [36]:
def get_token_results(light_result):

    tokens = [j.result for j in light_result["token"]]
    sentences = [j.metadata["sentence"] for j in light_result["token"]]
    begins = [j.begin for j in light_result["token"]]
    ends = [j.end for j in light_result["token"]]
    model_list = [ a for a in light_result.keys() if (a not in ["sentence", "token"] and "_chunks" not in a)]

    df = pd.DataFrame({'sentence':sentences, 'begin': begins, 'end': ends, 'token':tokens})

    for model_name in model_list:

        temp_df = pd.DataFrame(light_result[model_name])
        temp_df["jsl_label"] = temp_df.iloc[:,0].apply(lambda x : x.result)
        temp_df = temp_df[["jsl_label"]]

        # temp_df = get_ner_result(model_name)
        temp_df.columns = [model_name]
        df = pd.concat([df, temp_df], axis=1)

    # Filter columns to include only sentence, begin, end, token and all columns that start with 'ner_vop'
    filtered_df = df.loc[:, ['sentence', 'begin', 'end', 'token'] + [col for col in df.columns if col.startswith('oncology')]]

    return filtered_df

In [37]:
get_token_results(oncology_result)

Unnamed: 0,sentence,begin,end,token,oncology_anatomy_general_ner,oncology_anatomy_granular_ner,oncology_diagnosis_langtest_ner,oncology_anatomy_general_langtest_ner,oncology_langtest_ner,oncology_anatomy_granular_langtest_ner,...,oncology_unspecific_posology_langtest_ner,oncology_response_to_treatment_ner,oncology_posology_langtest_ner,oncology_demographics_ner,oncology_biomarker_ner,oncology_unspecific_posology_ner,oncology_test_ner,oncology_therapy_langtest_ner,oncology_diagnosis_ner,oncology_demographics_langtest_ner
0,0,0,2,The,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
1,0,4,6,had,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
2,0,8,17,previously,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
3,0,19,27,undergone,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
4,0,29,29,a,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
74,3,410,411,as,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
75,3,413,417,first,O,O,O,O,B-Line_Of_Therapy,O,...,O,B-Line_Of_Therapy,O,O,O,O,O,B-Line_Of_Therapy,O,O
76,3,419,422,line,O,O,O,O,I-Line_Of_Therapy,O,...,O,I-Line_Of_Therapy,O,O,O,O,O,I-Line_Of_Therapy,O,O
77,3,424,430,therapy,O,O,O,O,I-Line_Of_Therapy,O,...,O,O,O,O,O,O,O,I-Line_Of_Therapy,O,O


## General Oncology Pretrained Pipelines

**`oncology pretrained ` Model List**

This pipeline includes Named-Entity Recognition, Assertion Status, Relation Extraction and Entity Resolution models to extract information from oncology texts.


|index|model|index|model|
|-----:|:-----|-----:|:-----|
| 1| [oncology_biomarker_pipeline](https://nlp.johnsnowlabs.com/2022/12/01/oncology_biomarker_pipeline_en.html)  | 2| [oncology_general_pipeline](https://nlp.johnsnowlabs.com/2022/12/01/oncology_general_pipeline_en.html)  |
| 3| [oncology_therapy_pipeline](https://nlp.johnsnowlabs.com/2022/12/01/oncology_therapy_pipeline_en.html)  | 4| [oncology_diagnosis_pipeline](https://nlp.johnsnowlabs.com/2022/12/01/oncology_diagnosis_pipeline_en.html)  

In [44]:
oncology_pipeline = PretrainedPipeline("oncology_biomarker_pipeline", "en", "clinical/models")

oncology_biomarker_pipeline download started this may take some time.
Approx size to download 1.7 GB
[OK!]


In [45]:
oncology_pipeline.model.stages

[DocumentAssembler_eb57bf5cf30e,
 SentenceDetectorDLModel_6bafc4746ea5,
 REGEX_TOKENIZER_680255179420,
 WORD_EMBEDDINGS_MODEL_9004b1d00302,
 MedicalNerModel_8c59079bd37d,
 NER_CONVERTER_d79f163a7a32,
 MedicalNerModel_9fb8ec89af4a,
 NER_CONVERTER_fa7b541b0591,
 MedicalNerModel_74c49312f388,
 NER_CONVERTER_3a4f3eafbd08,
 MedicalNerModel_299a97740594,
 NER_CONVERTER_41fb99c5a464,
 ENTITY_EXTRACTOR_97d5ccc4aacb,
 MERGE_50eacd2b5f25,
 MERGE_1c1f6694a68e,
 ASSERTION_DL_d9d32f5f411d,
 ChunkFilterer_c0760fd4f0e6,
 ASSERTION_DL_163867728788,
 AssertionMerger_2841f135a8d0,
 POS_6f55785005bf,
 dependency_e7755462ba78,
 RelationExtractionModel_68ebe11369b6,
 RelationExtractionModel_513eb6317779,
 AnnotationMerger_8e6af0218865]

In [59]:
text = """The patient presents with persistent cough and hemoptysis, suggestive of lung cancer, but no evidence of metastasis to the liver was found. Also, there is axillary lymph node involvement consistent with metastatic breast carcinoma, though the patient denies any bone pain."""

result = oncology_pipeline.fullAnnotate(text)[0]

result.keys()

dict_keys(['ner_oncology_biomarker_chunk', 'cancer_dx', 'assertion_oncology', 'ner_biomarker_chunk', 'ner_oncology', 'document', 'merged_chunk', 'ner_biomarker', 're_oncology_granular', 'ner_oncology_biomarker', 'ner_oncology_test_chunk', 're_oncology_biomarker_result', 'all_relations', 'ner_oncology_test', 'ner_oncology_chunk', 'token', 'assertion_chunk_test', 'embeddings', 'pos_tags', 'assertion_chunk_oncology', 'assertion_merger', 'dependencies', 'assertion_oncology_test_binary', 'sentence'])

**NER Results**

In [60]:
chunks=[]
entities=[]
begins=[]
ends=[]
confidence=[]
for n in result['merged_chunk']:

    chunks.append(n.result)
    begins.append(n.begin)
    ends.append(n.end)
    entities.append(n.metadata['entity'])
    confidence.append(n.metadata['confidence'])

df = pd.DataFrame({'chunks':chunks, 'begin':begins, 'end':ends, 'entities':entities, 'confidence':confidence})

df

Unnamed: 0,chunks,begin,end,entities,confidence
0,lung cancer,73,83,CancerDx,0.99259996
1,metastasis,105,114,Metastasis,0.9397
2,metastatic,203,212,Metastasis,0.9961
3,breast carcinoma,214,229,CancerDx,0.9949


**Assertion Status Results**

In [61]:
chunks=[]
entities=[]
status=[]
confidence=[]

for n,m in zip(result['merged_chunk'], result['assertion_merger']):

    chunks.append(n.result)
    entities.append(n.metadata['entity'])
    status.append(m.result)
    confidence.append(m.metadata['confidence'])

df = pd.DataFrame({'chunks':chunks, 'entities':entities,
                   'assertion':status, 'confidence':confidence})

df

Unnamed: 0,chunks,entities,assertion,confidence
0,lung cancer,CancerDx,Possible,0.8867
1,metastasis,Metastasis,Absent,0.9983
2,metastatic,Metastasis,Present,0.9644
3,breast carcinoma,CancerDx,Present,0.9998


**Relation Extraction Results**

In [62]:
import pandas as pd

def get_relations_df (results, col='relations'):
  rel_pairs=[]
  for rel in results[0][col]:
      rel_pairs.append((
          rel.result,
          rel.metadata['entity1'],
          rel.metadata['entity1_begin'],
          rel.metadata['entity1_end'],
          rel.metadata['chunk1'],
          rel.metadata['entity2'],
          rel.metadata['entity2_begin'],
          rel.metadata['entity2_end'],
          rel.metadata['chunk2'],
          rel.metadata['confidence']
      ))

  rel_df = pd.DataFrame(rel_pairs, columns=['relation','entity1','entity1_begin','entity1_end','chunk1','entity2','entity2_begin','entity2_end','chunk2', 'confidence'])

  rel_df.confidence = rel_df.confidence.astype(float)

  return rel_df

In [66]:
text = """Immunohistochemistry was negative for thyroid transcription factor-1 and napsin A. The test was positive for ER and PR, and negative for HER2."""

In [67]:
result = oncology_pipeline.fullAnnotate(text)

rel_df = get_relations_df(result, 'all_relations')

rel_df[rel_df.relation!= "O"]

Unnamed: 0,relation,entity1,entity1_begin,entity1_end,chunk1,entity2,entity2_begin,entity2_end,chunk2,confidence
1,is_finding_of,Biomarker_Result,25,32,negative,Biomarker,38,67,thyroid transcription factor-1,0.923983
2,is_finding_of,Biomarker_Result,25,32,negative,Biomarker,73,80,napsin A,0.905294
3,is_finding_of,Biomarker_Result,96,103,positive,Biomarker,109,110,ER,0.923827
4,is_finding_of,Biomarker_Result,96,103,positive,Biomarker,116,117,PR,0.871721
5,is_finding_of,Biomarker_Result,96,103,positive,Oncogene,137,140,HER2,0.672547
8,is_finding_of,Biomarker_Result,124,131,negative,Oncogene,137,140,HER2,0.922905
9,is_finding_of,Biomarker_Result,25,32,negative,Biomarker,38,67,thyroid transcription factor-1,0.999216
10,is_finding_of,Biomarker_Result,25,32,negative,Biomarker,73,80,napsin A,0.976307
11,is_finding_of,Biomarker_Result,96,103,positive,Biomarker,109,110,ER,0.987886
12,is_finding_of,Biomarker_Result,96,103,positive,Biomarker,116,117,PR,0.959301


In [68]:
result[0]['all_relations']

[Annotation(category, 0, 32, O, {'chunk2': 'negative', 'confidence': '0.7320461', 'entity2_end': '32', 'chunk1': 'Immunohistochemistry', 'entity1': 'Pathology_Test', 'entity2_begin': '25', 'chunk2_confidence': '0.9933', 'entity1_begin': '0', 'sentence': '0', 'direction': 'both', 'entity1_end': '19', 'entity2': 'Biomarker_Result', 'chunk1_confidence': '0.9986'}, []),
 Annotation(category, 25, 67, is_finding_of, {'chunk2': 'thyroid transcription factor-1', 'confidence': '0.9239829', 'entity2_end': '67', 'chunk1': 'negative', 'entity1': 'Biomarker_Result', 'entity2_begin': '38', 'chunk2_confidence': '0.924675', 'entity1_begin': '25', 'sentence': '0', 'direction': 'both', 'entity1_end': '32', 'entity2': 'Biomarker', 'chunk1_confidence': '0.9933'}, []),
 Annotation(category, 25, 80, is_finding_of, {'chunk2': 'napsin A', 'confidence': '0.905294', 'entity2_end': '80', 'chunk1': 'negative', 'entity1': 'Biomarker_Result', 'entity2_begin': '73', 'chunk2_confidence': '0.9865', 'entity1_begin': '2