![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/healthcare/pipeline_parser/Parser.ipynb)


#   **📜 PipelineTracer and PipelineOutputParser**

### Starting the session

In [1]:
from johnsnowlabs import nlp
nlp.install()

Spark NLP Version : 5.3.2
Spark NLP_JSL Version : 5.4.0


spark session: <pyspark.sql.session.SparkSession object at 0x00000275E1469790>


## PipelineTracer

PipelineTracer is a class that allows to trace the stages of a pipeline and get information about them.
The `PipelineTracer` class provides functionality for tracing and retrieving information about the various stages of a pipeline.
It can be used to obtain detailed insights into the entities, assertions, and relationships utilized within the pipeline.
Compatibility with both `PipelineModel` and `PretrainedPipeline`.
It can be used with a PipelineModel or a PretrainedPipeline.
Additionally, it can be used to create a parser dictionary that can be used to create a PipelineOutputParser.



## **🔎 Functions**

**Functions**:

- `printPipelineSchema`: Prints the schema of the pipeline.
- `createParserDictionary`: Returns a parser dictionary that can be used to create a PipelineOutputParser
- `getPossibleEntities`: Returns a list of possible entities that the pipeline can include.
- `getPossibleAssertions`: Returns a list of possible assertions that the pipeline can include
- `getPossibleRelations`: Returns a list of possible relations that the pipeline can include.
- `getPipelineStages`: Returns a list of PipelineStage objects that represent the stages of the pipeline.
- `getParserDictDirectly`: Returns a parser dictionary that can be used to create a PipelineOutputParser. This method is used to get the parser dictionary directly without creating a PipelineTracer objec.
- `listAvailableModels`: Returns a list of available models for a given language and source
- `showAvailableModels`: Prints a list of available models for a given language and source.

## Oncology Pipeline

In [20]:
pipe = nlp.load("en.explain_doc.clinical_oncology.pipeline")

explain_clinical_doc_oncology download started this may take some time.
Approx size to download 1.8 GB
[OK!]


In [5]:

pipe.getPossibleAssertions()



['Past', 'Family', 'Absent', 'Hypothetical', 'Possible', 'Present']

In [6]:
pipe.getPossibleEntities()

['Cycle_Number',
 'Direction',
 'Histological_Type',
 'Biomarker_Result',
 'Site_Other_Body_Part',
 'Hormonal_Therapy',
 'Death_Entity',
 'Targeted_Therapy',
 'Route',
 'Tumor_Finding',
 'Duration',
 'Pathology_Result',
 'Chemotherapy',
 'Date',
 'Radiotherapy',
 'Radiation_Dose',
 'Oncogene',
 'Cancer_Surgery',
 'Tumor_Size',
 'Staging',
 'Pathology_Test',
 'Cancer_Dx',
 'Age',
 'Site_Lung',
 'Site_Breast',
 'Site_Liver',
 'Site_Lymph_Node',
 'Response_To_Treatment',
 'Site_Brain',
 'Immunotherapy',
 'Race_Ethnicity',
 'Metastasis',
 'Smoking_Status',
 'Imaging_Test',
 'Relative_Date',
 'Line_Of_Therapy',
 'Unspecific_Therapy',
 'Site_Bone',
 'Gender',
 'Cycle_Count',
 'Cancer_Score',
 'Adenopathy',
 'Grade',
 'Biomarker',
 'Invasion',
 'Frequency',
 'Performance_Status',
 'Dosage',
 'Cycle_Day',
 'Anatomical_Site',
 'Size_Trend',
 'Posology_Information',
 'Cancer_Therapy',
 'Lymph_Node',
 'Tumor_Description',
 'Lymph_Node_Modifier',
 'Alcohol',
 'BMI',
 'Communicable_Disease',
 'Obes

In [7]:
pipe.getPossibleRelations()

['is_size_of', 'is_date_of', 'is_location_of', 'is_finding_of']

#### with custom column_maps

In [10]:
column_maps = pipe.createParserDictionary()
column_maps.update({"document_identifier": "clinical_deidentification"})
column_maps

{'document_identifier': 'clinical_deidentification',
 'document_text': 'document',
 'entities': ['merged_chunk', 'merged_chunk_for_assertion'],
 'assertions': ['assertion'],
 'resolutions': [],
 'relations': ['all_relations'],
 'summaries': [],
 'deidentifications': [],
 'classifications': []}

In [11]:
res = pipe.predict("The Patient underwent a computed tomography (CT) scan of the abdomen and pelvis, which showed a complex ovarian mass. A Pap smear performed one month later was positive for atypical glandular cells suspicious for adenocarcinoma. The pathologic specimen showed extension of the tumor throughout the fallopian tubes, appendix, omentum, and 5 out of 5 enlarged lymph nodes. The final pathologic diagnosis of the tumor was stage IIIC papillary serous ovarian adenocarcinoma. Two months later, the patient was diagnosed with lung metastases.Neoadjuvant chemotherapy with the regimens of Cyclophosphamide (500 mg/m2) is being given for 6 cycles with poor response",
            parser_output=True,
            parser_config=column_maps)

In [12]:
res

{'result': [{'document_identifier': 'clinical_deidentification',
   'document_text': ['The Patient underwent a computed tomography (CT) scan of the abdomen and pelvis, which showed a complex ovarian mass. A Pap smear performed one month later was positive for atypical glandular cells suspicious for adenocarcinoma. The pathologic specimen showed extension of the tumor throughout the fallopian tubes, appendix, omentum, and 5 out of 5 enlarged lymph nodes. The final pathologic diagnosis of the tumor was stage IIIC papillary serous ovarian adenocarcinoma. Two months later, the patient was diagnosed with lung metastases.Neoadjuvant chemotherapy with the regimens of Cyclophosphamide (500 mg/m2) is being given for 6 cycles with poor response'],
   'entities': [{'chunk_id': '1b71b12a',
     'chunk': 'computed tomography',
     'begin': 24,
     'end': 42,
     'ner_label': 'Imaging_Test',
     'ner_source': 'ner_oncology_chunk',
     'ner_confidence': '0.9575'},
    {'chunk_id': 'ce9ac1a9',
  

In [13]:
import pandas as pd
pd.json_normalize(res['result'][0]["entities"])

Unnamed: 0,chunk_id,chunk,begin,end,ner_label,ner_source,ner_confidence
0,1b71b12a,computed tomography,24,42,Imaging_Test,ner_oncology_chunk,0.9575
1,ce9ac1a9,CT,45,46,Imaging_Test,ner_oncology_chunk,0.9565
2,3576c965,abdomen,61,67,Site_Other_Body_Part,ner_oncology_chunk,0.9446
3,cff2288c,pelvis,73,78,Site_Other_Body_Part,ner_oncology_chunk,0.6514
4,98848a68,ovarian,104,110,Site_Other_Body_Part,ner_oncology_chunk,0.7915
5,d3e628e9,mass,112,115,Tumor_Finding,ner_oncology_chunk,0.9557
6,3d8b6be0,Pap smear,120,128,Pathology_Test,ner_oncology_chunk,0.96725
7,4d03018b,one month later,140,154,Relative_Date,ner_oncology_chunk,0.8786667
8,8de23a92,atypical glandular cells,173,196,Pathology_Result,ner_oncology_chunk,0.7270667
9,70affced,adenocarcinoma,213,226,Cancer_Dx,ner_oncology_chunk,0.9992


#### without custom column_maps (createParserDictionary)

In [14]:
res = nlu.load("en.explain_doc.clinical_oncology.pipeline").predict(
            "The Patient underwent a computed tomography (CT) scan of the abdomen and pelvis, which showed a complex ovarian mass. A Pap smear performed one month later was positive for atypical glandular cells suspicious for adenocarcinoma. The pathologic specimen showed extension of the tumor throughout the fallopian tubes, appendix, omentum, and 5 out of 5 enlarged lymph nodes. The final pathologic diagnosis of the tumor was stage IIIC papillary serous ovarian adenocarcinoma. Two months later, the patient was diagnosed with lung metastases.Neoadjuvant chemotherapy with the regimens of Cyclophosphamide (500 mg/m2) is being given for 6 cycles with poor response",
            parser_output=True
        )

explain_clinical_doc_oncology download started this may take some time.
Approx size to download 1.8 GB
[OK!]


In [15]:
res

{'result': [{'document_identifier': 'Document',
   'document_text': ['The Patient underwent a computed tomography (CT) scan of the abdomen and pelvis, which showed a complex ovarian mass. A Pap smear performed one month later was positive for atypical glandular cells suspicious for adenocarcinoma. The pathologic specimen showed extension of the tumor throughout the fallopian tubes, appendix, omentum, and 5 out of 5 enlarged lymph nodes. The final pathologic diagnosis of the tumor was stage IIIC papillary serous ovarian adenocarcinoma. Two months later, the patient was diagnosed with lung metastases.Neoadjuvant chemotherapy with the regimens of Cyclophosphamide (500 mg/m2) is being given for 6 cycles with poor response'],
   'entities': [{'chunk_id': '1b71b12a',
     'chunk': 'computed tomography',
     'begin': 24,
     'end': 42,
     'ner_label': 'Imaging_Test',
     'ner_source': 'ner_oncology_chunk',
     'ner_confidence': '0.9575'},
    {'chunk_id': 'ce9ac1a9',
     'chunk': 'CT',

In [16]:
import pandas as pd
pd.json_normalize(res['result'][0])

Unnamed: 0,document_identifier,document_text,entities,assertions,resolutions,relations,summaries,deidentifications,classifications
0,Document,[The Patient underwent a computed tomography (...,"[{'chunk_id': '1b71b12a', 'chunk': 'computed t...","[{'chunk_id': '1b71b12a', 'chunk': 'computed t...",[],"[{'relation': 'O', 'chunk1_id': '3576c965', 'c...",[],[],[]


In [17]:
pd.json_normalize(res['result'][0]["entities"])

Unnamed: 0,chunk_id,chunk,begin,end,ner_label,ner_source,ner_confidence
0,1b71b12a,computed tomography,24,42,Imaging_Test,ner_oncology_chunk,0.9575
1,ce9ac1a9,CT,45,46,Imaging_Test,ner_oncology_chunk,0.9565
2,3576c965,abdomen,61,67,Site_Other_Body_Part,ner_oncology_chunk,0.9446
3,cff2288c,pelvis,73,78,Site_Other_Body_Part,ner_oncology_chunk,0.6514
4,98848a68,ovarian,104,110,Site_Other_Body_Part,ner_oncology_chunk,0.7915
5,d3e628e9,mass,112,115,Tumor_Finding,ner_oncology_chunk,0.9557
6,3d8b6be0,Pap smear,120,128,Pathology_Test,ner_oncology_chunk,0.96725
7,4d03018b,one month later,140,154,Relative_Date,ner_oncology_chunk,0.8786667
8,8de23a92,atypical glandular cells,173,196,Pathology_Result,ner_oncology_chunk,0.7270667
9,70affced,adenocarcinoma,213,226,Cancer_Dx,ner_oncology_chunk,0.9992


In [18]:
pd.json_normalize(res['result'][0]["assertions"])

Unnamed: 0,chunk_id,chunk,assertion,assertion_source
0,1b71b12a,computed tomography,Past,assertion
1,ce9ac1a9,CT,Past,assertion
2,d3e628e9,mass,Present,assertion
3,3d8b6be0,Pap smear,Past,assertion
4,8de23a92,atypical glandular cells,Present,assertion
5,70affced,adenocarcinoma,Possible,assertion
6,71dddb8a,pathologic specimen,Past,assertion
7,63e46bca,extension,Present,assertion
8,ac5748d2,tumor,Present,assertion
9,5d80c8a0,enlarged,Present,assertion


In [19]:
pd.json_normalize(res['result'][0]["relations"])

Unnamed: 0,relation,chunk1_id,chunk1,chunk2_id,chunk2,confidence,direction
0,O,3576c965,abdomen,d3e628e9,mass,0.9439166,both
1,O,cff2288c,pelvis,d3e628e9,mass,0.9611397,both
2,is_location_of,98848a68,ovarian,d3e628e9,mass,0.922661,both
3,is_finding_of,3d8b6be0,Pap smear,70affced,adenocarcinoma,0.52542114,both
4,is_location_of,ac5748d2,tumor,74e8e40b,fallopian tubes,0.9026299,both
5,is_location_of,ac5748d2,tumor,76146911,appendix,0.6649267,both
6,O,ac5748d2,tumor,dc74e652,omentum,0.80328876,both
7,Chemotherapy-Dosage,c2e02074,Neoadjuvant chemotherapy,98f81754,500 mg/m2,1.0,both
8,Chemotherapy-Cycle_Count,c2e02074,Neoadjuvant chemotherapy,bb801681,6 cycles,1.0,both
9,Chemotherapy-Dosage,d5d30ff5,Cyclophosphamide,98f81754,500 mg/m2,1.0,both
