![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/healthcare-nlp/04.12.Deidentification_NER_Profiling_Pipeline.ipynb)

# PHI Detection Profiling Pipeline

This pipeline is designed for profiling and benchmarking various de-identification models applied to clinical texts. It integrates multiple NER models and rule-based components that are commonly used for detecting and anonymizing protected health information (PHI). The pipeline includes models trained with embeddings_clinical, zero-shot NER models, regex matchers, text matchers, and contextual parsers. By consolidating these diverse approaches, it allows comprehensive evaluation and comparison of different de-identification strategies across clinical datasets.

# Setup

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install -q johnsnowlabs

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

In [None]:
from johnsnowlabs import nlp, medical

# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
nlp.install()

In [4]:
from johnsnowlabs import nlp, medical

# Automatically load license data and start a session with all jars user has access to
spark = nlp.start()

👌 Detected license file /content/6.0.3.spark_nlp_for_healthcare.json
👌 Launched [92mcpu optimized[39m session with with: 🚀Spark-NLP==6.0.3, 💊Spark-Healthcare==6.0.3, running on ⚡ PySpark==3.4.0


# Pipeline

In [5]:
phi_detection_pipeline = nlp.PretrainedPipeline('ner_profiling_deidentification', 'en', 'clinical/models')
phi_detection_pipeline.model.stages

ner_profiling_deidentification download started this may take some time.
Approx size to download 2.5 GB
[OK!]


[DocumentAssembler_5a339542eb3f,
 SentenceDetectorDLModel_6bafc4746ea5,
 REGEX_TOKENIZER_aa38e48ff4e8,
 WORD_EMBEDDINGS_MODEL_9004b1d00302,
 MedicalNerModel_dfe4c67dcc43,
 NerConverter_e6e524fe550b,
 MedicalNerModel_b611869af9b6,
 NerConverter_0f814e60a4a1,
 MedicalNerModel_32184c1db80b,
 NerConverter_bc015b5dc347,
 MedicalNerModel_adf94fcbedf5,
 NerConverter_ee19c743f77d,
 MedicalNerModel_80ef11d39c77,
 NerConverter_81f4072dd083,
 MedicalNerModel_ada39ac0d359,
 NerConverter_cea834b2e838,
 MedicalNerModel_9d4a08b1c03d,
 NerConverter_ee761261e933,
 MedicalNerModel_8aa43467dabc,
 NerConverter_4b03f0706dce,
 MedicalNerModel_5d9ebaf6a7c9,
 NerConverter_8df66db74cda,
 MedicalNerModel_2dd7df45612b,
 NerConverter_a4f11fb3190e,
 MedicalNerModel_d92d47622e85,
 NerConverter_b4e4551e94d0,
 MedicalNerModel_7a7fbb30ac62,
 NerConverter_4d7b6ae0fdcc,
 MedicalNerModel_8bdd6c4bd644,
 NerConverter_0bec839e4f5a,
 MedicalNerModel_d4076802b007,
 NerConverter_6b80df7555f5,
 MedicalNerModel_132c5bcaaa62,
 Ne

## Prediction

You can run the sample text through the pipeline with a single line of code to get predictions from all PHI detection models available in the Healthcare NLP library.

In [6]:
text= """Name : Hendrickson, Ora, Record date: 2093-01-13, Age: 25, # 719435. Dr. John Green, ID: 1231511863, IP 203.120.223.13. He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93. Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no:A334455B. Phone (302) 786-5227, 0295 Keats Street, San Francisco."""

result = phi_detection_pipeline.fullAnnotate(text)

## Evaluation

In [7]:
import pandas as pd
import plotly.graph_objects as go

class NERResultViewer:
    """
    A utility class to visualize NER (Named Entity Recognition) model outputs
    in different formats: chunks, token-level predictions, tuple results, and label distribution plots.
    """

    def __init__(self, light_result):
        """
        Initialize the class with the light_result dictionary from Spark NLP or similar output.
        """
        self.light_result = light_result

    def build_chunk_dataframe(self, results):
        """
        Helper function that builds a DataFrame from chunk results.
        """
        data = {
            'sentence': [r.metadata.get('sentence', '') for r in results],
            'begin': [r.begin for r in results],
            'end': [r.end for r in results],
            'chunks': [r.result for r in results],
            'entity': [r.metadata.get('entity') if r.metadata.get('entity') else r.metadata.get('field', '') for r in results],
            'confidence': [r.metadata.get('confidence', '') for r in results]
        }
        return pd.DataFrame(data)

    def show_chunk_results(self):
        """
        Prints chunk-level NER results as DataFrames for each model.
        """
        for key in self.light_result.keys():
            if key in ['sentence', 'token'] or "_chunks" not in key:
                continue

            model_name = "_".join(key.split("_")[:-1])
            results = self.light_result[key]

            print(f"\n{'*' * 20} {model_name} Model Results {'*' * 20}")

            if len(results) == 0:
                print("No Result For This Model")
                continue

            df = self.build_chunk_dataframe(results)
            display(df)

    def get_token_results(self):
        """
        Generate a DataFrame showing token-level predictions for all models.
        """

        # Extract base token data
        token_data = self.light_result["token"]
        df = pd.DataFrame({
            'sentence': [t.metadata["sentence"] for t in token_data],
            'begin': [t.begin for t in token_data],
            'end': [t.end for t in token_data],
            'token': [t.result for t in token_data]
        })

        # Add neural NER model predictions
        for key, model_output in self.light_result.items():
            if key in ["sentence", "token"] or "_chunks" in key or "ner" not in key:
                continue

            labels = [entry.result for entry in model_output]
            df[key] = labels

        # Add rule-based chunk predictions (like age_parser_chunks)
        for key, chunk_annotations in self.light_result.items():
            if key in ["sentence", "token"] or "ner" in key:
                continue

            chunk_df = self.build_chunk_dataframe(chunk_annotations)

            # Build an index-efficient token -> label mapping
            labels = []
            for _, token_row in df.iterrows():
                matched_entity = "O"
                for _, chunk_row in chunk_df.iterrows():
                    if (chunk_row.begin <= token_row.begin < chunk_row.end) or (token_row.begin <= chunk_row.begin < token_row.end):
                        matched_entity = chunk_row.entity
                        break
                labels.append(matched_entity)

            column_name = "_".join(key.split("_")[:-1])
            df[column_name] = labels

        return df


    def show_results_as_tuples(self):
        """
        Print the NER token predictions as (token, label) tuples for each model.
        """
        tokens = [j.result for j in self.light_result["token"]]

        for key in self.light_result.keys():
            if key == 'sentence' or "_chunks" in key:
                continue

            results = self.light_result[key]

            print(f"\n{'*'*20} {key} Model Results {'*'*20}")

            if len(results) == 0:
                print("No Result For This Model")
                continue

            labels = [r.result for r in results]
            paired = list(zip(tokens, labels))
            print(paired)

    def plot_entity_counts(self):
        """
        Plot a horizontal bar chart of entity counts (excluding "O") per model using Plotly.
        """
        # Get token-level result DataFrame
        result_df = self.get_token_results()

        # Extract only label columns (excluding token info)
        label_result_df = result_df[result_df.columns[4:]].copy()

        # Count how many non-'O' labels exist for each model
        label_count_list = [label_result_df[label_result_df[col] != "O"].shape[0] for col in label_result_df.columns]

        # Create bar plot
        fig = go.Figure(go.Bar(
            x=label_count_list,
            y=label_result_df.columns,
            orientation='h'
        ))

        fig.update_layout(
            autosize=False,
            width=1500,
            height=1500,
            margin=dict(l=50, r=50, b=100, t=100, pad=4),
            paper_bgcolor="LightSteelBlue",
            title={'text': "Counts of Labelled Entities", 'y':0.98, 'x':0.5, 'xanchor': 'center', 'yanchor': 'top'},
            titlefont=dict(size=30),
            yaxis=dict(title_text="Clinical NER Models", titlefont=dict(size=30)),
        )

        fig.show()

In [8]:
# Instantiate the class with your model output
viewer = NERResultViewer(result[0])  # Use light_result if not in a list

### Chunk Based Predictions

In [9]:
# Show chunk-level results
viewer.show_chunk_results()


******************** ner_deid_aipii Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,17,Hendrickson,NAME,0.9032
1,0,38,47,2093-01-13,SSN,0.8288
2,1,73,82,John Green,STREET,0.5083
3,1,89,98,1231511863,IDNUM,0.9761
4,1,104,117,203.120.223.13,SSN,0.8333
5,4,295,308,(302) 786-5227,PHONE,0.87525004
6,4,311,327,0295 Keats Street,STREET,0.6360667
7,4,330,342,San Francisco,CITY,0.72644997



******************** state_matcher Model Results ********************
No Result For This Model

******************** ssn_parser Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,3,246,256,333-44-6666,SSN,0.73



******************** phone_parser Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,4,295,308,(302) 786-5227,PHONE,0.73



******************** zip_matcher Model Results ********************
No Result For This Model

******************** ip_matcher Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,104,117,203.120.223.13,IP,



******************** ner_deid_large_langtest Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",NAME,0.9527667
1,0,38,47,2093-01-13,DATE,1.0
2,0,55,56,25,AGE,1.0
3,0,61,66,719435,CONTACT,0.8959
4,1,73,82,John Green,NAME,0.94799995
5,1,89,98,1231511863,ID,1.0
6,2,165,176,Day Hospital,LOCATION,0.7597
7,2,196,203,01/13/93,DATE,1.0
8,3,222,238,1HGBH41JXMN109286,ID,0.9999
9,3,276,286,no:A334455B,ID,0.9955



******************** ner_deid_enriched_langtest Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",PATIENT,0.7543333
1,0,38,47,2093-01-13,DATE,1.0
2,0,55,56,25,AGE,0.9952
3,0,61,66,719435,PHONE,0.3277
4,1,73,82,John Green,DOCTOR,0.9975
5,1,89,98,1231511863,IDNUM,0.9745
6,2,196,203,01/13/93,DATE,0.9997
7,3,222,238,1HGBH41JXMN109286,IDNUM,0.9491
8,4,295,308,(302) 786-5227,PHONE,0.73819995
9,4,311,327,0295 Keats Street,STREET,0.7242333



******************** medical_record_parser Model Results ********************
No Result For This Model

******************** ner_deid_subentity_augmented_langtest Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",PATIENT,0.77283335
1,0,38,47,2093-01-13,DATE,1.0
2,0,55,56,25,AGE,0.9738
3,0,61,66,719435,PHONE,0.5989
4,1,73,82,John Green,DOCTOR,0.74465
5,1,89,98,1231511863,IDNUM,0.9221
6,1,104,117,203.120.223.13,PHONE,0.6832
7,2,128,138,60-year-old,AGE,1.0
8,2,165,176,Day Hospital,HOSPITAL,0.97255003
9,2,196,203,01/13/93,DATE,1.0



******************** ner_deid_sd_large Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",NAME,0.9457333
1,0,38,47,2093-01-13,DATE,0.998
2,0,55,56,25,AGE,0.9671
3,0,61,66,719435,ID,0.6239
4,1,73,82,John Green,NAME,0.81405
5,1,89,98,1231511863,ID,0.8346
6,2,165,176,Day Hospital,LOCATION,0.9629
7,2,196,203,01/13/93,DATE,0.9986
8,4,301,308,786-5227,CONTACT,0.9966
9,4,311,327,0295 Keats Street,LOCATION,0.60546666



******************** ner_deid_subentity_augmented Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",PATIENT,0.9226
1,0,38,47,2093-01-13,DATE,1.0
2,0,55,56,25,AGE,0.9417
3,0,61,66,719435,PHONE,0.9616
4,1,73,82,John Green,DOCTOR,0.83624995
5,1,89,98,1231511863,DEVICE,0.7532
6,2,128,138,60-year-old,AGE,0.9993
7,2,165,176,Day Hospital,HOSPITAL,0.96959996
8,2,196,203,01/13/93,DATE,1.0
9,3,276,286,no:A334455B,IDNUM,0.9414



******************** date_of_death_parser Model Results ********************
No Result For This Model

******************** account_parser Model Results ********************
No Result For This Model

******************** ner_deid_generic_augmented_langtest Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",NAME,0.9737666
1,0,38,47,2093-01-13,DATE,1.0
2,0,55,56,25,AGE,0.9968
3,0,61,66,719435,CONTACT,0.9988
4,1,73,82,John Green,NAME,0.9802
5,1,89,98,1231511863,ID,0.9989
6,2,128,138,60-year-old,AGE,0.9998
7,2,196,203,01/13/93,DATE,1.0
8,4,295,308,(302) 786-5227,CONTACT,0.944
9,4,311,327,0295 Keats Street,LOCATION,0.9934667



******************** ner_deid_augmented Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",NAME,0.94523335
1,0,38,47,2093-01-13,DATE,0.9999
2,0,55,56,25,AGE,0.9699
3,0,61,66,719435,CONTACT,0.7206
4,1,73,82,John Green,NAME,0.90709996
5,1,89,98,1231511863,ID,0.9623
6,1,101,117,IP 203.120.223.13,CONTACT,0.54104996
7,2,165,176,Day Hospital,LOCATION,0.98440003
8,2,196,203,01/13/93,DATE,1.0
9,3,222,238,1HGBH41JXMN109286,ID,0.9942



******************** ner_deid_subentity_augmented_i2b2 Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",PATIENT,0.99090004
1,0,38,47,2093-01-13,DATE,0.9981
2,0,55,56,25,AGE,0.9278
3,0,61,66,719435,ZIP,0.5649
4,1,73,82,John Green,DOCTOR,0.99915
5,1,87,98,: 1231511863,IDNUM,0.80425
6,1,104,117,203.120.223.13,PHONE,0.4842
7,2,196,203,01/13/93,DATE,0.9988
8,3,222,238,1HGBH41JXMN109286,IDNUM,0.6205
9,3,245,256,#333-44-6666,PHONE,0.6083



******************** plate_parser Model Results ********************
No Result For This Model

******************** url_matcher Model Results ********************
No Result For This Model

******************** ner_deid_subentity_augmented_v2 Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",PATIENT,0.8605666
1,0,38,47,2093-01-13,DATE,1.0
2,0,55,56,25,AGE,0.9903
3,0,61,66,719435,PHONE,0.3717
4,1,73,82,John Green,DOCTOR,0.9954
5,1,89,98,1231511863,IDNUM,0.6217
6,1,104,117,203.120.223.13,USERNAME,0.8611
7,2,128,138,60-year-old,AGE,1.0
8,2,165,176,Day Hospital,HOSPITAL,0.9443
9,2,196,203,01/13/93,DATE,1.0



******************** ner_deid_sd Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",NAME,0.9279334
1,0,38,47,2093-01-13,DATE,0.8631
2,0,55,56,25,AGE,0.8409
3,0,61,66,719435,CONTACT,0.671
4,1,73,82,John Green,NAME,0.97230005
5,1,89,98,1231511863,ID,0.9977
6,1,104,117,203.120.223.13,CONTACT,0.5962
7,2,196,203,01/13/93,DATE,0.991
8,4,295,308,(302) 786-5227,CONTACT,0.92095006
9,4,311,327,0295 Keats Street,LOCATION,0.8502



******************** ner_deid_generic_docwise Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",NAME,0.9415333
1,0,38,47,2093-01-13,DATE,0.9999
2,0,55,56,25,AGE,0.9077
3,0,61,66,719435,CONTACT,0.9559
4,1,73,82,John Green,NAME,0.9592
5,1,89,98,1231511863,ID,0.9989
6,1,101,102,IP,NAME,0.8167
7,1,104,117,203.120.223.13,DATE,0.9628
8,2,128,138,60-year-old,AGE,0.9999
9,2,196,203,01/13/93,DATE,0.9999



******************** ner_deid_large Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",NAME,0.92716664
1,0,38,47,2093-01-13,DATE,1.0
2,0,55,56,25,AGE,0.996
3,0,61,66,719435,CONTACT,0.969
4,1,73,82,John Green,NAME,0.80565
5,1,89,98,1231511863,ID,1.0
6,2,165,176,Day Hospital,LOCATION,0.7364
7,2,196,203,01/13/93,DATE,1.0
8,3,222,238,1HGBH41JXMN109286,ID,1.0
9,4,295,308,(302) 786-5227,CONTACT,0.98990005



******************** ner_deid_generic_augmented Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",NAME,0.8842667
1,0,38,47,2093-01-13,DATE,1.0
2,0,55,56,25,AGE,0.9924
3,0,61,66,719435,CONTACT,0.9554
4,1,73,82,John Green,NAME,0.92685
5,1,89,98,1231511863,ID,0.9987
6,2,128,138,60-year-old,AGE,0.9991
7,2,165,176,Day Hospital,LOCATION,0.90195
8,2,196,203,01/13/93,DATE,1.0
9,3,222,238,1HGBH41JXMN109286,ID,0.9965



******************** email_matcher Model Results ********************
No Result For This Model

******************** ner_deidentify_dl Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",PATIENT,0.8725333
1,0,38,47,2093-01-13,DATE,0.9989
2,0,55,56,25,AGE,0.9934
3,0,61,66,719435,PHONE,0.9655
4,1,73,82,John Green,DOCTOR,0.9597
5,1,89,98,1231511863,IDNUM,0.9941
6,2,165,176,Day Hospital,HOSPITAL,0.96720004
7,2,196,203,01/13/93,DATE,0.9949
8,3,276,286,no:A334455B,IDNUM,0.9131
9,4,295,308,(302) 786-5227,PHONE,0.71275



******************** ner_deid_enriched Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",PATIENT,0.86520004
1,0,38,47,2093-01-13,DATE,1.0
2,0,55,56,25,AGE,0.9966
3,0,61,66,719435,PHONE,0.7897
4,1,73,82,John Green,DOCTOR,0.89825
5,2,196,203,01/13/93,DATE,1.0
6,4,295,308,(302) 786-5227,PHONE,0.99074996
7,4,311,327,0295 Keats Street,STREET,0.85420007
8,4,330,342,San Francisco,CITY,0.59475



******************** ner_deid_generic_augmented_allUpperCased_langtest Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",NAME,0.88206667
1,0,38,47,2093-01-13,DATE,1.0
2,0,55,56,25,AGE,0.886
3,0,61,66,719435,ID,0.9983
4,1,73,82,John Green,NAME,0.879
5,1,89,98,1231511863,ID,0.9985
6,2,128,138,60-year-old,AGE,1.0
7,2,196,203,01/13/93,DATE,1.0
8,3,222,243,"1HGBH41JXMN109286, SSN",NAME,0.89533335
9,3,245,256,#333-44-6666,ID,0.8229



******************** ner_deid_subentity_docwise Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",PATIENT,0.60826665
1,0,38,47,2093-01-13,DATE,0.9998
2,0,55,56,25,AGE,0.666
3,0,61,66,719435,DEVICE,0.544
4,1,73,82,John Green,DOCTOR,0.9247
5,1,89,98,1231511863,IDNUM,0.7804
6,1,104,117,203.120.223.13,DATE,0.9956
7,2,128,138,60-year-old,AGE,0.9684
8,2,196,203,01/13/93,DATE,0.9999
9,3,222,238,1HGBH41JXMN109286,IDNUM,0.6673



******************** zip_parser Model Results ********************
No Result For This Model

******************** license_parser Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,3,245,256,#333-44-6666,LICENSE,0.55
1,3,276,286,no:A334455B,LICENSE,0.64



******************** phone_matcher Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,4,295,308,(302) 786-5227,PHONE,



******************** dln_parser Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,3,276,286,no:A334455B,DLN,0.62



******************** zeroshot_ner_deid_subentity_merged_medium Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",PATIENT,0.9999984
1,0,38,47,2093-01-13,DATE,1.0
2,0,55,56,25,AGE,0.99999976
3,0,61,66,719435,IDNUM,0.5574845
4,1,73,82,John Green,DOCTOR,0.99999905
5,1,89,98,1231511863,IDNUM,0.95967746
6,2,128,138,60-year-old,AGE,1.0
7,2,165,176,Day Hospital,HOSPITAL,0.9999933
8,2,196,203,01/13/93,DATE,0.9999988
9,3,222,238,1HGBH41JXMN109286,IDNUM,0.9965166



******************** vin_parser Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,3,222,238,1HGBH41JXMN109286,VIN,0.72



******************** age_parser Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,55,56,25,AGE,0.71
1,0,61,63,719,AGE,0.63
2,0,64,66,435,AGE,0.59
3,2,128,129,60,AGE,0.5



******************** ner_deid_synthetic Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",NAME,0.8466
1,0,38,47,2093-01-13,DATE,0.9999
2,0,55,56,25,AGE,0.9548
3,0,61,66,719435,ID,0.9975
4,1,73,82,John Green,NAME,0.94869995
5,1,89,98,1231511863,ID,1.0
6,2,165,176,Day Hospital,LOCATION,0.78139997
7,2,196,203,01/13/93,DATE,1.0
8,3,222,238,1HGBH41JXMN109286,ID,1.0
9,4,295,298,(302,CONTACT,0.99670005



******************** country_matcher Model Results ********************
No Result For This Model

******************** date_matcher Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,38,47,2093-01-13,DATE,
1,2,196,203,01/13/93,DATE,
2,3,247,256,33-44-6666,DATE,



******************** ner_deid_subentity_augmented_docwise Model Results ********************


Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,7,22,"Hendrickson, Ora",PATIENT,0.9607666
1,0,38,47,2093-01-13,DATE,1.0
2,0,55,56,25,AGE,0.9904
3,0,61,66,719435,PHONE,0.9801
4,1,73,82,John Green,DOCTOR,0.99845004
5,1,89,98,1231511863,IDNUM,0.6383
6,1,104,117,203.120.223.13,PHONE,0.7589
7,2,128,138,60-year-old,AGE,1.0
8,2,165,176,Day Hospital,HOSPITAL,0.86335
9,2,196,203,01/13/93,DATE,1.0



******************** date_of_birth_parser Model Results ********************
No Result For This Model


### Specific Model Results

You can get the specific model results by using `build_chunk_dataframe` method.

In [10]:
result[0]["ip_matcher_chunks"]

[Annotation(chunk, 104, 117, 203.120.223.13, {'entity': 'IP', 'ner_source': 'ip_matcher_chunks', 'chunk': '0', 'sentence': '0'}, [])]

In [11]:
viewer.build_chunk_dataframe(result[0]["ip_matcher_chunks"])

Unnamed: 0,sentence,begin,end,chunks,entity,confidence
0,0,104,117,203.120.223.13,IP,


### Token Based Predictions

In [12]:
# Get token-level DataFrame
df = viewer.get_token_results()
df

Unnamed: 0,sentence,begin,end,token,ner_deid_augmented,ner_deid_subentity_augmented_i2b2,ner_deidentify_dl,ner_deid_subentity_augmented_langtest,ner_deid_synthetic,ner_deid_aipii,...,email_matcher,zip_parser,license_parser,phone_matcher,dln_parser,vin_parser,age_parser,country_matcher,date_matcher,date_of_birth_parser
0,0,0,3,Name,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
1,0,5,5,:,O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
2,0,7,17,Hendrickson,B-NAME,B-PATIENT,B-PATIENT,B-PATIENT,B-NAME,B-NAME,...,O,O,O,O,O,O,O,O,O,O
3,0,18,18,",",I-NAME,I-PATIENT,I-PATIENT,I-PATIENT,I-NAME,O,...,O,O,O,O,O,O,O,O,O,O
4,0,20,22,Ora,I-NAME,I-PATIENT,I-PATIENT,I-PATIENT,I-NAME,O,...,O,O,O,O,O,O,O,O,O,O
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
66,4,322,327,Street,I-LOCATION,I-STREET,I-STREET,I-STREET,I-LOCATION,I-STREET,...,O,O,O,O,O,O,O,O,O,O
67,4,328,328,",",O,O,O,O,O,O,...,O,O,O,O,O,O,O,O,O,O
68,4,330,332,San,B-LOCATION,B-CITY,B-CITY,B-CITY,B-LOCATION,B-CITY,...,O,O,O,O,O,O,O,O,O,O
69,4,334,342,Francisco,I-LOCATION,I-CITY,I-CITY,I-CITY,I-LOCATION,I-CITY,...,O,O,O,O,O,O,O,O,O,O


### Review the Results In the Form of Token-Label Tuples

In [13]:
# Show (token, label) tuples
viewer.show_results_as_tuples()


******************** ner_deid_augmented Model Results ********************
[('Name', 'O'), (':', 'O'), ('Hendrickson', 'B-NAME'), (',', 'I-NAME'), ('Ora', 'I-NAME'), (',', 'O'), ('Record', 'O'), ('date', 'O'), (':', 'O'), ('2093-01-13', 'B-DATE'), (',', 'O'), ('Age', 'O'), (':', 'O'), ('25', 'B-AGE'), (',', 'O'), ('#', 'O'), ('719435', 'B-CONTACT'), ('.', 'O'), ('Dr', 'O'), ('.', 'O'), ('John', 'B-NAME'), ('Green', 'I-NAME'), (',', 'O'), ('ID', 'O'), (':', 'O'), ('1231511863', 'B-ID'), (',', 'O'), ('IP', 'B-CONTACT'), ('203.120.223.13', 'I-CONTACT'), ('.', 'O'), ('He', 'O'), ('is', 'O'), ('a', 'O'), ('60-year-old', 'O'), ('male', 'O'), ('was', 'O'), ('admitted', 'O'), ('to', 'O'), ('the', 'O'), ('Day', 'B-LOCATION'), ('Hospital', 'I-LOCATION'), ('for', 'O'), ('cystectomy', 'O'), ('on', 'O'), ('01/13/93', 'B-DATE'), ('.', 'O'), ("Patient's", 'O'), ('VIN', 'O'), (':', 'O'), ('1HGBH41JXMN109286', 'B-ID'), (',', 'O'), ('SSN', 'O'), ('#333-44-6666', 'O'), (',', 'O'), ("Driver's", 'O'), ('l

### Comparison of Model Prediction Frequencies

In this bar chart, you can see how many PHI **tokens** were captured by each model.

In [14]:
# Plot PHI TOKEN counts
viewer.plot_entity_counts()