## Evaluate a custom Presidio Analyzer using the Presidio Evaluator framework

Steps:
1. Load dataset from file
2. Simple dataset statistics
3. Define the AnalyzerEngine object (and its parameters)
4. Align the dataset's entities to Presidio's entities
5. Set up the Evaluator object and Run Experiment
6. Evaluate results
7. Error analysis

Post Experiment:
- If you wish to include the experiment results for further analysis copy the generated experiment file into `/results<dataset_name>` and run Notebook `4_Compare_models.ipynb`

In [1]:
# install presidio evaluator via pip if not yet installed

#!pip install presidio-evaluator
#!pip install "presidio-analyzer[transformers]"

In [2]:
from pathlib import Path
from pprint import pprint
from collections import Counter
from typing import Dict, List
import json
import warnings
warnings.filterwarnings('ignore')

from presidio_evaluator import InputSample
from presidio_evaluator.evaluation import Evaluator, ModelError, Plotter
from presidio_evaluator.models import PresidioAnalyzerWrapper
from presidio_evaluator.experiment_tracking import get_experiment_tracker
from presidio_analyzer import AnalyzerEngine, AnalyzerEngineProvider, RecognizerResult

import pandas as pd

# Needed to import the local config files
import sys
import os
project_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
sys.path.append(project_root)
from config.flair_recognizer import FlairRecognizer
from config.config import create_full_config, NLP_ENGINES_CONFIG_FILES

from tqdm import tqdm
import time

pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
pd.set_option("display.max_colwidth", None)

%reload_ext autoreload
%autoreload 2
%matplotlib inline

Use cuda if available, note this requires initiating the nlp engine seperately and setting the device to the model

In [3]:
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# print("Device:" , device)

## 1. Load dataset from file

In [None]:
# Use the german NER model and pick your dataset
dataset_name = "generated_size_1500_date_April_09_2025.json"
dataset = InputSample.read_dataset_json(
    Path(Path.cwd(), "data", dataset_name),
    token_model_version="de_core_news_lg"
)

print(len(dataset))

tokenizing input:   0%|          | 0/1500 [00:00<?, ?it/s]

loading model de_core_news_lg


tokenizing input: 100%|██████████| 1500/1500 [00:32<00:00, 46.82it/s]

1500





This dataset was auto generated using german sentence-templates.

In [5]:
def get_entity_counts(dataset: List[InputSample]) -> Dict:
    """Return a dictionary with counter per entity type."""
    entity_counter = Counter()
    for sample in dataset:
        for tag in sample.tags:
            entity_counter[tag] += 1
    return entity_counter


## 2. Simple dataset statistics

In [6]:
entity_counts = get_entity_counts(dataset)
print("Count per entity:")
pprint(entity_counts.most_common(), compact=True)

print("\nMin and max number of tokens in dataset: "\
f"Min: {min([len(sample.tokens) for sample in dataset])}, "\
f"Max: {max([len(sample.tokens) for sample in dataset])}")

print(f"Min and max sentence length in dataset: " \
f"Min: {min([len(sample.full_text) for sample in dataset])}, "\
f"Max: {max([len(sample.full_text) for sample in dataset])}")

Count per entity:
[('O', 10309), ('STREET_ADDRESS', 3226), ('PHONE_NUMBER', 1109),
 ('AUT_LICENSE_PLATE', 936), ('PERSON', 814), ('EMAIL_ADDRESS', 287)]

Min and max number of tokens in dataset: Min: 3, Max: 28
Min and max sentence length in dataset: Min: 20, Max: 142


In [7]:
print("A few examples sentences containing each entity:\n")
for entity in entity_counts.keys():
    samples = [sample for sample in dataset if entity in set(sample.tags)]
    if len(samples) > 1 and entity != "O":
        print(f"Entity: <{entity}> two example sentences:\n"
              f"\n1) {samples[0].full_text}"
              f"\n2) {samples[1].full_text}"
              f"\n------------------------------------\n")

A few examples sentences containing each entity:

Entity: <AUT_LICENSE_PLATE> two example sentences:

1) Ich fahre ein Auto mit dem Kennzeichen K 56699 AO.
2) Kennzeichen: T 87813 NV.
------------------------------------

Entity: <EMAIL_ADDRESS> two example sentences:

1) Die E-Mail-Adresse gemäß Quelle: Kontaktieren Sie uns via andrey37@example.org.
2) Schreiben Sie an ahenk@example.com.
------------------------------------

Entity: <STREET_ADDRESS> two example sentences:

1) Standort laut Dokument: Bitte kommen Sie zu 929 Linke Spurs Suite 061
Achimmouth
, SD
 43441.
2) Standort laut Dokument: Ich wohne in Ludmila Junction Hornig Island
 Apt. 390
 Silkeshire
 Gibraltar 75807.
------------------------------------

Entity: <PERSON> two example sentences:

1) Kontaktperson: Christina Ziegert.
2) Laut Bericht: Kontaktperson: Irmela Herrmann MBA..
------------------------------------

Entity: <PHONE_NUMBER> two example sentences:

1) Hotline: +41 (0)43 859 59 41.
2) Telefonnummer laut Kon

## 3. Define the AnalyzerEngine object 
In this case, we load the YAML configuration to create our AnalyzerEngine

### 3.1 Set up the NlpEngine
The NLP engine is in charge of text processing using spaCy, and named entity recognition using a transformers model

In [8]:
# Omitted due to the YAML config file

### 3.2 Set up the relevant recognizers
Add and remove recognizers to fit the dataset in hand. 
Adding simple titles and zip code recognizers, another deny list for things that aren't considered PII but labeled as such,
and removing all the recognizers that don't map to entities in our dataset.

In [9]:
# Omitted due to the YAML Config file

### 3.3 Configure the context mechanism
Configure the `LemmaContextAawareEnhancer` which uses surrounding words to increase confidence in detection

In [10]:
from presidio_analyzer.context_aware_enhancers import LemmaContextAwareEnhancer

# Set up the context aware enhancer
context_enhancer = LemmaContextAwareEnhancer(context_prefix_count=10, 
                                             context_suffix_count=10)

### 3.4 Create the AnalyzerEngine object

For each model create a seperate engine based on the config files

- For the `transformers` that ignore labels `PERSON` and `ORG` add a `FlairRecognizer`
- For the `transformers` that do not ignore labels `PERSON` and `ORG` a `FlairRecognizer` can be added

In [11]:
def create_analyzer_engine(config_file: Path) -> AnalyzerEngine:
    analyzer_engine = AnalyzerEngineProvider(analyzer_engine_conf_file=config_file).create_engine()
    analyzer_engine.context_aware_enhancer = context_enhancer
    return analyzer_engine

In [None]:
analyzer_engines = {}
flair_recognizer = (
                FlairRecognizer(supported_language="en"),
                FlairRecognizer(supported_language="de")
)

# Loop through the engines and add them to the dict
for engine_name, _ in NLP_ENGINES_CONFIG_FILES.items():
    analyzer_engine = create_analyzer_engine(create_full_config(engine_name))
    
    # For the transfomers add a flair recognizer if needed (optional)
    if engine_name == "distillbert" or engine_name == "piiranha":
        ######## ADD THIS PART WHEN HAVING A TRANSFOMER NLP ENGINE WHICH DOES NOT DETECT PERSON AND ORG ##########
        analyzer_engine.registry.add_recognizer(flair_recognizer[0])
        analyzer_engine.registry.add_recognizer(flair_recognizer[1])
        ##########################################################################################################
    analyzer_engines[engine_name] = analyzer_engine

2025-04-14 17:12:51,630 SequenceTagger predicts: Dictionary with 20 tags: <unk>, O, S-ORG, S-MISC, B-PER, E-PER, S-LOC, B-ORG, E-ORG, I-PER, S-PER, B-MISC, I-MISC, E-MISC, I-ORG, B-LOC, E-LOC, I-LOC, <START>, <STOP>
2025-04-14 17:13:02,418 SequenceTagger predicts: Dictionary with 19 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-MISC, B-MISC, E-MISC, I-MISC, <START>, <STOP>


Device set to use cpu
Device set to use cpu
Device set to use cpu
Device set to use cpu
Device set to use cpu
Device set to use cpu


Remove some Recognizers which should not be tested

In [13]:
#rec = "AUTLicensePlateRecognizer"
#analyzer_engine.registry.remove_recognizer(rec)

In [14]:
# Print the configs for the different engines
for analyzer_engine_name, analyzer_engine in analyzer_engines.items():
    pprint(f"Analyzer engine: {analyzer_engine_name}\n")
    pprint(f"Supported entities for German:")
    pprint(analyzer_engine.get_supported_entities("de"), compact=True)

    print(f"\nLoaded recognizers for German:")
    pprint([rec.name for rec in analyzer_engine.registry.get_recognizers("de", all_fields=True)], compact=True)

    print(f"\nLoaded Context Aware Enhancer:")
    print(analyzer_engine.context_aware_enhancer.__class__.__name__)
    pprint(json.dumps(analyzer_engine.context_aware_enhancer.__dict__), compact=True)

    print(f"\nLoaded NER models:")
    pprint(analyzer_engine.nlp_engine.models)

'Analyzer engine: spacy\n'
'Supported entities for German:'
['PERSON', 'LOCATION', 'EMAIL_ADDRESS', 'PHONE_NUMBER', 'ORGANIZATION',
 'AUT_LICENSE_PLATE']

Loaded recognizers for German:
['EmailRecognizer', 'PhoneRecognizer', 'AUTLicensePlateRecognizer',
 'SpacyRecognizer']

Loaded Context Aware Enhancer:
LemmaContextAwareEnhancer
('{"context_similarity_factor": 0.35, "min_score_with_context_similarity": '
 '0.4, "context_prefix_count": 10, "context_suffix_count": 10}')

Loaded NER models:
[{'lang_code': 'en', 'model_name': 'en_core_web_md'},
 {'lang_code': 'de', 'model_name': 'de_core_news_md'},
 {'lang_code': 'it', 'model_name': 'it_core_news_md'}]
'Analyzer engine: piiranha\n'
'Supported entities for German:'
['PERSON', 'LOCATION', 'EMAIL_ADDRESS', 'PHONE_NUMBER', 'EMAIL',
 'AUT_LICENSE_PLATE']

Loaded recognizers for German:
['EmailRecognizer', 'PhoneRecognizer', 'AUTLicensePlateRecognizer',
 'TransformersRecognizer', 'Flair Analytics']

Loaded Context Aware Enhancer:
LemmaContextAw

In [15]:
# Test Analyzer
for analyzer_engine_name, analyzer_engine in analyzer_engines.items():
    pprint(f"Analyzer engine: {analyzer_engine_name}")
    text="Das in der Akte vermerkte Kennzeichen ist: Nummernschild: S 74862 ZU"
    res = analyzer_engine.analyze(text=text, 
                                language="de", 
                                return_decision_process=True)
    for result in res:
        print(f"\nEntity: {result.entity_type}, Text: {text[result.start:result.end]}\n\nAnalysis explanation:")
        pprint(result.analysis_explanation)

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


'Analyzer engine: spacy'

Entity: AUT_LICENSE_PLATE, Text: S 74862 ZU

Analysis explanation:
{'recognizer': 'AUTLicensePlateRecognizer', 'pattern_name': 'Austrian license plate', 'pattern': '\\b[A-Z]{1,2}[-\\s]?[0-9]{1,5}[-\\s]?[A-Z]{0,2}\\b', 'original_score': 0.4, 'score': 0.75, 'textual_explanation': 'Detected by `AUTLicensePlateRecognizer` using pattern `Austrian license plate`', 'score_context_improvement': 0.35, 'supportive_context_word': 'kennzeichen', 'validation_result': None, 'regex_flags': 26}
'Analyzer engine: piiranha'

Entity: AUT_LICENSE_PLATE, Text: S 74862 ZU

Analysis explanation:
{'recognizer': 'AUTLicensePlateRecognizer', 'pattern_name': 'Austrian license plate', 'pattern': '\\b[A-Z]{1,2}[-\\s]?[0-9]{1,5}[-\\s]?[A-Z]{0,2}\\b', 'original_score': 0.4, 'score': 0.75, 'textual_explanation': 'Detected by `AUTLicensePlateRecognizer` using pattern `Austrian license plate`', 'score_context_improvement': 0.35, 'supportive_context_word': 'kennzeichen', 'validation_result': No

## 4. Align the dataset's entities to Presidio's entities

There is possibly a difference between the names of entities in the dataset, and the names of entities Presidio can detect.
For example, it could be that a dataset labels a name as PER while Presidio returns PERSON. To be able to compare the predicted value to the actual and gather metrics, an alignment between the entity names is necessary. Consider changing the mapping if your dataset and/or Presidio instance supports difference entity types.

In [16]:
entities_mapping=PresidioAnalyzerWrapper.presidio_entities_map # default mapping

# Include the license plate mapping
entities_mapping.update({
    "AUT_LICENSE_PLATE": "AUT_LICENSE_PLATE"  # Map to Presidio Entity Type
})

print("Entities mapping:")
pprint(entities_mapping)

dataset = Evaluator.align_entity_types(
    dataset, 
    entities_mapping=entities_mapping, 
    allow_missing_mappings=True
)
new_entity_counts = get_entity_counts(dataset)
print("\nCount per entity after alignment:")
pprint(new_entity_counts.most_common(), compact=True)

dataset_entities = list(new_entity_counts.keys())

Entities mapping:
{'ADDRESS': 'LOCATION',
 'AGE': 'AGE',
 'AUT_LICENSE_PLATE': 'AUT_LICENSE_PLATE',
 'BIRTHDAY': 'DATE_TIME',
 'CITY': 'LOCATION',
 'CREDIT_CARD': 'CREDIT_CARD',
 'CREDIT_CARD_NUMBER': 'CREDIT_CARD',
 'DATE': 'DATE_TIME',
 'DATE_OF_BIRTH': 'DATE_TIME',
 'DATE_TIME': 'DATE_TIME',
 'DOB': 'DATE_TIME',
 'DOMAIN': 'URL',
 'DOMAIN_NAME': 'URL',
 'EMAIL': 'EMAIL_ADDRESS',
 'EMAIL_ADDRESS': 'EMAIL_ADDRESS',
 'FACILITY': 'LOCATION',
 'FIRST_NAME': 'PERSON',
 'GPE': 'LOCATION',
 'HCW': 'PERSON',
 'HOSP': 'ORGANIZATION',
 'HOSPITAL': 'ORGANIZATION',
 'IBAN': 'IBAN_CODE',
 'IBAN_CODE': 'IBAN_CODE',
 'ID': 'ID',
 'IP_ADDRESS': 'IP_ADDRESS',
 'LAST_NAME': 'PERSON',
 'LOC': 'LOCATION',
 'LOCATION': 'LOCATION',
 'NAME': 'PERSON',
 'NATIONALITY': 'NRP',
 'NORP': 'NRP',
 'NRP': 'NRP',
 'O': 'O',
 'ORG': 'ORGANIZATION',
 'ORGANIZATION': 'ORGANIZATION',
 'PATIENT': 'PERSON',
 'PATORG': 'ORGANIZATION',
 'PER': 'PERSON',
 'PERSON': 'PERSON',
 'PHONE': 'PHONE_NUMBER',
 'PHONE_NUMBER': 'PHONE

## 5. Set up the Evaluator object and run Experiment

In [None]:
total_models = len(analyzer_engines)
# Store for later reference
all_results = {}
all_evaluators = {}

# Iterate over engines with tqdm for progress
for idx, (analyzer_engine_name, analyzer_engine) in enumerate(tqdm(analyzer_engines.items(), desc="🔍 Evaluating models")):
    pprint(f"## 🔬 Evaluation {idx + 1}/{total_models} — **{analyzer_engine_name}**")
    
    # Start timing
    start_time = time.time()
    
    # Set up the experiment tracker to log the experiments for reproducibility
    experiment = get_experiment_tracker()

    # Create the evaluator object
    evaluator = Evaluator(model=analyzer_engine)

    # Track model and dataset params
    params = {"dataset_name": dataset_name, "model_name": analyzer_engine_name}
    params.update(evaluator.model.to_log())
    experiment.log_parameters(params)
    experiment.log_dataset_hash(dataset)
    experiment.log_parameter("entity_mappings", json.dumps(entities_mapping))
    
    # Evaluate (add tqdm inside if dataset is iterable)
    if hasattr(dataset, "__iter__"):
        evaluation_results = evaluator.evaluate_all(tqdm(dataset, desc=f"⚙️ Evaluating {analyzer_engine_name}"))
    else:
        evaluation_results = evaluator.evaluate_all(dataset)
    
    # Calculate score
    results = evaluator.calculate_score(evaluation_results)

    # Track experiment results
    experiment.log_metrics(results.to_log())
    entities, confmatrix = results.to_confusion_matrix()
    experiment.log_confusion_matrix(matrix=confmatrix, labels=entities)

    # end experiment
    experiment.end()
    elapsed_time = time.time() - start_time
    
    # ⬇️ Store for later
    all_results[analyzer_engine_name] = results
    all_evaluators[analyzer_engine_name] = evaluator
    
    print(f"✅ Done with {analyzer_engine_name} in {elapsed_time:.2f} seconds\n")

## 6. Evaluate results

In [None]:
model_plotters = {}
for model_name, results in all_results.items():
    plotter = Plotter(
        results=results,
        model_name=model_name,
        beta=2
    )
    plotter.plot_scores()
    model_plotters[model_name] = plotter

In [None]:
for model_name, results in all_results.items():
    pprint({"Model": model_name, "PII F":results.pii_f, "PII recall": results.pii_recall, "PII precision": results.pii_precision})

{'Model': 'piiranha',
 'PII F': 0.8354142596250606,
 'PII precision': 0.9054736416885477,
 'PII recall': 0.8195612431444241}
{'Model': 'distillbert',
 'PII F': 0.8605826397146253,
 'PII precision': 0.9075235109717869,
 'PII recall': 0.8495964783565664}


## 7. Error analysis

Now let's look into results to understand what's behind the metrics we're getting.
Note that evaluation is never perfect. Some things to consider:

1. There's often a mismatch between the annotated span and the predicted span, which isn't necessarily a mistake. For example: `<Southern France>` compared with `Southern <France>`. In the second text, the word `Southern` was not annotated/predicted as part of the entity, but that's not necessarily an error.
2. Token based evaluation (which is used here) counts the number of true positive / false positive / false negative tokens. Some entities might be broken into more tokens than others. For example, the phone number `222-444-1234` could be broken into five different tokens, whereas `Krishna` would be broken into one token, resulting in phone numbers having more influence on metrics than names.
3. The synthetic dataset used here isn't representative of a real dataset. Consider using more realistic datasets for evaluation

In [None]:
for model_name, plotter in model_plotters.items():
    plotter.plot_confusion_matrix(entities=entities, confmatrix=confmatrix)

In [None]:
for model_name, plotter in model_plotters.items():
    display(Markdown(f"Model name: {model_name}"))
    plotter.plot_most_common_tokens()

NameError: name 'Markdown' is not defined

### 7a. False positives
#### Most common false positive tokens:

In [None]:
for model_name, results in all_results.items():
    display(Markdown(f"Model name: {model_name}"))
    ModelError.most_common_fp_tokens(results.model_errors)

Most common false positive tokens:
[('der', 204),
 ('von', 113),
 ('zur', 111),
 ('Verfügung', 100),
 ('dem', 48),
 ('Bei', 46),
 ('um', 43),
 ('Leitung', 38),
 ('AE', 37),
 ('AA', 33)]
---------------
Example sentence with each FP token:
	- Zum Abholen der Messematerialien steht Ihnen morgen zwischen 8:00 und 12:00 Uhr unser Servicefahrzeug mit dem Kennzeichen K 26653 YQ vor dem Haupteingang zur Verfügung. (`der` pred as PERSON)
	- Die Bestellung wurde von Edwin Stadelmann aufgegeben und wird morgen mit dem Lieferfahrzeug N 31256 TR zwischen 10:00 und 12:00 Uhr an die angegebene Adresse Unit 7090 Box 9592
DPO AA 63843 zugestellt. (`von` pred as PERSON)
	- Bei Rückfragen zur neuen Datenschutzrichtlinie steht Ihnen Olena Wilmsen als zertifizierter Datenschutzbeauftragter jederzeit telefonisch oder per E-Mail zur Verfügung. (`zur` pred as PERSON)
	- Zum Abholen der Messematerialien steht Ihnen morgen zwischen 8:00 und 12:00 Uhr unser Servicefahrzeug mit dem Kennzeichen K 26653 YQ vor dem

[('der', 204),
 ('von', 113),
 ('zur', 111),
 ('Verfügung', 100),
 ('dem', 48),
 ('Bei', 46),
 ('um', 43),
 ('Leitung', 38),
 ('AE', 37),
 ('AA', 33)]

#### More FP analysis

In [None]:
for model_name, results in all_results.items():
    display(Markdown(f"Model name: {model_name}"))
    
    fps_df = ModelError.get_fps_dataframe(results.model_errors, entity=["AUT_LICENSE_PLATE"])
    fps_df[["full_text", "token", "annotation", "prediction"]].head(5)

Unnamed: 0,full_text,token,annotation,prediction
0,Der Techniker Rebekka Hettner-Koch wird mit dem Servicefahrzeug L 44458 EC morgen zwischen 13:00 und 15:00 Uhr in Unit 8541 Box 5249\nDPO AA 66549 eintreffen und ist vorab unter +41 28 982 75 85 erreichbar.,eintreffen,O,AUT_LICENSE_PLATE
1,"Das neu erworbene Schulungsgelände in PSC 2842, Box 2764\nAPO AP 22588 bietet mit seinem weitläufigen Park ideale Bedingungen für Team-Building-Maßnahmen und Outdoor-Aktivitäten aller Art.",bietet,O,AUT_LICENSE_PLATE
2,"Die jährliche Aktionärsversammlung findet am 15. Mai um 14:00 Uhr in unserer Hauptverwaltung 048 Schaaf Square Suite 029\nWilmsport\n, WV\n 69316 statt, Einlass ab 13:00 Uhr mit Registrierung im Foyer.",15.,O,AUT_LICENSE_PLATE
3,"Die jährliche Aktionärsversammlung findet am 15. Mai um 14:00 Uhr in unserer Hauptverwaltung 048 Schaaf Square Suite 029\nWilmsport\n, WV\n 69316 statt, Einlass ab 13:00 Uhr mit Registrierung im Foyer.",um,O,AUT_LICENSE_PLATE
4,"Die jährliche Aktionärsversammlung findet am 15. Mai um 14:00 Uhr in unserer Hauptverwaltung 048 Schaaf Square Suite 029\nWilmsport\n, WV\n 69316 statt, Einlass ab 13:00 Uhr mit Registrierung im Foyer.",14:00,O,AUT_LICENSE_PLATE
5,"Die jährliche Aktionärsversammlung findet am 15. Mai um 14:00 Uhr in unserer Hauptverwaltung 048 Schaaf Square Suite 029\nWilmsport\n, WV\n 69316 statt, Einlass ab 13:00 Uhr mit Registrierung im Foyer.",ab,O,AUT_LICENSE_PLATE
6,"Die jährliche Aktionärsversammlung findet am 15. Mai um 14:00 Uhr in unserer Hauptverwaltung 048 Schaaf Square Suite 029\nWilmsport\n, WV\n 69316 statt, Einlass ab 13:00 Uhr mit Registrierung im Foyer.",13:00,O,AUT_LICENSE_PLATE
7,Das neu erworbene Schulungsgelände in USCGC Butte\nFPO AE 99888 bietet mit seinem weitläufigen Park ideale Bedingungen für Team-Building-Maßnahmen und Outdoor-Aktivitäten aller Art.,bietet,O,AUT_LICENSE_PLATE
8,"Der Servicevertrag für das Gebäude in 8158 Saban Square\nIsmailmouth, HI 27905 wird von Prof. Philip Girschner B.Eng. koordiniert, bei technischen Notfällen wenden Sie sich bitte direkt an die Bereitschaft unter +46 (0)344 673 89.",wird,O,AUT_LICENSE_PLATE
9,"Das Kundengespräch mit der Geschäftsführung findet morgen um 10:00 Uhr in 7926 Heinz-Wilhelm Tunnel Apt. 934, New Mohammedchester, Gambia 46818 statt, bitte informieren Sie Dipl.-Ing. Eduard Bauer B.Sc. unter 363.381.0675x659 bei eventuellen Verzögerungen.",um,O,AUT_LICENSE_PLATE


### 7b. False negatives (FN)

#### Most common false negative examples + a few samples with FN

In [None]:
for model_name, results in all_results.items():
    display(Markdown(f"Model name: {model_name}"))
 
    ModelError.most_common_fn_tokens(results.model_errors, n=10)

Most common false negative tokens:
[('+41', 57),
 ('Prof.', 53),
 ('B.Sc.', 45),
 ('MBA', 43),
 ('FPO', 40),
 ('AE', 37),
 ('DPO', 33),
 ('AA', 33),
 ('B.Eng', 32),
 ('B.A.', 32),
 ('Univ.', 30),
 ('AP', 30),
 ('Dipl.-Ing.', 25),
 ('Dr.', 25),
 ('APO', 24)]
---------------
Example sentence with each FN token:
	- Der Techniker Rebekka Hettner-Koch wird mit dem Servicefahrzeug L 44458 EC morgen zwischen 13:00 und 15:00 Uhr in Unit 8541 Box 5249
DPO AA 66549 eintreffen und ist vorab unter +41 28 982 75 85 erreichbar. (`+41` annotated as PHONE_NUMBER)
	- Während der Systemumstellung am kommenden Wochenende ist Univ.Prof. Janina Müller als technischer Koordinator verantwortlich und unter der Notfallnummer +41 (0)38 367 49 35 rund um die Uhr erreichbar. (`Prof.` annotated as PERSON)
	- Der Personalrat hat Siglinde Hethur B.Sc. als Vermittler im aktuellen Tarifkonflikt vorgeschlagen, da sein diplomatisches Geschick schon bei früheren Verhandlungen erfolgreich war. (`B.Sc.` annotated as PERSON

[('+41', 57),
 ('Prof.', 53),
 ('B.Sc.', 45),
 ('MBA', 43),
 ('FPO', 40),
 ('AE', 37),
 ('DPO', 33),
 ('AA', 33),
 ('B.Eng', 32),
 ('B.A.', 32),
 ('Univ.', 30),
 ('AP', 30),
 ('Dipl.-Ing.', 25),
 ('Dr.', 25),
 ('APO', 24)]

#### More FN analysis

In [None]:
for model_name, results in all_results.items():
    display(Markdown(f"Model name: {model_name}"))
    
    fns_df = ModelError.get_fns_dataframe(results.model_errors, entity=["PERSON"])
    fns_df[["full_text", "token", "annotation", "prediction"]].head(5)