# Augmentation
This notebook recreates Table X from the paper XX and illustrates how to use the augmenters and scoring functions included in DaCy

In [1]:
import os
os.chdir("..")

In [2]:
import dacy
from dacy.augmenters import create_pers_augmenter, create_keyboard_augmenter, create_æøå_augmenter
from dacy.datasets import danish_names, muslim_names
from dacy.score import score, n_sents_score

import spacy
from spacy.training.augment import create_lower_casing_augmenter, dont_augment

from typing import Callable, List

from functools import partial

Start off by loading the test set and defining a function that applies the small Spacy model on the data.

In [4]:
test = dacy.datasets.dane(splits=["test"])


def apply_model(example, nlp):
    example.predicted = nlp(example.predicted.text)
    return example

# make an instance of apply_model using the spacy nlp
nlp = spacy.load("da_core_news_sm")
apply_spacy_model = partial(apply_model, nlp=nlp)

Let's test how well the model performs on the original data, data where names are changed to other Danish names, and data where names are changes to names of Muslim origin. The name augmenter allows us to specify a number of naming patterns we wish to augment the names to. Defaults are `["fn,ln", "abbpunct,ln"]`, which means names are augmented to the follow the pattern of either "first_name last_name" (e.g. Mette Frederiksen) or "abbreviated_first_name last_name" (e.g. M. Frederiksen). The patterns include "fn" (first name, Mette), "ln" (last name, Frederiksen), "abb" (abbreviated, M), "abbpunct" (abbreviated + ., M.). These patterns can be designed however you see fit. We will stick to the defaults for now. 

In [5]:
dk_name_dict = danish_names()
muslim_name_dict = muslim_names()

# Set keep_name to False to make the augmenter choose a new name from the dictionary
#   otherwise, it would simply make the name fit the pattern (e.g. make abbreviations)
# force_size ensures that the names are of the same length/format as the pattern.
dk_aug = create_pers_augmenter(dk_name_dict, force_size=True, keep_name=False)
muslim_aug = create_pers_augmenter(muslim_name_dict, force_size=True, keep_name=False)


In [6]:
scores_raw = score(test, apply_spacy_model, score_fn=["ents"])
scores_dk = score(test, apply_spacy_model, augmenter=dk_aug, score_fn=["ents"])
scores_muslim = score(test, apply_spacy_model, augmenter=muslim_aug, score_fn=["ents"])

In [7]:
scores = scores_raw + scores_dk + scores_muslim
scores.to_df()

Unnamed: 0,ents_p,ents_r,ents_f,ents_per_type_PER_p,ents_per_type_PER_r,ents_per_type_PER_f,ents_per_type_LOC_p,ents_per_type_LOC_r,ents_per_type_LOC_f,ents_per_type_MISC_p,ents_per_type_MISC_r,ents_per_type_MISC_f,ents_per_type_ORG_p,ents_per_type_ORG_r,ents_per_type_ORG_f
0,0.719262,0.629032,0.671128,0.768421,0.811111,0.789189,0.673267,0.708333,0.690355,0.68,0.561983,0.615385,0.71134,0.428571,0.534884
1,0.721881,0.632616,0.674308,0.766839,0.822222,0.793566,0.660194,0.708333,0.683417,0.708333,0.561983,0.626728,0.71134,0.428571,0.534884
2,0.696907,0.605735,0.64813,0.729282,0.733333,0.731302,0.641509,0.708333,0.673267,0.704082,0.570248,0.630137,0.69,0.428571,0.528736


Augmenting names to a Danish name which fit the pattern of either "fn,ln" or "abbpunct,ln" actually made it slightly easier for the model than the original test data. However, augmenting with muslim names made the model perform a lot worse than baseline - look at the recall for PERS!

Let's see how good the model is with names that start with an abbreviation. We will set `force_size` to `False` so only the first word will be augmented. `keep_name` will be `True`, so we're ensuring that we don't change the names, but only augment the first name.

In [9]:
abb_aug = create_pers_augmenter(dk_name_dict, patterns=["abbpunct"], force_size=False, keep_name=True)
scores_abb = score(test, apply_spacy_model, augmenter=abb_aug, score_fn=["ents"])
scores += scores_abb
scores.to_df()

Unnamed: 0,ents_p,ents_r,ents_f,ents_per_type_PER_p,ents_per_type_PER_r,ents_per_type_PER_f,ents_per_type_LOC_p,ents_per_type_LOC_r,ents_per_type_LOC_f,ents_per_type_MISC_p,ents_per_type_MISC_r,ents_per_type_MISC_f,ents_per_type_ORG_p,ents_per_type_ORG_r,ents_per_type_ORG_f
0,0.719262,0.629032,0.671128,0.768421,0.811111,0.789189,0.673267,0.708333,0.690355,0.68,0.561983,0.615385,0.71134,0.428571,0.534884
1,0.721881,0.632616,0.674308,0.766839,0.822222,0.793566,0.660194,0.708333,0.683417,0.708333,0.561983,0.626728,0.71134,0.428571,0.534884
2,0.696907,0.605735,0.64813,0.729282,0.733333,0.731302,0.641509,0.708333,0.673267,0.704082,0.570248,0.630137,0.69,0.428571,0.528736
3,0.705757,0.59319,0.644596,0.732558,0.7,0.715909,0.68,0.708333,0.693878,0.686869,0.561983,0.618182,0.704082,0.428571,0.532819
4,0.705757,0.59319,0.644596,0.732558,0.7,0.715909,0.68,0.708333,0.693878,0.686869,0.561983,0.618182,0.704082,0.428571,0.532819


Auch, yet another drop in recall. The models does not like abbreviated names.

Let's test how DaCy fares.


In [13]:
dacy_small = dacy.load("da_dacy_small_tft-0.0.0")
dacy_medium = dacy.load("da_dacy_medium_tft-0.0.0")
dacy_large = dacy.load("da_dacy_large_tft-0.0.0")

In [12]:
def score_augmenters(dataset, augmenters: List[Callable], apply_fn: Callable):

    baseline_score = score(dataset, apply_fn=apply_fn, score_fn=["ents"])
    scores = baseline_score
    for augmenter in augmenters:
        scores += score(dataset, augmenter=augmenter, apply_fn=apply_fn, score_fn=["ents"])
    return scores

In [14]:
# use Spacy's `dont_augment` to get baseline
augmenters = [dont_augment, dk_aug, muslim_aug, abb_aug]
apply_small_dacy = partial(apply_model, nlp=dacy_small)
apply_medium_dacy = partial(apply_model, nlp=dacy_medium)
apply_large_dacy = partial(apply_model, nlp=dacy_large)

In [16]:
score_small_dacy = score_augmenters(test, augmenters, apply_small_dacy)
score_medium_dacy = score_augmenters(test, augmenters, apply_large_dacy)
score_large_dacy = score_augmenters(test, augmenters, apply_large_dacy)

# score_small_dacy = score(test, augmenter=augmenters, apply_fn=apply_small_dacy, score_fn=["ents"])
# score_medium_dacy = score(test, augmenter=augmenters, apply_fn=apply_medium_dacy, score_fn=["ents"])
# score_large_dacy = score(test, augmenter=augmenters, apply_fn=apply_large_dacy, score_fn=["ents"])

dacy_scores = score_small_dacy + score_medium_dacy + score_large_dacy
dacy_score.to_df()["model"] = ["small"] * 4 + ["medium"] * 4 + ["large"] * 4

As you can see, the models obtain slightly different performance with the `dk_aug` and `muslim_aug` per run. This is because names are randomly sampled each time, where some names might be easier to predict than others. To account for this, `score` includes a `k` argument which you can use to run the model `k` times for a more robust performance estimate. 

## Reconstruction of table from paper
Alright, enough chat - let's test how different models fare on a battery of augmentations. 

First, define our augmenters.

In [3]:
# Randomly change 5%/15% of characters to a neighbouring key
keyboard_aug_05 = create_keyboard_augmenter(doc_level=1, char_level=0.05, keyboard="QWERTY_DA")
keyboard_aug_15 = create_keyboard_augmenter(doc_level=1, char_level=0.15, keyboard="QWERTY_DA")
# Change æ=ae, ø=oe, å=aa
æøå_aug = create_æøå_augmenter(doc_level=1, char_level=1)
# Lowercase text
lower_case_aug = create_lower_casing_augmenter(level=1)
# MAKE N_SENTS SCORE

augmenters += [keyboard_aug_05, keyboard_aug_15, æøå_aug, lower_case_aug]

Make a dict to store our models with names

In [12]:
from danlp.models import load_bert_ner_model
from NERDA.precooked import DA_BERT_ML

danlp_bert = load_bert_ner_model()
nerda_bert = DA_BERT_ML()

model_dict = {"spacy_small" : spacy.load("da_core_news_sm"),
              "spacy_medium": spacy.load("da_core_news_md"),
              "spacy_large" : spacy.load("da_core_news_lg"),
              "dacy_small" : dacy_small,
              "dacy_medium" : dacy_medium,
              "dacy_large" : dacy_large,
              "danlp_bert" : danlp_bert,
              "nerda_bert" : nerda_bert,
              }

NameError: name 'dacy_small' is not defined

Using NERDA and DaNLP's BERT models require a slightly more involved apply function.


In [8]:
def apply_bert_model(example, bert_model):
    doc = example.predicted
    # uses spacy tokenization
    tokens, labels = bert_model.predict([t.text for t in example.predicted])
    ent = []
    for i, t in enumerate(zip(doc, labels)):
        token, label = t

        # turn OOB labels into spans
        if label == "O":
            continue
        iob, ent_type = label.split("-")
        if (i - 1 >= 0 and iob == "I" and labels[i - 1] == "O") or (
            i == 0 and iob == "I"
        ):
            iob = "B"
        if iob == "B":
            start = i
        if i + 1 >= len(labels) or labels[i + 1].split("-")[0] != "I":
            ent.append(Span(doc, start, i + 1, label=ent_type))
    doc.set_ents(ent)
    example.predicted = doc
    return example

    ### DaNLP's BERT model requires transformers==3.5.1 (install with pip install transformers==3.5.1 --no-deps)

## Function to apply each model (+ create its partial apply fn) to each augmented augmented dataset (loop through dict and make a column with the name)