# Skript2: Finetuning eines BERT-Modells mit Hyperparametersuche für NER anhand von GermEval2014
Dieses Notebook beschreibt das Training eines Transformer Modells für die Task des Named Entity Recognition. Es wird ein BERT-Modell mit der nativen Hugging Face Transformers Bibliothek trainiert. Dafür wird vor dem Training mithilfe automatischer Hyperparametersuche versucht, bessere Trainingsparameter zu finden, um die Modellleistung zu verbessern. Das Training erfolgt gemäß des Splits wie es die Shared Task 2014 vorgab.

In [2]:
# Installation der benötigten Bibliotheken
#!pip install transformers
#!pip install datasets

Zunächst werden die Skriptparameter gesetzt. Die `task` Variable gibt an, für welche Aufgabe das Modell trainiert werden soll.
`ner` steht hier für Named Entity Recognition, dabei handelt es sich um ein Token-Klassifizierungsproblem. Die Variable `model_checkpoint` beinhaltet den Namen des zu nutzenden vortrainierten Transformer Modells. Das Modell `'deepset/gbert-base` ist ein BERT-Modell, welches mithilfe von deutschen Texten trainiert wurde. Der Name kann durch einen beliebigen Modellcheckpoint aus dem Transformers Model Hub ersetzt werden:  https://huggingface.co/models

In [3]:
task = "ner"
model_checkpoint = 'deepset/gbert-base'
batch_size=32

## Herunterladen des Datasets germeval2014

Neben der Transformers Bibliothek bietet die Hugging Face Inc. mit der `Datasets` Bibliothek eine Sammlung von Datensätzen und Metriken zum herunterladen an. Um diese zu nutzen, werden zuerst die beiden Methoden `load_dataset` und `load_metric` importiert.

In [4]:
from datasets import load_dataset, load_metric

Ebenso wie es für die vortrainierten Modelle einen Hub gibt, gibt es einen Hub für die verfügbaren Datasets: https://huggingface.co/datasets/germeval_14. Mit der Methode `load_dataset` kann das Dataset anschließend heruntergeladen werden.

In [5]:
datasets = load_dataset("germeval_14")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2537.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1488.0, style=ProgressStyle(description…


Downloading and preparing dataset germ_eval14/germeval_14 (download: 9.81 MiB, generated: 17.19 MiB, post-processed: Unknown size, total: 27.00 MiB) to /home/jupyter/.cache/huggingface/datasets/germ_eval14/germeval_14/2.0.0/2a7a0c62dc3278203778c3a16bfbe257d5656aa0f4ad1e84f357f4caa904e0da...


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Downloading', layout=Layout(width='20px…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=723876.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1682738.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0), HTML(value=''…

HBox(children=(FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0), HTML(value=''…

HBox(children=(FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0), HTML(value=''…

Dataset germ_eval14 downloaded and prepared to /home/jupyter/.cache/huggingface/datasets/germ_eval14/germeval_14/2.0.0/2a7a0c62dc3278203778c3a16bfbe257d5656aa0f4ad1e84f357f4caa904e0da. Subsequent calls will reuse this data.


#### Exploration des Aufbaus der Daten

Das heruntergeladene Dataset ist ein `DatasetDcit-Objekt`, welches die Keys `train, validation` und `test` besitzt. Der Value für jeden Key ist das `Dataset-Objekt`, welches die jeweiligen Daten für den Split enthält.

Im Falle des GermEval2014 Datasets bestehen die Daten aus insgesamt 31300 Datensätzen mit den Features `id, source, tokens, ner_tags` und `nested_ner_tags`. Die `nested_ner_tags` finden im Rahmen dieses Anwendungsbeispiels keine Verwendung.

In [6]:
datasets

DatasetDict({
    train: Dataset({
        features: ['id', 'source', 'tokens', 'ner_tags', 'nested_ner_tags'],
        num_rows: 24000
    })
    validation: Dataset({
        features: ['id', 'source', 'tokens', 'ner_tags', 'nested_ner_tags'],
        num_rows: 2200
    })
    test: Dataset({
        features: ['id', 'source', 'tokens', 'ner_tags', 'nested_ner_tags'],
        num_rows: 5100
    })
})

Möchte man auf ein einzelnes Item aus dem Datensatz zugreifen, so wählt man den Key des Splits aus wählt anschließend den Index des gewünschten Items aus:

In [7]:
print(datasets["train"][0])

{'id': '0', 'ner_tags': [19, 0, 0, 0, 7, 0, 0, 0, 0, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'nested_ner_tags': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'source': 'n-tv.de vom 26.02.2005 [2005-02-26] ', 'tokens': ['Schartau', 'sagte', 'dem', '"', 'Tagesspiegel', '"', 'vom', 'Freitag', ',', 'Fischer', 'sei', '"', 'in', 'einer', 'Weise', 'aufgetreten', ',', 'die', 'alles', 'andere', 'als', 'überzeugend', 'war', '"', '.']}


Die NER Tags sind bereits Integer ID codiert: 

In [8]:
datasets["train"][0]['ner_tags']

[19, 0, 0, 0, 7, 0, 0, 0, 0, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Das zugehörige Textlabel findet man unter dem `features`-Attribut des Datasets. Das Dataset GermEval2014 besitzt insgesamt 25 verschiedene Klassen:

In [9]:
datasets["train"].features["ner_tags"]

Sequence(feature=ClassLabel(num_classes=25, names=['O', 'B-LOC', 'I-LOC', 'B-LOCderiv', 'I-LOCderiv', 'B-LOCpart', 'I-LOCpart', 'B-ORG', 'I-ORG', 'B-ORGderiv', 'I-ORGderiv', 'B-ORGpart', 'I-ORGpart', 'B-OTH', 'I-OTH', 'B-OTHderiv', 'I-OTHderiv', 'B-OTHpart', 'I-OTHpart', 'B-PER', 'I-PER', 'B-PERderiv', 'I-PERderiv', 'B-PERpart', 'I-PERpart'], names_file=None, id=None), length=-1, id=None)

Die Klassenliste der Label kann wie folgt extrahiert werden:

In [10]:
label_list = datasets["train"].features[f"{task}_tags"].feature.names
label_list

['O',
 'B-LOC',
 'I-LOC',
 'B-LOCderiv',
 'I-LOCderiv',
 'B-LOCpart',
 'I-LOCpart',
 'B-ORG',
 'I-ORG',
 'B-ORGderiv',
 'I-ORGderiv',
 'B-ORGpart',
 'I-ORGpart',
 'B-OTH',
 'I-OTH',
 'B-OTHderiv',
 'I-OTHderiv',
 'B-OTHpart',
 'I-OTHpart',
 'B-PER',
 'I-PER',
 'B-PERderiv',
 'I-PERderiv',
 'B-PERpart',
 'I-PERpart']

#### Visualisierung von zufälligen Beispieldaten aus dem Dataset 
Die folgende Funktion wählt zufällig `N` Items aus dem übergebenen Dataset aus und gibt diese in einem `Pandas DataFrame` aus. Dabei werden die `ner_tags` in ihre entsprechenden Textlabel dekodiert.

In [11]:
# Quelle in Anlehnung an: https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/text_classification.ipynb#scrollTo=X6HrpprwIrIz
from datasets import ClassLabel, Sequence
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=10,seed=None):
    assert num_examples <= len(dataset),
    picks = []
    random.seed(seed)
    #Befüllen mit Random indexes
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            random.seed(seed)
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
       # print(column)
       # print(typ)
        if isinstance(typ, ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
            #die zeile hier unten ist aktiv
        elif isinstance(typ, Sequence) and isinstance(typ.feature, ClassLabel):
            df[column] = df[column].transform(lambda x: [typ.feature.names[i] for i in x])
    display(HTML(df.to_html()))

Die Methode kann nun dazu genutzt werden, eine tabellarische Ausgabe von Beispieldaten aus dem Dataset zu erzeugen. Dafür übergibt man das `datasets` Objekt und gibt den gewünschten Split an:

In [12]:
show_random_elements(datasets["train"], seed = 43)

Unnamed: 0,id,ner_tags,nested_ner_tags,source,tokens
0,1263,"[O, O, O, O, O, O, B-LOC, O, O]","[O, O, O, O, O, O, O, O, O]",http://de.wikipedia.org/wiki/Khangchenne [2009-12-25],"[Er, hielt, sich, lieber, im, fernen, Ngari, auf, .]"
1,9374,"[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O]","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O]",http://www.verbaende.com/News.php4?m=56415 [2008-09-25],"[Daher, lässt, er, keine, Gelegenheit, aus, ,, die, Politik, wachzurütteln, und, zum, Handeln, zu, bewegen, .]"
2,22813,"[B-PER, O, O, O, O, O, O, O, O, B-LOC, O]","[O, O, O, O, O, O, O, O, O, O, O]",http://www.hellwegeranzeiger.de/afp/journal/doc/zuma-anc.htm [2007-12-16],"[Zuma, ist, der, Liebling, der, Armen, und, Benachteiligten, in, Südafrika, .]"
3,4716,"[O, O, O, O, O, O, O, O, B-ORGpart, O, O, O, O, O, O, O]","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O]",http://www.handelsblatt.com/unternehmen/handel-dienstleister/lufthansa-fuerchtet-deutlichen-gewinnrueckgang;2251398 [2009-04-24],"[Außer, in, der, Cargo-Sparte, stehen, auch, an, dezentralen, Lufthansa-Standorten, des, Passagiergeschäfts, die, Zeichen, auf, Kurzarbeit, .]"
4,15156,"[O, O, O, O, O, B-LOCderiv, O, B-PER, I-PER, I-PER, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O]","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O]",http://de.wikipedia.org/wiki/Albert_Riera [2009-10-14],"[Nationalmannschaft, Natürlich, entging, auch, dem, spanischen, Nationaltrainer, Luis, Aragonés, Rieras, Leistungssteigerung, nicht, ,, so, dass, er, mittlerweile, drei, Spiele, für, die, "", Selección, "", bestritten, hat, .]"
5,12120,"[O, O, B-ORG, O, O, O, B-LOCderiv, O, O, O]","[O, O, O, O, O, O, O, O, O, O]",welt.de vom 15.02.2005 [2005-02-15],"[Damit, stattet, Walther, künftig, auch, die, irakischen, Sicherheitskräfte, aus, .]"
6,22008,"[O, O, O, O, O, O, O, O, B-PER, I-PER, O, O, O, O]","[O, O, O, O, O, O, O, O, O, O, O, O, O, O]",http://de.wikipedia.org/wiki/Zapp_(Magazin) [2009-11-28],"[Vom, 9., März, 2003, bis, November, 2005, führte, Caren, Miosga, durch, die, Sendung, .]"
7,22886,"[O, O, O, O, O, O, O, O, O, O, B-LOCderiv, O, O, O, O]","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O]",http://www.handelsblatt.com/unternehmen/banken-versicherungen/ec-karten-debakel-banken-ueben-kritik-an-zka;2527576 [2010-02-11],"[Dem, Konsensprinzip, werde, "", in, Extenso, über, die, Säulen, des, deutschen, Bankensystems, gehuldigt, "", .]"
8,3155,"[O, O, O, O, O, O, O, O, O, O, O, O, O, O, B-OTH, I-OTH, O, O, B-OTH, I-OTH, I-OTH, I-OTH, I-OTH, I-OTH, O, B-PER, I-PER, I-PER, O]","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O]",http://de.wikipedia.org/wiki/Grosses_vollständiges_Universal-Lexicon_Aller_Wissenschafften_und_Künste [2009-12-29],"[Schon, der, erste, Artikel, zum, Buchstaben, „, A, “, sei, eine, Kompilation, aus, dem, Historischen, Lexicon, und, dem, Allgemeinen, Lexicon, der, Künste, und, Wissenschaften, von, Johann, Theodor, Jablonski, .]"
9,14850,"[B-PER, O, O, O, O, O, O, O, B-ORG, I-ORG, I-ORG, O]","[O, O, O, O, O, O, O, O, O, O, O, O]",http://de.wikipedia.org/wiki/Nanaimo [2009-11-26],"[Dunsmuir, arbeitete, für, diese, Gesellschaft, und, für, die, Harewood, Coal, Company, .]"


## Preprocessing der Daten

Für das Preprocessign der Daten wird der Tokenizer des genutzen Modells benötigt. Dieser kann durch den folgenden Methodenaufruf heruntergeladen werden. <br>
`AutoTokenizer.from_pretrained()` lädt den zum Modell passenden Tokenizer sowie das Vokabular, welches für das Pretraining des Modelcheckpoints genutzt wurde, herunter.

In [13]:
from transformers import  AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast = True)

Der Tokenizer kann direkt genutzt werden, um eine Inputsequenz zu tokenisieren. Dabei erhält man ein Dictionary mit Mappings zu den input_ids, token_type_ids und eine attention_mask.
Die input_id ist die Identifikation des jeweiligen Tokens im Vokabluar des Modells.
Token_type_ids markieren Tokens in Seq2Seq Tasks und geben dem Modell Informationen darüber, zu welchem Teil einer zweiteiligen Eingabesequenz ein Token gehört.
Die attention_mask teilt dem Modell mit, für welche Token die Attention berechnet werden soll. Ist eine Eingabesequenz z. B. sehr kurz im Gegensatz zu den anderen, dann wird diese per Padding auf die gleiche Länge gebracht. Die attention_mask verhindert anschließend, dass die Attention für diese Padding Token berechnet wird.

Übergibt man dem Tokenizer nun eine String Sequenz, erhält man die oben beschriebene Ausgabe zurück:

In [14]:
tokenizer("Hallo, das ist ein Satz!")

{'input_ids': [102, 4485, 818, 199, 215, 143, 5607, 3330, 103], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]}

Wenn im Datensatz, wie hier gegeben (Germeval_2014) die Inputs schon in einzelne Wörter gesplittet sind, dann kann man diese Liste und den Parameter `is_split_into_words=True` an den Tokenizer übergeben:

In [15]:
tokenizer(["Hallo",",","das", "ist", "ein","Satz","der","in","Wörter","aufgeteilt","wurde","!"],is_split_into_words=True)

{'input_ids': [102, 4485, 818, 199, 215, 143, 5607, 125, 153, 14253, 14070, 325, 3330, 103], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

#### Alignment- Problematik

Transformer Tokenizer nutzen meistens Subword Tokenizer, daher kann es vorkommen, dass selbst diese Wortliste noch in weitere Token zerlegt wird:

In [16]:
example = datasets["train"][5]
print(example["tokens"])

['ARD-Programmchef', 'Günter', 'Struve', 'war', 'wegen', 'eines', 'vierwöchigen', 'Urlaubs', 'für', 'eine', 'Stellungnahme', 'nicht', 'erreichbar', '.']


In [17]:
tokenized_input = tokenizer(example["tokens"],is_split_into_words=True)
tokens = tokenizer.convert_ids_to_tokens(tokenized_input["input_ids"])
print(tokens)

['[CLS]', 'ARD', '-', 'Programm', '##chef', 'Günter', 'Stru', '##ve', 'war', 'wegen', 'eines', 'vier', '##wöch', '##igen', 'Urlaubs', 'für', 'eine', 'Stellungnahme', 'nicht', 'erreichbar', '.', '[SEP]']


Hier wurde zum Beispiel "Struve" in 2 Subtoken zerlegt und "vierwöchenigen" sogar in 3 Subtoken.

Durch das Einfügen von SpecialToken [CLS] und [SEP] sowie die Subword Tokenisierung ist die Liste der Token länger als die Liste der zugehörigen Tags. Das Allignment aus dem Dataset ist somit kaputt:

In [18]:
len(example[f"{task}_tags"]),len(tokenized_input['input_ids'])

(14, 22)

Um dieses Problem zu lösen und das Alignment wiederherzustellen besitzt das Rückgabeobjektdes Tokenizer die `word_ids()` Methode.
- Sie liefert eine Liste die genauso lang ist, wie die Liste mit den Input-IDs
- Sie mapped Special Token zu `None` und alle anderen zum zugehörigen Original Wort-Input, bspw. markiert eine `0` die Zugehörigkeit des Subtokens zum ersten Token der Eingabesequenz

In [19]:
tokenized_input.word_ids()

[None, 0, 0, 0, 0, 1, 2, 2, 3, 4, 5, 6, 6, 6, 7, 8, 9, 10, 11, 12, 13, None]

Nun kann man das Alignment zwischen den `ner_tags` und den `input_ids` wiederherstellen. Im Ergbnis haben die Label und die Input_ids die gleiche Anzahl.

In [20]:
word_ids = tokenized_input.word_ids()
aligned_labels = [-100 if i is None else example[f"{task}_tags"][i] for i in word_ids]
print(len(aligned_labels), len(tokenized_input["input_ids"]))

22 22


Mit der obigen Funktion wurden die `ner_tags` für die eingefügten Special Token  auf -100 gestellt und somit von Pytorch ignoriert. Die anderen `input_ids` haben das entsprechene Label ihres zugehörigen Wortes erhalten.

Eine andere Strategie ist es nur das Label für den ersten Token eines Wortes zu setzen und -100 für die weiteren Subtoken des Wortes zu vergeben. Dafür muss folgendes Flag geädnert werden:

In [21]:
label_all_tokens = True

#### Preprocessing Funktion

Mit der folgenden Funktion werden die übergebenen Datensätze des Datasets zunächst tokenisiert und anschließend werden die `ner_tags` mit den tokenisierten `input_ids` alligned. Dafür wird ein `labels` Attribut angelegt, welches diese Zuordnung enthält. Als Rückgabe erhält man die um das `labels` Attribut erweiterte Ausgabe des Tokenizers.

In [22]:
# Quelle https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/token_classification.ipynb#scrollTo=n9qywopnIrJH
def tokenize_and_align_labels(examples):
    # Tokenisierung der Eingabetoken
    tokenized_inputs = tokenizer(examples["tokens"], truncation=True, is_split_into_words=True)
    labels = []
    
    for i, label in enumerate(examples[f"{task}_tags"]):
        word_ids = tokenized_inputs.word_ids(batch_index=i)
        previous_word_idx = None
        label_ids = []
        for word_idx in word_ids:
            # Special Tokens haben den Wert None als word id. Durch das Setzen des Wertes -100 als Label
            # wird dieser Token automatisch in der Loss Funktion ignoriert.
            if word_idx is None:
                label_ids.append(-100)
            # Setzen des Labels für den ersten Token eines Wortes.
            elif word_idx != previous_word_idx:
                label_ids.append(label[word_idx])
            # Abhängig vom label_all_tokems_flag wird für den nächsten Token eines Wortes das gleiche Label oder -100
            # gesetzt
            else:
                label_ids.append(label[word_idx] if label_all_tokens else -100)
            previous_word_idx = word_idx

        labels.append(label_ids)

    tokenized_inputs["labels"] = labels
    return tokenized_inputs

Die Funktion kann mit einem oder mehreren Examples genutzt werden. Wenn mehrere Datensätze übergeben werden, dann gibt der interne Tokenizer der Funktion eine  Liste mit Listen für jeden Key zurück:

In [23]:
tokenize_and_align_labels(datasets['train'][:3])

{'input_ids': [[102, 20652, 976, 30890, 1382, 249, 224, 3424, 9733, 224, 507, 2476, 818, 4086, 745, 224, 153, 369, 3485, 24523, 818, 128, 1446, 1301, 276, 11052, 30888, 285, 224, 566, 103], [102, 5213, 23835, 4022, 2605, 3697, 30881, 3736, 2201, 125, 24168, 7273, 744, 276, 13015, 6974, 818, 276, 180, 397, 11738, 169, 13095, 260, 249, 21799, 7374, 566, 103], [102, 1776, 307, 1413, 333, 249, 13681, 5650, 339, 1910, 566, 1483, 153, 1270, 387, 143, 20175, 1965, 6110, 4496, 371, 818, 215, 2149, 205, 26058, 21494, 566, 103]], 'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

Um die Funktion auf alle Datensätze im Datensatz anzuwenden kann man die `map` Methode des `datasets` Objekts nutzen. Dies wendet die Funktion auf alle Splits im `dataset` an, (training, valid, test set werden alle mit der einen Zeile Code preprocessed).
 - Durch `batched=True` wird dem Tokenizer ermöglicht mehrere Datensätze parallel zu verarbeiten

In [24]:
tokenized_dataset = datasets.map(tokenize_and_align_labels,batched=True)

HBox(children=(FloatProgress(value=0.0, max=24.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=3.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))




Durch das Preprocessing werden die Ausgaben des Tokenizers als Features des DataSets ergänzt.
In den folgenden Ausgaben sieht man, dass nun die `attention_mask, die input_ids, die labels und die token_type_ids` als Features ergänzt wurden.

In [25]:
tokenized_dataset

DatasetDict({
    train: Dataset({
        features: ['attention_mask', 'id', 'input_ids', 'labels', 'ner_tags', 'nested_ner_tags', 'source', 'token_type_ids', 'tokens'],
        num_rows: 24000
    })
    validation: Dataset({
        features: ['attention_mask', 'id', 'input_ids', 'labels', 'ner_tags', 'nested_ner_tags', 'source', 'token_type_ids', 'tokens'],
        num_rows: 2200
    })
    test: Dataset({
        features: ['attention_mask', 'id', 'input_ids', 'labels', 'ner_tags', 'nested_ner_tags', 'source', 'token_type_ids', 'tokens'],
        num_rows: 5100
    })
})

In [26]:
show_random_elements(tokenized_dataset["train"],seed=43,num_examples=2)

Unnamed: 0,attention_mask,id,input_ids,labels,ner_tags,nested_ner_tags,source,token_type_ids,tokens
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]",1263,"[102, 279, 5637, 251, 6949, 223, 12495, 106, 196, 1127, 30883, 216, 566, 103]","[-100, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, -100]","[O, O, O, O, O, O, B-LOC, O, O]","[O, O, O, O, O, O, O, O, O]",http://de.wikipedia.org/wiki/Khangchenne [2009-12-25],"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]","[Er, hielt, sich, lieber, im, fernen, Ngari, auf, .]"
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]",9374,"[102, 6053, 2876, 180, 855, 7169, 260, 818, 128, 2017, 17643, 5335, 30007, 136, 386, 15898, 205, 8452, 566, 103]","[-100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -100]","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O]","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, O]",http://www.verbaende.com/News.php4?m=56415 [2008-09-25],"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]","[Daher, lässt, er, keine, Gelegenheit, aus, ,, die, Politik, wachzurütteln, und, zum, Handeln, zu, bewegen, .]"


## Finetuning des Modells mit Hyperparmetersearch

Für die Konstruktion des Modells wird ein Label-ID Mapping erzeugt. Dieses Mapping wird anschließend in der Configuration des Modells hinterlegt. Es ermöglicht dem Modell die vorhergesagten Label als Texte auszugeben anstatt ihrer numerischen Kodierung.

In [27]:
id2label = {}
label2id= {}
for i,l in enumerate(label_list):
    id2label[str(i)] = l
    label2id[l]=str(i)
print(id2label)
print(label2id)

{'0': 'O', '1': 'B-LOC', '2': 'I-LOC', '3': 'B-LOCderiv', '4': 'I-LOCderiv', '5': 'B-LOCpart', '6': 'I-LOCpart', '7': 'B-ORG', '8': 'I-ORG', '9': 'B-ORGderiv', '10': 'I-ORGderiv', '11': 'B-ORGpart', '12': 'I-ORGpart', '13': 'B-OTH', '14': 'I-OTH', '15': 'B-OTHderiv', '16': 'I-OTHderiv', '17': 'B-OTHpart', '18': 'I-OTHpart', '19': 'B-PER', '20': 'I-PER', '21': 'B-PERderiv', '22': 'I-PERderiv', '23': 'B-PERpart', '24': 'I-PERpart'}
{'O': '0', 'B-LOC': '1', 'I-LOC': '2', 'B-LOCderiv': '3', 'I-LOCderiv': '4', 'B-LOCpart': '5', 'I-LOCpart': '6', 'B-ORG': '7', 'I-ORG': '8', 'B-ORGderiv': '9', 'I-ORGderiv': '10', 'B-ORGpart': '11', 'I-ORGpart': '12', 'B-OTH': '13', 'I-OTH': '14', 'B-OTHderiv': '15', 'I-OTHderiv': '16', 'B-OTHpart': '17', 'I-OTHpart': '18', 'B-PER': '19', 'I-PER': '20', 'B-PERderiv': '21', 'I-PERderiv': '22', 'B-PERpart': '23', 'I-PERpart': '24'}


Erzeugen und Konfiguration des Modells mit Label-ID Mapping:
 - `AutoModelForTokenClassification.from_pretrained` lädt automatisch das entsprechende Modell herunter und initialisiert einen Token-Klassifizierungskopf am Ende des Modells.
 - Die auftretenden Warnungen geben nur Auskunft darüber, dass der Kopf des Modells ausgetauscht wurde und demzufolge keine trainierten Weights hat

Mit der Annahme, dass besser gewählte Hyperparameter zu leistungsstärkeren Modellen führen, wird im folgenden Abschnitt mithilfe von automatischer Hyperparametersuche versucht, die Modellperformance weiter zu steigern. <br>
Hierfür wird das Modul optuna genutzt:

In [28]:
#!pip install optuna

In [29]:
from transformers import  AutoModelForTokenClassification, TrainingArguments, Trainer

Es muss eine Funktion erstellt werden, die das zu trainierende Modell nach jedem Versuch wieder initialisiert, damit die verschiedenen Testläufe nicht miteinander interagieren:

In [30]:
def model_init():
       return AutoModelForTokenClassification.from_pretrained(model_checkpoint, num_labels=len(label_list), id2label=id2label, label2id=label2id)

#### Vorbereitung für die Erzeugung des `Trainers`:
Es müssen zunächst die `TrainingArguments` konfiguriert werden, dabei handelt es sich um die Hyperparameter des Trainings:
 - Dazu zählen z.B. die Anzahl der Epoch, die Learning Rate, die Batchsize und der weight_decay
 - `load_best_model_at_end` und `metric_for_best_model` sorgen dafür, dass am Ende des Trainings das Model mit der höchsten `F1_Score` geladen wird

In [31]:
metric_name = 'eval_f1'
args = TrainingArguments(
    f"test-{task}_hyperparametersearch",
    evaluation_strategy = "epoch",
    # empfohlener Standardwert für das Finetuning
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=5,
    # empfohlener Standardwert für das Finetuning
    weight_decay=0.01,
    load_best_model_at_end=True,
    metric_for_best_model=metric_name,
)

Der `DataCollator` fügt die Beispieldaten zu Batches zusammen und fügt das nötige Padding für die Inputs und Labels ein. (Hierfür wird die Länge des längsten Datensatzes gewählt)

In [32]:
from transformers import DataCollatorForTokenClassification

data_collator = DataCollatorForTokenClassification(tokenizer)

Damit das Modell während des Trainings die gewünschten Metriken berechnen kann, muss eine Funktion definiert werden die diese Metriken berechnet. Das übernimmt die `compute_metric` Funktion:
- Die Datasets Bibliothek ermöglicht es, Funktionen zur Berechnung von Metriken herunterzuladen.
- `Seqeval` eignet sich gut für Tasks im Bereich der TokenClassification: https://github.com/chakki-works/seqeval

In [33]:
#Installieren von seqeval falls nötig
#!pip install seqeval

Mit `load_metric` kann man eine gewünschte Funktion aus der Datasets Bibliothek herunterladen:

In [34]:
metric = load_metric("seqeval")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1961.0, style=ProgressStyle(description…




#### Beispiel für die Anwendung der Metrik

Um die Metriken zu berechnen, müssen der Funktion 2 Listen an die Parameter `predictions` und ` references` übergeben werden. 

Für dieses Beispiel wird die Labelliste für den `predictions` Parameter durch ein Mapping der `ner_tags` auf die Indizies der `label_list` erstellt:

In [35]:
labels = [label_list[i] for i in example[f"{task}_tags"]]

In [36]:
labels

['B-ORGpart',
 'B-PER',
 'I-PER',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O']

Für den `references` Parameter wird die Labelsliste leicht angepasst:

In [37]:
labels_ref = labels.copy()
labels_ref[-1] = 'B-ORGpart'

In [38]:
labels_ref

['B-ORGpart',
 'B-PER',
 'I-PER',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'O',
 'B-ORGpart']

Anschließend kann die `compute` Methode des Metrikobjekts benutzt werden, um die Metrik zu berechnen:

In [39]:
metric.compute(predictions=[labels], references=[labels_ref])

{'ORGpart': {'precision': 1.0,
  'recall': 0.5,
  'f1': 0.6666666666666666,
  'number': 2},
 'PER': {'precision': 1.0, 'recall': 1.0, 'f1': 1.0, 'number': 1},
 'overall_precision': 1.0,
 'overall_recall': 0.6666666666666666,
 'overall_f1': 0.8,
 'overall_accuracy': 0.9285714285714286}

Damit diese Metrik vom Modell berechnet werden kann, muss sie in eine Funktion gewrappt werden.
Diese Funktion erhält während des Trainings das Ergebnis des Methodenaufrufs `Trainer.evaluate`. Dabei handelt es sich um ein Tuple aus den `predictions` und den tatsächlichen `labels`. Die `predictions` sind der letzte Hidden State des Modells, daher muss hiervon der maximale Wert ausgewählt werden, um tatsächlich vorhergesagten Wert zu erhalten. Die Funktion `compute_metrics` kümmert sich um das nötige Postprocessing und gibt die berechneten Metriken zurück.

In [40]:
import numpy as np

def compute_metrics(p):
    # Unpacking des Tuple p
    predictions, labels = p
    # Größten Wert für die jeweilige Prediction auswählen
    predictions = np.argmax(predictions, axis=2)

    # Ignorieren aller Werte mit -100
    true_predictions = [
        [label_list[p] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]
    true_labels = [
        [label_list[l] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]

    results = metric.compute(predictions=true_predictions, references=true_labels)
    return {
        "precision": results["overall_precision"],
        "recall": results["overall_recall"],
        "f1": results["overall_f1"],
        "accuracy": results["overall_accuracy"],
    }

- hier wird die Ausgabe pro Kategorie weggelassen und nur Gesamt precision/recall/f1/accuracy  ausgegeben

Anschließend kann der `Trainer` erzeugt werden, welcher für die Hyperparametersuche genutzt wird. Diesem wird die `model_init` Funktion, die Hyperparameter, die Trainings- und Evaluierungsdatensätze, der Tokenizer, der DataCollator sowie die `compute_metrics` Funktion übergeben.

In [41]:
trainer_hyper = Trainer(
    model_init = model_init,
    args=args,
    train_dataset =tokenized_dataset["train"],
    eval_dataset =tokenized_dataset["validation"],
    data_collator = data_collator,
    tokenizer = tokenizer,
    compute_metrics = compute_metrics
)

Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at deepset/gb

Um festzulegen, welche Hyperparameter während der Suche ermittelt werden sollen, wird ein Hyperparameterspace definiert.
  - In diesem Beispiel werden Werte für die Learning Rate und die Anzahl der Epochs und eine passende Batchsize ermittelt.
  - Dabei werden Werte zwischen 1E-7 und 1E-2 für die LR getestet, die Epochanzahl liegt im Bereich von 1-5 und die Batchsize zwischen 4-64.

In [42]:
def my_hp_space(trial):
    return {
        "learning_rate": trial.suggest_float("learning_rate", 1e-7, 1e-2, log=True),
        "num_train_epochs": trial.suggest_int("num_train_epochs", 1, 5),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [4, 8, 16, 32, 64]),
    }

Für die Hyperparametersuche muss festgelegt werden, wie ein Durchlauf bewertetet werden soll.
Hierfür werden mit der folgenden Funktion alle vom Modell berechneten Metriken aufsummiert. Die Summe ist der Score für die Epoch des jeweiligen Versuchs:

In [43]:
import copy
def my_objective(metrics) -> float:
    metrics = copy.deepcopy(metrics)
    # mit pop() werden die Metriken entfernt die nicht 
    # in den Score miteinberechnet werden sollen
    loss = metrics.pop("eval_loss", None)
    _ = metrics.pop("epoch", None)
    _ = metrics.pop("eval_runtime",None)
    _ = metrics.pop("eval_samples_per_second",None)
    return loss if len(metrics) == 0 else sum(metrics.values())

Anschließend kann die Hyperparametersuche mit `hyperparameter_search` über das `Trainer`-Objekt gestartet werden.
- es werden 25 Versuche durchgeführt
- `direction="maximize"` sorgt dafür, dass das beste Modell anhand der maximalen Punktzahl der Evaluationsmetriken ausgewählt wird
- die Methode returned als Ergebnis die verwendeten Hyperparameter des besten Versuchs

In [44]:
best_run = trainer_hyper.hyperparameter_search(n_trials=25, direction="maximize", hp_space= my_hp_space, compute_objective = my_objective)

[32m[I 2021-03-15 17:54:29,335][0m A new study created in memory with name: no-name-33dcc524-b0a8-4737-99e8-150661333a4f[0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenc

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.667,0.763176,0.0,0.0,0.0,0.856638,3.4345,640.551
2,0.7196,0.754149,0.0,0.0,0.0,0.856638,3.2252,682.121
3,0.7353,0.755719,0.0,0.0,0.0,0.856638,3.2391,679.204
4,0.6982,0.755096,0.0,0.0,0.0,0.856638,3.2145,684.404


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
[32m[I 2021-03-15 18:24:17,891][0m Trial 0 finished with value: 0.8566383136094674 and parameters: {'learning_rate': 0.0008426186969007761, 'num_train_epochs': 4, 'per_device_train_batch_size': 4}. Best is trial 0 with value: 0.8566383136094674.[0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.7287,0.75693,0.0,0.0,0.0,0.856638,3.2784,671.059
2,0.7335,0.763037,0.0,0.0,0.0,0.856638,3.2711,672.556
3,0.7344,0.749598,0.0,0.0,0.0,0.856638,3.2681,673.183


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
[32m[I 2021-03-15 18:31:02,845][0m Trial 1 finished with value: 0.8566383136094674 and parameters: {'learning_rate': 0.0013129738805987784, 'num_train_epochs': 3, 'per_device_train_batch_size': 16}. Best is trial 0 with value: 0.8566383136094674.[0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This 

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.1235,0.127494,0.801386,0.825586,0.813306,0.965255,3.3198,662.694
2,0.0801,0.128967,0.806784,0.831442,0.818927,0.966365,3.3203,662.587
3,0.0378,0.151904,0.809425,0.855051,0.831613,0.968602,3.3075,665.15
4,0.0165,0.168313,0.823664,0.85743,0.840208,0.970044,3.296,667.482
5,0.0079,0.173456,0.824978,0.858346,0.841331,0.969841,3.4133,644.532


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
[32m[I 2021-03-15 18:50:00,384][0m Trial 2 finished with value: 3.4944955841332046 and parameters: {'learning_rate': 6.439727896317605e-05, 'num_train_epochs': 5, 'per_device_train_batch_size': 8}. Best is trial 2 with value: 3.4944955841332046.[0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPre

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.112,0.104556,0.816589,0.861274,0.838336,0.96973,3.3025,666.17
2,0.0698,0.102818,0.835286,0.874268,0.854332,0.973021,3.3285,660.964


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
[32m[I 2021-03-15 18:54:36,160][0m Trial 3 finished with value: 3.5369077433499796 and parameters: {'learning_rate': 2.1780149596884364e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 16}. Best is trial 3 with value: 3.5369077433499796.[0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForP

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.1094,0.111921,0.818949,0.852672,0.83547,0.970303,3.3231,662.028
2,0.0691,0.113478,0.844071,0.855966,0.849977,0.971524,3.3213,662.387
3,0.0336,0.133086,0.834446,0.86896,0.851354,0.972319,3.4211,643.062
4,0.0154,0.157866,0.831704,0.871889,0.851322,0.971561,3.3118,664.3
5,0.009,0.161963,0.835298,0.868777,0.851709,0.97243,3.3129,664.064


  _warn_prf(average, modifier, msg_start, len(result))
[32m[I 2021-03-15 19:13:27,901][0m Trial 4 finished with value: 3.528214424100584 and parameters: {'learning_rate': 2.923772580788808e-05, 'num_train_epochs': 5, 'per_device_train_batch_size': 8}. Best is trial 3 with value: 3.5369077433499796.[0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initi

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.5434,0.211821,0.639061,0.64257,0.64081,0.942382,3.3121,664.233


  _warn_prf(average, modifier, msg_start, len(result))
[32m[I 2021-03-15 19:15:13,407][0m Trial 5 pruned. [0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassificatio

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,No log,0.235105,0.602752,0.617313,0.609946,0.938018,3.3249,661.667


[32m[I 2021-03-15 19:16:43,981][0m Trial 6 pruned. [0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification we

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.7331,0.762599,0.0,0.0,0.0,0.856638,3.3447,657.763


  _warn_prf(average, modifier, msg_start, len(result))
[32m[I 2021-03-15 19:18:55,542][0m Trial 7 pruned. [0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassificatio

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,No log,0.104935,0.84177,0.821742,0.831635,0.969434,3.3392,658.845
2,0.174900,0.101533,0.826449,0.871523,0.848388,0.971524,3.3176,663.136
3,0.066000,0.108212,0.826019,0.868045,0.846511,0.972078,3.3195,662.756
4,0.038800,0.11433,0.823672,0.871157,0.846749,0.971746,3.3367,659.338


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
[32m[I 2021-03-15 19:24:55,627][0m Trial 8 finished with value: 3.5133232277943613 and parameters: {'learning_rate': 3.2215705490095944e-05, 'num_train_epochs': 4, 'per_device_train_batch_size': 64}. Best is trial 3 with value: 3.5369077433499796.[0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializin

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.0963,0.114137,0.828485,0.854868,0.84147,0.970414,3.4425,639.074
2,0.0712,0.116914,0.83479,0.873902,0.853898,0.972282,3.336,659.47
3,0.0474,0.130443,0.833421,0.872621,0.85257,0.972596,3.3197,662.709


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
[32m[I 2021-03-15 19:47:17,691][0m Trial 9 finished with value: 3.53120808191204 and parameters: {'learning_rate': 1.0523471685582026e-05, 'num_train_epochs': 3, 'per_device_train_batch_size': 4}. Best is trial 3 with value: 3.5369077433499796.[0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreT

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,1.3419,1.240439,0.0,0.0,0.0,0.856472,3.3306,660.533


  _warn_prf(average, modifier, msg_start, len(result))
[32m[I 2021-03-15 19:49:33,417][0m Trial 10 pruned. [0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassificati

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.1385,0.141782,0.772227,0.783675,0.777909,0.961982,3.3001,666.642


[32m[I 2021-03-15 19:57:04,178][0m Trial 11 pruned. [0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification w

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.1707,0.177468,0.732296,0.766471,0.748994,0.954567,3.3024,666.182


[32m[I 2021-03-15 20:04:36,494][0m Trial 12 pruned. [0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification w

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.4598,0.412247,0.430605,0.310029,0.360502,0.898743,3.3069,665.276


[32m[I 2021-03-15 20:06:52,755][0m Trial 13 pruned. [0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification w

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.0981,0.114977,0.814613,0.846816,0.830402,0.969471,3.3377,659.133


[32m[I 2021-03-15 20:14:21,521][0m Trial 14 pruned. [0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification w

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.2118,0.116331,0.81069,0.829978,0.820221,0.966457,3.3243,661.8


[32m[I 2021-03-15 20:16:07,005][0m Trial 15 pruned. [0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification w

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.0976,0.114056,0.828826,0.852489,0.840491,0.970063,3.3983,647.384
2,0.0768,0.117301,0.830349,0.874268,0.851743,0.972097,3.3175,663.159
3,0.0539,0.12715,0.832167,0.870242,0.850778,0.972559,3.3354,659.586


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
[32m[I 2021-03-15 20:38:29,239][0m Trial 16 finished with value: 3.5257456757336314 and parameters: {'learning_rate': 8.439314060180965e-06, 'num_train_epochs': 3, 'per_device_train_batch_size': 4}. Best is trial 3 with value: 3.5369077433499796.[0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPr

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.4261,0.379645,0.457842,0.383602,0.417447,0.906379,3.3556,655.628


  _warn_prf(average, modifier, msg_start, len(result))
[32m[I 2021-03-15 20:40:45,628][0m Trial 17 pruned. [0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassificati

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.1276,0.114619,0.797086,0.831076,0.813726,0.967437,3.3227,662.109


[32m[I 2021-03-15 20:43:02,089][0m Trial 18 pruned. [0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification w

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.6664,0.759869,0.0,0.0,0.0,0.856638,3.3308,660.505


  _warn_prf(average, modifier, msg_start, len(result))
[32m[I 2021-03-15 20:50:35,044][0m Trial 19 pruned. [0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassificati

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.2452,0.229905,0.610582,0.633602,0.621879,0.938961,3.2974,667.192


[32m[I 2021-03-15 20:54:25,937][0m Trial 20 pruned. [0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification w

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.1073,0.111262,0.818903,0.849927,0.834127,0.969767,3.3193,662.782
2,0.067,0.110211,0.847424,0.864019,0.855641,0.972485,3.3285,660.951
3,0.0309,0.12868,0.835203,0.871889,0.853152,0.972596,3.318,663.059
4,0.0174,0.137855,0.835061,0.870059,0.8522,0.9723,3.4282,641.73


  _warn_prf(average, modifier, msg_start, len(result))
[32m[I 2021-03-15 21:09:39,850][0m Trial 21 finished with value: 3.529619874046682 and parameters: {'learning_rate': 2.3743331068877387e-05, 'num_train_epochs': 4, 'per_device_train_batch_size': 8}. Best is trial 3 with value: 3.5369077433499796.[0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are ini

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.1072,0.109901,0.821574,0.850293,0.835687,0.970081,3.3143,663.798
2,0.067,0.109003,0.844831,0.862921,0.85378,0.972504,3.3173,663.184
3,0.032,0.129621,0.835626,0.86713,0.851087,0.972559,3.3118,664.3
4,0.0165,0.138446,0.835122,0.868594,0.85153,0.972374,3.3081,665.032


  _warn_prf(average, modifier, msg_start, len(result))
[32m[I 2021-03-15 21:25:00,385][0m Trial 22 finished with value: 3.527620550750897 and parameters: {'learning_rate': 2.403050728243794e-05, 'num_train_epochs': 4, 'per_device_train_batch_size': 8}. Best is trial 3 with value: 3.5369077433499796.[0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are init

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.1375,0.126179,0.783211,0.817899,0.800179,0.96409,3.565,617.104


  _warn_prf(average, modifier, msg_start, len(result))
[32m[I 2021-03-15 21:28:54,912][0m Trial 23 pruned. [0m
Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassificati

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.1183,0.112257,0.815363,0.837299,0.826185,0.968713,3.3155,663.548


[32m[I 2021-03-15 21:32:46,242][0m Trial 24 pruned. [0m


Ausgabe des besten Versuchs:

In [45]:
best_run

BestRun(run_id='3', objective=3.5369077433499796, hyperparameters={'learning_rate': 2.1780149596884364e-05, 'num_train_epochs': 2, 'per_device_train_batch_size': 16})

Nachdem der `best_run` ermittelt wurde, kann nun ein neuer `Trainer` erzeugt werden:

In [46]:
trainer_hyper = Trainer(
    model_init = model_init,
    args=args,
    train_dataset =tokenized_dataset["train"],
    eval_dataset =tokenized_dataset["validation"],
    data_collator = data_collator,
    tokenizer = tokenizer,
    compute_metrics = compute_metrics
)    

Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at deepset/gb

Die Hyperparamter des `best_run` werden mit folgendem Loop übertragen:

In [47]:
for n, v in best_run.hyperparameters.items():
    setattr(trainer_hyper.args, n, v)

Ausgabe der `TrainingArguments` zur Kontrolle:

In [48]:
trainer_hyper.args

TrainingArguments(output_dir=test-ner_hyperparametersearch, overwrite_output_dir=False, do_train=False, do_eval=None, do_predict=False, evaluation_strategy=EvaluationStrategy.EPOCH, prediction_loss_only=False, per_device_train_batch_size=16, per_device_eval_batch_size=32, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=2.1780149596884364e-05, weight_decay=0.01, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=2, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_steps=0, logging_dir=runs/Mar15_17-54-07_fastai3, logging_first_step=False, logging_steps=500, save_steps=500, save_total_limit=None, no_cuda=False, seed=42, fp16=False, fp16_opt_level=O1, fp16_backend=auto, local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name=test-ner_hyperparametersearch, disable_tqdm=False, remove_unused_columns=True, l

Anschließend kann das Modell mit den durch Optuna gefundenen Hyperparametern trainiert werden:

In [49]:
trainer_hyper.train()

Some weights of the model checkpoint at deepset/gbert-base were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at deepset/gb

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
1,0.112,0.104556,0.816589,0.861274,0.838336,0.96973,3.3412,658.453
2,0.0698,0.102818,0.835286,0.874268,0.854332,0.973021,3.4592,635.987


  _warn_prf(average, modifier, msg_start, len(result))


TrainOutput(global_step=3000, training_loss=0.11994199879964193, metrics={'train_runtime': 271.7675, 'train_samples_per_second': 11.039, 'total_flos': 1548978807764256, 'epoch': 2.0})

Evaluierung des Modells mit ungesehenen Testdaten

In [51]:
trainer_hyper.evaluate(tokenized_dataset["test"])

  _warn_prf(average, modifier, msg_start, len(result))


{'eval_loss': 0.10938520729541779,
 'eval_precision': 0.8293180569724554,
 'eval_recall': 0.8430121250797703,
 'eval_f1': 0.8361090232999722,
 'eval_accuracy': 0.9718215471023742,
 'eval_runtime': 8.8705,
 'eval_samples_per_second': 574.942,
 'epoch': 2.0}

Im Anschluss kann man wieder die Precision/recall/f1/accuracy pro Kategorie berechnen indem man die `predict` Methode nutzt. Dazu wird die folgende `evaluate_all_categories` definiert. Sie erhält ein tokenisiertes Dataset als Eingabeparameter. Anschließend wird die Metrik für jede enthaltene Kategorie berechnet.

In [52]:
def evaluate_all_categories(tokenized_dataset):
    predictions, labels, _ = trainer_hyper.predict(tokenized_dataset)
    predictions = np.argmax(predictions, axis=2)

    # Remove ignored index (special tokens)
    true_predictions = [
        [label_list[p] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]
    true_labels = [
        [label_list[l] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]

    return metric.compute(predictions=true_predictions, references=true_labels)

In [53]:
evaluate_all_categories(tokenized_dataset["test"])

{'LOC': {'precision': 0.8846373500856654,
  'recall': 0.922847780756628,
  'f1': 0.9033386791077417,
  'number': 3357},
 'LOCderiv': {'precision': 0.8143115942028986,
  'recall': 0.9423480083857443,
  'f1': 0.8736637512147716,
  'number': 954},
 'LOCpart': {'precision': 0.7808219178082192,
  'recall': 0.6195652173913043,
  'f1': 0.6909090909090909,
  'number': 368},
 'ORG': {'precision': 0.7830866807610993,
  'recall': 0.7924689773213521,
  'f1': 0.7877498936622713,
  'number': 2337},
 'ORGderiv': {'precision': 0.0, 'recall': 0.0, 'f1': 0.0, 'number': 19},
 'ORGpart': {'precision': 0.7334963325183375,
  'recall': 0.8207934336525308,
  'f1': 0.7746933505487411,
  'number': 731},
 'OTH': {'precision': 0.6712856043110085,
  'recall': 0.6855345911949685,
  'f1': 0.6783352781019059,
  'number': 1272},
 'OTHderiv': {'precision': 0.5,
  'recall': 0.6,
  'f1': 0.5454545454545454,
  'number': 75},
 'OTHpart': {'precision': 0.5078125,
  'recall': 0.3651685393258427,
  'f1': 0.42483660130718953,


Das trainierte Modell schneidet besonders gut in den Hauptkategorien `LOC` und `PER` ab. Die nächstbesten Werte werden für die beiden anderen Hauptkategorien `ORG` und `OTH` erreicht. Dieses Ergebnis entspricht der Erwartung, da es für diese Kategorien auch die meisten Trainingsdaten gab. Besonders für die Klasse der `deriv` Token werden nicht so hohe F1-Werte erreicht. 

#### Speichern des trainierten Modells

In [54]:
trainer_hyper.save_model('models/deepset_finetuned/ner_deepset_hyper')

In [55]:
tokenizer.save_pretrained('models/deepset_finetuned/ner_deepset_hyper')

('models/deepset_finetuned/ner_deepset_hyper/tokenizer_config.json',
 'models/deepset_finetuned/ner_deepset_hyper/special_tokens_map.json',
 'models/deepset_finetuned/ner_deepset_hyper/vocab.txt',
 'models/deepset_finetuned/ner_deepset_hyper/added_tokens.json')

## Beispielhafte Anwendung des Modells

Um ein Transformers Modell einfach anzuwenden, nutzt man die Methode `pipeline`. Diese baut automatisch in Abhängigkeit vom übergebenen Task und Modell eine Pipeline. Die Pipeline kümmert sich um die nötigen Schritte um aus einer Stringeingabe eine Modellvorhersage zu erzeugen:

In [56]:
from transformers import pipeline

In [57]:
ner = pipeline("ner", model = f'./models/deepset_finetuned/ner_deepset_hyper')

Some weights of BertModel were not initialized from the model checkpoint at ./models/deepset_finetuned/ner_deepset_hyper and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Anschließend kann dem `ner` Objekt der zu Verarbeitende String übergeben werden um eine Vorhersage der enthaltenden Entitäten zu erhalten:

In [58]:
ner('''In der CDU liegen die Nerven blank. Parteichef Laschet und andere führende Christdemokraten sprechen
    nach den Niederlagen in Mainz und Stuttgart von einem Warnschuss – und attackieren die SPD. ''')

[{'word': 'CDU',
  'score': 0.9922722578048706,
  'entity': 'B-ORG',
  'index': 3,
  'start': 7,
  'end': 10},
 {'word': 'Las',
  'score': 0.9973433017730713,
  'entity': 'B-PER',
  'index': 12,
  'start': 47,
  'end': 50},
 {'word': '##che',
  'score': 0.9965707063674927,
  'entity': 'B-PER',
  'index': 13,
  'start': 50,
  'end': 53},
 {'word': '##t',
  'score': 0.9962957501411438,
  'entity': 'B-PER',
  'index': 14,
  'start': 53,
  'end': 54},
 {'word': 'Mainz',
  'score': 0.9975157380104065,
  'entity': 'B-LOC',
  'index': 25,
  'start': 129,
  'end': 134},
 {'word': 'Stuttgart',
  'score': 0.9974929690361023,
  'entity': 'B-LOC',
  'index': 27,
  'start': 139,
  'end': 148},
 {'word': 'SPD',
  'score': 0.9899235367774963,
  'entity': 'B-ORG',
  'index': 37,
  'start': 192,
  'end': 195}]

Das Modell konnte für obige Eingabe alle Token korrekt bestimmen.

In [59]:
ner('''Die Europäische Union hat ein Verfahren gegen Großbritannien wegen Verletzung des EU-Austrittsvertrags
    eingeleitet. Dies teilte die EU-Kommission am Montag in Brüssel mit. ''')

[{'word': 'Europäische',
  'score': 0.989363431930542,
  'entity': 'B-ORG',
  'index': 2,
  'start': 4,
  'end': 15},
 {'word': 'Union',
  'score': 0.9924749732017517,
  'entity': 'I-ORG',
  'index': 3,
  'start': 16,
  'end': 21},
 {'word': 'Großbritannien',
  'score': 0.9966283440589905,
  'entity': 'B-LOC',
  'index': 8,
  'start': 46,
  'end': 60},
 {'word': 'EU',
  'score': 0.9872016310691833,
  'entity': 'B-ORGpart',
  'index': 12,
  'start': 82,
  'end': 84},
 {'word': '-',
  'score': 0.9843431115150452,
  'entity': 'B-ORGpart',
  'index': 13,
  'start': 84,
  'end': 85},
 {'word': 'Aust',
  'score': 0.9803022146224976,
  'entity': 'B-ORGpart',
  'index': 14,
  'start': 85,
  'end': 89},
 {'word': '##ritts',
  'score': 0.9775412082672119,
  'entity': 'B-ORGpart',
  'index': 15,
  'start': 89,
  'end': 94},
 {'word': '##vertrags',
  'score': 0.9782739877700806,
  'entity': 'B-ORGpart',
  'index': 16,
  'start': 94,
  'end': 102},
 {'word': 'EU',
  'score': 0.7618938684463501,
  '

In [60]:
ner('''Das sogenannte Nordirland-Protokoll im Austrittsvertrag sieht vor,
    dass einige Regeln des EU-Binnenmarkts für Nordirland weiter gelten.
    Dies soll Kontrollen an der Landgrenze zum EU-Staat Irland auf der gemeinsamen
    Insel überflüssig machen. Da Waren dennoch kontrolliert werden müssen, um EU-Standards zu wahren,
    wurden die Kontrollen auf Häfen an der Irischen See zwischen Nordirland und dem übrigen
    Großbritannien verschoben. So wurde das Problem zwischen Großbritannien und der
    EU – und insbesondere der europäischen Republik Irland – zu einem innerbritischen Problem.''')

[{'word': 'Nordirland',
  'score': 0.7542711496353149,
  'entity': 'B-LOCpart',
  'index': 3,
  'start': 15,
  'end': 25},
 {'word': '-',
  'score': 0.8626562356948853,
  'entity': 'B-LOCpart',
  'index': 4,
  'start': 25,
  'end': 26},
 {'word': 'Protokoll',
  'score': 0.8470491170883179,
  'entity': 'B-LOCpart',
  'index': 5,
  'start': 26,
  'end': 35},
 {'word': 'EU',
  'score': 0.986542820930481,
  'entity': 'B-ORGpart',
  'index': 17,
  'start': 94,
  'end': 96},
 {'word': '-',
  'score': 0.983571469783783,
  'entity': 'B-ORGpart',
  'index': 18,
  'start': 96,
  'end': 97},
 {'word': 'Binnenmarkt',
  'score': 0.9791744351387024,
  'entity': 'B-ORGpart',
  'index': 19,
  'start': 97,
  'end': 108},
 {'word': '##s',
  'score': 0.9741042852401733,
  'entity': 'B-ORGpart',
  'index': 20,
  'start': 108,
  'end': 109},
 {'word': 'Nordirland',
  'score': 0.996210515499115,
  'entity': 'B-LOC',
  'index': 22,
  'start': 114,
  'end': 124},
 {'word': 'EU',
  'score': 0.9874661564826965,

In [61]:
ner('''@_A_K_K_ @CDU @jensspahn @_FriedrichMerz Die CDU muss für den packt und die betrügerische
    Wahl karrenbauer muss absteigen auf 12% Überall sind die Regierungen gegen den packt Deutschland
    wird von irren in den Abgrund gerissen.''')

[{'word': 'CDU',
  'score': 0.9686644077301025,
  'entity': 'B-ORG',
  'index': 10,
  'start': 10,
  'end': 13},
 {'word': 'jens',
  'score': 0.7816832065582275,
  'entity': 'B-PER',
  'index': 12,
  'start': 15,
  'end': 19},
 {'word': '##sp',
  'score': 0.7370290160179138,
  'entity': 'B-PER',
  'index': 13,
  'start': 19,
  'end': 21},
 {'word': '##ahn',
  'score': 0.7971813678741455,
  'entity': 'B-PER',
  'index': 14,
  'start': 21,
  'end': 24},
 {'word': '@',
  'score': 0.7660948038101196,
  'entity': 'B-PER',
  'index': 15,
  'start': 25,
  'end': 26},
 {'word': '_',
  'score': 0.7780709266662598,
  'entity': 'B-PER',
  'index': 16,
  'start': 26,
  'end': 27},
 {'word': 'Friedrich',
  'score': 0.9596119523048401,
  'entity': 'B-PER',
  'index': 17,
  'start': 27,
  'end': 36},
 {'word': '##Mer',
  'score': 0.9146791696548462,
  'entity': 'B-PER',
  'index': 18,
  'start': 36,
  'end': 39},
 {'word': '##z',
  'score': 0.907802939414978,
  'entity': 'B-PER',
  'index': 19,
  'st

Das Modell erkennt anhand der obigen Eingabe auch Personenentitäten in Tweets wie bspw. @jensspahn und @_FriedrichMerz