# Binder for custom-ner-de

This binder enables you to train, evaluate, and apply a custom NER model using spaCy. The only prerequisite is that you know how to annotate in and export documents from Transkribus. That's it!

## Train a custom NER model using spaCy with German texts annotated in Transkribus

Input URL to an annotated PAGE XML Zip file exported from Transkribus (for example a public link pointing to a file on SWITCHdrive):

In [12]:
ZIP_URL = "https://drive.switch.ch/index.php/s/nIF5agoktDJP3li/download"  # change this sample URL to your own URL

Optional inputs:

In [13]:
WORD_REMOVE = None  # list of words to be removed from the list of entities (false positives), e.g., WORD_REMOVE = ["Händeklatschen", "Salpeter"]
PERSON_NAMES = None  # list of persons to be added to the model, e.g., PERSON_NAMES = ["Max Mustermann", "Ada Lovelace"]
LOCATION_NAMES = None  # list of locations to be added to the model, e.g., LOCATION_NAMES = ["Basel", "Mittelerde"]

Train the model (this can take some time):

In [14]:
from custom_ner_de.client import Client
my_client = Client()
my_client.train_model(zip_url=ZIP_URL,
                      word_remove=WORD_REMOVE,
                      person_names=PERSON_NAMES,
                      location_names=LOCATION_NAMES,
                      epochs=100)

Downloading Zip file... done.
Loading gold standard... done.
Splitting gold standard into training and validation... done (training: 1531, validation: 171).
Training 100 epochs... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 
Training completed, saved model to C:\Users\HINDER~1\AppData\Local\Temp\tmphd85it9j.


Save the model to the `/custom-ner-de/user_output/models/` directory:

In [15]:
my_client.save_model()

Saved model to C:\Users\hinder0000\PycharmProjects\custom-ner-de/user_output/models/2022-06-27-13-51-21.


If you want to keep the model, you must download it from this directory to your local machine (Binder will reset after you close your browser).


## Evaluate a custom NER model

Optional inputs:

In [16]:
MODEL_PATH = None  # complete path to custom NER model directory (if no input is provided, the previously trained model is loaded)

Evaluate model:

In [17]:
my_client.evaluate_model(model_path=MODEL_PATH)

Loading custom NER model... C:\Users\HINDER~1\AppData\Local\Temp\tmphd85it9j loaded.
Evaluation of C:\Users\HINDER~1\AppData\Local\Temp\tmphd85it9j completed.
-- Model scores:
Precision:  0.8282828282828283
Recall:  0.7909967845659164
F1 Score:  0.8092105263157895
-- Person entity scores:
Precision:  0.8271604938271605
Recall:  0.7570621468926554
F1 Score:  0.7905604719764012
-- Location entity scores:
Precision:  0.8296296296296296
Recall:  0.835820895522388
F1 Score:  0.8327137546468402


## Apply the custom NER model to new German texts transcribed in Transkribus

Input URL to plain text file (for example a public link pointing to a file on SWITCHdrive) to which to custom NER model should be applied:

In [18]:
TEXT_URL = "https://drive.switch.ch/index.php/s/4eBBIImulcOfMf7/download"  # change this sample URL to your own URL

Optional inputs:

In [19]:
MODEL_PATH = None  # complete path to custom NER model directory (if no input is provided, the previously trained model is loaded)

Apply the model to the text:

In [20]:
my_client.apply_model(text_url=TEXT_URL,
                      model_path=MODEL_PATH)

Downloading TXT file... done.
Loading custom NER model... C:\Users\HINDER~1\AppData\Local\Temp\tmphd85it9j loaded.
Applying custom NER model to TXT file... done.


Display the result:

In [21]:
my_client.result

Unnamed: 0,text,persons,locations
0,Stenographisches Protokoll,[Stenographisches Protokoll],[]
1,der,[],[]
2,Verhandlungen,[],[]
3,des,[],[]
4,,[],[]
...,...,...,...
16564,,[],[]
16565,,[],[]
16566,,[],[]
16567,,[],[]


Save the result to the `/custom-ner-de/user_output/results/` directory:

In [22]:
my_client.save_result2csv()

Saved result to C:\Users\hinder0000\PycharmProjects\custom-ner-de/user_output/results/2022-06-27-13-52-03.csv.


If you want to keep the result, you must download it from this directory to your local machine (Binder will reset after you close your browser).