# Binder for custom-ner-de

This binder enables you to train, evaluate, and apply a custom NER model using spaCy. The only prerequisite is that you know how to annotate in and export documents from Transkribus. That's it!

## Train a custom NER model using spaCy with German texts annotated in Transkribus

Input URL to an annotated PAGE XML Zip file exported from Transkribus (for example a public link pointing to a file on SWITCHdrive):

In [1]:
ZIP_URL = "https://drive.switch.ch/index.php/s/nIF5agoktDJP3li/download"  # change this sample URL to your own URL

Optional inputs:

In [2]:
WORD_REMOVE = None  # list of words to be removed from the list of entities (false positives), e.g., WORD_REMOVE = ["Händeklatschen", "Salpeter"]
PERSON_NAMES = None  # list of persons to be added to the model, e.g., PERSON_NAMES = ["Max Mustermann", "Ada Lovelace"]
LOCATION_NAMES = None  # list of locations to be added to the model, e.g., LOCATION_NAMES = ["Basel", "Mittelerde"]

Train the model (this can take some time):

In [3]:
from custom_ner_de.client import Client
my_client = Client()
my_client.train_model(zip_url=ZIP_URL,
                      word_remove=WORD_REMOVE,
                      person_names=PERSON_NAMES,
                      location_names=LOCATION_NAMES,
                      epochs=1)

Downloading Zip file... done.
Loading gold standard... done.
Splitting gold standard into training and validation... done (training: 1531, validation: 171).
Training 1 epochs... 1 
Training completed, saved model to C:\Users\HINDER~1\AppData\Local\Temp\tmpbpuj73ex.


Save the model to the `/custom-ner-de/user_output/models/` directory:

In [4]:
my_client.save_model()

Saved model to C:\Users\hinder0000\PycharmProjects\custom-ner-de/user_output/models/2022-06-27-11-46-34.


If you want to keep the model, you must download it from this directory to your local machine (Binder will reset after you close your browser).


## Evaluate a custom NER model

Optional inputs:

In [5]:
MODEL_PATH = None  # complete path to custom NER model directory (if no input is provided, the previously trained model is loaded)

Evaluate model:

In [6]:
my_client.evaluate_model(model_path=MODEL_PATH)

Loading custom NER model... C:\Users\HINDER~1\AppData\Local\Temp\tmpbpuj73ex loaded.
Evaluation of C:\Users\HINDER~1\AppData\Local\Temp\tmpbpuj73ex completed.
-- Model scores:
Precision:  0.6191950464396285
Recall:  0.6920415224913494
F1 Score:  0.6535947712418301
-- Person entity scores:
Precision:  0.7777777777777778
Recall:  0.625
F1 Score:  0.6930693069306931
-- Location entity scores:
Precision:  0.5053191489361702
Recall:  0.7851239669421488
F1 Score:  0.6148867313915858


## Apply the custom NER model to new German texts transcribed in Transkribus

Input URL to plain text file (for example a public link pointing to a file on SWITCHdrive) to which to custom NER model should be applied:

In [7]:
TEXT_URL = "https://drive.switch.ch/index.php/s/4eBBIImulcOfMf7/download"  # change this sample URL to your own URL

Optional inputs:

In [8]:
MODEL_PATH = None  # complete path to custom NER model directory (if no input is provided, the previously trained model is loaded)

Apply the model to the text:

In [9]:
my_client.apply_model(text_url=TEXT_URL,
                      model_path=MODEL_PATH)

Downloading TXT file... done.
Loading custom NER model... C:\Users\HINDER~1\AppData\Local\Temp\tmpbpuj73ex loaded.
Applying custom NER model to TXT file... done.


Display the result:

In [14]:
my_client.result

Unnamed: 0,text,persons,locations
0,Stenographisches Protokoll,[Stenographisches Protokoll],[]
1,der,[],[]
2,Verhandlungen,[],[]
3,des,[],[]
4,,[],[]
...,...,...,...
16564,,[],[]
16565,,[],[]
16566,,[],[]
16567,,[],[]


Save the result to the `/custom-ner-de/user_output/results/` directory:

In [11]:
my_client.save_result2csv()

Saved result to C:\Users\hinder0000\PycharmProjects\custom-ner-de/user_output/results/2022-06-27-11-47-11.csv.


If you want to keep the result, you must download it from this directory to your local machine (Binder will reset after you close your browser).