# Task 1: Annotating Text with FlairNLP

Flair is a powerful Sequence-Tagging framework that is easy to use for prediction, but also to train and finetune your own models with. The contextual character embeddings, that you have learned about are a central feature of flair, but you can use most other embedding types without much hassle.

Flair is a deep learning architecture and will run much faster when you use a machine with a CUDA-compatible video card. Creating predictions
will usually be fast enough without a video card, but it is strongly recommended when training a model.

In this notebook, you learn how to use a pre-trained model to annotate text.

You can find more helpful tutorials at https://flairnlp.github.io/.

In [None]:
# if flair isn't installed, install it (this can take a moment)
%pip install flair

In [2]:
# import the necessary parts of the library
from flair.data import Sentence
from flair.nn import Classifier

## Loading Models
Flair can retrieve models from two sources with the `load()` method, either from a local directory or from [Huggingface](https://huggingface.co/). There is also a list of abbreviated model names that can be found in the [flair tutorial](https://flairnlp.github.io/docs/tutorial-basics/tagging-entities) such as `ner-fast`, which will load a small version of the default NER model for english.

If you want to use a local model, simply write the path instead of a model name.


In [None]:
# load a classifier
tagger = Classifier.load('ner-fast')

In [7]:
# create a sentence object that will be predicted
sentence = Sentence('Here you can see the land of East Anglia over the Humber estuary to York in Northumbria, and there was great discord among the people themselves, and they had overthrown their king Osbryht, and had accepted an illegitimate king named Ælla.')

In [8]:
# predict the annotations
tagger.predict(sentence)

In [None]:
# show the annotations
for label in sentence.get_labels():
    print(label)

As you can see, the sentence objects holds now information for all predicted annotations, including the indices from where to where the annotation goes, the category and a confidence score.

## Using pre-trained models on historical text

Try the original passage instead now:

> Her for se here of East Englum ofer Humbremuþan to Eoforwicceastre on Norþhymbre, ond þær wæs micel ungeþuærnes þære þeode betweox him selfum, ond hie hæfdun hiera cyning aworpenne Osbryht, ond ungecyndne cyning underfengon Ællan;

How well are the entities recognized?

Experiment further:
How well does the recognition work when you lowercase the sentence?
How do other models perform?

Test how your annotation on your own data looks when using the available models.

## Accessing the annotations
In the next part, I will show how the annotations are accessible inside the sentence object. How to transform them for further use will depend on what you want to do with them.

In [None]:
# annotations are stored as Span objects
annotations = sentence.get_spans("ner")
print(annotations[0])

In [None]:
# we can access all relevant information through these objects
annotation = annotations[0]

print("Start Index:", annotation.start_position)  # note this returns character index, not token index
print("End Index:", annotation.end_position)  # note this returns character index, not token index
print("Text:", annotation.text)
print("Text as Token Objects:", annotation.tokens)
print("Confidence:", annotation.get_labels("ner")[0].score)

In [None]:
# we could now for example use this to create some pseudo-TEI formatted string
tags = [{} for i in range(len(sentence))]
for entity in sentence.get_spans():
    # start
    start = entity.tokens[0].idx - 1
    tags[start]["start"] = "<" + entity.get_labels()[0].value + ">"
    # end
    end = entity.tokens[-1].idx - 1
    tags[end]["end"] = "</" + entity.get_labels()[0].value + ">"

out_list = []
for token, tag in zip(sentence, tags):
    if "start" in tag:
        out_list.append(tag["start"])
    out_list.append(token.text)
    if "end" in tag:
        out_list.append(tag["end"])

out_xml = "<tei>" + " ".join(out_list) + "</tei>"

print(out_xml)