## Tagging with Pre-Trained Sequence Tagging Models

Let's use a pre-trained model for named entity recognition (NER). This model was trained over the English CoNLL-03 task and can recognize 4 different entity types.

In [1]:
from flair.data import Sentence
from flair.models import SequenceTagger

tagger = SequenceTagger.load('ner')

Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
2019-03-21 15:32:53,538 https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/models-v0.4/NER-conll03-english/en-ner-conll03-v0.4.pt not found in cache, downloading to /tmp/tmp21qyoa0b


100%|██████████| 432197603/432197603 [00:18<00:00, 23356545.83B/s]

2019-03-21 15:33:12,386 copying /tmp/tmp21qyoa0b to cache at /home/quick/.flair/models/en-ner-conll03-v0.4.pt





2019-03-21 15:33:12,592 removing temp file /tmp/tmp21qyoa0b
2019-03-21 15:33:12,593 loading file /home/quick/.flair/models/en-ner-conll03-v0.4.pt


In [3]:
sentence = Sentence('George Washington went to Washington .')

# predict NER tags
tagger.predict(sentence)

# print sentence with predicted tags
print(sentence.to_tagged_string())

George <B-PER> Washington <E-PER> went to Washington <S-LOC> .


In [4]:
for entity in sentence.get_spans('ner'):
    print(entity)

PER-span [1,2]: "George Washington"
LOC-span [5]: "Washington"


Which indicates that "George Washington" is a person (PER) and "Washington" is a location (LOC). Each such Span has a text, a tag value, its position in the sentence and "score" that indicates how confident the tagger is that the prediction is correct. You can also get additional information, such as the position offsets of each entity in the sentence by calling:

In [6]:
from pprint import pprint
pprint(sentence.to_dict(tag_type='ner'))

{'entities': [{'confidence': 0.9967881441116333,
               'end_pos': 17,
               'start_pos': 0,
               'text': 'George Washington',
               'type': 'PER'},
              {'confidence': 0.9993712306022644,
               'end_pos': 36,
               'start_pos': 26,
               'text': 'Washington',
               'type': 'LOC'}],
 'labels': [],
 'text': 'George Washington went to Washington .'}


### List of Pre-Trained Sequence Tagger Models

You choose which pre-trained model you load by passing the appropriate
string to the `load()` method of the `SequenceTagger` class. Currently, the following pre-trained models
are provided:

#### English Models

| ID | Task | Training Dataset | Accuracy |
| -------------    | ------------- |------------- |------------- |
| 'ner' | 4-class Named Entity Recognition |  Conll-03  |  **93.24** (F1) |
| 'ner-ontonotes' | 12-class Named Entity Recognition |  Ontonotes  |  **89.52** (F1) |
| 'chunk' |  Syntactic Chunking   |  Conll-2000     |  **96.61** (F1) |
| 'pos' |  Part-of-Speech Tagging |  Ontonotes     |  **98.01** (Accuracy) |
| 'frame'  |   Semantic Frame Detection  (***Experimental***)|  Propbank 3.0     |  **93.92** (F1) |



## Tagging a List of Sentences


Often, you may want to tag an entire text corpus. In this case, you need to split the corpus into sentences and pass a list of Sentence objects to the .predict() method.

For instance, you can use the sentence splitter of segtok to split your text:

In [10]:
# your text of many sentences
text = "This is a sentence. This is another sentence. I love Berlin. George Washington went to Washington ."

# use a library to split into sentences
from segtok.segmenter import split_single
sentences = [Sentence(sent, use_tokenizer=True) for sent in split_single(text)]

# predict tags for list of sentences
tagger: SequenceTagger = SequenceTagger.load('ner')
tagger.predict(sentences)

2019-03-21 15:39:26,035 loading file /home/quick/.flair/models/en-ner-conll03-v0.4.pt


[Sentence: "This is a sentence ." - 5 Tokens,
 Sentence: "This is another sentence ." - 5 Tokens,
 Sentence: "I love Berlin ." - 4 Tokens,
 Sentence: "George Washington went to Washington ." - 6 Tokens]

In [18]:
for sentence in sentences:
    print(sentence.to_tagged_string())
    for entity in sentence.get_spans('ner'):
        print("\t",entity)

This is a sentence .
This is another sentence .
I love Berlin <S-LOC> .
	 LOC-span [3]: "Berlin"
George <B-PER> Washington <E-PER> went to Washington <S-LOC> .
	 PER-span [1,2]: "George Washington"
	 LOC-span [5]: "Washington"


## Tagging with Pre-Trained Text Classification Models


Let's use a pre-trained model for detecting positive or negative comments. This model was trained over the IMDB dataset and can recognize positive and negative sentiment in English text.

In [19]:
from flair.models import TextClassifier

classifier = TextClassifier.load('en-sentiment')

sentence = Sentence('This film hurts. It is so bad that I am confused.')

# predict NER tags
classifier.predict(sentence)

# print sentence with predicted labels
print(sentence.labels)

2019-03-21 15:41:19,848 https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/models-v0.4/TEXT-CLASSIFICATION_imdb/imdb.pt not found in cache, downloading to /tmp/tmpt6vul3wr


100%|██████████| 2794252905/2794252905 [02:05<00:00, 22265517.33B/s]

2019-03-21 15:43:25,708 copying /tmp/tmpt6vul3wr to cache at /home/quick/.flair/models/imdb.pt





2019-03-21 15:43:27,025 removing temp file /tmp/tmpt6vul3wr
2019-03-21 15:43:27,025 loading file /home/quick/.flair/models/imdb.pt


  result = unpickler.load()


[NEGATIVE (1.0)]


### List of Pre-Trained Text Classification Models

You choose which pre-trained model you load by passing the appropriate
string to the `load()` method of the `TextClassifier` class. Currently, the following pre-trained models
are provided:

| ID | Language | Task | Training Dataset | Accuracy |
| ------------- | ---- | ------------- |------------- |------------- |
| 'en-sentiment' | English | detecting positive and negative sentiment | movie reviews from [IMDB](http://ai.stanford.edu/~amaas/data/sentiment/)  |  **90.54** (Micro F1) |
| 'de-offensive-language' | German | detecting offensive language | [GermEval 2018 Task 1](https://projects.fzai.h-da.de/iggsa/projekt/) |  **75.71** (Macro F1) |

