#### Multilingual Models

We distribute new models that are capable of handling text in multiple languages within a singular model. 

The NER models are trained over 4 languages (English, German, Dutch and Spanish).

| ID | Task | Language | Training Dataset | Accuracy | Contributor / Notes |
| -------------    | ------------- |------------- |------------- | ------------- | ------------- |
| '[ner-multi](https://huggingface.co/flair/ner-multi)' | NER (4-class) | Multilingual | Conll-03   |  **89.27**  (average F1) | (4 languages)
| '[ner-multi-fast](https://huggingface.co/flair/ner-multi-fast)' | NER (4-class)|  Multilingual |  Conll-03   |  **87.91**  (average F1) | (4 languages)

You can pass text in any of these languages to the model. In particular, the NER also kind of works for languages it was not trained on, such as French.

#### NER MULTI 4-class

In [1]:
from flair.data import Corpus, MultiCorpus

from flair.models import SequenceTagger

from flair.datasets import CONLL_03, CONLL_03_GERMAN, CONLL_03_DUTCH, CONLL_03_SPANISH
from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings

# 1. get the multi-language corpus
corpus: Corpus = MultiCorpus([
    CONLL_03(),         # English corpus
#     CONLL_03_GERMAN(),  # German corpus
#     CONLL_03_DUTCH(),   # Dutch corpus
#     CONLL_03_SPANISH(), # Spanish corpus
    ])

# 2. what tag do we want to predict?
tag_type = 'ner'

# 3. make the tag dictionary from the corpus
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)

# 4. initialize each embedding we use
embedding_types = [

    # GloVe embeddings
    WordEmbeddings('glove'),

    # FastText embeddings
    WordEmbeddings('de'),

    # contextual string embeddings, forward
    FlairEmbeddings('multi-forward'),

    # contextual string embeddings, backward
    FlairEmbeddings('multi-backward'),
]

# embedding stack consists of Flair and GloVe embeddings
embeddings = StackedEmbeddings(embeddings=embedding_types)

2021-05-16 22:57:26,578 Reading data from C:\Users\Bernard\.flair\datasets\conll_03
2021-05-16 22:57:26,579 Train: C:\Users\Bernard\.flair\datasets\conll_03\train.txt
2021-05-16 22:57:26,580 Dev: C:\Users\Bernard\.flair\datasets\conll_03\dev.txt
2021-05-16 22:57:26,580 Test: C:\Users\Bernard\.flair\datasets\conll_03\test.txt


In [2]:
# 5. initialize sequence tagger
from flair.models import SequenceTagger

In [3]:
tagger = SequenceTagger(hidden_size=256,
                        embeddings=embeddings,
                        tag_dictionary=tag_dictionary,
                        tag_type=tag_type )

# 6. initialize trainer
from flair.trainers import ModelTrainer

trainer = ModelTrainer(tagger, corpus)

In [4]:
# 7. run training
trainer.train('resources/taggers/ner-multi',
              train_with_dev=False,
              max_epochs=3)

2021-05-16 22:57:40,717 ----------------------------------------------------------------------------------------------------
2021-05-16 22:57:40,718 Model: "SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): WordEmbeddings('glove')
    (list_embedding_1): WordEmbeddings('de')
    (list_embedding_2): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.1, inplace=False)
        (encoder): Embedding(11854, 100)
        (rnn): LSTM(100, 2048)
        (decoder): Linear(in_features=2048, out_features=11854, bias=True)
      )
    )
    (list_embedding_3): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.1, inplace=False)
        (encoder): Embedding(11854, 100)
        (rnn): LSTM(100, 2048)
        (decoder): Linear(in_features=2048, out_features=11854, bias=True)
      )
    )
  )
  (word_dropout): WordDropout(p=0.05)
  (locked_dropout): LockedDropout(p=0.5)
  (embedding2nn): Linear(in_features=4496, out_features=4496, bias

{'test_score': 0.8791247887574491,
 'dev_score_history': [0.8299498643872771,
  0.9053001758352173,
  0.9096599560142108],
 'train_loss_history': [4.178048132960476,
  1.7326754077411155,
  1.3194100373208142],
 'dev_loss_history': [1.6980996131896973,
  0.9827048182487488,
  0.8215925693511963]}