Major Changes

Adding Roberta & XLNet

Welcome RoBERTa and XLNet on the FARM 🎉!
We did some intense refactoring in FARM to make it easier to add more language models. However, we will only add models where we see some decent advantages. One of the next models to follow will very likely be ALBERT ...

For now, we support Roberta/XLNet on (Multilabel) Textclassification, Text Regression and NER. QA will follow soon.

⚠️ Breaking Change - Loading of Language models has changed:
Bert.load("bert-base-cased") -> LanguageModel.load("bert-base-cased")

Migrating to tokenizers from the transformers repo.

Pros:

It's quite easy to add a tokenizer for any of the models implemented in transformers.
We rather support the development there than building something in parallel
The additional metadata during tokenization (offsets, start_of_word) is still created via tokenize_with_metadata
We can use encode_plus to add model specific special tokens (CLS, SEP ...)

Cons:

We had to deprecate our attribute "never_split_chars" that allowed to adjust the BasicTokenizer of BERT.
Custom vocab is now realized by increasing vocab_size instead of replacing unused tokens

⚠️ Breaking Change - Loading of tokenizers has changed:
BertTokenizer.from_pretrained("bert-base-cased") -> Tokenizer.load("bert-base-cased")

⚠️ Breaking Change - never_split_chars:
is no longer supported as an argument for the Tokenizer

Modelling:

[enhancement] Add Roberta, XLNet and redesign Tokenizer #125
[bug] fix loading of old tokenizer style #129

Data Handling:

[bug] Fix name of squad labels in experiment config #121
[bug] change arg in squadprocessor from labels to label_list #123

Inference:

[enhancement] Add option to disable multiprocessing in Inferencer(#117) #128
[bug] Fix logging verbosity in Inferencer (#117) #122

Other

[enhancement] Tutorial update #116
[enhancement] Update docs for api/ui docker #118

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.3.0