0.3.0
Major Changes
Adding Roberta & XLNet
Welcome RoBERTa and XLNet on the FARM 馃帀!
We did some intense refactoring in FARM to make it easier to add more language models. However, we will only add models where we see some decent advantages. One of the next models to follow will very likely be ALBERT ...
For now, we support Roberta/XLNet on (Multilabel) Textclassification, Text Regression and NER. QA will follow soon.
Bert.load("bert-base-cased") -> LanguageModel.load("bert-base-cased")
Migrating to tokenizers from the transformers repo.
Pros:
- It's quite easy to add a tokenizer for any of the models implemented in transformers.
- We rather support the development there than building something in parallel
- The additional metadata during tokenization (offsets, start_of_word) is still created via tokenize_with_metadata
- We can use encode_plus to add model specific special tokens (CLS, SEP ...)
Cons:
- We had to deprecate our attribute "never_split_chars" that allowed to adjust the BasicTokenizer of BERT.
- Custom vocab is now realized by increasing vocab_size instead of replacing unused tokens
BertTokenizer.from_pretrained("bert-base-cased") -> Tokenizer.load("bert-base-cased")
is no longer supported as an argument for the Tokenizer
Modelling:
- [enhancement] Add Roberta, XLNet and redesign Tokenizer #125
- [bug] fix loading of old tokenizer style #129
Data Handling:
- [bug] Fix name of squad labels in experiment config #121
- [bug] change arg in squadprocessor from labels to label_list #123
Inference:
- [enhancement] Add option to disable multiprocessing in Inferencer(#117) #128
- [bug] Fix logging verbosity in Inferencer (#117) #122