Note that this repository is no longer maintained. See https://github.com/af-ai-center/SweBERT instead.
Swedish pre-trained models for BERT.
BERT, or Bidirectional Encoder Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks.
BERT is based on the Transformer architecture introduced in Attention is all you need.
Googles academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: https://arxiv.org/abs/1810.04805.
Googles Github repository where the original english models can be found here: https://github.com/google-research/bert.
Included in the downloads below are PyTorch versions of the models based on the work of NLP researchers from HuggingFace. PyTorch version of BERT available
BERT is a method of pre-training language representations, meaning that we train a general-purpose "language understanding" model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised, deeply bidirectional system for pre-training NLP.
Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages.
We used Swedish Wikipedia with approximatelly 2 million articles and 300 million words.
The links to the models are here (right-click, 'Save link as...' on the name):
Swedish BERT-Base, Uncased
: 12-layer, 768-hidden, 12-heads, 110M parametersSwedish BERT-Large, Uncased
: 24-layer, 1024-hidden, 16-heads, 340M parameters
This is initial pretrained Swedish versions of BERT models based on a smaller corpus than the original English versions.
If you find these models useful or if you have suggestions for how they can be improved, please submit a GitHub issue.
For personal communication related to these Swedish versions of BERT, please contact Magnus Bjelkenhed
(magnus.bjelkenhed@arbetsformedlingen.se
) or Mattias Bielsa (mattias.bielsa@arbetsformedlingen.se
)