Skip to content

ZurichNLP/swissbert

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

http://img.shields.io/badge/arXiv-2303.13310-orange.svg?style=flat

SwissBERT is a transformer encoder with language adapters in each layer. There is an adapter for each national language of Switzerland. The other parameters in the model are shared among the four languages.

SwissBERT is a masked language model for processing Switzerland-related text. It has been trained on more than 21 million Swiss news articles retrieved from Swissdox@LiRI.

The model is based on X-MOD, which has been pre-trained with language adapters in 81 languages. SwissBERT contains adapters for the national languages of Switzerland – German, French, Italian, and Romansh Grischun. In addition, it uses a Switzerland-specific subword vocabulary.

The easiest way to use SwissBERT is via the transformers library and the Hugging Face model hub: https://huggingface.co/ZurichNLP/swissbert

More information on the model design and evaluation is provided in our paper "SwissBERT: The Multilingual Language Model for Switzerland" (SwissText 2023).

License

  • This code repository: MIT license
  • Model: Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

Pre-training code

See pretraining

Evaluation code

SwissNER

See evaluation/swissner/notebook.ipynb

HIPE-2022

See evaluation/hipe2022/notebook.ipynb

X-Stance

See evaluation/xstance/notebook.ipynb

German–Romansh alignment

See evaluation/romansh_alignment/notebook.ipynb

Citation

@inproceedings{vamvas-etal-2023-swissbert,
    title = "{S}wiss{BERT}: The Multilingual Language Model for {S}witzerland",
    author = {Vamvas, Jannis  and
      Gra{\"e}n, Johannes  and
      Sennrich, Rico},
    editor = {Ghorbel, Hatem  and
      Sokhn, Maria  and
      Cieliebak, Mark  and
      H{\"u}rlimann, Manuela  and
      de Salis, Emmanuel  and
      Guerne, Jonathan},
    booktitle = "Proceedings of the 8th edition of the Swiss Text Analytics Conference",
    month = jun,
    year = "2023",
    address = "Neuchatel, Switzerland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.swisstext-1.6",
    pages = "54--69",
}

About

The multilingual language model for Switzerland

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages