Skip to content
Beto - Spanish version of the BERT model
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

** This is work in progress **

BETO: Spanish BERT

BETO is a BERT model trained on a big Spanish corpus. BETO is of size similar to a BERT-Base and was trained with the Whole Word Masking technique. Below you find Tensorflow and Pytorch checkpoints for the uncased and cased versions, as well as some results for Spanish benchmarks comparing BETO with Multilingual BERT as well as other (not BERT-based) models.


BETO uncased tensorflow weights pytorch weights vocab config
BETO cased tensorflow weights pytorch weights vocab config

All models use a vocabulary of about 31k BPE subwords constructed using SentencePiece.


The following table shows some BETO results in the Spanish version of every task. We compare BETO (cased and uncased) with the Best Multilingual BERT results that we found in the literature (as of October 2019) highlighting the results whenever BETO ourperform Multilingual BERT for the Spanish task. The table also shows some alternative methods for the same tasks (not necessarily BERT-based methods). References for all methods can be found here.

Task BETO-cased BETO-uncased Best Multilingual BERT Other results
XNLI ----- 80.15 78.50 [2] 80.80 [5], 77.80 [1], 73.15 [4]
POS ----- 98.44 97.10 [2] 98.91 [6], 96.71 [3]
PAWS-X ----- 89.55 90.70 [8]
NER-C ----- 81.70 87.38 [2] 87.18 [3]

Example of use

For further details on how to use BETO you can visit the amazing 🤗Transformers repo, starting by the Quickstart section.


We thank Adereso for kindly providing support for traininig BETO-uncased, and the Millennium Institute for Foundational Research on Data that provided support for training BETO-cased.


You can’t perform that action at this time.