This repository shares smaller versions of multilingual transformers that keep the same representations offered by the original ones. The idea came from a simple observation: after massively multilingual pretraining, not all embeddings are needed to perform finetuning and inference. In practice one would rarely require a model that supports more than 100 languages as the original mBERT. Therefore, we extracted several smaller versions that handle fewer languages. Since most of the parameters of multilingual transformers are located in the embeddings layer, our models are up to 64% smaller in size.
The table bellow compares two of our exracted versions with the original mBERT. It shows the models size, memory footprint and the obtained accuracy on the XNLI dataset (Cross-lingual Transfer from english for french). These measurements have been computed on a Google Cloud n1-standard-1 machine (1 vCPU, 3.75 GB).
Model | Num parameters | Size | Memory | Accuracy |
---|---|---|---|---|
bert-base-multilingual-cased | 178 million | 714 MB | 1400 MB | 73.8 |
Geotrend/bert-base-15lang-cased | 141 million | 564 MB | 1098 MB | 74.1 |
Geotrend/bert-base-en-fr-cased | 112 million | 447 MB | 878 MB | 73.8 |
Reducing the size of multilingual transformers facilitates their deployment on public cloud platforms. For instance, Google Cloud Platform requires that the model size on disk should be lower than 500 MB for serveless deployments (Cloud Functions / Cloud ML).
For more information, please refer to our paper: Load What You Need.
*** New August 2021: smaller versions of distil-mBERT are now available ! ***
Model | Num parameters | Size | Memory |
---|---|---|---|
distilbert-base-multilingual-cased | 178 million | 542 MB | 1200 MB |
Geotrend/distilbert-base-en-fr-cased | 69 million | 277 MB | 740 MB |
🚀 To our knowledge, these distil-mBERT based versions are the smallest and fastest multilingual transformers that we are aware of.
Until now, we generated a total of 138 models (70 extracted from mBERT and 68 extracted from distil-mBERT). These models have been uploaded to the Hugging Face Model Hub in order to facilitate their use: https://huggingface.co/Geotrend.
They can be downloaded easily using the transformers library:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-fr-cased")
model = AutoModel.from_pretrained("Geotrend/bert-base-en-fr-cased")
More models will be released soon.
We also share a python script that allows users to generate smaller transformers by their own based on a subset of the original vocabulary (the method does not only concern multilingual transformers):
pip install -r requirements.txt
python3 reduce_model.py \
--source_model bert-base-multilingual-cased \
--vocab_file vocab_5langs.txt \
--output_model bert-base-5lang-cased \
--convert_to_tf False
Where:
--source_model
is the multilingual transformer to reduce--vocab_file
is the intended vocabulary file path--output_model
is the name of the final reduced model--convert_to_tf
tells the scipt whether to generate a tenserflow version or not
@inproceedings{smallermbert,
title={Load What You Need: Smaller Versions of Multilingual BERT},
author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
booktitle={SustaiNLP / EMNLP},
year={2020}
}
Please contact amin.geotrend@gmail.com for any question, feedback or request.