Skip to content

Latest commit

 

History

History
74 lines (65 loc) · 3.75 KB

File metadata and controls

74 lines (65 loc) · 3.75 KB

M-BERT Base 69

Huggingface Model · Huggingface Base Model

Usage

To use this model along with the original CLIP vision encoder follow the main page usage instructions to download the additional linear weights. Once this is done, you can load and use the model with the following code

from mclip import multilingual_clip

model = multilingual_clip.load_model('M-BERT-Base-69')
embeddings = model(['Älgen är skogens konung!', 'Wie leben Eisbären in der Antarktis?', 'Вы знали, что все белые медведи левши?'])
print(embeddings.shape)
# Yields: torch.Size([3, 640])

About

A bert-base-multilingual tuned to match the embedding space for 69 languages, to the embedding space of the CLIP text encoder which accompanies the Res50x4 vision encoder.
A full list of the 100 languages used during pre-training can be found here, and a list of the 69 languages used during fine-tuning can be found in SupportedLanguages.md.

Training data pairs was generated by sampling 40k sentences for each language from the combined descriptions of GCC + MSCOCO + VizWiz, and translating them into the corresponding language. All translation was done using the AWS translate service, the quality of these translations have currently not been analyzed, but one can assume the quality varies between the 40 languages.