To choose the best translator model available on Hugging Face, I would recommend **MarianMT**.

### Reasons:
1. **Wide Language Support**: MarianMT supports a vast number of language pairs, making it versatile for various translation tasks.
2. **Pre-trained by Facebook**: This model is backed by Facebook’s research, ensuring high-quality translations.
3. **Fast and Efficient**: It's designed for quick inference, suitable for real-time translation applications.
4. **Customizability**: Being open-source, it allows for fine-tuning and adaptation to specific needs.

MarianMT is a robust choice for most translation tasks.

Overview:

A framework for translation models, using the same models as BART. Translations should be similar, but not identical to output in the test set linked to in each model card. This model was contributed by sshleifer.

Implementation Notes:

* Each model is about 298 MB on disk, there are more than 1,000 models.
* The list of supported language pairs can be found in https://huggingface.co/Helsinki-NLP
* Models were originally trained by Jörg Tiedemann using the Marian C++ library, which supports fast training and translation.
* All models are transformer encoder-decoders with 6 layers in each component. Each model’s performance is documented in a model card.
* The 80 opus models that require BPE preprocessing are not supported.
* The modeling code is the same as BartForConditionalGeneration with a few  minor modifications:

  * static (sinusoid) positional embeddings (MarianConfig.static_position_embeddings=True)
  * no layernorm_embedding (MarianConfig.normalize_embedding=False)
  * the model starts generating with pad_token_id (which has 0 as a token_embedding) as the prefix (Bart uses)
* Code to bulk convert models can be found in convert_marian_to_pytorch.py.

Naming:

* All model names use the following format: Helsinki-NLP/opus-mt-{src}-{tgt}
* The language codes used to name models are inconsistent. Two digit codes can usually be found here, three digit codes require googling language code {code}.
* Codes formatted like es_AR are usually code_{region}. That one is Spanish from Argentina.
* The models were converted in two stages. The first 1000 models use ISO-639-2 codes to identify languages, the second group use a combination of ISO-639-5 codes and ISO-639-2 codes.

Examples:

* Since Marian models are smaller than many other translation models available in the library, they can be useful for fine-tuning experiments and integration tests.
* Fine-tune on GPU

Multilingual Models:

* All model names use the following format: Helsinki-NLP/opus-mt-{src}-{tgt}:
* If a model can output multiple languages, and you should specify a language code by prepending the desired output language to the src_text.
* You can see a models’s supported language codes in its model card, under target constituents, like in opus-mt-en-roa.
* Note that if a model is only multilingual on the source side, like Helsinki-NLP/opus-mt-roa-en, no language codes are required.


New multi-lingual models from the Tatoeba-Challenge repo require 3 character language codes:

In [1]:
from transformers import MarianMTModel, MarianTokenizer

src_text = [
    ">>fra<< this is a sentence in english that we want to translate to french",
    ">>por<< This should go to portuguese",
    ">>esp<< And this to Spanish",
]

model_name = "Helsinki-NLP/opus-mt-en-roa"
tokenizer = MarianTokenizer.from_pretrained(model_name)
print(tokenizer.supported_language_codes)

model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
[tokenizer.decode(t, skip_special_tokens=True) for t in translated]

tokenizer_config.json:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/786k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/793k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.26M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]



['>>zlm_Latn<<', '>>mfe<<', '>>hat<<', '>>pap<<', '>>ast<<', '>>cat<<', '>>ind<<', '>>glg<<', '>>wln<<', '>>spa<<', '>>fra<<', '>>ron<<', '>>por<<', '>>ita<<', '>>oci<<', '>>arg<<', '>>min<<']


pytorch_model.bin:   0%|          | 0.00/295M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

["c'est une phrase en anglais que nous voulons traduire en français",
 'Isto deve ir para o português',
 'esto al español']

Here is the code to see all available pretrained models on the hub:

In [2]:
from huggingface_hub import list_models

model_list = list_models()
org = "Helsinki-NLP"
model_ids = [x.id for x in model_list if x.id.startswith(org)]
suffix = [x.split("/")[1] for x in model_ids]
old_style_multi_models = [f"{org}/{s}" for s in suffix if s != s.lower()]

Old Style Multi-Lingual Models:

These are the old style multi-lingual models ported from the OPUS-MT-Train repo: and the members of each language group:

In [3]:
['Helsinki-NLP/opus-mt-NORTH_EU-NORTH_EU',
 'Helsinki-NLP/opus-mt-ROMANCE-en',
 'Helsinki-NLP/opus-mt-SCANDINAVIA-SCANDINAVIA',
 'Helsinki-NLP/opus-mt-de-ZH',
 'Helsinki-NLP/opus-mt-en-CELTIC',
 'Helsinki-NLP/opus-mt-en-ROMANCE',
 'Helsinki-NLP/opus-mt-es-NORWAY',
 'Helsinki-NLP/opus-mt-fi-NORWAY',
 'Helsinki-NLP/opus-mt-fi-ZH',
 'Helsinki-NLP/opus-mt-fi_nb_no_nn_ru_sv_en-SAMI',
 'Helsinki-NLP/opus-mt-sv-NORWAY',
 'Helsinki-NLP/opus-mt-sv-ZH']
GROUP_MEMBERS = {
 'ZH': ['cmn', 'cn', 'yue', 'ze_zh', 'zh_cn', 'zh_CN', 'zh_HK', 'zh_tw', 'zh_TW', 'zh_yue', 'zhs', 'zht', 'zh'],
 'ROMANCE': ['fr', 'fr_BE', 'fr_CA', 'fr_FR', 'wa', 'frp', 'oc', 'ca', 'rm', 'lld', 'fur', 'lij', 'lmo', 'es', 'es_AR', 'es_CL', 'es_CO', 'es_CR', 'es_DO', 'es_EC', 'es_ES', 'es_GT', 'es_HN', 'es_MX', 'es_NI', 'es_PA', 'es_PE', 'es_PR', 'es_SV', 'es_UY', 'es_VE', 'pt', 'pt_br', 'pt_BR', 'pt_PT', 'gl', 'lad', 'an', 'mwl', 'it', 'it_IT', 'co', 'nap', 'scn', 'vec', 'sc', 'ro', 'la'],
 'NORTH_EU': ['de', 'nl', 'fy', 'af', 'da', 'fo', 'is', 'no', 'nb', 'nn', 'sv'],
 'SCANDINAVIA': ['da', 'fo', 'is', 'no', 'nb', 'nn', 'sv'],
 'SAMI': ['se', 'sma', 'smj', 'smn', 'sms'],
 'NORWAY': ['nb_NO', 'nb', 'nn_NO', 'nn', 'nog', 'no_nb', 'no'],
 'CELTIC': ['ga', 'cy', 'br', 'gd', 'kw', 'gv']
}

Example of translating english to many romance languages, using old-style 2 character language codes:

In [5]:
from transformers import MarianMTModel, MarianTokenizer

src_text = [
    ">>fr<< this is a sentence in english that we want to translate to french",
    ">>pt<< This should go to portuguese",
    ">>es<< And this to Spanish",
]

model_name = "Helsinki-NLP/opus-mt-en-ROMANCE"
tokenizer = MarianTokenizer.from_pretrained(model_name)

model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
print(tgt_text)



["c'est une phrase en anglais que nous voulons traduire en français", 'Isto deve ir para o português.', 'Y esto al español']


MarianConfig:

class transformers.MarianConfig:

( vocab_size = 58101decoder_vocab_size = Nonemax_position_embeddings = 1024encoder_layers = 12encoder_ffn_dim = 4096encoder_attention_heads = 16decoder_layers = 12decoder_ffn_dim = 4096decoder_attention_heads = 16encoder_layerdrop = 0.0decoder_layerdrop = 0.0use_cache = Trueis_encoder_decoder = Trueactivation_function = 'gelu'd_model = 1024dropout = 0.1attention_dropout = 0.0activation_dropout = 0.0init_std = 0.02decoder_start_token_id = 58100scale_embedding = Falsepad_token_id = 58100eos_token_id = 0forced_eos_token_id = 0share_encoder_decoder_embeddings = True**kwargs )

Parameters:

* vocab_size (int, optional, defaults to 58101) — Vocabulary size of the Marian model. Defines the number of different tokens that can be represented by the inputs_ids passed when calling MarianModel or TFMarianModel.
* d_model (int, optional, defaults to 1024) — Dimensionality of the layers and the pooler layer.
* encoder_layers (int, optional, defaults to 12) — Number of encoder layers.
* decoder_layers (int, optional, defaults to 12) — Number of decoder layers.
* encoder_attention_heads (int, optional, defaults to 16) — Number of attention heads for each attention layer in the Transformer encoder.
* decoder_attention_heads (int, optional, defaults to 16) — Number of attention heads for each attention layer in the Transformer decoder.
* decoder_ffn_dim (int, optional, defaults to 4096) — Dimensionality of the “intermediate” (often named feed-forward) layer in decoder.
* encoder_ffn_dim (int, optional, defaults to 4096) — Dimensionality of the “intermediate” (often named feed-forward) layer in decoder.
* activation_function (str or function, optional, defaults to "gelu") — The non-linear activation function (function or string) in the encoder and pooler. If string, "gelu", "relu", "silu" and "gelu_new" are supported.
* dropout (float, optional, defaults to 0.1) — The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
* attention_dropout (float, optional, defaults to 0.0) — The dropout ratio for the attention probabilities.
* activation_dropout (float, optional, defaults to 0.0) — The dropout ratio for activations inside the fully connected layer.
* max_position_embeddings (int, optional, defaults to 1024) — The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
* init_std (float, optional, defaults to 0.02) — The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
* encoder_layerdrop (float, optional, defaults to 0.0) — The LayerDrop probability for the encoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more details.
* decoder_layerdrop (float, optional, defaults to 0.0) — The LayerDrop probability for the decoder. See the [LayerDrop paper](see https://arxiv.org/abs/1909.11556) for more details.
* scale_embedding (bool, optional, defaults to False) — Scale embeddings by diving by sqrt(d_model).
* use_cache (bool, optional, defaults to True) — Whether or not the model should return the last key/values attentions (not used by all models)
* forced_eos_token_id (int, optional, defaults to 0) — The id of the token to force as the last generated token when max_length is reached. Usually set to eos_token_id.

This is the configuration class to store the configuration of a MarianModel. It is used to instantiate an Marian model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the Marian Helsinki-NLP/opus-mt-en-de architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs.

In [6]:
from transformers import MarianModel, MarianConfig

# Initializing a Marian Helsinki-NLP/opus-mt-en-de style configuration
configuration = MarianConfig()

# Initializing a model from the Helsinki-NLP/opus-mt-en-de style configuration
model = MarianModel(configuration)

# Accessing the model configuration
configuration = model.config

MarianTokenizer:

class transformers.MarianTokenizer:

( source_spmtarget_spmvocabtarget_vocab_file = Nonesource_lang = Nonetarget_lang = Noneunk_token = '<unk>'eos_token = '</s>'pad_token = '<pad>'model_max_length = 512sp_model_kwargs: Optional = Noneseparate_vocabs = False**kwargs )

Parameters:

* source_spm (str) — SentencePiece file (generally has a .spm extension) that contains the vocabulary for the source language.
* target_spm (str) — SentencePiece file (generally has a .spm extension) that contains the vocabulary for the target language.
* source_lang (str, optional) — A string representing the source language.
* target_lang (str, optional) — A string representing the target language.
* unk_token (str, optional, defaults to "<unk>") — The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead.
* eos_token (str, optional, defaults to "</s>") — The end of sequence token.
* pad_token (str, optional, defaults to "<pad>") — The token used for padding, for example when batching sequences of different lengths.
* model_max_length (int, optional, defaults to 512) — The maximum sentence length the model accepts.
* additional_special_tokens (List[str], optional, defaults to ["<eop>", "<eod>"]) — Additional special tokens used by the tokenizer.
* sp_model_kwargs (dict, optional) — Will be passed to the SentencePieceProcessor.__init__() method. The Python wrapper for SentencePiece can be used, among other things, to set:
  * enable_sampling: Enable subword regularization.

  * nbest_size: Sampling parameters for unigram. Invalid for BPE-Dropout.

    * nbest_size = {0,1}: No sampling is performed.
    * nbest_size > 1: samples from the nbest_size results.
    * nbest_size < 0: assuming that nbest_size is infinite and samples from the all hypothesis (lattice) using forward-filtering-and-backward-sampling algorithm.
  * alpha: Smoothing parameter for unigram sampling, and dropout probability of merge operations for BPE-dropout.

This tokenizer inherits from PreTrainedTokenizer which contains most of the main methods. Users should refer to this superclass for more information regarding those methods.

In [7]:
from transformers import MarianForCausalLM, MarianTokenizer

model = MarianForCausalLM.from_pretrained("Helsinki-NLP/opus-mt-en-de")
tokenizer = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-de")
src_texts = ["I am a small frog.", "Tom asked his teacher for advice."]
tgt_texts = ["Ich bin ein kleiner Frosch.", "Tom bat seinen Lehrer um Rat."]  # optional
inputs = tokenizer(src_texts, text_target=tgt_texts, return_tensors="pt", padding=True)

outputs = model(**inputs)  # should work

config.json:   0%|          | 0.00/1.33k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/298M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/768k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/797k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.27M [00:00<?, ?B/s]

build_inputs_with_special_tokens:

( token_ids_0, token_ids_1 = None )

MarianModel:

class transformers.MarianModel:

( config: MarianConfig )

Example:

In [8]:
from transformers import AutoTokenizer, MarianModel

tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-de")
model = MarianModel.from_pretrained("Helsinki-NLP/opus-mt-en-de")

inputs = tokenizer("Studies have been shown that owning a dog is good for you", return_tensors="pt")
decoder_inputs = tokenizer(
    "<pad> Studien haben gezeigt dass es hilfreich ist einen Hund zu besitzen",
    return_tensors="pt",
    add_special_tokens=False,
)
outputs = model(input_ids=inputs.input_ids, decoder_input_ids=decoder_inputs.input_ids)

last_hidden_states = outputs.last_hidden_state
list(last_hidden_states.shape)

[1, 26, 512]

MarianMTModel:

class transformers.MarianMTModel:

( config: MarianConfig )

Parameters:

* config (MarianConfig) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.

The Marian Model with a language modeling head. Can be used for summarization. This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)


This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

In [9]:
from transformers import AutoTokenizer, MarianMTModel

src = "fr"  # source language
trg = "en"  # target language

model_name = f"Helsinki-NLP/opus-mt-{src}-{trg}"
model = MarianMTModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

sample_text = "où est l'arrêt de bus ?"
batch = tokenizer([sample_text], return_tensors="pt")

generated_ids = model.generate(**batch)
tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]

"Where's the bus stop?"

MarianForCausalLM:

class transformers.MarianForCausalLM:

( config )

In [10]:
from transformers import AutoTokenizer, MarianForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-fr-en")
model = MarianForCausalLM.from_pretrained("Helsinki-NLP/opus-mt-fr-en", add_cross_attention=False)
assert model.config.is_decoder, f"{model.__class__} has to be configured as a decoder."
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

logits = outputs.logits
expected_shape = [1, inputs.input_ids.shape[-1], model.config.vocab_size]
list(logits.shape) == expected_shape

True

FlaxMarianModel:

class transformers.FlaxMarianModel:

( config: MarianConfiginput_shape: Tuple = (1, 1)seed: int = 0dtype: dtype = <class 'jax.numpy.float32'>_do_init: bool = True**kwargs )

Parameters:

* config (MarianConfig) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.
* dtype (jax.numpy.dtype, optional, defaults to jax.numpy.float32) — The data type of the computation. Can be one of jax.numpy.float32, jax.numpy.float16 (on GPUs) and jax.numpy.bfloat16 (on TPUs).

This can be used to enable mixed-precision training or half-precision inference on GPUs or TPUs. If specified all the computation will be performed with the given dtype.

The bare Marian Model transformer outputting raw hidden-states without any specific head on top. This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a Flax Linen flax.nn.Module subclass. Use it as a regular Flax Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

* Just-In-Time (JIT) compilation
* Automatic Differentiation
* Vectorization
* Parallelization

In [11]:
from transformers import AutoTokenizer, FlaxMarianModel

tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-de")
model = FlaxMarianModel.from_pretrained("Helsinki-NLP/opus-mt-en-de")

inputs = tokenizer("Hello, my dog is cute", return_tensors="jax")
outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state



flax_model.msgpack:   0%|          | 0.00/296M [00:00<?, ?B/s]

FlaxMarianMTModel:

class transformers.FlaxMarianMTModel:

( config: MarianConfiginput_shape: Tuple = (1, 1)seed: int = 0dtype: dtype = <class 'jax.numpy.float32'>_do_init: bool = True**kwargs )

Parameters:

* config (MarianConfig) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.
* dtype (jax.numpy.dtype, optional, defaults to jax.numpy.float32) — The data type of the computation. Can be one of jax.numpy.float32, jax.numpy.float16 (on GPUs) and jax.numpy.bfloat16 (on TPUs).

This can be used to enable mixed-precision training or half-precision inference on GPUs or TPUs. If specified all the computation will be performed with the given dtype.

The MARIAN Model with a language modeling head. Can be used for translation. This model inherits from FlaxPreTrainedModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads etc.)

This model is also a Flax Linen flax.nn.Module subclass. Use it as a regular Flax Module and refer to the Flax documentation for all matter related to general usage and behavior.

Finally, this model supports inherent JAX features such as:

* Just-In-Time (JIT) compilation
* Automatic Differentiation
* Vectorization
* Parallelization

In [13]:
from transformers import AutoTokenizer, FlaxMarianMTModel

model = FlaxMarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-en-de")
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-de")

text = "My friends are cool but they eat too many carbs."
input_ids = tokenizer(text, max_length=64, return_tensors="jax").input_ids

sequences = model.generate(input_ids, max_length=64, num_beams=2).sequences

outputs = tokenizer.batch_decode(sequences, skip_special_tokens=True)
# should give *Meine Freunde sind cool, aber sie essen zu viele Kohlenhydrate.*
print(outputs[0])

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Meine Freunde sind cool, aber sie essen zu viele Kohlenhydrate.
