**XGLM-564M** is a multilingual autoregressive language model (*with 564 million parameters*) trained on a balanced corpus of a diverse set of **30 languages totaling 500 billion sub-tokens.**

It was introduced in the paper [`Few-shot Learning with Multilingual Language Models`](https://arxiv.org/abs/2112.10668)

In [1]:
from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="facebook/xglm-564M",   # supports Hindi, French, Spanish, etc.
    device=0                      # set to 0 for GPU in Colab, else leave out
    )

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/546 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.13G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/433 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/4.92M [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/276 [00:00<?, ?B/s]

Device set to use cuda:0


In [2]:
# # English prompt, but ask for Hindi output
# prompt_hindi = "Write the following in Hindi: Amit, is a high impact leader, sharp in focus, relentless in drive and always step ahead in AI reasearcher journey"

# hindi_output = generator(
#     prompt_hindi,
#     max_new_tokens=100,
#     do_sample=True,
#     temperature=0.7,
#     top_p=0.9,
#     repetition_penalty=1.1,
#     no_repeat_ngram_size=3
# )

# print("Hindi Output:\n", hindi_output[0]['generated_text'])

prompt_hindi = "Write the following in Hindi language: India is a great country and"
hindi_output = generator(
    prompt_hindi,
    max_new_tokens=100,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.1,
    no_repeat_ngram_size=3
)
print("Hindi Output:\n", hindi_output[0]['generated_text'])

Hindi Output:
 Write the following in Hindi language: India is a great country and has been around for hundreds of years. If you are planning to visit this place, do not forget about tourism departments as it’ll be difficult if your budget doesn't allow that too much travel time! I will show... Read More » Posted by admin _ Category : Uncategorized Comments Off on Top 10 Places To Visit In India – Must Have Things For Your Vacation Travel Tips And Tricks The best part would always come from visiting places like Delhi or Mumbai which offer lots more things than just


# mBART-50 many to many multilingual machine translation
This model is a fine-tuned checkpoint of [**mBART-large-50**](https://huggingface.co/facebook/mbart-large-50). **`mbart-large-50-many-to-many-mmt`** is fine-tuned for multilingual machine translation. It was introduced in [**Multilingual Translation with Extensible Multilingual Pretraining and Finetuning paper**.](https://arxiv.org/abs/2008.00401)

The model can translate directly between any pair of **50 languages**. To translate into a target language, the **target language id** is forced as the **first generated token**. To force the **target language id** as the **first generated token**, pass the **`forced_bos_token_id`** parameter to the generate method.

* For **multilingual generation**, you need a **causal LM** trained on many languages (e.g. ***XGLM, BLOOM, mGPT***).
* These models don’t **“auto-translate”** but *continue text in the target language if you nudge them.*
* You can enforce the language by:
  - Adding a language instruction in the prompt (e.g., “Write in Hindi: ...”)
  - Or using a **translation model** if you want exact translation.

In [None]:
from transformers import pipeline

# Load the multilingual translation pipeline (mBART)
translator = pipeline("translation", model="facebook/mbart-large-50-many-to-many-mmt")

# English → Hindi
hindi_prompt = "India is a great country and has a rich culture."
hindi_output = translator(hindi_prompt, src_lang="en_XX", tgt_lang="hi_IN")
print("Hindi Output:\n", hindi_output[0]['translation_text'])

# English → Spanish
spanish_prompt = "Artificial intelligence is changing the future of technology."
spanish_output = translator(spanish_prompt, src_lang="en_XX", tgt_lang="es_XX")
print("\nSpanish Output:\n", spanish_output[0]['translation_text'])

# English → French
french_prompt = "The future of technology is full of exciting opportunities."
french_output = translator(french_prompt, src_lang="en_XX", tgt_lang="fr_XX")
print("\nFrench Output:\n", french_output[0]['translation_text'])


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/2.44G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/261 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/529 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/649 [00:00<?, ?B/s]

Device set to use cpu


Hindi Output:
 भारत एक महान देश है और इसकी समृद्ध संस्कृति है।

Spanish Output:
 La inteligencia artificial está cambiando el futuro de la tecnología.

French Output:
 L'avenir de la technologie est plein d'opportunités intéressantes.
