# IndicTrans2 HF Inference

We provide an example notebook on how to use our IndicTrans2 models which were originally trained with the fairseq to HuggingFace transformers for inference purpose.


## Setup

Please run the cells below to install the necessary dependencies.


In [None]:
%%capture
!git clone https://github.com/AI4Bharat/IndicTrans2.git

In [None]:
%%capture
%cd /content/IndicTrans2/huggingface_interface

In [None]:
%%capture
!python3 -m pip install nltk sacremoses pandas regex mock transformers>=4.33.2 mosestokenizer
!python3 -c "import nltk; nltk.download('punkt')"
!python3 -m pip install bitsandbytes scipy accelerate datasets
!python3 -m pip install sentencepiece

!git clone https://github.com/VarunGumma/IndicTransToolkit.git
%cd IndicTransToolkit
!python3 -m pip install --editable ./
%cd ..

**IMPORTANT : Restart your run-time first and then run the cells below.**

## Inference


In [None]:
import torch
from transformers import AutoModelForSeq2SeqLM, BitsAndBytesConfig, AutoTokenizer
from IndicTransToolkit import IndicProcessor

BATCH_SIZE = 4
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
quantization = None

In [None]:
def initialize_model_and_tokenizer(ckpt_dir, quantization):
    if quantization == "4-bit":
        qconfig = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_compute_dtype=torch.bfloat16,
        )
    elif quantization == "8-bit":
        qconfig = BitsAndBytesConfig(
            load_in_8bit=True,
            bnb_8bit_use_double_quant=True,
            bnb_8bit_compute_dtype=torch.bfloat16,
        )
    else:
        qconfig = None

    tokenizer = AutoTokenizer.from_pretrained(ckpt_dir, trust_remote_code=True)
    model = AutoModelForSeq2SeqLM.from_pretrained(
        ckpt_dir,
        trust_remote_code=True,
        low_cpu_mem_usage=True,
        quantization_config=qconfig,
    )

    if qconfig == None:
        model = model.to(DEVICE)
        if DEVICE == "cuda":
            model.half()

    model.eval()

    return tokenizer, model


def batch_translate(input_sentences, src_lang, tgt_lang, model, tokenizer, ip):
    translations = []
    for i in range(0, len(input_sentences), BATCH_SIZE):
        batch = input_sentences[i : i + BATCH_SIZE]

        # Preprocess the batch and extract entity mappings
        batch = ip.preprocess_batch(batch, src_lang=src_lang, tgt_lang=tgt_lang)

        # Tokenize the batch and generate input encodings
        inputs = tokenizer(
            batch,
            truncation=True,
            padding="longest",
            return_tensors="pt",
            return_attention_mask=True,
        ).to(DEVICE)

        # Generate translations using the model
        with torch.no_grad():
            generated_tokens = model.generate(
                **inputs,
                use_cache=True,
                min_length=0,
                max_length=256,
                num_beams=5,
                num_return_sequences=1,
            )

        # Decode the generated tokens into text

        with tokenizer.as_target_tokenizer():
            generated_tokens = tokenizer.batch_decode(
                generated_tokens.detach().cpu().tolist(),
                skip_special_tokens=True,
                clean_up_tokenization_spaces=True,
            )

        # Postprocess the translations, including entity replacement
        translations += ip.postprocess_batch(generated_tokens, lang=tgt_lang)

        del inputs
        torch.cuda.empty_cache()

    return translations

Indic To English


In [None]:
#This block will take a lot of time to run.SETUP block.
indic_en_ckpt_dir = "ai4bharat/indictrans2-indic-en-1B"  # ai4bharat/indictrans2-indic-en-dist-200M
indic_en_tokenizer, indic_en_model = initialize_model_and_tokenizer(indic_en_ckpt_dir, quantization)
!pip install -q groq



import os

from groq import Groq

client = Groq(
    #api_key=userdata.get('GROQ_API_KEY'),
    api_key="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",
)
en_indic_ckpt_dir = "ai4bharat/indictrans2-en-indic-1B"  # ai4bharat/indictrans2-en-indic-dist-200M
en_indic_tokenizer, en_indic_model = initialize_model_and_tokenizer(en_indic_ckpt_dir, quantization)



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.10k [00:00<?, ?B/s]

tokenization_indictrans.py:   0%|          | 0.00/8.13k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ai4bharat/indictrans2-indic-en-1B:
- tokenization_indictrans.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


dict.SRC.json:   0%|          | 0.00/3.39M [00:00<?, ?B/s]

dict.TGT.json:   0%|          | 0.00/645k [00:00<?, ?B/s]

model.SRC:   0%|          | 0.00/3.26M [00:00<?, ?B/s]

model.TGT:   0%|          | 0.00/759k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/96.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.37k [00:00<?, ?B/s]

configuration_indictrans.py:   0%|          | 0.00/14.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ai4bharat/indictrans2-indic-en-1B:
- configuration_indictrans.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_indictrans.py:   0%|          | 0.00/79.8k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ai4bharat/indictrans2-indic-en-1B:
- modeling_indictrans.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/4.09G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/163 [00:00<?, ?B/s]

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/126.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m126.7/126.7 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25h

tokenizer_config.json:   0%|          | 0.00/1.10k [00:00<?, ?B/s]

tokenization_indictrans.py:   0%|          | 0.00/8.13k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ai4bharat/indictrans2-en-indic-1B:
- tokenization_indictrans.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


dict.SRC.json:   0%|          | 0.00/645k [00:00<?, ?B/s]

dict.TGT.json:   0%|          | 0.00/3.39M [00:00<?, ?B/s]

model.SRC:   0%|          | 0.00/759k [00:00<?, ?B/s]

model.TGT:   0%|          | 0.00/3.26M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/96.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.37k [00:00<?, ?B/s]

configuration_indictrans.py:   0%|          | 0.00/14.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ai4bharat/indictrans2-en-indic-1B:
- configuration_indictrans.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_indictrans.py:   0%|          | 0.00/79.8k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ai4bharat/indictrans2-en-indic-1B:
- modeling_indictrans.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/4.46G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/163 [00:00<?, ?B/s]

In [None]:
# prompt: def translate_and_chat(input_text):
#     # First translation from Marathi to English
#     hi_sents = [input_text]
#     src_lang, tgt_lang = "mar_Deva", "eng_Latn"
#     en_translations = batch_translate(hi_sents, src_lang, tgt_lang, indic_en_model, indic_en_tokenizer, ip)
#     # Chat model interaction
#     chat_completion = client.chat.completions.create(
#         messages=[
#             {
#                 "role": "user",
#                 "content": en_translations[0],
#             }
#         ],
#         model="llama3-70b-8192",
#     )
#     # Getting the chat model's output
#     chat_output = chat_completion.choices[0].message.content
#     # Second translation from English back to Marathi
#     en_sents = [chat_output]
#     src_lang, tgt_lang = "eng_Latn", "mar_Deva"
#     hi_translations = batch_translate(en_sents, src_lang, tgt_lang, en_indic_model, en_indic_tokenizer, ip)
#     # Return the final output
#     return hi_translations[0] use this as it is and print its output here



# Assuming necessary variables (indic_en_model, indic_en_tokenizer, etc.) are defined as in the provided code

# Initialize IndicProcessor

def translate_and_chat(input_text):
    # First translation from Marathi to English
    ip = IndicProcessor()
    hi_sents = [input_text]
    src_lang, tgt_lang = "mar_Deva", "eng_Latn"
    en_translations = batch_translate(hi_sents, src_lang, tgt_lang, indic_en_model, indic_en_tokenizer, ip)
    # Chat model interaction
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": en_translations[0],
            }
        ],
        model="llama3-70b-8192",
    )
    # Getting the chat model's output
    chat_output = chat_completion.choices[0].message.content
    # Second translation from English back to Marathi
    en_sents = [chat_output]
    src_lang, tgt_lang = "eng_Latn", "mar_Deva"
    hi_translations = batch_translate(en_sents, src_lang, tgt_lang, en_indic_model, en_indic_tokenizer, ip)
    # Return the final output
    return hi_translations[0]

# Example usage:
# Assuming 'input_text' is defined somewhere
input_text = "नमस्कार, कसे आहात?" # Example Marathi text
output = translate_and_chat(input_text)
output



'मी फक्त एक भाषा आदर्श आहे, मला माणसांसारख्या भावना किंवा भावना नाहीत, म्हणून माझे दिवस चांगले किंवा वाईट नसतात. तथापि, मी तुम्हाला कोणतेही प्रश्न किंवा कामे करण्यास मदत करण्यासाठी येथे आहे, म्हणून कृपया मला काहीही विचारू नका! '

### English to Indic Example


In [None]:
!pip install gradio --upgrade

import gradio as gr
from gradio import Textbox, Interface



def translate_and_chat(input_text):
    ip = IndicProcessor()
    # First translation from Marathi to English
    hi_sents = [input_text]
    src_lang, tgt_lang = "mar_Deva", "eng_Latn"
    en_translations = batch_translate(hi_sents, src_lang, tgt_lang, indic_en_model, indic_en_tokenizer, ip)

    # Chat model interaction
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": en_translations[0],
            }
        ],
        model="llama3-70b-8192",
    )

    # Getting the chat model's output
    chat_output = chat_completion.choices[0].message.content

    # Second translation from English back to Marathi
    en_sents = [chat_output]
    src_lang, tgt_lang = "eng_Latn", "mar_Deva"
    hi_translations = batch_translate(en_sents, src_lang, tgt_lang, en_indic_model, en_indic_tokenizer, ip)

    # Return the final output
    return hi_translations[0]

# Define the Gradio interface
interface = Interface(
    fn=translate_and_chat,
    inputs=Textbox(lines=2, placeholder="Type your text here...", label="Enter Text in Marathi"),
    outputs=Textbox(label="Response in Marathi"),
    title="Marathi Chatbot",
    description="Enter a sentence in Marathi to chat with the bot.",
    article="<p style='text-align: center;'><img src='your_placeholder_image_url' alt='Placeholder Image' width='300'/></p>",
)


# Launch the interface
interface.launch()




Collecting gradio
  Downloading gradio-5.24.0-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<25.0,>=22.0 (from gradio)
  Downloading aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.8.0 (from gradio)
  Downloading gradio_client-1.8.0-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting ruff>=0.9.3 (from gradio)
  Downloading ruff-0.11.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (25 kB)
Collecting safehttpx<0.2.0,>=0.1.6 (



In [None]:
!pip install gradio --upgrade

import gradio as gr
from gradio import Textbox, Dropdown

# Flores codes to language names mapping
LANGUAGE_NAMES = {
    "asm_Beng": "Assamese",
    "awa_Deva": "Awadhi",
    "ben_Beng": "Bengali",
    "bho_Deva": "Bhojpuri",
    "brx_Deva": "Bodo",
    "doi_Deva": "Dogri",
    "eng_Latn": "English",
    "gom_Deva": "Konkani",
    "guj_Gujr": "Gujarati",
    "hin_Deva": "Hindi",
    "hne_Deva": "Chhattisgarhi",
    "kan_Knda": "Kannada",
    "kas_Arab": "Kashmiri (Arabic)",
    "kas_Deva": "Kashmiri (Devanagari)",
    "kha_Latn": "Khasi",
    "lus_Latn": "Mizo",
    "mag_Deva": "Magahi",
    "mai_Deva": "Maithili",
    "mal_Mlym": "Malayalam",
    "mar_Deva": "Marathi",
    "mni_Beng": "Manipuri (Bengali)",
    "mni_Mtei": "Manipuri (Meitei)",
    "npi_Deva": "Nepali",
    "ory_Orya": "Odia",
    "pan_Guru": "Punjabi",
    "san_Deva": "Sanskrit",
    "sat_Olck": "Santali",
    "snd_Arab": "Sindhi (Arabic)",
    "snd_Deva": "Sindhi (Devanagari)",
    "tam_Taml": "Tamil",
    "tel_Telu": "Telugu",
    "urd_Arab": "Urdu"
}

def translate_and_chat(input_text, input_language):
    ip = IndicProcessor(inference=True)

    # Get language code from the selected language name
    lang_code = [code for code, name in LANGUAGE_NAMES.items() if name == input_language][0]

    # First translation from selected language to English
    input_sents = [input_text]
    en_translations = batch_translate(input_sents, lang_code, "eng_Latn", indic_en_model, indic_en_tokenizer, ip)

    # Chat model interaction
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": en_translations[0],
            }
        ],
        model="llama3-70b-8192",
    )

    # Getting the chat model's output
    chat_output = chat_completion.choices[0].message.content

    # Second translation from English back to original language
    en_sents = [chat_output]
    hi_translations = batch_translate(en_sents, "eng_Latn", lang_code, en_indic_model, en_indic_tokenizer, ip)

    # Return the final output
    return hi_translations[0]

# Sort languages alphabetically for the dropdown
sorted_languages = sorted(LANGUAGE_NAMES.values())

# Define the Gradio interface with language selector
interface = gr.Interface(
    fn=translate_and_chat,
    inputs=[
        Textbox(lines=2, placeholder="Type your text here...", label="Input Text"),
        Dropdown(
            choices=sorted_languages,
            value="Marathi",
            label="Select Input Language"
        )
    ],
    outputs=Textbox(label="Response"),
    title="Multilingual Indian Language Chatbot",
    description="Select your language and enter text to chat with the bot.",
    examples=[
        ["पंतप्रधान नरेंद्र मोदींबद्दल तुम्हाला काय वाटतं?", "Marathi"],
        ["प्रधानमंत्री नरेंद्र मोदी के बारे में आप क्या सोचते हैं?", "Hindi"],
        ["What do you think about technology in India?", "English"],
        ["ভারতের প্রযুক্তি সম্পর্কে আপনি কী মনে করেন?", "Bengali"]
    ],
    theme="soft",
    allow_flagging="never"
)

# Launch the interface
interface.launch(share=True)





Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://886237ff453b3be403.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


