# IndicTrans2 HF Inference

We provide an example notebook on how to use our IndicTrans2 models which were originally trained with the fairseq to HuggingFace transformers for inference purpose.


## Setup

Please run the cells below to install the necessary dependencies.


In [1]:
%%capture
!git clone https://github.com/AI4Bharat/IndicTrans2.git

In [2]:
%%capture
%cd /content/IndicTrans2/huggingface_interface

In [3]:
%%capture
!python3 -m pip install nltk sacremoses pandas regex mock transformers>=4.33.2 mosestokenizer
!python3 -c "import nltk; nltk.download('punkt')"
!python3 -m pip install bitsandbytes scipy accelerate datasets
!python3 -m pip install sentencepiece

!git clone https://github.com/VarunGumma/IndicTransToolkit.git
%cd IndicTransToolkit
!python3 -m pip install --editable ./
%cd ..

**IMPORTANT : Restart your run-time first and then run the cells below.**

## Inference


In [1]:
import torch
from transformers import AutoModelForSeq2SeqLM, BitsAndBytesConfig, AutoTokenizer
from IndicTransToolkit.processor import IndicProcessor

BATCH_SIZE = 4
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
quantization = None

In [2]:
def initialize_model_and_tokenizer(ckpt_dir, quantization):
    if quantization == "4-bit":
        qconfig = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_compute_dtype=torch.bfloat16,
        )
    elif quantization == "8-bit":
        qconfig = BitsAndBytesConfig(
            load_in_8bit=True,
            bnb_8bit_use_double_quant=True,
            bnb_8bit_compute_dtype=torch.bfloat16,
        )
    else:
        qconfig = None

    tokenizer = AutoTokenizer.from_pretrained(ckpt_dir, trust_remote_code=True)
    model = AutoModelForSeq2SeqLM.from_pretrained(
        ckpt_dir,
        trust_remote_code=True,
        low_cpu_mem_usage=True,
        quantization_config=qconfig,
    )

    if qconfig == None:
        model = model.to(DEVICE)
        if DEVICE == "cuda":
            model.half()

    model.eval()

    return tokenizer, model


def batch_translate(input_sentences, src_lang, tgt_lang, model, tokenizer, ip):
    translations = []
    for i in range(0, len(input_sentences), BATCH_SIZE):
        batch = input_sentences[i : i + BATCH_SIZE]

        # Preprocess the batch and extract entity mappings
        batch = ip.preprocess_batch(batch, src_lang=src_lang, tgt_lang=tgt_lang)

        # Tokenize the batch and generate input encodings
        inputs = tokenizer(
            batch,
            truncation=True,
            padding="longest",
            return_tensors="pt",
            return_attention_mask=True,
        ).to(DEVICE)


        # Generate translations using the model
        with torch.no_grad():
            generated_tokens = model.generate(
                **inputs,
                use_cache=False,
                min_length=0,
                max_length=256,
                num_beams=5,
                num_return_sequences=1,
            )

        # Decode the generated tokens into text
        generated_tokens = tokenizer.batch_decode(
            generated_tokens,
            skip_special_tokens=True,
            clean_up_tokenization_spaces=True,
        )

        # Postprocess the translations, including entity replacement
        translations += ip.postprocess_batch(generated_tokens, lang=tgt_lang)

        del inputs
        torch.cuda.empty_cache()

    return translations

### English to Indic Example


In [3]:
#ai4bharat/indictrans2-en-indic-1B

en_indic_ckpt_dir = "ai4bharat/indictrans2-en-indic-1B"
en_indic_tokenizer, en_indic_model = initialize_model_and_tokenizer(en_indic_ckpt_dir, quantization)

ip = IndicProcessor(inference=True)

en_sents = [
   """
3.	Possession:
The SELLER hereby delivers physical and lawful possession of the Said Property to the BUYER on the execution of this Deed.
4.	Title and Warranty:
The SELLER assures that the Said Property is free from all encumbrances, legal disputes, claims, or mortgages. The SELLER further agrees to indemnify the BUYER against any loss arising out of any defect in the title of the Said Property.


""",
]

src_lang, tgt_lang = "eng_Latn", "mar_Deva"
hi_translations = batch_translate(en_sents, src_lang, tgt_lang, en_indic_model, en_indic_tokenizer, ip)

print(f"\n{src_lang} - {tgt_lang}")
for input_sentence, translation in zip(en_sents, hi_translations):
    print(f"{src_lang}: {input_sentence}")
    print(f"{tgt_lang}: {translation}")

# flush the models to free the GPU memory
del en_indic_tokenizer, en_indic_model

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenization_indictrans.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ai4bharat/indictrans2-en-indic-1B:
- tokenization_indictrans.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


dict.SRC.json: 0.00B [00:00, ?B/s]

dict.TGT.json: 0.00B [00:00, ?B/s]

model.SRC:   0%|          | 0.00/759k [00:00<?, ?B/s]

model.TGT:   0%|          | 0.00/3.26M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/96.0 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

configuration_indictrans.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ai4bharat/indictrans2-en-indic-1B:
- configuration_indictrans.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_indictrans.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ai4bharat/indictrans2-en-indic-1B:
- modeling_indictrans.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/4.46G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/163 [00:00<?, ?B/s]


eng_Latn - mar_Deva
eng_Latn: 
3.	Possession:
The SELLER hereby delivers physical and lawful possession of the Said Property to the BUYER on the execution of this Deed.
4.	Title and Warranty:
The SELLER assures that the Said Property is free from all encumbrances, legal disputes, claims, or mortgages. The SELLER further agrees to indemnify the BUYER against any loss arising out of any defect in the title of the Said Property.



mar_Deva: 3. ताबाः विक्रेता याद्वारे या कराराच्या अंमलबजावणीवर खरेदीदाराला उक्त मालमत्तेचा प्रत्यक्ष आणि कायदेशीर ताबा देतो. 4. मालकी हक्क आणि हमीः विक्रेता खात्री देतो की उक्त मालमत्ता सर्व अडचणी, कायदेशीर वाद, दावे किंवा गहाणांपासून मुक्त आहे. विक्रेता पुढे खरेदीदाराला उक्त मालमत्तेच्या मालकी हक्कातील कोणत्याही दोषामुळे उद्भवणाऱ्या कोणत्याही नुकसानाची भरपाई करण्यास सहमत होतो.


In [4]:
import sacrebleu

# Your model-generated translations
translations = [
    "  3. ताबाः विक्रेता याद्वारे या कराराच्या अंमलबजावणीवर खरेदीदाराला उक्त मालमत्तेचा प्रत्यक्ष आणि कायदेशीर ताबा देतो. 4. मालकी हक्क आणि हमीः विक्रेता खात्री देतो की उक्त मालमत्ता सर्व अडचणी, कायदेशीर वाद, दावे किंवा गहाणांपासून मुक्त आहे. विक्रेता पुढे खरेदीदाराला उक्त मालमत्तेच्या मालकी हक्कातील कोणत्याही दोषामुळे उद्भवणाऱ्या कोणत्याही नुकसानाची भरपाई करण्यास सहमत होतो."
]

# Reference translations (ground truth)
references = [
     """3.	ताबा:
विक्रेता याद्वारे या कराराच्या अंमलबजावणीनंतर खरेदीदाराला उक्त मालमत्तेचा भौतिक आणि कायदेशीर ताबा देतो.
4.	मालकी हक्क आणि हमी:
विक्रेता खात्री देतो की सदर मालमत्ता सर्व भार, कायदेशीर वाद, दावे किंवा गहाणखतांपासून मुक्त आहे. विक्रेता सदर मालमत्तेच्या मालकी हक्कातील कोणत्याही दोषामुळे उद्भवणाऱ्या कोणत्याही नुकसानाविरुद्ध खरेदीदाराला भरपाई देण्यास सहमत आहे.


"""
]

# BLEU score
bleu = sacrebleu.corpus_bleu(translations, [references])
print("BLEU score:", bleu.score)

#Corpus bleu score
bleu = sacrebleu.corpus_bleu(translations, [references])
print(f"Corpus BLEU score = {bleu.score:.2f}")

# ChrF score
chrf = sacrebleu.corpus_chrf(translations, [references])
print("ChrF score:", chrf.score)


BLEU score: 48.81257925109376
Corpus BLEU score = 48.81
ChrF score: 80.77642186335538


In [5]:
#ai4bharat/indictrans2-en-indic-dist-200M

en_indic_ckpt_dir = "ai4bharat/indictrans2-en-indic-dist-200M"
en_indic_tokenizer, en_indic_model = initialize_model_and_tokenizer(en_indic_ckpt_dir, quantization)

ip = IndicProcessor(inference=True)

en_sents = [
   """
3.	Possession:
The SELLER hereby delivers physical and lawful possession of the Said Property to the BUYER on the execution of this Deed.
4.	Title and Warranty:
The SELLER assures that the Said Property is free from all encumbrances, legal disputes, claims, or mortgages. The SELLER further agrees to indemnify the BUYER against any loss arising out of any defect in the title of the Said Property.


""",
]

src_lang, tgt_lang = "eng_Latn", "mar_Deva"
hi_translations = batch_translate(en_sents, src_lang, tgt_lang, en_indic_model, en_indic_tokenizer, ip)

print(f"\n{src_lang} - {tgt_lang}")
for input_sentence, translation in zip(en_sents, hi_translations):
    print(f"{src_lang}: {input_sentence}")
    print(f"{tgt_lang}: {translation}")

# flush the models to free the GPU memory
del en_indic_tokenizer, en_indic_model

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenization_indictrans.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ai4bharat/indictrans2-en-indic-dist-200M:
- tokenization_indictrans.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


dict.SRC.json: 0.00B [00:00, ?B/s]

dict.TGT.json: 0.00B [00:00, ?B/s]

model.SRC:   0%|          | 0.00/759k [00:00<?, ?B/s]

model.TGT:   0%|          | 0.00/3.26M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/96.0 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

configuration_indictrans.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ai4bharat/indictrans2-en-indic-dist-200M:
- configuration_indictrans.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_indictrans.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ai4bharat/indictrans2-en-indic-dist-200M:
- modeling_indictrans.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/1.10G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/163 [00:00<?, ?B/s]


eng_Latn - mar_Deva
eng_Latn: 
3.	Possession:
The SELLER hereby delivers physical and lawful possession of the Said Property to the BUYER on the execution of this Deed.
4.	Title and Warranty:
The SELLER assures that the Said Property is free from all encumbrances, legal disputes, claims, or mortgages. The SELLER further agrees to indemnify the BUYER against any loss arising out of any defect in the title of the Said Property.



mar_Deva: 3. मालकीः विक्रेता याद्वारे या कराराच्या अंमलबजावणीनंतर उक्त मालमत्तेचा प्रत्यक्ष आणि कायदेशीर ताबा खरेदीदाराला देतो. 4. शीर्षक आणि हमीः विक्रेता आश्वासन देतो की उक्त मालमत्ता सर्व अडचणी, कायदेशीर विवाद, दावे किंवा गहाणांपासून मुक्त आहे. विक्रेता पुढे खरेदीदाराला उक्त मालमत्तेच्या मालकीतील कोणत्याही दोषामुळे होणाऱ्या कोणत्याही नुकसानासाठी नुकसानभरपाई देण्यास सहमत होतो.


In [6]:
import sacrebleu

# Your model-generated translations
translations = [
    "   3. मालकीः विक्रेता याद्वारे या कराराच्या अंमलबजावणीनंतर उक्त मालमत्तेचा प्रत्यक्ष आणि कायदेशीर ताबा खरेदीदाराला देतो. 4. शीर्षक आणि हमीः विक्रेता आश्वासन देतो की उक्त मालमत्ता सर्व अडचणी, कायदेशीर विवाद, दावे किंवा गहाणांपासून मुक्त आहे. विक्रेता पुढे खरेदीदाराला उक्त मालमत्तेच्या मालकीतील कोणत्याही दोषामुळे होणाऱ्या कोणत्याही नुकसानासाठी नुकसानभरपाई देण्यास सहमत होतो."
]

# Reference translations (ground truth)
references = [
     """3.	ताबा:
विक्रेता याद्वारे या कराराच्या अंमलबजावणीनंतर खरेदीदाराला उक्त मालमत्तेचा भौतिक आणि कायदेशीर ताबा देतो.
4.	मालकी हक्क आणि हमी:
विक्रेता खात्री देतो की सदर मालमत्ता सर्व भार, कायदेशीर वाद, दावे किंवा गहाणखतांपासून मुक्त आहे. विक्रेता सदर मालमत्तेच्या मालकी हक्कातील कोणत्याही दोषामुळे उद्भवणाऱ्या कोणत्याही नुकसानाविरुद्ध खरेदीदाराला भरपाई देण्यास सहमत आहे.


"""
]

# BLEU score
bleu = sacrebleu.corpus_bleu(translations, [references])
print("BLEU score:", bleu.score)

#Corpus bleu score
bleu = sacrebleu.corpus_bleu(translations, [references])
print(f"Corpus BLEU score = {bleu.score:.2f}")

# ChrF score
chrf = sacrebleu.corpus_chrf(translations, [references])
print("ChrF score:", chrf.score)


BLEU score: 23.20734352594993
Corpus BLEU score = 23.21
ChrF score: 72.60613198038888


In [7]:
#ai4bharat/indictrans2-indic-indic-1B

en_indic_ckpt_dir = "ai4bharat/indictrans2-indic-indic-1B"
en_indic_tokenizer, en_indic_model = initialize_model_and_tokenizer(en_indic_ckpt_dir, quantization)

ip = IndicProcessor(inference=True)

en_sents = [
   """
3.	Possession:
The SELLER hereby delivers physical and lawful possession of the Said Property to the BUYER on the execution of this Deed.
4.	Title and Warranty:
The SELLER assures that the Said Property is free from all encumbrances, legal disputes, claims, or mortgages. The SELLER further agrees to indemnify the BUYER against any loss arising out of any defect in the title of the Said Property.


""",
]

src_lang, tgt_lang = "eng_Latn", "mar_Deva"
hi_translations = batch_translate(en_sents, src_lang, tgt_lang, en_indic_model, en_indic_tokenizer, ip)

print(f"\n{src_lang} - {tgt_lang}")
for input_sentence, translation in zip(en_sents, hi_translations):
    print(f"{src_lang}: {input_sentence}")
    print(f"{tgt_lang}: {translation}")

# flush the models to free the GPU memory
del en_indic_tokenizer, en_indic_model

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenization_indictrans.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ai4bharat/indictrans2-indic-indic-1B:
- tokenization_indictrans.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


dict.SRC.json: 0.00B [00:00, ?B/s]

dict.TGT.json: 0.00B [00:00, ?B/s]

model.SRC:   0%|          | 0.00/3.26M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/96.0 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

configuration_indictrans.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ai4bharat/indictrans2-indic-indic-1B:
- configuration_indictrans.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_indictrans.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ai4bharat/indictrans2-indic-indic-1B:
- modeling_indictrans.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/4.83G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/168 [00:00<?, ?B/s]


eng_Latn - mar_Deva
eng_Latn: 
3.	Possession:
The SELLER hereby delivers physical and lawful possession of the Said Property to the BUYER on the execution of this Deed.
4.	Title and Warranty:
The SELLER assures that the Said Property is free from all encumbrances, legal disputes, claims, or mortgages. The SELLER further agrees to indemnify the BUYER against any loss arising out of any defect in the title of the Said Property.



mar_Deva: 3. पदाधिकाराः या कराराच्या अंमलबजावणीवर खरेदीदाराला विक्रीदाराच्या मालमत्तेची भौतिक आणि कायदेशीर मालमत्ता विक्रेता तेथे देतो. शीर्षक आणि हमीः विक्रीदाराची खात्री आहे की खरेदीदाराची मालमत्ता सर्व जप्ती, कायदेशीर विवाद, दावे किंवा गहाणवटींपासून मुक्त आहे.


In [8]:
import sacrebleu

# Your model-generated translations
translations = [
    "    3. पदाधिकाराः या कराराच्या अंमलबजावणीवर खरेदीदाराला विक्रीदाराच्या मालमत्तेची भौतिक आणि कायदेशीर मालमत्ता विक्रेता तेथे देतो. शीर्षक आणि हमीः विक्रीदाराची खात्री आहे की खरेदीदाराची मालमत्ता सर्व जप्ती, कायदेशीर विवाद, दावे किंवा गहाणवटींपासून मुक्त आहे."
]

# Reference translations (ground truth)
references = [
     """3.	ताबा:
विक्रेता याद्वारे या कराराच्या अंमलबजावणीनंतर खरेदीदाराला उक्त मालमत्तेचा भौतिक आणि कायदेशीर ताबा देतो.
4.	मालकी हक्क आणि हमी:
विक्रेता खात्री देतो की सदर मालमत्ता सर्व भार, कायदेशीर वाद, दावे किंवा गहाणखतांपासून मुक्त आहे. विक्रेता सदर मालमत्तेच्या मालकी हक्कातील कोणत्याही दोषामुळे उद्भवणाऱ्या कोणत्याही नुकसानाविरुद्ध खरेदीदाराला भरपाई देण्यास सहमत आहे.


"""
]

# BLEU score
bleu = sacrebleu.corpus_bleu(translations, [references])
print("BLEU score:", bleu.score)

#Corpus bleu score
bleu = sacrebleu.corpus_bleu(translations, [references])
print(f"Corpus BLEU score = {bleu.score:.2f}")

# ChrF score
chrf = sacrebleu.corpus_chrf(translations, [references])
print("ChrF score:", chrf.score)


BLEU score: 7.108241534746647
Corpus BLEU score = 7.11
ChrF score: 46.895347129318566


In [9]:
#ai4bharat/indictrans2-indic-indic-dist-320M

en_indic_ckpt_dir = "ai4bharat/indictrans2-indic-indic-dist-320M"
en_indic_tokenizer, en_indic_model = initialize_model_and_tokenizer(en_indic_ckpt_dir, quantization)

ip = IndicProcessor(inference=True)

en_sents = [
   """
3.	Possession:
The SELLER hereby delivers physical and lawful possession of the Said Property to the BUYER on the execution of this Deed.
4.	Title and Warranty:
The SELLER assures that the Said Property is free from all encumbrances, legal disputes, claims, or mortgages. The SELLER further agrees to indemnify the BUYER against any loss arising out of any defect in the title of the Said Property.


""",
]

src_lang, tgt_lang = "eng_Latn", "mar_Deva"
hi_translations = batch_translate(en_sents, src_lang, tgt_lang, en_indic_model, en_indic_tokenizer, ip)

print(f"\n{src_lang} - {tgt_lang}")
for input_sentence, translation in zip(en_sents, hi_translations):
    print(f"{src_lang}: {input_sentence}")
    print(f"{tgt_lang}: {translation}")

# flush the models to free the GPU memory
del en_indic_tokenizer, en_indic_model

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenization_indictrans.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ai4bharat/indictrans2-indic-indic-dist-320M:
- tokenization_indictrans.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


dict.SRC.json: 0.00B [00:00, ?B/s]

dict.TGT.json: 0.00B [00:00, ?B/s]

model.SRC:   0%|          | 0.00/3.26M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/96.0 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

configuration_indictrans.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ai4bharat/indictrans2-indic-indic-dist-320M:
- configuration_indictrans.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_indictrans.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/ai4bharat/indictrans2-indic-indic-dist-320M:
- modeling_indictrans.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/1.28G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/168 [00:00<?, ?B/s]


eng_Latn - mar_Deva
eng_Latn: 
3.	Possession:
The SELLER hereby delivers physical and lawful possession of the Said Property to the BUYER on the execution of this Deed.
4.	Title and Warranty:
The SELLER assures that the Said Property is free from all encumbrances, legal disputes, claims, or mortgages. The SELLER further agrees to indemnify the BUYER against any loss arising out of any defect in the title of the Said Property.



mar_Deva: 3. Pssssion: द सेलर हे बायरच्या निष्पादनावर बायरच्या बाजूच्या मालमत्तेचे भौतिक आणि कायदेशीर मालकीचे वितरण करते. 4. शीर्षक आणि वॉरंटीः द सेलर हमी देते की बायरची मालमत्ता सर्व करार, कायदेशीर वाद, दावे किंवा गहाण ठेवण्यापासून मुक्त आहे. द सेलर पुढील करार करते. बायरच्या विरोधात कोणत्याही नुकसानीच्या जोखमीच्या बाहेर कोणत्याही सदोष मालमत्तेच्या शीर्षकासह.


In [10]:
import sacrebleu

# Your model-generated translations
translations = [
    "    3. Pssssion: द सेलर हे बायरच्या निष्पादनावर बायरच्या बाजूच्या मालमत्तेचे भौतिक आणि कायदेशीर मालकीचे वितरण करते. 4. शीर्षक आणि वॉरंटीः द सेलर हमी देते की बायरची मालमत्ता सर्व करार, कायदेशीर वाद, दावे किंवा गहाण ठेवण्यापासून मुक्त आहे. द सेलर पुढील करार करते. बायरच्या विरोधात कोणत्याही नुकसानीच्या जोखमीच्या बाहेर कोणत्याही सदोष मालमत्तेच्या शीर्षकासह."
]

# Reference translations (ground truth)
references = [
     """3.	ताबा:
विक्रेता याद्वारे या कराराच्या अंमलबजावणीनंतर खरेदीदाराला उक्त मालमत्तेचा भौतिक आणि कायदेशीर ताबा देतो.
4.	मालकी हक्क आणि हमी:
विक्रेता खात्री देतो की सदर मालमत्ता सर्व भार, कायदेशीर वाद, दावे किंवा गहाणखतांपासून मुक्त आहे. विक्रेता सदर मालमत्तेच्या मालकी हक्कातील कोणत्याही दोषामुळे उद्भवणाऱ्या कोणत्याही नुकसानाविरुद्ध खरेदीदाराला भरपाई देण्यास सहमत आहे.


"""
]

# BLEU score
bleu = sacrebleu.corpus_bleu(translations, [references])
print("BLEU score:", bleu.score)

#Corpus bleu score
bleu = sacrebleu.corpus_bleu(translations, [references])
print(f"Corpus BLEU score = {bleu.score:.2f}")

# ChrF score
chrf = sacrebleu.corpus_chrf(translations, [references])
print("ChrF score:", chrf.score)


BLEU score: 15.575476078956111
Corpus BLEU score = 15.58
ChrF score: 43.73000858819199


| Model                         | Bleu Score | Corpus Bleu Score | ChrF Score |
|-------------------------------|------------|-------------------|------------|
| indictrans2-en-indic-1B       | 48.81      | 48.81             | 80.77      |
| indictrans2-en-indic-dist-200M| 23.20      | 23.21             | 72.60      |
| indictrans2-indic-indic-1B    | 7.10       | 7.11              | 46.89      |
| indictrans2-indic-indic-320M  | 15.57      | 15.58             | 43.73      |


