# Testing the NMT model

Congratulations! You have successfully trained your first NMT model using NVIDIA NeMo. To generate outputs for your input sentences, use the trained `.nemo model` for inference.

In [None]:
# Change the path to where the .nemo model is saved
path_to_nemo_model = "../model/AAYNBase.nemo"

## Inference on a sample file

We have a sample `test.txt` that contains a few English sentences that we can try to translate with our trained model.

In [None]:
!cat test.txt

To generate predictions, we'll use the `nmt_transformer_infer.py` and specify the path to the nemo model and the sample file.

In [None]:
!python nmt_transformer_infer.py --model=$path_to_nemo_model --srctext=test.txt --tgtout=test.output --beam_size=5

Let's look at the translated output.

In [None]:
!cat test.output

## Downloading test dataset

Manually download the dataset from [Samanantar's website](https://indicnlp.ai4bharat.org/samanantar/#mirror-links).
The mirrored link from [google drive](https://drive.google.com/drive/folders/1hR-8Mc7qQWsZAC-cw-nUqG8_OCqCdq-b?usp=sharing) is most stable.

 Download the `benchmarks.zip` file to a local directory and extract the contents. The folders of interest are `benchmarks\wat2020-devtest\en-hi`, `benchmarks\wat2021-devtest` and `benchmarks\wmt-news\en-hi`. These folders contain the source sentences along with their ground truth translations. The ground truth translations are used to calculate the metric, which is used to rate the quality of the translation.

In [None]:
# Change the path to where the benchmarks folder in downloaded and unziped
path_to_benchmarks_folder = 'benchmarks'

## Calculating sacreBLEU on test datasets

We can use the same `nmt_transformer_infer.py` script to generate predictions for the 3 test datasets - WAT2020, WAT2021, and WMT.

Make sure that the path to the benchmarks folder is set correctly!

 Here, we use the Bilingual Evaluation Understudy or BLEU score. The BLEU score works by counting matching n-grams in the candidate and reference text, where each token is represented by a 1-gram or unigram, and each word pair is represented by a bigram comparison.

In [None]:
# Generating predictions for WAT2020 test set
!python nmt_transformer_infer.py --model=$path_to_nemo_model --srctext=$path_to_benchmarks_folder/wat2020-devtest/en-hi/test.en --tgtout=wat2020_test_out.pre

In [None]:
# Generating predictions for WAT2021 test set
!python nmt_transformer_infer.py --model=$path_to_nemo_model --srctext=$path_to_benchmarks_folder/wat2021-devtest/test.en --tgtout=wat2021_test_out.pre

In [None]:
# Generating predictions for WMT test set
!python nmt_transformer_infer.py --model=$path_to_nemo_model --srctext=$path_to_benchmarks_folder/wmt-news/en-hi/test.en --tgtout=wmt-news_test_out.pre

In [None]:
# Installing sacrebleu library
!pip install sacrebleu

In [None]:
# Calculating sacreBLEU for WAT2020
!sacrebleu $path_to_benchmarks_folder/wat2020-devtest/en-hi/test.hi -i wat2020_test_out.pre --score-only

In [None]:
# Calculating sacreBLEU for WAT2021
!sacrebleu $path_to_benchmarks_folder/wat2021-devtest/test.hi -i wat2021_test_out.pre --score-only

In [None]:
# Calculating sacreBLEU for WMT-News
!sacrebleu $path_to_benchmarks_folder/wmt-news/en-hi/test.hi -i wmt-news_test_out.pre --score-only