# Evaluating translations with BLEU - Review

In this notebook, we will use the BLEU metric to compare the quality of two different approaches for performing translations.

I will translate a few lines from the beginning of this chapter from English to Spanish. My translations will be taken as the reference translations. In other words, they will be used as the basis upon which the quality of the automatic translations will be determined.



In [None]:
#Sentences to Translate.
sentences = [
    "In the previous chapters, you've mainly seen how to work with OpenAI models, and you've had a very practical introduction to Hugging Face's open-source models, the use of embeddings, vector databases, and agents.",
    "These have been very practical chapters in which I've tried to gradually introduce concepts that have allowed you, or at least I hope so, to scale up your knowledge and start creating projects using the current technology stack of large language models."
    ]

In [None]:
#Spanish Translation References.
reference_translations = [
    ["En los capítulos anteriores has visto mayoritariamente como trabajar con los modelos de OpenAI, y has tenido una introducción muy práctica a los modelos Open Source de Hugging Face, al uso de embeddings, las bases de datos vectoriales, los agentes."],
    ["Han sido capítulos muy prácticos en los que he intentado ir introduciendo conceptos que te han permitido, o eso espero, ir escalando en tus conocimientos y empezar a crear proyectos usando el stack tecnológico actual de los grandes modelos de lenguaje."]
    ]

We will perform the first translation using the NLLB model, a small model specialized in performing translations, which we will retrieve from Hugging Face.

In [None]:
import transformers
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
model_id = "facebook/nllb-200-distilled-600M"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

When creating the pipeline, we pass the source language and the target language of the translation to it.

In [None]:
translator = pipeline('translation', model=model, tokenizer=tokenizer,
                        src_lang="eng_Latn", tgt_lang="spa_Latn")

In [None]:
import os
os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'

translations_nllb = []

for text in sentences:
  print ("to translate: " + text)
  translation = ""
  translation = translator(text)

  #Add the summary to summaries list
  translations_nllb += translation[0].values()

Now we have the translations stored in the list 'translations_nllb'.

In [None]:
translations_nllb

##Create Translations with Google Traslator.

As a second source for translations, we will use the Google Translator API.

In [None]:
!pip install -q googletrans==3.1.0a0
from googletrans import Translator

In [None]:
translator_google = Translator()

In [None]:
translations_google = []

for text in sentences:
  print ("to translate: " + text)
  translation = ""
  translation = translator_google.translate(text, dest="es")

  #Add the summary to summaries list
  translations_google.append(translation.text)
  print (translation.text)

In this list, we have the translations created by Google.

In [None]:
translations_google

## Evaluate translations with BLEU

We will use the BLEU implementation from the Evaluate library by Hugging Face.

In [None]:
#pip install -q evaluate==0.4.1
import evaluate
bleu = evaluate.load('bleu')

In [None]:
results_nllb = bleu.compute(predictions=translations_nllb, references=reference_translations)


To obtain the metrics, we pass the translated text and the reference text to the BLEU function.

Note that the translated text is a list of translations:
["Translation1", "Translation2"]

Whereas the reference texts are a list of lists of text. This allows for providing multiple references per translation:

[["reference1 Translation1", "reference2 Translation1"],
["reference2 Translation2", "reference2 Translation2"]]


In [None]:
results_google = bleu.compute(predictions=translations_google, references=reference_translations)

In [None]:
print(results_nllb)

In [None]:
print(results_google)