# 🔤 BLEU Score Demonstration

**BLEU** (Bilingual Evaluation Understudy) is a metric for evaluating the quality of machine-translated text compared to one or more reference translations.

BLEU is based on:
- n-gram precision
- a brevity penalty for short hypotheses

This notebook shows how to calculate BLEU using Hugging Face's `evaluate` library.

In [None]:
# Install if necessary
# !pip install evaluate

In [None]:
import evaluate
bleu = evaluate.load("bleu")

In [None]:
# Example: prediction vs reference translation
predictions = [
    "The cat is on the mat."
]

references = [
    ["There is a cat sitting on the mat."]  # Note: BLEU expects list of lists for references
]

In [None]:
# Compute BLEU
results = bleu.compute(predictions=predictions, references=references)
results

### 🧠 Notes:
- BLEU uses **n-gram precision** (up to 4-grams by default).
- It penalizes overly short translations via a **brevity penalty**.
- Values range from 0 (no match) to 1 (perfect match).
- Often used for evaluating **translation** and sometimes **summarization** (less ideal for abstractive cases).