## **Assignment No 1**

**Group members :**

1) Akshy Kumar (CS23MTECH11022)

2) Arnab Ghosh (CS23MTECH11025)

3) Sanket Rathod (CS23MTECH11033)

## Question No 1
Implement BLEU Score metric. Pre-process the text by lower-casing the text and removing
punctuation.

In [61]:
# Importing Libraries
import string
from collections import Counter
import numpy as np

In [62]:
#Function for preprocessing the sentences.
def preprocess(sentence):
    # Converting text to lower case.
    sentence = sentence.lower()
    # Removing Punctuation from the text.
    sentence = sentence.translate(str.maketrans("", "", string.punctuation))
    # Splitting the text and returning the list of splitted sentence.
    return sentence.split()

In [63]:
# counting ngrams by passing Tokens and value of n.
def count_ngrams(tokens, n):
    ngrams = []
    for i in range(len(tokens) - n + 1):
        ngram = tuple(tokens[i:i + n])
        ngrams.append(ngram)
    # Returning the frequency of each token in a sentence using Counter function.
    return Counter(ngrams)

In [64]:
# Here calculating Modified ngram precision.
def clipped_precision(reference_counts, candidate_counts):
    clipped_counts = {}
    for ngram, count in candidate_counts.items():
        if ngram in reference_counts:
            clipped_counts[ngram] = min(count, reference_counts[ngram])
    # if not any mathch found then return precision as 0.
    if not clipped_counts or not candidate_counts:
        return 0
    # returning the modified ngram precision
    return sum(clipped_counts.values()) / sum(candidate_counts.values())

In [65]:
# Calculating bleu score.
def bleu_score(reference, candidate, n=4):
    # Preprocessing Refrence text
    reference_tokens = preprocess(reference)
    # Preprocessing Candidate text
    candidate_tokens = preprocess(candidate)
    # Calculating the length of refrence tokens and candidate tokens.
    reference_length = len(reference_tokens)
    candidate_length = len(candidate_tokens)

    reference_counts = {ngram: count for n in range(1, n + 1) for ngram, count in count_ngrams(reference_tokens, n).items()}
    candidate_counts = {ngram: count for n in range(1, n + 1) for ngram, count in count_ngrams(candidate_tokens, n).items()}
    # Precision Calculation
    precision = clipped_precision(reference_counts, candidate_counts)
    # Keeping BP value as 1 as mensioned in question.
    bp = 1

    BLEU = bp * np.exp(np.log(precision + 1e-10) / n)
    return BLEU

In [66]:
# Preprocessing text by lower-casing the text and removing punctuation
text = "The boys were playing happily on the ground."
preprocessed_text = preprocess(text)
print(preprocessed_text)

['the', 'boys', 'were', 'playing', 'happily', 'on', 'the', 'ground']


## Question no 2
 Use this implementation to find BLEU Score when

x = ”The boys were playing happily on the
ground.” and

y = ”The boys were playing football on the field.”.

In [67]:
# Implementing BLEU score for these texts.
reference = "The boys were playing happily on the ground."
candidate = "The boys were playing football on the field."
# passing refrence and candidate
bleu_score_value = bleu_score(reference, candidate)
print("BLEU Score:", bleu_score_value)

BLEU Score: 0.8408964152957594


## Question no 3

 Can you explain why we are taking minimum in numerator in equation 1?

The expression min(count-x(n-gram),count-y(n-gram)) ensures that the modified n-gram precision considers only the minimum count of occurrences between the machine-translated text and the reference text.

We are taking minimum in numerator of equation 1 given in the question because penalize the machine translated text in the two situations as given below:


* If the machine-translated text generates more occurrences of an
n-gram then what it is actually present in the reference text then it should only be evaluated based on the number of occurrences in the reference text to avoid artificial inflation of precision.

* If the machine-translated text generates fewer occurrences of an n-gram compared to the reference text, it should be penalized based on the reference count to encourage the generation of more accurate translations.


By taking the minimum count, the modified n-gram precision effectively evaluates how the machine-translated text matches the reference text and penalize it for generating too many occurrences of the same n-gram or for failing to generate enough occurrences as per the reference.

# Question no 4

 Use your implementation to find BLEU Score between any 5 sentence pairs and explain what
are potential disadvantages of using the BLEU Score.

In [69]:
# Example sentence pairs
sentence_pairs = [
    ("The cat is on the mat.", "The mat is under the cat."),
    ("He's a talented musician.", "He's a skilled musician."),
    ("The movie was fantastic.", "The film was superb."),
    ("The quick brown fox jumps over the lazy dog.", "The fox jumps."),
    ("She sells seashells by the seashore.", "She sells seashells at the seashore."),
]
# Printing BLEU score for 5 sentence pairs.
for reference, candidate in sentence_pairs:
    bleu_score_value = bleu_score(reference, candidate)
    print(f"Reference: {reference}")
    print(f"Candidate: {candidate}")
    print(f"BLEU Score: {bleu_score_value}\n")

Reference: The cat is on the mat.
Candidate: The mat is under the cat.
BLEU Score: 0.7896895368070302

Reference: He's a talented musician.
Candidate: He's a skilled musician.
BLEU Score: 0.7952707288167551

Reference: The movie was fantastic.
Candidate: The film was superb.
BLEU Score: 0.6687403050600146

Reference: The quick brown fox jumps over the lazy dog.
Candidate: The fox jumps.
BLEU Score: 0.9036020036437299

Reference: She sells seashells by the seashore.
Candidate: She sells seashells at the seashore.
BLEU Score: 0.8408964152957594



Limited to N-gram Matching: Consider the following translations:

**Reference:** "The cat is on the mat."

**Candidate:** "The mat is under the cat."

While the candidate translation conveys the same meaning as the reference, BLEU would penalize it because the n-grams do not match exactly.
Insensitive to Synonyms and Paraphrases:

**Reference:** "He's a talented musician."

**Candidate:** "He's a skilled musician."

Both sentences convey the same meaning, but BLEU would penalize the candidate translation because it uses a synonym ("talented" vs. "skilled").
Insensitive to Structural Differences:

**Reference:** "The quick brown fox jumps over the lazy dog."

**Candidate:** "The fox jumps."

Even though the candidate translation is correct, BLEU would likely penalize it for being too short compared to the reference.
Reference Dependency:

**Reference 1:** "The car is red."

**Reference 2:** "The car is colored red."

Depending on which reference translation is chosen, the BLEU score may vary, leading to inconsistency in evaluation.
Lack of Context Awareness:


Difficulty in Interpretation:

BLEU Score: 0.75
It's challenging to determine what specific aspects of the translation contributed to this score without additional information or analysis.