# Assignment 1 CS5803

It includes the solution to all questions of the Assignment 1 of CS5803 Natural Language processing course

# TEAM MEMBERS

## Group No 2
Abhinav Kumar Jha (cs23mtech15001@iith.ac.in)         
Shriram Pradeep (cs23mtech15020@iith.ac.in)

In [1]:
import math
import string
import collections

# Question 1.1

Implement BLEU Score metric. Pre-process the text by lower-casing the text and removing punctuation.

In [2]:
class BleuScore:
    """
    Class to implement bleu score
    """
    def __init__(self,max_order=4):
        """
        Initializes variables to default values
        """
        self.results_dict = {}
        self.max_order = max_order
        self.matches_by_order = [0] * self.max_order
        self.possible_matches_by_order = [0] * self.max_order
        self.precisions = [0] * self.max_order
        self.reference_text_length = 0
        self.translation_text_length = 0
    
    def preprocess_text(self,text):
        """
        Preprocess the text by converting it into lower case and removes punctuation
        """
        # Convert the text to lowercase
        text = text.lower()
        # Remove punctuation
        text = text.translate(str.maketrans("", "", string.punctuation))
        return text
    
    
    def generate_ngrams(self,text_list, max_order=4):
        """Extracts all n-grams up to a given maximum order from an input segment.
    
        :param text_list: list of text tokens from which n-grams will be extracted.
        :param max_order: maximum length in tokens of the n-grams returned by this methods.
        
        :returns The all n-grams upto max_order in segment with a count of how many times each n-gram occurred.
        """
        ngram_counts = collections.Counter()
        for order in range(1, max_order + 1):
            for i in range(0, len(text_list) - order + 1):
                # For ngram to be hashable.
                ngram = tuple(text_list[i : i + order])
                ngram_counts[ngram] += 1
        return ngram_counts

    def print_bleu_score(self,bleu_info):
        """
        prints the bleu score values in more readable format
        :args: bleu_info: Dictionary containing the bleu score,geometric mean,precision values and brevity penalty calculated
        """
        print("*"*50)
        print("\nPrecision:")
        for ngram, score in bleu_info['precision'].items():
            print("{:8s}: {:.4f}".format(ngram.capitalize(), score))
        print("Geometric Mean: {:.4f}".format(bleu_info['geometric_mean']))
        print("Brevity Penalty: {:.4f}".format(bleu_info['brevity_penalty']))
        print("\nBLEU Score: {:.4f}".format(bleu_info['bleu_score']))
        
        print("*"*50)
    
    def get_bleu_score(
        self,translation_text, reference_text_list, max_order=4, smooth=True
    ):
        """Computes BLEU score of translated segments against one or more references.
        
        :param translation_text: List of translations to score.
        :param reference_text_list: List of reference text for translation text.
        :param max_order: Maximum n-gram order to use when computing BLEU score.
        :param smooth: Boolean value for whether or not to apply smoothing
        
        :returns: BLEU score, geometric mean, n-gram precisions, and brevity penalty.
        """
    
        translation_text_list = self.preprocess_text(translation_text).split()
        reference_text_list = [self.preprocess_text(reference_sentence).split() for reference_sentence in reference_text_list]

        # Calculate r
        self.reference_text_length += min(len(ref) for ref in reference_text_list)
        # Calculate c
        self.translation_text_length = len(translation_text_list)
        
        # Finds the maximum ngram count in references
        # The | operator computes the maximum reference count as in the original paper.
        # For any instance of n-grams, we takes its max count among all references.
        reference_ngram_counts = collections.Counter()
        for reference_text in reference_text_list:
            reference_ngram_counts |= self.generate_ngrams(reference_text, max_order)
        translation_ngram_counts = self.generate_ngrams(translation_text_list, max_order)
    
        # Clips translation ngram count with maximum ngram count in reference
        # The clipping prevents meaningless translation consisting of many repeated words being overestimated,
        overlap = translation_ngram_counts & reference_ngram_counts
        for ngram in overlap:
            self.matches_by_order[len(ngram) - 1] += overlap[ngram]
    
        # Computes the counts of all n-grams ranging from 1 to max_order in a translation,
        # This term serves as the normalizer in the modified-ngrams-precision
        for order in range(1, max_order + 1):
            possible_matches = len(translation_text_list) - order + 1
            if possible_matches > 0:
                self.possible_matches_by_order[order - 1] += possible_matches

        # Compute modified n-grams precision
        for i in range(self.max_order):
            if smooth:
                smoothed_match_value = (self.matches_by_order[i] + 0.1) if self.matches_by_order[i] == 0 else self.matches_by_order[i]
                self.precisions[i] = smoothed_match_value / self.possible_matches_by_order[i] \
                                                        if self.possible_matches_by_order[i] > 0 else 0.0
            else:
                self.precisions[i] = self.matches_by_order[i] / self.possible_matches_by_order[i] \
                                                            if self.possible_matches_by_order[i] > 0 else 0.0
        
        # Calculate geometric mean
        if min(self.precisions) > 0:
            precision_log_sum = sum((1.0 / self.max_order) * math.log(precision) for precision in self.precisions)
            geo_mean = math.exp(precision_log_sum)
        else:
            geo_mean = 0
        
        # Compute brevity penalty
        ratio = float(self.translation_text_length) / self.reference_text_length
        brevity_penalty = min(1.0, math.exp(1 - 1.0 / ratio))
        
        # Compute BLEU score
        bleu = geo_mean * brevity_penalty

        precisions_dict = {
            'unigram':self.precisions[0],
            'bigram':self.precisions[1],
            'trigram':self.precisions[2],
            '4-gram':self.precisions[3]
        }
        self.results_dict['bleu_score'] = bleu
        self.results_dict['geometric_mean'] = geo_mean
        self.results_dict['precision'] = precisions_dict
        self.results_dict['brevity_penalty'] = brevity_penalty
        
        return self.results_dict

# Question 1.2

Use the bleau score implementation to find BLEU Score when      
`x = The boys were playing happily on the ground.` 
and     
`y = The boys were playing football on the field.`

In [3]:
translation_text = 'The boys were playing happily on the ground.'
reference_text_list  = ['The boys were playing football on the field']

bleu_score_obj = BleuScore()
score = bleu_score_obj.get_bleu_score(
    translation_text=translation_text,
    reference_text_list=reference_text_list
)
bleu_score_obj.print_bleu_score(score)


**************************************************

Precision:
Unigram : 0.7500
Bigram  : 0.5714
Trigram : 0.3333
4-gram  : 0.2000
Geometric Mean: 0.4111
Brevity Penalty: 1.0000

BLEU Score: 0.4111
**************************************************


# Question 1.3

Can you explain why we are taking minimum in numerator in equation 1?

## Answer 1.3


Machine Translation systems may overgenerate reasonable words resulting in translations with high precisions but bad translations if one does not clip the count of each candidate word by it's maximum reference count.

For e.g 
```
Candidate: the the the the the the the. 

Reference 1: The cat is on the mat.
Reference 2: There is a cat on the mat.

Standard Unigram precision is 7/7.
Modified Unigram Precision = 2/7.

```

Therefore to produce better translations for a particular value of n we take the min to clip the total count of each candidate word by it's maximum count present in any reference translations.


# Question 1.4

Use your Bleu score implementation to find BLEU Score between any 5 sentence pairs and

Explain what are potential disadvantages of using the BLEU Score

In [4]:

translation_text_list = ['The cat is sitting on the mat',
                         'The cat is sleeping on the couch.',
                        'I enjoy listening to music.',
                        'The weather is beautiful today.',
                        'She is reading a book in the garden.',
                        'He ate a delicious meal at the restaurant.'
                        ]

references_text_list  = [
                            ['The cat is on the mat'],
                            ['The cat is sleeping on the sofa.'],
                            ['I like to listen to music.'],
                            ['Today weather is lovely.'],
                            ['She reads a book in the garden.'],
                            ['He enjoyed a tasty meal at the restaurant.']
                       ]


for (translation_text,reference_text) in zip(translation_text_list,references_text_list):
    bleu_score_obj = BleuScore()
    
    bleu_score = bleu_score_obj.get_bleu_score(
        translation_text=translation_text,
        reference_text_list=reference_text
    )
    print("For Translation text \n {0} \nReference Text \n {1}".format(translation_text,reference_text))
    print("Bleu Score computed is \n ")
    bleu_score_obj.print_bleu_score(bleu_score)
    print()
    print()
    

For Translation text 
 The cat is sitting on the mat 
Reference Text 
 ['The cat is on the mat']
Bleu Score computed is 
 
**************************************************

Precision:
Unigram : 0.8571
Bigram  : 0.6667
Trigram : 0.4000
4-gram  : 0.0250
Geometric Mean: 0.2749
Brevity Penalty: 1.0000

BLEU Score: 0.2749
**************************************************


For Translation text 
 The cat is sleeping on the couch. 
Reference Text 
 ['The cat is sleeping on the sofa.']
Bleu Score computed is 
 
**************************************************

Precision:
Unigram : 0.8571
Bigram  : 0.8333
Trigram : 0.8000
4-gram  : 0.7500
Geometric Mean: 0.8091
Brevity Penalty: 1.0000

BLEU Score: 0.8091
**************************************************


For Translation text 
 I enjoy listening to music. 
Reference Text 
 ['I like to listen to music.']
Bleu Score computed is 
 
**************************************************

Precision:
Unigram : 0.6000
Bigram  : 0.2500
Trigram : 0.03

###  Disadvantages of Bleu Score


1. **Inability to Capture Meaning**: BLEU does not measure the semantic meaning of translations. It cannot differentiate between translations that convey the intended meaning accurately and those that do not
2. **Limited Vocabulary Coverage**: BLEU Score considers only exact matches of n-grams between translations and references. It penalizes translations for using synonyms or paraphrasing, which may be valid translations but not identical to the reference text.
3. **Insensitive to Structural Differences**: BLEU treats all n-grams equally regardless of their position in the sentence. It does not consider differences in word order or sentence structure