The BLEU (Bilingual Evaluation Understudy) score is a metric used to evaluate the quality of machine-generated translations against one or more reference translations. It measures the similarity between the generated translation and the reference translations based on n-gram matches.

The BLEU score ranges between 0 and 1, with 1 being a perfect match. It is calculated by comparing the n-grams (contiguous sequences of words) in the generated translation to the n-grams in the reference translations. BLEU considers precision, which is the percentage of n-grams in the generated translation that also appear in the reference translations, and brevity penalty, which penalizes shorter translations.

BLEU is commonly used in machine translation tasks to assess the quality of generated translations and compare different translation systems. It provides a quantitative measure of how well the generated translation aligns with the reference translations in terms of word overlap.

Trying to compare BLEU scores across different corpora and languages is strongly discouraged. Even comparing BLEU scores for the same corpus but with different numbers of reference translations can be highly misleading.

However, as a rough guideline, the following interpretation of BLEU scores (expressed as percentages rather than decimals) might be helpful.

BLEU Score	Interpretation

< 10	Almost useless

10 - 19	Hard to get the gist

20 - 29	The gist is clear, but has significant grammatical errors

30 - 40	Understandable to good translations

40 - 50	High quality translations

50 - 60	Very high quality, adequate, and fluent translations

greater than 60	Quality often better than human

****NOTE*****
Bleu Scores are between 0 and 1. A score of 0.6 or 0.7 is considered the best you can achieve. Even two humans would likely come up with different sentence variants for a problem, and would rarely achieve a perfect match.

In [None]:
import pandas as pd
from nltk.translate.bleu_score import sentence_bleu, corpus_bleu

In [None]:
df = pd.read_excel('pegasus-x-large-booksum-1.xlsx', nrows=1630)

columns_to_keep = ['Filename', 'Abstract', 'Claims', 'Summary'] # Add other column names here
df_subset = df[columns_to_keep]
abstract_and_claims = df['Abstract'] + ' ' + df['Claims']

summary = df['Summary']

bleu_scores_df = pd.DataFrame()

for i in range(len(abstract_and_claims)):
  input_content = abstract_and_claims[i]
  reference_summary = summary[i]
  # Split input content and reference summary into segments (e.g., paragraphs)
  input_segments = input_content.split("\n")
  reference_segments = reference_summary.split("\n")

  # Initialize the scores for this row
  row_scores = {}

  for input_seg, reference_seg in zip(input_segments, reference_segments):
      # Calculate BLEU score
      bleu_score = sentence_bleu([reference_seg], input_seg)

      # Add BLEU score to row_scores
      row_scores.setdefault('bleu', []).append(bleu_score)

  # Calculate the average scores for this row
  num_segments = len(input_segments)
  row_scores_avg = {'bleu': sum(row_scores['bleu']) / num_segments}

  # Convert the row_scores_avg to a DataFrame and append to bleu_scores_df
  row_scores_df = pd.DataFrame(row_scores_avg, index=[i])
  bleu_scores_df = bleu_scores_df.append(row_scores_df, ignore_index=True)

merged_df = pd.concat([df_subset, bleu_scores_df], axis=1)

merged_df.to_csv('bleu_scores_Final.csv', index=False)


