**ROUGE-1**: Measures the overlap of single words between the generated summary and the reference summary.

**ROUGE-2**: Measures the overlap of two consecutive words (bigrams) between the generated summary and the reference summary.

**ROUGE-L**: Measures the longest common subsequence of words, reflecting sentence-level structure and order between the generated summary and the reference summary.

Since we are comparing extractive text summarization to abstractive text summarization, ROUGE metrics are preferred over traditional precision and recall as they provide a more nuanced evaluation of content coverage, contextual coherence, and structural preservation. Traditional precision and recall do not capture the linguistic and structural similarities between the generated and reference summaries, which are crucial for summarization tasks.

In [1]:
pip install rouge


Collecting rouge
  Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Installing collected packages: rouge
Successfully installed rouge-1.0.1


In [2]:
pip install rouge_score

Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24933 sha256=2c28629e9c5d22293bb213cacb861a9eeddc05e69071163ab965db2b5cf9e58b
  Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4
Successfully built rouge_score
Installing collected packages: rouge_score
Successfully installed rouge_score-0.1.2


In [3]:
from google.colab import drive
drive.mount("/content/drive")


Mounted at /content/drive


In [4]:
import os
from rouge_score import rouge_scorer

def calculate_rouge_scores(generated_summaries_folder, reference_summaries_folder):
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

    rouge1_precision_dict = {}
    rouge1_recall_dict = {}
    rouge1_f1_dict = {}
    rouge2_precision_dict = {}
    rouge2_recall_dict = {}
    rouge2_f1_dict = {}
    rougeL_precision_dict = {}
    rougeL_recall_dict = {}
    rougeL_f1_dict = {}

    for subfolder in os.listdir(generated_summaries_folder):
        subfolder_generated_summaries_folder = os.path.join(generated_summaries_folder, subfolder)
        subfolder_reference_summaries_folder = os.path.join(reference_summaries_folder, subfolder)


        rouge1_precision = 0
        rouge1_recall = 0
        rouge1_f1 = 0
        rouge2_precision = 0
        rouge2_recall = 0
        rouge2_f1 = 0
        rougeL_precision = 0
        rougeL_recall = 0
        rougeL_f1 = 0

        for filename in os.listdir(subfolder_generated_summaries_folder):
            reference_summary_file = os.path.join(subfolder_reference_summaries_folder, filename)
            generated_summary_file = os.path.join(subfolder_generated_summaries_folder, filename)


            with open(reference_summary_file, 'r') as f:
                reference_summary = f.read()
            with open(generated_summary_file, 'r') as f:
                generated_summary = f.read()

            scores = scorer.score(reference_summary, generated_summary)
            rouge1_precision += scores['rouge1'].precision
            rouge1_recall += scores['rouge1'].recall
            rouge1_f1 += scores['rouge1'].fmeasure
            rouge2_precision += scores['rouge2'].precision
            rouge2_recall += scores['rouge2'].recall
            rouge2_f1 += scores['rouge2'].fmeasure
            rougeL_precision += scores['rougeL'].precision
            rougeL_recall += scores['rougeL'].recall
            rougeL_f1 += scores['rougeL'].fmeasure

        num_files = len(os.listdir(subfolder_generated_summaries_folder))
        rouge1_precision_dict[subfolder] = rouge1_precision / num_files
        rouge1_recall_dict[subfolder] = rouge1_recall / num_files
        rouge1_f1_dict[subfolder] = rouge1_f1 / num_files
        rouge2_precision_dict[subfolder] = rouge2_precision / num_files
        rouge2_recall_dict[subfolder] = rouge2_recall / num_files
        rouge2_f1_dict[subfolder] = rouge2_f1 / num_files
        rougeL_precision_dict[subfolder] = rougeL_precision / num_files
        rougeL_recall_dict[subfolder] = rougeL_recall / num_files
        rougeL_f1_dict[subfolder] = rougeL_f1 / num_files

    return (
        rouge1_precision_dict, rouge1_recall_dict, rouge1_f1_dict,
        rouge2_precision_dict, rouge2_recall_dict, rouge2_f1_dict,
        rougeL_precision_dict, rougeL_recall_dict, rougeL_f1_dict
    )

generated_summaries_folder = '/content/drive/MyDrive/zip_ref/BBC News Summary/Generated_Summaries'
reference_summaries_folder = '/content/drive/MyDrive/zip_ref/BBC News Summary/Summaries'

(
    rouge1_precision_dict, rouge1_recall_dict, rouge1_f1_dict,
    rouge2_precision_dict, rouge2_recall_dict, rouge2_f1_dict,
    rougeL_precision_dict, rougeL_recall_dict, rougeL_f1_dict
) = calculate_rouge_scores(generated_summaries_folder, reference_summaries_folder)


for subfolder in rouge1_precision_dict:
    print(f"Subfolder: {subfolder}")
    print("ROUGE-1 Precision:", rouge1_precision_dict[subfolder])
    print("ROUGE-1 Recall:", rouge1_recall_dict[subfolder])
    print("ROUGE-1 F1:", rouge1_f1_dict[subfolder])
    print("ROUGE-2 Precision:", rouge2_precision_dict[subfolder])
    print("ROUGE-2 Recall:", rouge2_recall_dict[subfolder])
    print("ROUGE-2 F1:", rouge2_f1_dict[subfolder])
    print("ROUGE-L Precision:", rougeL_precision_dict[subfolder])
    print("ROUGE-L Recall:", rougeL_recall_dict[subfolder])
    print("ROUGE-L F1:", rougeL_f1_dict[subfolder])
    print()


Subfolder: business
ROUGE-1 Precision: 0.6500305964893116
ROUGE-1 Recall: 0.5313253388121801
ROUGE-1 F1: 0.5662604114648947
ROUGE-2 Precision: 0.5022548169469754
ROUGE-2 Recall: 0.41613582873905536
ROUGE-2 F1: 0.4410760113086637
ROUGE-L Precision: 0.445598215538576
ROUGE-L Recall: 0.367858279070623
ROUGE-L F1: 0.3900853479050344

Subfolder: entertainment
ROUGE-1 Precision: 0.6406259842182173
ROUGE-1 Recall: 0.5402209127355717
ROUGE-1 F1: 0.5641916400562288
ROUGE-2 Precision: 0.49915640888139806
ROUGE-2 Recall: 0.42550350750732097
ROUGE-2 F1: 0.442743424933174
ROUGE-L Precision: 0.4484502346347948
ROUGE-L Recall: 0.37928634341512446
ROUGE-L F1: 0.39513023544311643

Subfolder: politics
ROUGE-1 Precision: 0.688698391117823
ROUGE-1 Recall: 0.435677816721828
ROUGE-1 F1: 0.5130898413220609
ROUGE-2 Precision: 0.511767398368549
ROUGE-2 Recall: 0.3282674195770657
ROUGE-2 F1: 0.38415557757019864
ROUGE-L Precision: 0.4514365655040401
ROUGE-L Recall: 0.28811334855180437
ROUGE-L F1: 0.3378619503650

**Inferences made:**

*   The summarization system performs moderately across all categories, with room for improvement in capturing content nuances and maintaining coherence.

*   Technical content poses the greatest challenge for the summarization system, indicating the need for specialized techniques to handle complex vocabulary and detailed information.

*   Higher precision than recall suggests that the generated summaries contain relevant information but may lack completeness, highlighting the need for improved content coverage.