# Get data set for semantic analysis

In [None]:
from datasets import load_dataset

ds = load_dataset("EdinburghNLP/xsum")

In [None]:
ds["test"][0]

# Calculate Similarity Metrics

As per this paper: https://arxiv.org/html/2402.17008v1#S4

"ROUGE: ROUGE Lin (2004) is a family of metrics that score the lexical overlap between the generated text and the reference text. We used 3 variations, R-1, R-2, and R-L, which are widely adopted for evaluating text summarizing tasks. However, despite its popularity, works like Akter et al. (2022) and Bansal et al. (2022b) show that ROUGE is an unsuitable metric for comparing semantics. For this reason we also evaluate using metrics that have been designed with semantic awareness in mind.

BERTscore: While ROUGE can only convey information about lexical overlap, BERTscore is a metric that utilizes contextual embeddings from transformer models like BERT to evaluate the semantic similarity between the generated text and reference text. For this study, we compute BERTscore with the hashcode roberta-large_ L17_ no-idf_ version=0.3.12(hug_ trans=4.36.2)-rescaled.

SEM-F1: While ROUGE and BERTscore are useful and powerful metrics, SEM-F1 was specifically designed for the SOS task. SEM-F1 leverages rigorously fine-tuned sentence encoders to evaluate the SOS task using sentence-level similarity. It differs from BERTscore as BERTscore computes token-level similarity. For this study, we compute SEM-F1 with underlying models: USE Cer et al. (2018), RoBERTa Zhuang et al. (2021), and DistilRoBERTa Sanh et al. (2019)."

In [17]:
from bert_score import score
from rouge_score import rouge_scorer 
import json


In [None]:
scorer = rouge_scorer.RougeScorer(['rouge1'], use_stemmer=True)
scores = scorer.score(ds["test"][0]["document"], ds["test"][0]["summary"])
print(scores['rouge1'])

In [31]:
ds["test"][0]["summary"]

'There is a "chronic" need for more housing for prison leavers in Wales, according to a charity.'

In [None]:
P, R, F1 = score(ds["test"][0]["summary"], ds["test"][0]["summary"], lang="en")
print(F1)

# Prompt Generation 
Generate Prompt following: TELeR Prompts

Task: Summarize the following newsletter article in exactly one sentence that captures its core message.

Explanation: You are summarizing for industry professionals who need a fast, high-level understanding of the article. Your summary should include the key topic, any notable findings or updates, and the article’s main implication or takeaway.

Limitations:

Do not exceed one sentence.
Do not use bullet points or lists.
Do not add commentary, opinion, or context not present in the original article.
Use clear, informative language appropriate for a professional audience.

Input Article:
[Insert Any of test set articels]

In [29]:
test_set = ds["test"]
to_test=20

In [30]:
for i in range(0, min(to_test, len(test_set))):
    article_text = test_set[i]["document"]
    prompt_template = (
        "Task: Summarize the following newsletter article in exactly one sentence that captures its core message.\n\n"
        "Explanation: You are summarizing for industry professionals who need a fast, high-level understanding of the article. "
        "Your summary should include the key topic, any notable findings or updates, and the article’s main implication or takeaway.\n\n"
        "Limitations:\n\n"
        "Do not exceed one sentence.\n"
        "Do not use bullet points or lists.\n"
        "Do not add commentary, opinion, or context not present in the original article.\n"
        "Use clear, informative language appropriate for a professional audience.\n\n"
        "Input Article:\n"
        f"{article_text}"
    )
    data = {"text": prompt_template}
    with open(f"prompt_{i}.json", "w", encoding="utf-8") as f:
        json.dump(data, f, ensure_ascii=False, indent=2)