# FactSum: Fact-Verification-Guided Abstractive Summarizer for News Clusters

## Loading Python Libraries

In [12]:
import pandas as pd
from datasets import load_dataset #For the Multi-News dataset
import torch
from transformers import PegasusTokenizer, PegasusForConditionalGeneration

# Loading the main model:
- **pegasus-multi_news**: A pre-trained model fine-tuned on the Multi-News dataset for abstractive summarization.


In [13]:
# Model Identifier
MODEL_NAME = "google/pegasus-multi_news"

# Loading the tokenizer and model
tokenizer = PegasusTokenizer.from_pretrained(MODEL_NAME)
model = PegasusForConditionalGeneration.from_pretrained(MODEL_NAME)

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-multi_news and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


# Loading the Dataset(s):
- **Multi-News**: A dataset containing news articles and their corresponding human-written summaries, used for training and evaluating summarization models.



In [14]:
multi_news = load_dataset("Awesome075/multi_news_parquet")

In [15]:
print(multi_news)

DatasetDict({
    train: Dataset({
        features: ['document', 'summary'],
        num_rows: 44972
    })
    validation: Dataset({
        features: ['document', 'summary'],
        num_rows: 5622
    })
    test: Dataset({
        features: ['document', 'summary'],
        num_rows: 5622
    })
})


In [16]:
example = multi_news["test"][0]

print("DOCUMENT:\n", example["document"][:1000], "...")  # first 1000 chars
print("\nSUMMARY:\n", example["summary"])

DOCUMENT:
 GOP Eyes Gains As Voters In 11 States Pick Governors 
 
 Enlarge this image toggle caption Jim Cole/AP Jim Cole/AP 
 
 Voters in 11 states will pick their governors tonight, and Republicans appear on track to increase their numbers by at least one, with the potential to extend their hold to more than two-thirds of the nation's top state offices. 
 
 Eight of the gubernatorial seats up for grabs are now held by Democrats; three are in Republican hands. Republicans currently hold 29 governorships, Democrats have 20, and Rhode Island's Gov. Lincoln Chafee is an Independent. 
 
 Polls and race analysts suggest that only three of tonight's contests are considered competitive, all in states where incumbent Democratic governors aren't running again: Montana, New Hampshire and Washington. 
 
 While those state races remain too close to call, Republicans are expected to wrest the North Carolina governorship from Democratic control, and to easily win GOP-held seats in Utah, North Dako

# Implementing the Chunking (1024 tokens) with 128 token overlap

## Semantic Document Chunker Implementation


In [17]:
from semantic_document_chunker import SemanticDocumentChunker