#pegasus-large-txtsummary

Pegasus-Large is a specialized transformer model designed for abstractive summarization, excelling at generating concise, human-like summaries. Its unique Gap-Sentence Generation pre-training mimics summarization tasks, leading to high-quality outputs. It achieves state-of-the-art results on benchmarks like CNN/Daily Mail and XSum. Pegasus is efficient, scalable, and versatile, suitable for various domains. As an open-source model, it’s easily accessible for integration into diverse applications.

In [1]:
from transformers import PegasusForConditionalGeneration, PegasusTokenizer

# Load PEGASUS model and tokenizer
model_name = "google/pegasus-large"  # You can choose different PEGASUS variants depending on your task
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name)

def summarize_text(text, max_length=150):
    # Tokenize input text
    inputs = tokenizer([text], max_length=max_length, return_tensors="pt", truncation=True)

    # Generate summary
    summary_ids = model.generate(inputs["input_ids"], max_length=max_length, num_beams=4, early_stopping=True)

    # Decode the summary
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

    return summary




The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/88.0 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/3.09k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-large and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


generation_config.json:   0%|          | 0.00/260 [00:00<?, ?B/s]

In [2]:
# Example usage
text = """
         The PEGASUS model is a powerful tool for abstractive text summarization.
         It uses transformer-based architecture and pre-training with gap sentences
         to generate informative and coherent summaries of input documents.
         """
summary = summarize_text(text)
print("Original Text:\n", text)
print("\nSummary:\n", summary)

Original Text:
 
         The PEGASUS model is a powerful tool for abstractive text summarization.
         It uses transformer-based architecture and pre-training with gap sentences
         to generate informative and coherent summaries of input documents.
         

Summary:
 It uses transformer-based architecture and pre-training with gap sentences to generate informative and coherent summaries of input documents.
