<a href="https://colab.research.google.com/github/ankit-singh26/GenerativeAi/blob/main/TextSummarizationProject.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
# @title 1. Install necessary libraries
# This cell installs the Hugging Face Transformers library.
# It's crucial to run this first in your Colab notebook.
!pip install transformers

# Import the pipeline function after installation
from transformers import pipeline

# @title 2. Define the Text Summarization Function
def summarize_text(text, max_length=150, min_length=40, model_name="facebook/bart-large-cnn"):
    """
    Summarizes the given text using a pre-trained Hugging Face model.

    Args:
        text (str): The input text to be summarized.
        max_length (int): The maximum length (in tokens) of the generated summary.
        min_length (int): The minimum length (in tokens) of the generated summary.
        model_name (str): The name of the pre-trained model to use for summarization.
                          Common choices include:
                          - "facebook/bart-large-cnn" (good general-purpose)
                          - "t5-base"
                          - "google/pegasus-xsum" (more abstractive, good for very short summaries)
                          - "sshleifer/distilbart-cnn-12-6" (faster, smaller)

    Returns:
        str: The summarized text.
    """
    # Initialize the summarization pipeline
    # The first time you run this, Colab will download the model,
    # which might take a few minutes depending on the model size and your internet speed.
    summarizer = pipeline("summarization", model=model_name)

    # Generate the summary
    # do_sample=False ensures deterministic output for the same input
    summary = summarizer(text, max_length=max_length, min_length=min_length, do_sample=False)

    # The summarizer returns a list of dictionaries, so we extract the 'summary_text'
    return summary[0]['summary_text']

# @title 3. Example Usage
# You can paste your own long text here.
# For demonstration, we'll use a sample text about AI.
long_text_example = """
Artificial intelligence (AI) is rapidly transforming various industries, from healthcare to finance and transportation. In healthcare, AI-powered systems can analyze vast amounts of patient data to assist with diagnosis, predict disease outbreaks, and even help in drug discovery. For instance, machine learning algorithms can identify subtle patterns in medical images that human doctors might miss, leading to earlier and more accurate diagnoses.

In the financial sector, AI is used for fraud detection, algorithmic trading, and personalized financial advice. AI models can detect unusual transaction patterns, flag potential fraudulent activities, and analyze market trends to make informed investment decisions. Chatbots and virtual assistants powered by AI are also becoming common for customer service in banking.

The transportation industry is experiencing a revolution with autonomous vehicles. AI plays a crucial role in enabling self-driving cars to perceive their surroundings, make real-time decisions, and navigate safely. Beyond self-driving cars, AI optimizes traffic flow, manages logistics in supply chains, and enhances public transportation systems.

However, the rise of AI also brings forth ethical considerations and challenges. Concerns about job displacement due to automation, algorithmic bias, and the responsible use of AI in sensitive areas like surveillance and warfare are actively being debated. Ensuring transparency, fairness, and accountability in AI development is paramount for its beneficial integration into society. Researchers and policymakers are working on guidelines and regulations to address these complex issues and harness AI's potential while mitigating its risks.
"""

# Summarize the text with default lengths using BART
print("--- Summary with facebook/bart-large-cnn (Default Lengths) ---")
summary_bart_default = summarize_text(long_text_example)
print(summary_bart_default)
print("\n" + "="*80 + "\n")

# Summarize the text with custom lengths using BART
print("--- Summary with facebook/bart-large-cnn (Custom Lengths) ---")
summary_bart_custom = summarize_text(long_text_example, max_length=70, min_length=25)
print(summary_bart_custom)
print("\n" + "="*80 + "\n")

# You can try a different model as well (e.g., 'sshleifer/distilbart-cnn-12-6')
# This model is generally faster but might be slightly less accurate than BART-large.
print("--- Summary with sshleifer/distilbart-cnn-12-6 (Custom Lengths) ---")
summary_distilbart_custom = summarize_text(
    long_text_example,
    max_length=60,
    min_length=20,
    model_name="sshleifer/distilbart-cnn-12-6"
)
print(summary_distilbart_custom)
print("\n" + "="*80 + "\n")

# @title 4. (Optional) Try with your own text!
# Replace the placeholder text below with your article, document, or any long text.
my_own_text = """
Paste your very long text here! For example, an article about a new scientific discovery,
a historical event, or a detailed product description. The longer and more complex
the text, the more useful summarization becomes. Remember that very, very long texts
might hit model token limits, but for most articles, this should work well.
"""

if my_own_text.strip() != "Paste your very long text here! For example, an article about a new scientific discovery, a historical event, or a detailed product description. The longer and more complex the text, the more useful summarization becomes. Remember that very, very long texts might hit model token limits, but for most articles, this should work well.":
    print("--- Summary of Your Custom Text ---")
    my_summary = summarize_text(my_own_text, max_length=100, min_length=30)
    print(my_summary)
else:
    print("No custom text provided in 'my_own_text' variable. Skipping custom text summarization.")

--- Summary with facebook/bart-large-cnn (Default Lengths) ---


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


Artificial intelligence (AI) is rapidly transforming various industries, from healthcare to finance and transportation. Concerns about job displacement due to automation, algorithmic bias, and the responsible use of AI in sensitive areas like surveillance and warfare are actively being debated.


--- Summary with facebook/bart-large-cnn (Custom Lengths) ---


Device set to use cpu


Artificial intelligence (AI) is rapidly transforming various industries, from healthcare to finance and transportation. In healthcare, AI-powered systems can analyze vast amounts of patient data to assist with diagnosis, predict disease outbreaks, and even help in drug discovery.


--- Summary with sshleifer/distilbart-cnn-12-6 (Custom Lengths) ---


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

Device set to use cpu


 In healthcare, AI-powered systems can analyze vast amounts of patient data to assist with diagnosis, predict disease outbreaks, and even help in drug discovery . In the financial sector, AI is used for fraud detection, algorithmic trading, and personalized financial advice . Chatbots and virtual assistants powered


--- Summary of Your Custom Text ---


Device set to use cpu
Your max_length is set to 100, but your input_length is only 75. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=37)


Paste your very long text here! For example, an article about a new scientific discovery. The longer and more complexthe text, the more useful summarization becomes.
