
#### Category 2: Summarization (The "Productivity" Agents) 

#### TL;DR for News Articles.

Goal: Paste a long news article URL or text and get a small summary.

Tech: distilbart for speed.


NAME: ANANYAA SREE SP

SRN: PES2UG23CS066

SECTION: A

In [29]:
# Import libraries
from transformers import pipeline
import textwrap
import shutil
import requests
from bs4 import BeautifulSoup
from urllib.parse import urlparse

#  Load summarization pipeline 
# Fast DistilBART model for summarization
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6")

#  Helper functions 
def is_valid_url(url):
    try:
        result = urlparse(url)
        return all([result.scheme, result.netloc])
    except:
        return False

def extract_text_from_url(url, max_chars=5000):
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()
    except requests.RequestException as e:
        return "", f"Error fetching URL: {e}"

    soup = BeautifulSoup(response.text, "html.parser")
    paragraphs = soup.find_all("p")
    
    if paragraphs:
        article_text = " ".join(p.get_text() for p in paragraphs)
    else:
        # Fallback: get all text
        article_text = soup.get_text(separator=" ", strip=True)
    
    return article_text[:max_chars], None  # truncate for summarizer

def summarize_article(article_text, max_length=180, min_length=90):
    if not article_text.strip():
        return "No content to summarize."
    
    summary = summarizer(
        article_text,
        max_length=max_length,
        min_length=min_length,
        do_sample=False
    )
    return summary[0]["summary_text"]

def display_summary(summary):
    # Use a default width
    terminal_width = shutil.get_terminal_size(fallback=(80, 20)).columns
    print("\nTL;DR Summary:\n")
    print(textwrap.fill(summary, width=terminal_width))

#  3. Input (text or URL)
user_input = input("Paste article text or URL:\n").strip()

if is_valid_url(user_input):
    print("Fetching article from URL...")
    article_text, error = extract_text_from_url(user_input)
    if error:
        print(error)
else:
    article_text = user_input

# 4. Generate and display summary
summary = summarize_article(article_text)
display_summary(summary)


Device set to use cpu


Paste article text or URL:
 AI Transforming Healthcare  NEW YORK — Artificial Intelligence (AI) is rapidly transforming the healthcare industry, revolutionizing diagnostics, treatment planning, and patient care. Hospitals and research institutions worldwide are increasingly deploying AI algorithms to detect diseases earlier and  more accurately than traditional methods. For instance, AI models trained on medical imaging data can now identify cancers such as breast and lung cancer with precision rivaling that of expert radiologists.  In addition to diagnostics, AI-powered tools are being used to personalize treatment plans for patients. Algorithms can analyze vast amounts of patient data — from genetic information to medical history — and suggest optimal treatment strategies tailored to each individual.  Pharmaceutical companies are also leveraging AI to accelerate drug discovery, reducing the time and cost required to bring new medicines to market.  Despite its promise, AI in healthcar


TL;DR Summary:

 Artificial Intelligence (AI) is rapidly transforming the healthcare industry, revolutionizing diagnostics, treatment
planning, and patient care . Hospitals and research institutions worldwide are increasingly deploying AI algorithms to
detect diseases earlier and more accurately than traditional methods . Pharmaceutical companies are also leveraging AI
to accelerate drug discovery, reducing the time and cost required to bring new medicines to market . Ethical concerns
regarding patient privacy, data security, and algorithmic bias remain high on the agenda .


Note:
- This summarizer works only on URLs where the article content is freely accessible.
- It may NOT work properly for websites url behind paywalls, login-protected pages, or sites that block scraping. 
- In such cases, only the visible HTML content will be summarized, which may include ads, subscription prompts, or incomplete text.
- Eg works for- https://www.androidcentral.com/apps-software/google-messages-update-adds-slick-new-link-and-youtube-video-previews.
- **All articles textually copy pasted are summarised properly.** 

Project description

This project focuses on building a TL;DR summarization system for news articles using Natural Language Processing (NLP) techniques. It allows users to input either a long news article URL or directly paste article text. When a URL is provided, the system automatically fetches and extracts the readable content from the webpage using web scraping techniques. The extracted or pasted text is then passed to a Hugging Face summarization pipeline powered by the DistilBART model, which is optimized for fast and efficient text summarization.
The project demonstrates practical applications of transformer-based models in real-world scenarios such as news consumption and information overload. By generating concise summaries from lengthy articles, the system helps users quickly understand the key ideas without reading the entire content. It also highlights important considerations such as handling web content limitations, preprocessing text, and producing readable outputs suitable for terminal or notebook environments.
