<a href="https://colab.research.google.com/github/camillan/llm-learning/blob/main/summarization_of_microplastics_articles.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [7]:
urls = [
    "https://www.eatingwell.com/how-to-limit-microplastics-in-your-food-11713723",
    "https://www.weforum.org/stories/2025/04/impact-microplastics-environment-health/",
    "https://marinedebris.noaa.gov/what-marine-debris/microplastics",
    "https://en.wikipedia.org/wiki/Microplastics",
    "https://www.ucsf.edu/news/2024/02/427161/how-to-limit-microplastics-dangers"
]


In [8]:
import requests
from bs4 import BeautifulSoup

def get_article_text(url):
    try:
        response = requests.get(url, timeout=10)
        soup = BeautifulSoup(response.content, "html.parser")
        # Combine all paragraph tags into one string
        paragraphs = [p.get_text() for p in soup.find_all("p")]
        return " ".join(paragraphs)
    except Exception as e:
        print(f"❌ Error fetching {url}: {e}")
        return ""


In [9]:
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

for url in urls:
    print(f"\n🔗 {url}")
    article_text = get_article_text(url)

    if article_text:
        # Limit to 1024 tokens (~4000 characters) to fit model input
        article_text = article_text[:4000]
        summary = summarizer(article_text, max_length=100, min_length=30, do_sample=False)[0]['summary_text']
        print("📝 Summary:")
        print(summary)
    else:
        print("No article text found.")


Device set to use cuda:0



🔗 https://www.eatingwell.com/how-to-limit-microplastics-in-your-food-11713723


Your max_length is set to 100, but your input_length is only 87. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=43)


📝 Summary:
Microplastics are tiny plastic particles less than 5 millimeters in diameter. The majority of plastic waste is left to accumulate in landfills and the environment. There’s not enough evidence to directly link microplastics to a disease. Follow these steps to reduce your exposure.

🔗 https://www.weforum.org/stories/2025/04/impact-microplastics-environment-health/
📝 Summary:
Reference #18.984ddb17.1744501627.c3c5962b.https://errors.edgesuite.net/18.8.4.1.1/errors-errors-error-reporting.html?referer=http://www.eBay.com/search?q=eBay%20Search%20Ebay%20Results%20Home%20and%20Google%20Street%20

🔗 https://marinedebris.noaa.gov/what-marine-debris/microplastics
📝 Summary:
Microplastics are small plastic pieces or fibers that are smaller than 5 mm in size. Because they are so small, wildlife can mistake microplastics for food. Microplastics can attract and carry pollutants that are in the water. They can also release the chemicals that are added to plastics.

🔗 https://en.wikipedia.o