In [1]:
import json

with open("database_100.json", 'r') as f:
    data = json.load(f)

for i, item in enumerate(data):
    print(f'[{i}] - {item["title"]}')

[0] - There's a shortage of truckers, but TuSimple thinks it has a solution: no driver needed - CNN
[1] - Bioservo's robotic 'Ironhand' could protect factory workers from injuries - CNN
[2] - This swarm of robots gets smarter the more it works - CNN
[3] - Two years later, remote work has changed millions of careers - CNN
[4] - Burger King partner 'refuses' to close 800 Russian locations - CNN
[5] - White House 'appalled' at Axios over Ukraine article - CNN
[6] - How Kohl's became such a mess - CNN
[7] - Budweiser's slogan wasn't always the 'King of Beers' - CNN
[8] - India's young investors prefer crypto to gold and 'boring' stocks - CNN
[9] - Adar Poonawalla: He vaccinates half the world's babies. Ending the pandemic proved much harder - CNN
[10] - Gravity could solve renewable energy's biggest problem - CNN
[11] - This Indian dairy-tech startup has created a step counter for cows  - CNN
[12] - How an Indian company is transforming palm leaves into tableware - CNN
[13] - A robot is ki

In [3]:
!pip install transformers pandas nltk numpy

!pip install torch torchvision torchaudio



In [2]:
import pandas as pd
import nltk
from transformers import pipeline
import torch
import time

nltk.download('punkt')

torch.set_num_threads(torch.get_num_threads())

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [19]:
from transformers import AutoTokenizer

def llm_summarize(text: str, model_name: str = "google/flan-t5-base", max_length: int = 500, min_length: int = 50) -> str:
    try:
      tokenizer = AutoTokenizer.from_pretrained(model_name)

      summarizer = pipeline(
          "summarization",
          model=model_name,
          tokenizer=tokenizer,
          num_beams=4,
          do_sample=True,
          device=0,
          max_length=8192
        )

      prompt = f"Summarize the provided article content, focusing on capturing the key events, central themes, or critical insights that define the article’s narrative. Ensure the summary is vivid, engaging, and informative, highlighting the most impactful moments or ideas that would resonate with a reader.\n{text}"
      summary = summarizer(prompt, truncation=True, max_length=max_length, min_length=min_length)
    except Exception as e:
      print(f"Error in llm_summarize: {e}")
      return ""

    return summary[0]["summary_text"]

In [20]:
def summarize_text(text: str, model_name: str = "google/flan-t5-base", max_length: int = 500, min_length: int = 50) -> str:
    return llm_summarize(text, model_name=model_name, max_length=max_length, min_length=min_length)

In [None]:
# Summarize the sample text using different models
model_names = [
    "Falconsai/text_summarization",
    "google/pegasus-xsum"
]

output = {
    "models": model_names,
    "content": []
}

for item in data[:50]:
    article_output = {
        "title": item["title"],
        "content": item["content"],
    }

    for model in model_names:
        print(f"Summarizing for article: {item['title']}")
        print(f"\tSummarize using {model}:")

        start_time = time.time()
        summary = summarize_text(item["content"], model_name=model)
        end_time = time.time()
        elapsed_time = end_time - start_time

        print(summary)
        print(f"Time taken for {model}: {elapsed_time:.2f} seconds\n")

        article_output[model] = {
            "content": summary,
            "time": f"{elapsed_time:.2f}"
        }

    output["content"].append(article_output)

with open("summarized_articles.json", "w") as f:
    json.dump(output, f, indent=2)

Summarizing for article: There's a shortage of truckers, but TuSimple thinks it has a solution: no driver needed - CNN
	Summarize using Falconsai/text_summarization:


Device set to use cuda:0


Yara Birkeland is what its builders call the world's first zero-emission, autonomous cargo ship . The ship is scheduled to make its first journey between two Norwegian towns before the end of the year . In China, a new Maglev high-speed train rolls off the production line in Qingdao, east China's Shandong Province, on July 20 .
Time taken for Falconsai/text_summarization: 1.73 seconds

Summarizing for article: There's a shortage of truckers, but TuSimple thinks it has a solution: no driver needed - CNN
	Summarize using google/pegasus-xsum:


Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-xsum and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cuda:0


In Case You Missed It: A round-up of interesting transport-related links shared over the past seven days, as compiled by the BBC's transport team and shared with you by our writers and editors., below are some of the latest transport-related links shared over the past seven days, as compiled by the BBC's transport team and shared with you by our writers and editors.
Time taken for google/pegasus-xsum: 7.05 seconds

Summarizing for article: Bioservo's robotic 'Ironhand' could protect factory workers from injuries - CNN
	Summarize using Falconsai/text_summarization:


Device set to use cuda:0


The "Ironhand" glove strengthens the wearer's grip, meaning they don't have to use as much force to perform repetitive tasks . The Swedish company describes the system as a "soft exoskeleton" The robots running our warehouses are an increasingly familiar presence in warehouses . At the south-east London warehouse run by British online supermarket Ocado, 3,000 robots fulfill shopping orders.
Time taken for Falconsai/text_summarization: 1.91 seconds

Summarizing for article: Bioservo's robotic 'Ironhand' could protect factory workers from injuries - CNN
	Summarize using google/pegasus-xsum:


Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-xsum and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cuda:0


A battery-powered glove, robots that spring to life in warehouses, and a tentacle-like gripper are among the latest stories featured in the latest issue of the BBC News website. The latest issue of the BBC News website features some of the most striking stories from the past week.
Time taken for google/pegasus-xsum: 6.19 seconds

Summarizing for article: This swarm of robots gets smarter the more it works - CNN
	Summarize using Falconsai/text_summarization:


Device set to use cuda:0


In a Hong Kong warehouse, a swarm of autonomous robots works 24/7. They're not just working hard, they're working smart; as they operate, they get better at their job. At the south-east London warehouse run by British online supermarket Ocado, 3,000 robots fulfill shopping orders . Scroll through to see more robots that are revolutionizing warehouses.
Time taken for Falconsai/text_summarization: 1.66 seconds

Summarizing for article: This swarm of robots gets smarter the more it works - CNN
	Summarize using google/pegasus-xsum:


Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-xsum and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cuda:0


If you would like to be considered for this position, please send us your resume and a brief summary of why you would like to be considered for this role to careers@bbc.co.uk or write to Careers, BBC World Service, 4 Cowdenbeath Avenue, London, EC1A 2BN.
Time taken for google/pegasus-xsum: 6.65 seconds

Summarizing for article: Two years later, remote work has changed millions of careers - CNN
	Summarize using Falconsai/text_summarization:


Device set to use cuda:0


Chelsea Pruitt, 31, moved from California to Alabama . At the time, she started working remotely . She started working for long-term housing rental company Zeus Living in January 2020 . Now, she's headed to Alabama to start a new life .
Time taken for Falconsai/text_summarization: 1.26 seconds

Summarizing for article: Two years later, remote work has changed millions of careers - CNN
	Summarize using google/pegasus-xsum:


Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-xsum and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cuda:0


The World Health Organization (WHO) is asking the public to share their stories of how the H1N1 pandemic affected their lives and their work, using the hashtag #H1N1Stories on Twitter or using the hashtag #H1N1Summary on Instagram.
Time taken for google/pegasus-xsum: 5.62 seconds

Summarizing for article: Burger King partner 'refuses' to close 800 Russian locations - CNN
	Summarize using Falconsai/text_summarization:


Device set to use cuda:0


Burger King is trying to suspend its operations in Russia, but that's proving difficult . A business partner controlling 800 restaurants has "refused" to close them, the company says . Burger King has a joint venture partnership with businessman Alexander Kolobov in Russia .
Time taken for Falconsai/text_summarization: 1.33 seconds

Summarizing for article: Burger King partner 'refuses' to close 800 Russian locations - CNN
	Summarize using google/pegasus-xsum:


Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-xsum and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cuda:0


Sharpen your writing skills with this short online writing course from the University of Wisconsin-Madison's School of Journalism and Mass Communication., which will teach you how to deliver a compelling and effective summary of a news story to a group of readers.
Time taken for google/pegasus-xsum: 6.88 seconds

Summarizing for article: White House 'appalled' at Axios over Ukraine article - CNN
	Summarize using Falconsai/text_summarization:


Device set to use cuda:0


Axios report based on fabricated letter purportedly written by Ukraine's top national security official . Ukraine ambassador Oksana Markarova said she believed the letter was "falsified" White House and CIA officials said they had no record of receiving the letter from Oleksiy Danilov .
Time taken for Falconsai/text_summarization: 1.49 seconds

Summarizing for article: White House 'appalled' at Axios over Ukraine article - CNN
	Summarize using google/pegasus-xsum:


Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-xsum and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cuda:0


If you would like to receive theReliable Sources newsletter, you can sign up for a free trial right here.... Read full story at theReliableSources.tumblr.com or follow us on Facebook and Twitter.. Read full story at theReliableSources.tumblr.com or follow us on Facebook and Twitter..
Time taken for google/pegasus-xsum: 6.42 seconds

Summarizing for article: How Kohl's became such a mess - CNN
	Summarize using Falconsai/text_summarization:


Device set to use cuda:0
