In [2]:
import os
import json
import requests
from sentence_transformers import SentenceTransformer
from transformers import pipeline


In [3]:
embedder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu


In [5]:

API_KEY = "f3a9ebde93ff46229425f0d66512546d"
PREFERENCES_FILE = "preferences.json"


In [6]:
def fetch_news(topic):
    url = f"https://newsapi.org/v2/everything?q={topic}&apiKey={API_KEY}"
    response = requests.get(url).json()
    return response.get("articles", [])

def get_embedding(text):
    return embedder.encode(text)


def summarize_text(text, summary_type="brief"):
    max_length = 50 if summary_type == "brief" else 150
    summary = summarizer(text, max_length=max_length, min_length=25, do_sample=False)
    return summary[0]["summary_text"]

def load_preferences():
    if os.path.exists(PREFERENCES_FILE):
        with open(PREFERENCES_FILE, "r") as file:
            return json.load(file)
    return {"saved_topics": [], "search_history": []}

def save_preferences(preferences):
    with open(PREFERENCES_FILE, "w") as file:
        json.dump(preferences, file, indent=4)

def save_topic(topic):
    preferences = load_preferences()
    if topic not in preferences["saved_topics"]:
        preferences["saved_topics"].append(topic)
        save_preferences(preferences)
        print(f"Saved topic: {topic}")
    else:
        print("Topic already saved.")

def view_saved_topics():
    preferences = load_preferences()
    print("Saved Topics:", preferences["saved_topics"])

def view_search_history():
    preferences = load_preferences()
    print("Search History:", preferences["search_history"])

def search_news(topic):
    preferences = load_preferences()
    preferences["search_history"].append(topic)
    save_preferences(preferences)

    articles = fetch_news(topic)
    if not articles:
        print("No articles found.")
        return

    print(f"Found {len(articles)} articles on '{topic}'\n")

    for idx, article in enumerate(articles[:5]):
        print(f"{idx+1}. {article['title']}")
        print(f"   Brief: {summarize_text(article['content'] or article['description'], 'brief')}")
        print(f"   Detailed: {summarize_text(article['content'] or article['description'], 'detailed')}")
        print("-" * 80)

In [7]:
def main():
    while True:
        print("\n1. Search News")
        print("2. Save Topic")
        print("3. View Saved Topics")
        print("4. View Search History")
        print("5. Exit")
        choice = input("Choose an option: ")

        if choice == "1":
            topic = input("Enter topic: ")
            search_news(topic)
        elif choice == "2":
            topic = input("Enter topic to save: ")
            save_topic(topic)
        elif choice == "3":
            view_saved_topics()
        elif choice == "4":
            view_search_history()
        elif choice == "5":
            break
        else:
            print("Invalid choice. Try again.")

if __name__ == "__main__":
    main()



1. Search News
2. Save Topic
3. View Saved Topics
4. View Search History
5. Exit
Choose an option: 1
Enter topic: sport
Found 99 articles on 'sport'

1. The Corvette ZR1’s 233-mph run had to start in a virtual world


Your max_length is set to 150, but your input_length is only 65. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=32)


   Brief: The Corvette ZR1s 233-mph run had to start in a virtual world. It turns out it takes a little Fortran to make a faster Corvette.
   Detailed: The Corvette ZR1s 233-mph run had to start in a virtual world. It turns out it takes a little Fortran to make a faster Corvette.
--------------------------------------------------------------------------------
2. Lexus’ sporty RZ is the latest EV with fake engine noises to simulate gear-shifting


Your max_length is set to 150, but your input_length is only 58. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=29)


   Brief: Lexus is updating its RZ electric vehicle lineup. New versions include a simulated manual gearbox and a steer-by-wire system.
   Detailed: Lexus is updating its RZ electric vehicle lineup. New versions include a simulated manual gearbox and a steer-by-wire system.
--------------------------------------------------------------------------------
3. Britain set to host men's Tour de France Grand Depart in 2027


Your max_length is set to 150, but your input_length is only 54. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=27)


   Brief: The Tour de France is set to return to Great Britain in 2027. Britain last hosted the start of the world's most famous cycling race in Yorkshire in 2…


Your max_length is set to 50, but your input_length is only 23. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=11)


   Detailed: The Tour de France is set to return to Great Britain in 2027. Britain last hosted the start of the world's most famous cycling race in Yorkshire in 2…
--------------------------------------------------------------------------------
4. Geraint Thomas: Over the Finish Line


Your max_length is set to 150, but your input_length is only 23. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=11)


   Brief: Geraint Thomas talks about his decision to retire from cycling. The Welshman has won three Tour de France titles and a World Cup. Thomas also won the Tour in 2012 and 2013.
   Detailed: Geraint Thomas talks about his decision to retire from cycling. The Welshman has won three Tour de France titles and a World Cup. Thomas also won the Tour in 2012 and 2013.
--------------------------------------------------------------------------------
5. 'More a procession than a title race' - when could Liverpool seal title?


Your max_length is set to 150, but your input_length is only 52. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=26)


   Brief: Liverpool have increased their lead to 13 points. No team in English top-flight history, going all the way back to 1888-89, have been…
   Detailed: Liverpool have increased their lead to 13 points. No team in English top-flight history, going all the way back to 1888-89, have been…
--------------------------------------------------------------------------------

1. Search News
2. Save Topic
3. View Saved Topics
4. View Search History
5. Exit
Choose an option: 4
Search History: ['sport']

1. Search News
2. Save Topic
3. View Saved Topics
4. View Search History
5. Exit
Choose an option: 2
Enter topic to save: save file
Saved topic: save file

1. Search News
2. Save Topic
3. View Saved Topics
4. View Search History
5. Exit
Choose an option: 3
Saved Topics: ['save file']

1. Search News
2. Save Topic
3. View Saved Topics
4. View Search History
5. Exit
Choose an option: 5
