<a href="https://colab.research.google.com/github/Ahmed230460/News-Summarization-and-Search-Application/blob/main/full_code.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [54]:
!pip install langchain_groq chromadb newsapi-python



In [55]:
!pip install langchain_groq --upgrade



In [12]:
!pip install langchain_groq langchain_community --upgrade

Collecting langchain_community
  Downloading langchain_community-0.3.19-py3-none-any.whl.metadata (2.4 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.8.1-py3-none-any.whl.metadata (3.5 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading mypy_extensions-1.0.0-py3-no

In [5]:
!pip install newsapi-python



In [9]:
!pip install groq chromadb



In [11]:
!pip install sentence-transformers



In [19]:
!pip install langchain requests




In [17]:
!pip install groq langchain_groq chromadb newsapi-python langchain_community sentence-transformers --upgrade

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=1.11.0->sentence-transformers)
 

In [41]:
import os
import groq
import chromadb
from sentence_transformers import SentenceTransformer

# Load API Key
GROQ_API_KEY = os.getenv("GROQ_API_KEY", "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")

class EmbeddingEngine:
    def __init__(self):
        # Initialize Groq client
        self.client = groq.Client(api_key=GROQ_API_KEY)
        # Initialize Sentence-Transformers model for embeddings
        self.embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
        # Initialize ChromaDB
        self.chroma_client = chromadb.PersistentClient(path="./vector_db")
        self.collection = self.chroma_client.get_or_create_collection(name="news_embeddings")

    def generate_summary(self, text):
        """Use Groq LLM to generate a structured summary of the text."""
        try:
            response = self.client.chat.completions.create(
                model="mixtral-8x7b-32768",
                messages=[{"role": "system", "content": "Summarize this text in a structured format."},
                          {"role": "user", "content": text}],
                temperature=0.5
            )
            return response.choices[0].message.content.strip()
        except Exception as e:
            print(f"Error generating summary with Groq: {e}")
            return text  # Fallback to original text

    def generate_embedding(self, text):
        """Generate embeddings from structured text summary."""
        summary = self.generate_summary(text)  # Get structured summary from Groq
        embedding = self.embedding_model.encode(summary).tolist()  # Convert to embedding
        return embedding

    def store_embedding(self, article):
        """Store an article's embedding in ChromaDB."""
        embedding = self.generate_embedding(article["content"])
        if embedding:
            self.collection.add(
                ids=[article["url"]],
                embeddings=[embedding],
                metadatas=[{
                    "title": article["title"],
                    "description": article["description"],
                    "url": article["url"]
                }]
            )
            print(f" Stored: {article['title']}")

    def retrieve_similar(self, query, top_k=3):
        """Retrieve similar articles based on query embedding."""
        query_embedding = self.generate_embedding(query)
        if query_embedding:
            results = self.collection.query(
                query_embeddings=[query_embedding],
                n_results=top_k
            )
            return results.get("metadatas", [[]])[0]
        return []

In [53]:
import requests
import os
class NewsRetriever:
    def __init__(self, api_key):
        self.api_key = api_key
        self.base_url = "https://newsapi.org/v2/everything"

    def fetch_news(self, query, max_results=5):
        """Fetches news articles related to a query from NewsAPI."""
        params = {
            "q": query,
            "apiKey": self.api_key,
            "language": "en",
            "pageSize": max_results
        }
        response = requests.get(self.base_url, params=params)

        if response.status_code == 200:
            articles = response.json().get("articles", [])
            return [
                {
                    "title": article["title"],
                    "description": article["description"],
                    "content": article["content"] or article["description"],
                    "url": article["url"]
                }
                for article in articles
                if article["content"]  # Filter out articles with no content
            ]
        else:
            print(f"Error fetching news: {response.status_code}, {response.text}")
            return []

# Test the module
if __name__ == "__main__":
    retriever = NewsRetriever("134f287e31a54aac9d3de1790b50f2da")
    topic = input("Enter a topic: ")
    articles = retriever.fetch_news(topic)
    print(json.dumps(articles, indent=2))


Enter a topic: AI
[
  {
    "title": "T-Mobile\u2019s parent company is making an \u2018AI Phone\u2019 with Perplexity Assistant",
    "description": "Deutsche Telekom is building a new Perplexity chatbot-powered \u201cAI Phone,\u201d the companies announced at Mobile World Congress (MWC) in Barcelona today. The new device will be revealed later this year and run \u201cMagenta AI,\u201d which gives users access to Perplexity A\u2026",
    "content": "The Magenta AI push will also offer Perplexity and other AI apps for existing smartphones on T-Mobile.\r\nThe Magenta AI push will also offer Perplexity and other AI apps for existing smartphones on T-\u2026 [+1527 chars]",
    "url": "https://www.theverge.com/news/623164/t-mobile-ai-phone-perplexity-assistant-mwc-2025"
  },
  {
    "title": "Anthropic\u2019s plan to win the AI race",
    "description": "Anthropic is one of the world\u00e2\u0080\u0099s leading AI model providers, especially in areas like coding. But its AI assistant, Claud

In [44]:
import json
import os

class UserPreferences:
    def __init__(self, file_path="user_preferences.json", max_history=10):
        self.file_path = file_path
        self.max_history = max_history
        self.data = {"favorite_topics": [], "search_history": []}
        self._load_preferences()

    def _load_preferences(self):
        """Load user preferences from JSON file."""
        if os.path.exists(self.file_path):
            with open(self.file_path, "r") as file:
                self.data = json.load(file)

    def save_preferences(self):
        """Save user preferences to JSON file."""
        with open(self.file_path, "w") as file:
            json.dump(self.data, file, indent=4)

    def add_favorite_topic(self, topic):
        """Add a topic to the favorites list if not already added."""
        if topic not in self.data["favorite_topics"]:
            self.data["favorite_topics"].append(topic)
            self.save_preferences()

    def add_search_history(self, topic, articles):
        """Store search history with summaries (limit to last 10 searches)."""
        new_entry = {
            "topic": topic,
            "articles": articles
        }

        # Remove duplicates before adding new entry
        self.data["search_history"] = [
            entry for entry in self.data["search_history"] if entry["topic"] != topic
        ]

        # Insert new search at the beginning
        self.data["search_history"].insert(0, new_entry)

        # Keep only the last `max_history` searches
        self.data["search_history"] = self.data["search_history"][:self.max_history]

        self.save_preferences()

    def get_favorite_topics(self):
        """Retrieve favorite topics."""
        return self.data["favorite_topics"]

    def get_search_history(self):
        """Retrieve search history."""
        return self.data["search_history"]

    def clear_search_history(self):
        """Clear search history."""
        self.data["search_history"] = []
        self.save_preferences()

In [34]:
import requests
import os

class Summarizer:
    def __init__(self, api_key):
        self.api_key = api_key
        self.api_url = "https://api.groq.com/openai/v1/chat/completions"  # Correct API URL

    def summarize(self, text, summary_type="brief"):
        """Generates a summary using Groq API."""
        prompt = f"Summarize the following article. Make it {summary_type}.\n\n{text}"

        payload = {
            "model": "llama3-8b-8192",  # Use available model
            "messages": [{"role": "user", "content": prompt}],
            "temperature": 0.3,
            "max_tokens": 150 if summary_type == "brief" else 300,
        }

        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json",
        }

        try:
            response = requests.post(self.api_url, json=payload, headers=headers)
            response_json = response.json()

            # Debugging: Print full response if 'choices' is missing
            if "choices" not in response_json:
                print("API Response Error:", response_json)
                return None

            # Extract response text
            return response_json["choices"][0]["message"]["content"].strip()
        except Exception as e:
            print(f"Error generating summary: {e}")
            return None


In [47]:
if __name__ == "__main__":
    API_KEY = "134f287e31a54aac9d3de1790b50f2da"
    GROQ_API_KEY = "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"

    news_retriever = NewsRetriever(API_KEY)
    summarizer = Summarizer(GROQ_API_KEY)
    engine = EmbeddingEngine()
    user_prefs = UserPreferences()

    while True:
        print("\nMENU")
        print("1 - Search for news")
        print("2 - View search history")
        print("3 - View favorite topics")
        print("4 - Clear search history")
        print("5 - Exit")
        choice = input("Select an option: ").strip()

        if choice == "1":
            topic = input("Enter a topic: ").strip()
            num_articles = input("How many articles do you want to retrieve? (Default 3): ").strip()
            num_articles = int(num_articles) if num_articles.isdigit() else 3

            print("\nFetching news articles... Please wait.")

            articles = news_retriever.fetch_news(topic, max_results=num_articles)
            if not articles:
                print("No articles found for this topic.")
                continue

            stored_articles = []
            print("\nSearch Results:")
            for idx, article in enumerate(articles, 1):
                engine.store_embedding(article)
                brief_summary = summarizer.summarize(article["content"], "brief")
                detailed_summary = summarizer.summarize(article["content"], "detailed")

                print("\n" + "=" * 50)
                print(f"Article {idx}: {article['title']}")
                print("-" * 50)
                print(f"Source: {article.get('source', 'Unknown')}")
                print(f"Published Date: {article.get('published_date', 'Unknown')}")
                print(f"Brief Summary: {brief_summary}")
                print(f"Detailed Summary:\n{detailed_summary[:400]}...")  # Truncate for readability
                print("=" * 50)

                # Store for history
                stored_articles.append({
                    "title": article["title"],
                    "brief_summary": brief_summary,
                    "detailed_summary": detailed_summary
                })

            user_prefs.add_search_history(topic, stored_articles)

            fav = input("\nDo you want to add this topic to favorites? (yes/no): ").strip().lower()
            if fav == "yes":
                user_prefs.add_favorite_topic(topic)

        elif choice == "2":
            history = user_prefs.get_search_history()
            if history:
                print("\nSearch History:")
                for idx, entry in enumerate(history, 1):
                    print("\n" + "=" * 50)
                    print(f"Search {idx}: {entry['topic']}")
                    for i, article in enumerate(entry["articles"], 1):
                        print(f"\nArticle {i}: {article['title']}")
                        print(f"Brief Summary: {article['brief_summary']}")
                        print(f"Detailed Summary: {article['detailed_summary'][:200]}...")  # Truncated
                    print("=" * 50)
            else:
                print("\nNo search history found.")

        elif choice == "3":
            favorites = user_prefs.get_favorite_topics()
            print("\nFavorite Topics:")
            print(favorites if favorites else "No favorite topics yet.")

        elif choice == "4":
            confirm = input("Are you sure you want to clear search history? (yes/no): ").strip().lower()
            if confirm == "yes":
                user_prefs.clear_search_history()
                print("Search history cleared.")

        elif choice == "5":
            print("Exiting...")
            break

        else:
            print("Invalid choice! Please select again.")


MENU
1 - Search for news
2 - View search history
3 - View favorite topics
4 - Clear search history
5 - Exit
Select an option: 4
Are you sure you want to clear search history? (yes/no): yes
Search history cleared.

MENU
1 - Search for news
2 - View search history
3 - View favorite topics
4 - Clear search history
5 - Exit
Select an option: 2

No search history found.

MENU
1 - Search for news
2 - View search history
3 - View favorite topics
4 - Clear search history
5 - Exit
Select an option: 1
Enter a topic: robotics
How many articles do you want to retrieve? (Default 3): 2

Fetching news articles... Please wait.

Search Results:




 Stored: Google’s Gemini Robotics AI Model Reaches Into the Physical World

Article 1: Google’s Gemini Robotics AI Model Reaches Into the Physical World
--------------------------------------------------
Source: Unknown
Published Date: Unknown
Brief Summary: Google is collaborating with robotics companies Agility Robotics, Boston Dynamics, and Enchanted Tools to develop legged robots.
Detailed Summary:
The article reports that Google is currently collaborating with several robotics companies to develop advanced robotic technology. The companies mentioned include Agility Robotics, Boston Dynamics, and Enchanted Tools.

Agility Robotics is a company that specializes in designing and building legged robots, which are robots that use legs to move around and navigate their environment. Boston Dynamic...




 Stored: Google DeepMind’s new AI models help robots perform physical tasks, even without training

Article 2: Google DeepMind’s new AI models help robots perform physical tasks, even without training
--------------------------------------------------
Source: Unknown
Published Date: Unknown
Brief Summary: Gemini Robotics enhances robots' dexterity, enabling them to perform precise tasks such as folding paper.
Detailed Summary:
The article discusses Gemini Robotics, a company that specializes in developing robots that are more dexterous and capable of performing precise tasks. One example of this is the ability to fold a piece of paper, which requires a high level of precision and dexterity.

Gemini Robotics' technology allows robots to perform tasks that were previously thought to be the exclusive domain of humans, such...

Do you want to add this topic to favorites? (yes/no): yes

MENU
1 - Search for news
2 - View search history
3 - View favorite topics
4 - Clear search history
5 - Ex



 Stored: T-Mobile’s parent company is making an ‘AI Phone’ with Perplexity Assistant

Article 1: T-Mobile’s parent company is making an ‘AI Phone’ with Perplexity Assistant
--------------------------------------------------
Source: Unknown
Published Date: Unknown
Brief Summary: T-Mobile will offer Magenta AI, Perplexity, and other AI apps for existing smartphones, as part of its push to integrate AI technology into its services.
Detailed Summary:
The article does not provide a lot of information, but it can be summarized as follows:

T-Mobile is planning to launch a new initiative called Magenta AI, which will offer AI-powered features and apps to its customers. As part of this initiative, T-Mobile will provide Perplexity and other AI apps to existing smartphone users on its network. This means that customers who already have a smartphone ...

Do you want to add this topic to favorites? (yes/no): yes

MENU
1 - Search for news
2 - View search history
3 - View favorite topics
4 - Clear s

In [None]:
4