# News Retreival RAG Agents
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MervinPraison/PraisonAI/blob/main/cookbooks/notebooks/news_retreival_rag_agents.ipynb)

## Dependencies

In [11]:
# Install dependencies without output
%pip install langchain_community > /dev/null
%pip install praisonai[crewai] > /dev/null
%pip install duckduckgo_search > /dev/null
%pip install requests > /dev/null

## Tools

In [12]:
from google.colab import userdata

import requests
import os
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from praisonai_tools import BaseTool
from duckduckgo_search import DDGS


class InternetSearchTool(BaseTool):
    name: str = "InternetSearchTool"
    description: str = "Search Internet for relevant information based on a query or latest news"

    def _run(self, query: str):
        ddgs = DDGS()
        results = ddgs.text(keywords=query, region='wt-wt', safesearch='moderate', max_results=5)
        return results


class NewsDBTool(BaseTool):
    name: str = "NewsDBTool"
    description: str = "Fetches and stores news articles based on a specified query in a local database"

    def _run(self, query: str):
        NEWSAPI_KEY = userdata.get('NEWSAPI_KEY') or "ENTER NEWSAPI_KEY HERE"
        base_url = "https://newsapi.org/v2/everything"

        params = {
            'q': query,
            'sortBy': 'publishedAt',
            'apiKey': NEWSAPI_KEY,
            'language': 'en',
            'pageSize': 5,
        }

        response = requests.get(base_url, params=params)
        if response.status_code != 200:
            return "Failed to retrieve news."

        articles = response.json().get('articles', [])
        if not articles:
            return "No articles found."

        all_splits = []

        for article in articles:
            loader = WebBaseLoader(article['url'])
            docs = loader.load()

            text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
            splits = text_splitter.split_documents(docs)
            all_splits.extend(splits)

        if all_splits:
            embedding_function = OpenAIEmbeddings()
            vectorstore = Chroma.from_documents(all_splits, embedding=embedding_function, persist_directory="./chroma_db")
            vectorstore.persist()
            return "News articles processed and stored in the database."
        else:
            return "No content available for processing."



class RetrieveNewsTool(BaseTool):
    name: str = "RetrieveNewsTool"
    description: str = "Retrieves news articles from the local database using a search query"

    def _run(self, query: str):
        embedding_function = OpenAIEmbeddings()
        vectorstore = Chroma(persist_directory="./chroma_db", embedding_function=embedding_function)

        retriever = vectorstore.similarity_search(query)

        if retriever:
            return [{"content": result.page_content, "metadata": result.metadata} for result in retriever]
        else:
            return "No relevant articles found in the database."


## YAML Prompt

In [None]:
agent_yaml = """
framework: "crewai"
topic: "News Analysis and Reporting"
roles:
  news_search_agent:
    role: "News Researcher"
    backstory: |
      You are an experienced News Researcher, skilled in finding and summarizing key points from the latest news articles.
    goal: |
      Conduct in-depth searches on the latest news, summarize key points for each article, and make the information available for further review.
    verbose: true
    allow_delegation: true
    tools:
      - "NewsDBTool"
      - "RetrieveNewsTool"
    tasks:
      search_and_summarize_news_task:
        description: |
          Search for the latest news articles based on a specified topic, process them, and generate key points for each article.
        expected_output: |
          A list of key points from recent articles related to the specified topic, ready for further summarization.
  writer_agent:
    role: "News Content Writer"
    backstory: |
      As a News Content Writer, you specialize in transforming raw news data into detailed summaries, adding depth and clarity.
    goal: |
      Review the collected news data, verify its relevance, search for additional context, and create comprehensive summaries for each topic.
    verbose: true
    allow_delegation: true
    tools:
      - "RetrieveNewsTool"
      - "InternetSearchTool"
    tasks:
      analyze_and_write_news_task:
        description: |
          Identify each topic, verify relevance using the retrieved data, then conduct additional web searches to add depth. Summarize each topic in detail.
        expected_output: |
          An in-depth summary of each identified news topic, incorporating all relevant information retrieved from the database and online searches.
dependencies: []
"""

## Main

In [13]:
import os
from praisonai import PraisonAI
from google.colab import userdata

# Create a PraisonAI instance with the agent_yaml content
praisonai = PraisonAI(agent_yaml=agent_yaml, tools=[NewsDBTool, RetrieveNewsTool, InternetSearchTool])

# Add OPENAI_API_KEY Secrets to Google Colab on the Left Hand Side 🔑 or Enter Manually Below
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY') or "ENTER OPENAI_API_KEY HERE"
os.environ["OPENAI_MODEL_NAME"] = "gpt-5-nano"

# Run PraisonAI
result = praisonai.run()

# Print the result
print(result) # 10/10


[1m[95m [2024-11-02 17:51:51][DEBUG]: == Working Agent: News Researcher[00m
[1m[95m [2024-11-02 17:51:51][INFO]: == Starting Task: Search for the latest news articles based on a specified topic, process them, and generate key points for each article.
[00m


[1m> Entering new CrewAgentExecutor chain...[0m
[32;1m[1;3mI need to conduct a search for the latest news articles on a specified topic to gather key points for further summarization. Since the topic is not specified yet, I will first define a general topic to search for. 

For this exercise, I'll choose the topic "climate change" as it is currently a relevant and pressing issue. 

Action: NewsDBTool  
Action Input: {"query": "climate change"}  [0m

Returning anything other than `self` from a top level model validator isn't supported when validating via `__init__`.
See the `model_validator` docs (https://docs.pydantic.dev/latest/concepts/validators/#model-validators) for more details.
  warn_deprecated(


[95m 

News articles processed and stored in the database.
[00m
[32;1m[1;3mThought: I have successfully stored news articles related to "climate change" in the database. Now, I will retrieve these articles to summarize the key points from each one. 

Action: RetrieveNewsTool  
Action Input: {"query": "climate change"}  [0m

Returning anything other than `self` from a top level model validator isn't supported when validating via `__init__`.
See the `model_validator` docs (https://docs.pydantic.dev/latest/concepts/validators/#model-validators) for more details.


[95m 

[{'content': 'I’m hearing so many stories like that from the survivors of the climate-intensified hurricanes that trashed western North Carolina, coastal Florida, and other parts of the Southeastern USA. From the victims of a climate-intensified catastrophe that has wrecked whole towns and torn out roads, flattened forests, washed away homes and put parts of Asheville underwater. I don’t want any more disasters like that, and I’m a climate activist to try to keep nature from getting more violent and destructive, which it will if we keep being violent and destructive toward the climate. But I do want us to know who we are, and how hungry we are for meaning, purpose, and connection, and sometimes disaster lets us see that.', 'metadata': {'language': 'en-US', 'source': 'https://lithub.com/rebecca-solnit-we-the-people-can-make-a-better-future-for-all/', 'title': 'Rebecca Solnit: We, the People, Can Make a Better Future ‹ Literary  Hub'}}, {'content': 'When it comes to the climate w

Returning anything other than `self` from a top level model validator isn't supported when validating via `__init__`.
See the `model_validator` docs (https://docs.pydantic.dev/latest/concepts/validators/#model-validators) for more details.


[95m 

[{'content': 'I’m hearing so many stories like that from the survivors of the climate-intensified hurricanes that trashed western North Carolina, coastal Florida, and other parts of the Southeastern USA. From the victims of a climate-intensified catastrophe that has wrecked whole towns and torn out roads, flattened forests, washed away homes and put parts of Asheville underwater. I don’t want any more disasters like that, and I’m a climate activist to try to keep nature from getting more violent and destructive, which it will if we keep being violent and destructive toward the climate. But I do want us to know who we are, and how hungry we are for meaning, purpose, and connection, and sometimes disaster lets us see that.', 'metadata': {'language': 'en-US', 'source': 'https://lithub.com/rebecca-solnit-we-the-people-can-make-a-better-future-for-all/', 'title': 'Rebecca Solnit: We, the People, Can Make a Better Future ‹ Literary  Hub'}}, {'content': 'Polls offer the false promise 

Returning anything other than `self` from a top level model validator isn't supported when validating via `__init__`.
See the `model_validator` docs (https://docs.pydantic.dev/latest/concepts/validators/#model-validators) for more details.


[95m 

[{'title': "Thousands march in New York to demand that Biden 'end fossil fuels' - NPR", 'href': 'https://www.npr.org/2023/09/17/1200027693/thousands-expected-to-march-in-new-york-to-demand-that-biden-end-fossil-fuels', 'body': 'Thousands of activists, indigenous groups, students and others take to the streets of New York for the March to End Fossil Fuels protest on Sunday, Sept. 17, 2023 in New York. Spencer Platt/Getty ...'}, {'title': 'Judge sides with young activists in first-of-its-kind climate change ...', 'href': 'https://www.npr.org/2023/08/14/1193780700/montana-climate-change-trial-ruling', 'body': 'In a landmark climate change trial, a Montana judge sides with youth activists The ruling follows a first-of-its-kind trial in the U.S., adding to a small number of legal decisions around the ...'}, {'title': "The Most Important Climate Stories of 2023 Aren't All Bad News", 'href': 'https://www.scientificamerican.com/article/the-most-important-climate-stories-of-2023-arent-a

Returning anything other than `self` from a top level model validator isn't supported when validating via `__init__`.
See the `model_validator` docs (https://docs.pydantic.dev/latest/concepts/validators/#model-validators) for more details.


[95m 

[{'title': 'How Global Companies Use AI to Prevent Supply Chain Disruptions', 'href': 'https://hbr.org/2023/11/how-global-companies-use-ai-to-prevent-supply-chain-disruptions', 'body': 'Our recent research at Walmart, Tyson Foods, Koch Industries, Maersk, Siemens, and Unilever revealed how these global companies are using advanced AI technologies to plan for and adapt to supply ...'}, {'title': 'AI in Logistics: A Sea of Potential Amidst Waves of Change', 'href': 'https://www.freightos.com/ai-in-logistics-a-sea-of-potential-amidst-waves-of-change/', 'body': 'A significant share (84%) believe AI might trim the workforce. Specifically, 53% anticipate a limited reduction, while 31% foresee substantial cuts. Enterprises, more than SMBs, sense this looming change, with 38% vs. 26% bracing for significant reductions. Notably, a handful of SMBs (7%) harbor hopes of AI birthing new roles, a sentiment ...'}, {'title': 'Logistics Trends 2023/2024: Which Direction for AI?', 'href': 'https

None
