# 🎯 News Agent

In this notebook, we build **News Agent** that can search the web for recent news on a given topic, fetches article content, and generates a concise markdown brief with 3 key points and sources.


## ⚙️ Environment Setup

This project was developed with **Python 3.10.9**.  
You can install the required libraries from the included **`requirements.txt`** file.



### Create a Virtual Environment

**MacOS / Linux:**
```bash
python3 -m venv env
source env/bin/activate
```

**Windows (PowerShell or Command Prompt):**

```bash
python -m venv env
.\env\Scripts\activate
```
(If you have multiple Python versions, you may need to provide the full path, for example:

```bash
C:\Users\YourName\AppData\Local\Programs\Python\Python310\python.exe -m venv env)
```

### Install Dependencies
```bash
pip install -r requirements.txt
```

## 🔑 API Keys Required

To run this notebook, you must provide your own **API key** (for example from [Open AI](https://platform.openai.com/api-keys)). 

Alternatively, you can configure it to use another model provider such as [Hugging Face](https://huggingface.co/docs/hub/security-tokens), [Claude](https://www.anthropic.com/api), or [LLaMA](https://www.llama.com/products/llama-api/).

<p style="padding:15px; border-width:3px; border-color:#e0f0e0; border-style:solid; border-radius:6px"> 🚨
&nbsp; <b>Different Run Results:</b> The output generated by AI chat models can vary with each execution due to their dynamic, probabilistic nature. Don't be surprised if your results differ from those shown in the video.</p>


## 🛠️ What We Use in This Project

- **[smolagents](https://huggingface.co/docs/smolagents/index)** → a lightweight framework from Hugging Face for building AI agents.  
- **CodeAgent** → a type of agent in smolagents that can write and execute Python code, and call tools we define.  
- **Setup Tools** → functions wrapped with `@tool` so the agent can:  
  - `news_search`: search DuckDuckGo for fresh articles  
  - `fetch_text`: fetch and clean article text  
  - `summarize_markdown`: generate a presentation-ready summary  
- **Large Language Model (LLM)** → e.g., `gpt-4o-mini` (OpenAI) or a Hugging Face model, used for reasoning and summarization.  

In [None]:
# Imports
import os
import re
import textwrap
import time
from typing import Dict, List

import requests
from bs4 import BeautifulSoup
from ddgs import DDGS
from dotenv import load_dotenv
from smolagents import CodeAgent, OpenAIServerModel, tool

In [None]:
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY") # Get your API key

## 🔧 Setup tools


📝 **Step-by-step breakdown for news_search function:**
- `news_search` function takes a search (query) as string and optional number (number of results, default to 5). Returns a list of dictionaries containing new results
- Create `DDGS` Context Manager, search and iterarte for related news, and ensures the number is between 1 and 10.
- Check if results is a dictionary with required fields (title and URL), and extract title and URL.
- Stops collecting results once we have enough number results
- Return the collected results if any were found, or return error dictionary.

In [None]:
@tool
def news_search(query: str, number: int = 5) -> List[Dict[str, str]]:
    """
    Search the web for recent news with DuckDuckGo.

    Args:
        query (str): The topic to search (e.g., "Agentic AI 2025").
        number (int, optional): Number of news results to return (default=5, max=10).

    Returns:
        List[Dict[str, str]]: A list of results with 'title' and 'url'.
    """
    output = []
    # DuckDuckGo is a privacy-focused search engine and web browser that blocks ad tracking and doesn't store personal data
    # DDGS (DuckDuckGo Search) is a Python library that provides a simple interface to search DuckDuckGo without requiring API keys.
    with DDGS() as ddgs:
        for result in ddgs.news(query, max_results=min(max(number, 1), 10)):
            if isinstance(result, dict) and "title" in result and "url" in result:
                output.append({"title": result["title"], "url": result["url"]})
            if len(output) >= number:
                break
    return output if output else [{"error": "NO_RESULTS"}]

📝 **Step-by-step breakdown for fetch_text:**
- `fetch_text` function takes a url as a string and optional timeout (default 10 seconds). Returns extracted text as a string. 
- Makes HTTP GET request to the URL. 
- Parse HTML with `BeautifulSoup`. Creates a `BeautifulSoup` object to parse the HTML content
- Extract text content
- Removes excessive whitespace and newlines for cleaner text
- Truncate and return

In [None]:
@tool
def fetch_text(url: str, timeout: int = 10) -> str:
    """
    Fetch and clean visible text from a web page.

    Args:
        url (str): The full URL of the page to fetch.
        timeout (int, optional): Timeout in seconds (default=10).

    Returns:
        str: Extracted text (truncated if very long) or an error message.
    """
    try:
        resp = requests.get(     # Sets "User-Agent" header to mimic a browser.
            url, timeout=timeout, headers={"User-Agent": "Mozilla/5.0"}
        )
        # Throw an exception for HTTP error codes (404, 500, etc.)
        resp.raise_for_status()
    except Exception as e:
        return f"ERROR_FETCH: {e}"

    # BeautifulSoup is a Python library that makes it easy to scrape information from web pages. 
    soup = BeautifulSoup(resp.text, "lxml")
    # Try to find "article" tag 
    article = soup.find("article")
    # If found, extract text from the article only
    # If no article tag, extracts text from entire page. 
    text = (
        article.get_text(separator="\n")
        if article
        else soup.get_text(separator="\n")
    )

    text = re.sub(r"\n{2,}", "\n", text) # Replace multiple newlines with single
    text = re.sub(r"\s{2,}", " ", text) # Replace multiple spaces with single
    return text[:8000] # Limits text to 8000 characters to prevent overwhelming the AI model


📝 **Step-by-step breakdown for summarize_markdown:**
- `summarize_markdown` function takes a topic as a string, bullets as a list of strings, and sources as list of dictionaries, and returns a formatted markdown string.
- Process bullet points, limits bullets to first 3 items only otherwise uses placeholder "no bullet"
- Process sources, limits to first 5 items, extract title and url from each source and add them to the ref list. 
- Create markdown template with topic, bullets, and sources.
- Clean and return the formatted markdown string.

In [None]:
@tool
def summarize_markdown(
    topic: str, bullets: List[str], sources: List[Dict[str, str]]
) -> str:
    """
    Format a 1-slide markdown with bullets and sources.

    Args:
        topic (str): The news topic.
        bullets (List[str]): Bullet points (will be trimmed to 3).
        sources (List[Dict[str, str]]): Each dict has 'title' and 'url'.

    Returns:
        str: Markdown string suitable for slides.
    """
    bullets = bullets[:3] if bullets else ["(no bullet)"]
    md_bullets = "\n".join(f"- {bullet}" for bullet in bullets)

    refs = []
    for source in sources[:5]:
        title = source.get("title", "source")
        url = source.get("url", "")
        refs.append(f"- [{title}]({url})" if url else f"- {title}")
    refs_md = "\n".join(refs) if refs else "- (no sources)"

    md = f"""
    # Daily Brief: {topic}

    ## 3 Things That Matter
    {md_bullets}

    ## Sources
    {refs_md}
    """
    return textwrap.dedent(md).strip()

## 🤖 Agents

In [None]:
SYSTEM_INSTRUCTIONS = """
You are NewsAgent. Plan and execute:
1) Call news_search with a concise query; get ~5 items.
2) For the 3 most relevant items, call fetch_text to get article text.
3) From the combined evidence, produce exactly 3 crisp, non-overlapping bullets:
   - each bullet: one concrete development/insight, specific, no duplication
   - avoid speculation; be precise
4) Return ONLY via summarize_markdown(topic, bullets, sources).

Constraints:
- If fetch_text fails for a URL, skip it.
- Never fabricate sources; only use the URLs returned by news_search.
"""

📝 **Step-by-step breakdown for def build_agent():**
- Creates and returns a configured AI agent for news processing.
- `OpenAIServerModel`, creates connection to OpenAI's API
- Create `CodeAgent`, creates an AI agent that can execute code and use tools

In [None]:
def build_agent():
    """Create and configure a CodeAgent with news tools.
    
    Returns:
        agent: CodeAgent 
        
    """
    model = OpenAIServerModel(
            model_id="gpt-4o-mini", # Specifies which GPT model to use
            api_base="https://api.openai.com/v1", # OpenAI's API endpoint URL
            api_key=os.getenv("OPENAI_API_KEY"),
)
    
    agent = CodeAgent(
        model=model,
        tools=[news_search, fetch_text, summarize_markdown], # Provides the agent with our three custom tools:
        add_base_tools=False,  # Disables default tools to keep behavior predictable
        max_steps=5
    )
    return agent

### 🕹️ Main Orchestrator

Main orchestrator function that coordinates the entire news briefing process

In [None]:
def run(topic: str):
    """Generate a news brief for the given topic and print the result.

    Args:
        topic (str): The news topic (e.g., "Agentic AI 2025").
    """
    agent = build_agent()
    today = time.strftime("%Y-%m-%d")
    prompt = f"""{SYSTEM_INSTRUCTIONS}

    Now complete this task:

    Topic: "{topic}"
    Date hint: {today}
    Instructions to follow:
    - Use a short query like: "{topic} {today} latest"
    - After collecting sources and evidence, call summarize_markdown(topic, bullets, sources).
    - Return only the markdown produced by summarize_markdown.
    """
    result = agent.run(prompt)
    print("\n" + "=" * 80 + "\nFINAL MARKDOWN\n" + "=" * 80 + "\n")
    print(result)

In [None]:
topic = (
        input("Topic (e.g., AI policy, Agentic AI, AI Ethics): ").strip()
        or "Agentic AI"
    )
run(topic)