<a href="https://colab.research.google.com/github/AndyJihang/Building-Code-Agents-with-Hugging-Face-smolagents/blob/main/Build_a_deep_research_agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Deep Research Agent (Smolagents)

This Colab builds a multi-step web research agent with Smolagents that answers technical questions and returns a concise synthesis with citations.

## What it does?
 - Wraps an LLM (OpenAI via OpenAIServerModel, or HF) as the agent’s brain.
 - Adds two web tools: web_search (discover sources) and visit_webpage (fetch + convert HTML→Markdown, trim, handle errors/timeouts).
 - Orchestrates a loop: plan → search → fetch → extract → dedupe → rank → synthesize → cite.
 - Produces a ~N-word answer plus 3+ credible sources.
## Quality & safety
 - URL de-duplication, relevance scoring, and cap on pages per query.
 - Simple allow/deny lists to avoid low-quality or NSFW domains; retries + timeouts.
 - Configurable knobs: MAX_SOURCES, MAX_PAGES_PER_SOURCE, MAX_STEPS, MAX_TOKENS.

In [1]:
# 1) Install
!pip -q install "smolagents[openai,toolkit]" duckduckgo-search markdownify requests

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.3/5.3 MB[0m [31m47.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m72.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m145.4/145.4 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
# 2) Imports & model
import os, json, re, requests
from markdownify import markdownify
from smolagents import ToolCallingAgent, OpenAIServerModel, DuckDuckGoSearchTool, tool
from google.colab import userdata

OPENAI_API_KEY = userdata.get("OPENAI_API_KEY")

model = OpenAIServerModel(
    model_id="gpt-4o-mini",        # or gpt-4o
    api_key=OPENAI_API_KEY,
    temperature=0.2,
)

In [4]:
# 3) Page-reading tool (HTML -> compact Markdown)
@tool
def visit_webpage(url: str) -> str:
    """Visits a webpage and returns cleaned Markdown.
    Args:
        url: page URL to fetch
    """
    try:
        html = requests.get(url, timeout=20).text
        md = markdownify(html).strip()
        # compact newlines and remove super-long lines
        md = re.sub(r"\n{3,}", "\n\n", md)
        return md[:40000]
    except Exception as e:
        return f"[visit_webpage error] {e}"

search = DuckDuckGoSearchTool()  # built-in search tool

# 4) “Deep researcher” instructions (citations + iteration rules)
INSTRUCTIONS = """
You are DeepResearcher.
- Plan -> search -> open pages -> extract key points -> iterate if needed.
- Always provide a concise final answer with a 'SOURCES:' section listing the exact URLs you used.
- Prefer diverse sources. Avoid duplicates (same domain) unless necessary.
- Quote sparingly; summarize in your own words.
- If uncertain, say what’s uncertain and what further search you’d do.
"""

agent = ToolCallingAgent(
    tools=[search, visit_webpage],
    model=model,
    instructions=INSTRUCTIONS,
    max_steps=10,   # raise for deeper dives
)

# 5) Run an example query
q = "Summarize the main differences between FlashAttention and xFormers attention. 150 words. Include 3 sources."
print(agent.run(q))


FlashAttention and xFormers are two advanced attention mechanisms used in transformer models, each with distinct characteristics. FlashAttention is designed for speed and memory efficiency, utilizing an IO-aware approach that minimizes memory reads and writes, resulting in significant performance improvements—up to 70% faster than traditional methods. It is particularly effective for long sequences, allowing for larger context windows without running into memory issues (Dao et al., 2022). In contrast, xFormers focuses on modularity and flexibility, providing a composable architecture for building attention mechanisms. While it also aims to reduce memory usage, it may not achieve the same level of speed as FlashAttention in certain scenarios (Pikoo, 2025). Overall, FlashAttention excels in performance, while xFormers offers greater adaptability for various applications.

**SOURCES:**
1. https://www.libhunt.com/compare-flash-attention-vs-xformers
2. https://arxiv.org/abs/2205.14135
3. ht