# ðŸ§ª Lab Task: The "Smart Search" Agent

Your task is to build a conversational agent that answers user questions using a single, powerful `web_search` tool.

This tool is "smart": your Python implementation of the function must internally and invisibly call a (mock or real) small LLM to rewrite the query before it ever calls the final search API. The main LLM will be unaware this "rewrite" step is happening.

## Standard Task

### Define the Single Function Tool:

From the LLM's perspective, it only knows about one tool: `web_search(user_query: str) -> List[SearchResult]`

The `user_query` will be the user's raw, conversational text (e.g., "Why is the sky blue?").

### Implement the "Smart" Function:

Your Python code that defines `web_search` must execute this internal sequence:

**Step 1 (Internal):** Take the `user_query` string.

**Step 2 (Internal):** Pass this string to a smaller LLM API (or just a hard-coded prompt) with instructions like, "You are a search query expert. Convert this question into 3-5 optimal keywords."

**Step 3 (Internal):** Receive the `rewritten_query` (e.g., "Rayleigh scattering atmosphere") from that internal call.

**Step 4 (Internal):** Use this `rewritten_query` to call the actual search API (e.g., Google, Bing, or a mock JSON file).

**Step 5 (Return):** Return the final list of `SearchResult` objects to the main LLM.

### Implement the Agent Logic:

- The LLM's main prompt should instruct it to use the `web_search` tool whenever it needs external information.
- The LLM will call the tool with the user's messy query.
- It will receive the clean search results back (unaware of the internal rewrite).
- It must then use these results to synthesize a final, helpful answer.

### Example Flow:

**User:** "What's the best way to get from Warsaw to Berlin?"

**LLM (call 1):** `web_search(user_query="What's the best way to get from Warsaw to Berlin?")`

**Your Code (Internal Step 1-3):** Calls rewriter â†’ gets back "Warsaw to Berlin travel options train bus flight"

**Your Code (Internal Step 4):** Calls search API with the new query â†’ gets back `[{title: "Trains Warsaw-Berlin...", snippet: "...", url: "..."}, ...]`

**Your Code (returns to LLM):** `[{title: "Trains Warsaw-Berlin...", snippet: "...", url: "..."}, ...]`

**LLM (final answer):** "You can travel from Warsaw to Berlin by train, bus, or plane. According to 'Trains Warsaw-Berlin...', the train takes about 6 hours..."


---


## ðŸš€ Advanced Task

### Add a Second Tool for Summarization:

Now, add a new, separate tool that the LLM is aware of: `scrape_and_summarize(urls: List[str]) -> str`

This function (which you'll implement) takes a list of URLs and returns a block of (mocked or real) text summarizing their content.

### Modify the Agent Logic:

- Update the LLM's main prompt. Instruct it to first call `web_search` as before.
- Then, it must analyze the search snippets.
- If the snippets are too short or don't fully answer the question, it should make a second, follow-up call to `scrape_and_summarize` using the top 2-3 URLs from the first call.
- This tests the LLM's ability to chain tools and make decisions based on intermediate results.

In [1]:
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
from dataclasses import dataclass
from openai import OpenAI
from bs4 import BeautifulSoup
import requests
import json
import os

In [3]:
openai_api_key = os.getenv("OPENAI_API_KEY")
client = OpenAI(api_key = openai_api_key)

In [4]:
def small_llm_rewrite(user_query):
    system_prompt = (
        "You are a search query expert. "
        "Read the user question. "
        "Return only 3 to 6 short keywords. "
        "Do not write sentences. "
        "Do not add explanations."
    )

    user_prompt = (
        "User question:\n"
        f"{user_query}\n\n"
        "Return 3 to 6 keywords separated by spaces."
    )

    response = client.chat.completions.create(
        model = "gpt-4.1-mini",
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        max_completion_tokens = 30
    )

    text = response.choices[0].message.content.strip()
    return text

In [5]:
you_api_key = os.getenv("YOU_SEARCH_API_KEY")
you_search_url = "https://api.ydc-index.io/search"

@dataclass
class SearchResult:
    title: str
    snippet: str
    url: str

def call_you_search_api(query):
    headers = {
        "X-API-Key": you_api_key
    }
    params = {
        "query": query,
        "count": 20
    }

    resp = requests.get(you_search_url, headers = headers, params = params, timeout = 10)
    data = resp.json()
    hits = data.get("hits", [])

    results = []

    for item in hits:
        title = item.get("title", "")
        url = item.get("url", "")

        desc = item.get("description", "")
        snippets = item.get("snippets", [])
        snippet_text = " ".join(snippets) if snippets else desc

        results.append(SearchResult(
            title = title,
            snippet = snippet_text,
            url = url
        ))

    return results

In [6]:
def web_search(user_query):
    rewritten_query = small_llm_rewrite(user_query)
    print(f"rewrote query to: {rewritten_query}")
    results = call_you_search_api(rewritten_query)
    return results

In [None]:
def scrape_page_text(url):
    try:
        resp = requests.get(
            url,
            timeout = 8,
            headers = {"User-Agent": "Mozilla/5.0"}
        )

        if resp.status_code != 200:
            print("could not fetch:", url, resp.status_code)
            return ""

        soup = BeautifulSoup(resp.text, "html.parser")

        for tag in soup(["script", "style", "noscript", "header", "footer", "nav", "form"]):
            tag.decompose()

        text = soup.get_text(separator = " ")
        text = " ".join(text.split())

        if not text:
            print("empty content:", url)
            return ""

        print("fetched", url)
        return text[:4000]
    except Exception as e:
        print("error scraping:", url, e)
        return ""

def scrape_and_summarize(urls):
    texts = []
    for url in urls:
        t = scrape_page_text(url)
        if t:
            texts.append(t)

    if not texts:
        return "Could not read the pages."

    joined = "\n\n".join(texts)

    response = client.chat.completions.create(
        model = "gpt-4.1-mini",
        messages = [
            {"role": "system", "content": "Read the content and write a short clear summary."},
            {"role": "user", "content": joined}
        ],
        temperature = 0.3,
        max_tokens = 256
    )

    return response.choices[0].message.content.strip()

In [8]:
tools = [
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "Search the web for information relevant to the user query.",
            "parameters": {
                "type": "object",
                "properties": {
                    "user_query": {
                        "type": "string",
                        "description": "Original user question."
                    }
                },
                "required": ["user_query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "scrape_and_summarize",
            "description": "Fetch the given URLs and return a short summary.",
            "parameters": {
                "type": "object",
                "properties": {
                    "urls": {
                        "type": "array",
                        "items": {"type": "string"},
                        "description": "List of URLs from search results."
                    }
                },
                "required": ["urls"]
            }
        }
    }
]

In [9]:
system_prompt = """
You are SmartSearchAgent.

You can use two tools:
1. web_search(user_query: str)
2. scrape_and_summarize(urls: List[str])

When you need external information:
First call web_search with the user query.
Look at the SearchResult objects.
If snippets are enough, answer based on them.
If snippets are not enough, choose 2 or 3 URLs.
Call scrape_and_summarize with these URLs.
Use the summary and the snippets to answer.
Use simple and clear language.
"""

In [10]:
def call_oai(user_prompt):
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ]

    first = client.chat.completions.create(
        model = "gpt-4.1",
        messages = messages,
        tools = tools,
        tool_choice = "auto" # {"type": "function", "function": {"name": "web_search"}}
    )

    msg = first.choices[0].message

    if msg.tool_calls:
        for tool_call in msg.tool_calls:
            name = tool_call.function.name
            args = json.loads(tool_call.function.arguments)

            if name == "web_search":
                print("using web search tool...")
                search_results = web_search(args["user_query"])
                payload = [res.__dict__ for res in search_results]

                messages.append({
                    "role": "assistant",
                    "tool_calls": msg.tool_calls
                })
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "name": "web_search",
                    "content": json.dumps(payload),
                })

                second = client.chat.completions.create(
                    model = "gpt-4.1",
                    messages = messages,
                    tools = tools,
                    tool_choice = {"type": "function", "function": {"name": "scrape_and_summarize"}} # "auto"
                )

                msg2 = second.choices[0].message

                if msg2.tool_calls:
                    for tc in msg2.tool_calls:
                        if tc.function.name == "scrape_and_summarize":
                            print("\nusing scrape and summarize tool...")
                            args2 = json.loads(tc.function.arguments)
                            summary = scrape_and_summarize(args2["urls"])

                            messages.append({
                                "role": "assistant",
                                "tool_calls": msg2.tool_calls
                            })
                            messages.append({
                                "role": "tool",
                                "tool_call_id": tc.id,
                                "name": "scrape_and_summarize",
                                "content": summary
                            })

                            final = client.chat.completions.create(
                                model = "gpt-4.1",
                                messages = messages,
                            )

                            return final.choices[0].message.content

                return msg2.content

    return msg.content


In [11]:
query = "why is the stock market down in November 2025?"
result = call_oai(query) # should use web_search tool
print('\n==================================================\n')
print(result)

using web search tool...
rewrote query to: stock market November 2025 decline causes factors

using scrape and summarize tool...
fetched https://en.wikipedia.org/wiki/2025_stock_market_crash
fetched https://stockstotrade.com/stock-market-crash/
fetched https://www.cnbc.com/2025/11/03/stock-market-today-live-updates.html


The main reason the stock market was down in November 2025 is because of major policy changes, especially new tariffs introduced by U.S. President Donald Trump in April 2025. These tariffs affected nearly all sectors of the U.S. economy and led to intensified trade wars with China, Canada, and Mexico. The sudden and sweeping nature of these tariffs caused panic selling and sharp declines in global stock markets, especially in technology and AI-focused stocks, which were seen as overpriced and vulnerable.

Other contributing factors included:
- Concerns about overpriced tech stocks (like Nvidia, Apple, and Tesla) that had been leading the market.
- Ongoing volatility a