# Take-Home Project: Agent Task Experimentation

## 🎯 Goal

- Choose a task involving **multi-turn interaction/tool use**
- Implement an **agent scaffold** (using a raw API or one of the provided frameworks)
- Create a **small set of test prompts**
- Define a **reward function** to evaluate your agent's performance
- **Test** your agent with **multiple models/prompts**
- Analyze agent outputs, identify a consistent **problem**, and either:
  - Adjust the **setup** (prompts/tools), or
  - Adjust your **evaluations** to capture or address the issue

---

## 🧠 Ideas for Agent Tasks

- A **search agent** for your favorite blog or website
- An agent that fills in the rest of the sentence (e.g., “Agent which…”)
- An agent to play a **simple board or card game**
- A **code agent** that only uses a specific library (e.g., iterates on a matplotlib plot until it “looks right”)
- A **terminal-based chat agent** that can hand off to a human or confirm actions

---

## 🏆 Ideas for Reward Functions

- **Format checks** using regex
- **Deterministic checks** (e.g., parsing math answers, running code, solving puzzles)
- **Embedding or text similarity** compared to a “ground truth”
- **LLM judges** that:
  - Have access to the ground truth
  - Evaluate a set of fuzzy criteria and assign scores

---

## ✅ Tips

- **Start simple**: Get a minimal version working before adding complexity
- If your agent "just works" with a strong model, try it with a **weaker model** and see where it breaks

---

## 🌟 Bonus Goals

- Make a **parallel-friendly** version using `asyncio` and error handling
- Implement **Best-of-N selection** and test if your reward function aligns with your judgment
- Try a **multi-agent setup** or a **Client/Server setup** (e.g., MCP, A2A)

---

## 📦 Deliverables

Submit **a repo**, **notebook**, or **short writeup** with the following:

- A description of your **task and setup**
- What **approaches** you tried
- Any **roadblocks** you ran into
- Which **evaluation methods** worked best
- What’s the **smallest model** that still performed decently well


In [40]:
from dotenv import load_dotenv

load_dotenv()

import weave

weave.init("agents-engineering-course")

print("All Good!")

All Good!


In [12]:
# Basic search tool
 # To install: pip install tavily-python
from tavily import TavilyClient
import requests
import os
from markdownify import markdownify


TAVILY_API_KEY = os.getenv("TAVILY_API_KEY");

def search(query: str) -> str:
    """Searches the web and returns summaries of top results.
    
    Args:
        query: The search query string

    Returns:
        Mardown file of the closest search result for the query

    Examples:
        {"query": "Rava Pongal Recipe"} -> ""
        {"query": "Thattu Idli Recipe"} -> ""
        {"query": "Cheese Garlic Bread Recipe"} -> ""
    """

    try:
       
        client = TavilyClient(TAVILY_API_KEY)
        response = client.search(
            query = query
        )
        
        # example: https://hebbarskitchen.com/rava-pongal-recipe-semolina-pongal/
        top_url = response["results"][0]["url"]


        response = requests.get(top_url)

        if response.status_code == 200:

            response = requests.get(top_url)

            response.raise_for_status()

            return markdownify(response.text) 

        else:
            print(f"Failed to fetch URL. Status code: {response.status_code}")
            raise Exception

    except Exception as e: 
        return f"Error: {str(e)}"


results = search("Hebbars Kitchen Rava Pongal")

print(results)

Rava Pongal Recipe | Sooji Ka Pongal



[Facebook](https://www.facebook.com/HebbarsKitchen/ "Facebook")

[Instagram](https://www.instagram.com/hebbars.kitchen/ "Instagram")

[Mail](mailto:hebbars.kitchen@gmail.com "Mail")

[Pinterest](https://www.pinterest.com/hebbarskitchen/ "Pinterest")

[RSS](https://hebbarskitchen.com/feed/ "RSS")

[Twitter](https://twitter.com/HebbarsKitchen "Twitter")

[Youtube](https://www.youtube.com/channel/UCPPIsrNlEkaFQBk-4uNkOaw "Youtube")



* [Home](https://hebbarskitchen.com/)
* [Breakfast](https://hebbarskitchen.com/recipes/breakfast-recipes/)
  + [Dosa](https://hebbarskitchen.com/recipes/south-indian-dosa-recipes/)
  + [Idli](https://hebbarskitchen.com/recipes/south-indian-idli-recipes/)
  + [Chutney](https://hebbarskitchen.com/recipes/indian-chutney-recipes/)
  + [Sandwich](https://hebbarskitchen.com/recipes/sandwich-recipes/)
* [Rice](https://hebbarskitchen.com/recipes/indian-rice-recipes/)
  + [Biryani](https://hebbarskitchen.com/recipes/biryani/)
 

In [49]:
def fetch(url: str) -> str:
    """ 
    Add Doc Strings
    """

    try:
        response = requests.get(url)

        if response.status_code == 200:

            response = requests.get(url)

            response.raise_for_status()

            return markdownify(response.text) 
    
    except Exception as e:
        return f"Error: {str(e)}"


print(fetch("https://hebbarskitchen.com/mango-sago-recipe-mango-tapioca-recipe/"))

Mango Sago Recipe | Mango Tapioca | Mango Sabudana Dessert



[Facebook](https://www.facebook.com/HebbarsKitchen/ "Facebook")

[Instagram](https://www.instagram.com/hebbars.kitchen/ "Instagram")

[Mail](mailto:hebbars.kitchen@gmail.com "Mail")

[Pinterest](https://www.pinterest.com/hebbarskitchen/ "Pinterest")

[RSS](https://hebbarskitchen.com/feed/ "RSS")

[Twitter](https://twitter.com/HebbarsKitchen "Twitter")

[Youtube](https://www.youtube.com/channel/UCPPIsrNlEkaFQBk-4uNkOaw "Youtube")



* [Home](https://hebbarskitchen.com/)
* [Breakfast](https://hebbarskitchen.com/recipes/breakfast-recipes/)
  + [Dosa](https://hebbarskitchen.com/recipes/south-indian-dosa-recipes/)
  + [Idli](https://hebbarskitchen.com/recipes/south-indian-idli-recipes/)
  + [Chutney](https://hebbarskitchen.com/recipes/indian-chutney-recipes/)
  + [Sandwich](https://hebbarskitchen.com/recipes/sandwich-recipes/)
* [Rice](https://hebbarskitchen.com/recipes/indian-rice-recipes/)
  + [Biryani](https://hebbarskitchen.c

In [23]:
# Parsers
import re
import json

def parse_thinking_from_response(response: str) -> str | None:
    """Parse a thinking from a response."""
    # re.DOTALL is used to all \n inside the think tags
    # ? matches lazily meaning we pick the content inside the first <think> </think>
    thinking = re.search(r'<think>(.*?)</think>', response, re.DOTALL)
    if thinking:
        return thinking.group(1)
    return None

def parse_tool_from_response(response: str) -> dict | None:
    """Parse a tool from a response."""
    tool_call = re.search(r'<tool>(.*?)</tool>', response, re.DOTALL)
    if tool_call:
        return json.loads(tool_call.group(1))
    return None


def parse_answer_from_response(response: str) -> str | None:
    """Parse an answer from a response."""
    answer = re.search(r'<answer>(.*?)</answer>', response, re.DOTALL)
    if answer:
        return answer.group(1)
    return None

In [22]:
async def call_tool(tool_call: dict) -> str:
    """Call a tool with the given tool call."""

    if tool_call['name'] == 'fetch':
        return fetch(tool_call['args']['url'])

    else:
        return f"Error: Tool {tool_call['name']} not found"

In [50]:
from openai import AsyncOpenAI
import asyncio
import nest_asyncio
nest_asyncio.apply() #

@weave.op
async def agent_loop(question: str) -> str:

    # oai = AsyncOpenAI()
    # model_name = "gpt-4.1-nano"

    oai = AsyncOpenAI(base_url=os.getenv("DEEPINFRA_API_LINK"), api_key=os.getenv("DEEPINFRA_API_KEY"))
    model_name = "deepseek-ai/DeepSeek-V3-0324"

    SYSTEM_PROMPT = """
    You are assistant who needs to answers queries related to Hebbars Kitchen. 
    * The main url is https://hebbarskitchen.com/recipes/dessert-recipes of the website. You can fetch this and navigate through the urls of this page as you go. 
    * The website totally has 15 pages. If you want to navigate to the next page the website, the url format is like this:
        ** https://hebbarskitchen.com/recipes/dessert-recipes/ + page/page_number
    * If you can't find the relevant information in the fetch results, please search the next page using the above format.
    * Always site the final URL of the website you are using to answer any queries. 
    * You cannot call the same URL again and again. 

    You have access to the following tools:
    - fetch(url: str) -> str: Markdown version of the url

    For Example, you can call the tool like this
    <tool>
    {
        "name": "fetch",
        "args": {
            "url": "The URL you need"
        }
    }
    </tool>
    You may call one tool per turn, for up to 5 turns, before giving your final answer.

    In each turn, you should respond in the following format:

    <think>
    [your thoughts here]
    </think>
    <tool>
    JSON with the following fields:
    - name: The name of the tool to call
    - args: A dictionary of arguments to pass to the tool (must be valid JSON)
    </tool>

    When you are done, give your final answer in the following format:

    <answer>
    [your final answer here]
    </answer>
    """

    messages = [{"role": "system", "content": SYSTEM_PROMPT}, 
                {"role": "user", "content": question}]
    turns = 0
    errors = 0

    while True:
        turns += 1
        retries = 0 
        while retries < 5:
            try:
                response = await oai.chat.completions.create(
                    model=model_name,
                    messages=messages, # type: ignore
                )
                response = response.choices[0].message.content # type: ignore
                # parse for thinking, tool call, and/or answer 
                maybe_thinking = parse_thinking_from_response(response) # type: ignore
                maybe_tool_call = parse_tool_from_response(response) # type: ignore
                maybe_answer = parse_answer_from_response(response) # type: ignore
                if maybe_tool_call or maybe_answer:
                    break

            except Exception as e:
                print(f"Error: {e}")
                retries += 1
            
        print("=== Turn", turns, "===")

        if maybe_thinking: # type: ignore
            thinking = maybe_thinking.strip()
            print(f"Thinking: {thinking}")
            messages.append({"role": "user", "content": maybe_thinking})

        if maybe_tool_call: # type: ignore
            tool_call = maybe_tool_call
            try:
                tool_result = await call_tool(tool_call)
                print(f"Tool call: {tool_call}")
                print(f"Tool result: {tool_result[:100]}")
                messages.append({"role": "user", "content": f"Using Tool Call: {tool_call}"})
                messages.append({"role": "user", "content": tool_result})

            except Exception as e:
                if errors == 5:
                    break

                errors += 1

        elif maybe_answer: # type: ignore
            final_answer = maybe_answer
            return final_answer

        else:
            raise ValueError

        if turns == 5:
            return "Max Turns Reached"

async def main():
    question = "List all the step-by-step instructions for Mango Sago"
    result = await asyncio.gather(agent_loop(question))

    return result

result = asyncio.run(main())

print(result)

=== Turn 1 ===
Thinking: To find the step-by-step instructions for Mango Sago, I need to first locate the recipe on Hebbar's Kitchen. Since the main URL for dessert recipes is provided, I will start by fetching the first page of dessert recipes to see if Mango Sago is listed there. If not, I will proceed to the next pages until I find it.
Tool call: {'name': 'fetch', 'args': {'url': 'https://hebbarskitchen.com/recipes/dessert-recipes/'}}
Tool result: dessert recipes | indian sweet & dessert recipes | easy dessert & pudding



[Facebook](https://www.


KeyboardInterrupt: 