# Tool Calling Agent with ResponseAgent

This notebook demonstrates building a production-ready tool-calling agent using MLflow ResponseAgent.

## Table of Contents
1. Understanding Tool Calling
2. Building a Tool-Calling Agent
3. Complex Tool Workflow
4. Logging and Deployment
5. Testing with Real Tools

## Setup

In [None]:
import os
import json
from typing import Any, Callable, Generator
from uuid import uuid4
from dotenv import load_dotenv

import mlflow
import openai
import backoff
from pydantic import BaseModel

from openai import OpenAI
from mlflow.entities import SpanType
from mlflow.pyfunc import ResponsesAgent
from mlflow.types.responses import (
    ResponsesAgentRequest,
    ResponsesAgentResponse,
    ResponsesAgentStreamEvent,
)

# Load environment
load_dotenv()
assert "OPENAI_API_KEY" in os.environ, "Set OPENAI_API_KEY in .env"

# Set experiment
mlflow.set_experiment("Tool_Calling_Agent")

print("✅ Setup complete!")

## 1. Understanding Tool Calling

### What is Tool Calling?

Tool calling (also known as function calling) allows LLMs to:
1. Recognize when they need external information
2. Generate structured function calls
3. Receive and process function results
4. Generate final responses

### Tool Calling Flow:

```
User: "What's the weather in Boston?"
   ↓
LLM: Generates function call → get_weather(location="Boston")
   ↓
Tool: Executes → Returns "72°F, Sunny"
   ↓
LLM: Processes result → "The weather in Boston is 72°F and sunny."
```

### ResponseAgent Tool Calling Advantages:

✅ Returns intermediate tool calls (for transparency)

✅ Structured tool execution tracking

✅ Built-in retry logic support

✅ Full tracing of tool invocations

✅ OpenAI-compatible tool specifications

## 2. Creating Tool Definitions

In [None]:
class ToolInfo(BaseModel):
    """
    Represents a tool that the agent can use.
    
    Attributes:
        name: Tool identifier
        spec: OpenAI-compatible tool specification
        exec_fn: Python function that implements the tool
    """
    name: str
    spec: dict
    exec_fn: Callable


# Define actual tool implementations
def get_weather(latitude: float, longitude: float) -> dict:
    """
    Mock weather API - in production, this would call a real API.
    """
    # Simulate API call
    return {
        "temperature": 72,
        "unit": "fahrenheit",
        "condition": "sunny",
        "location": f"({latitude}, {longitude})"
    }


def calculate(operation: str, x: float, y: float) -> float:
    """
    Perform mathematical operations.
    """
    operations = {
        "add": lambda a, b: a + b,
        "subtract": lambda a, b: a - b,
        "multiply": lambda a, b: a * b,
        "divide": lambda a, b: a / b if b != 0 else "Error: Division by zero",
    }
    
    if operation not in operations:
        return f"Error: Unknown operation {operation}"
    
    return operations[operation](x, y)


# Create tool specifications
TOOLS = [
    ToolInfo(
        name="get_weather",
        spec={
            "type": "function",
            "name": "get_weather",
            "description": "Get current temperature and weather conditions for coordinates.",
            "parameters": {
                "type": "object",
                "properties": {
                    "latitude": {
                        "type": "number",
                        "description": "Latitude coordinate"
                    },
                    "longitude": {
                        "type": "number",
                        "description": "Longitude coordinate"
                    },
                },
                "required": ["latitude", "longitude"],
                "additionalProperties": False,
            },
            "strict": True,
        },
        exec_fn=get_weather,
    ),
    ToolInfo(
        name="calculate",
        spec={
            "type": "function",
            "name": "calculate",
            "description": "Perform mathematical calculations: add, subtract, multiply, divide.",
            "parameters": {
                "type": "object",
                "properties": {
                    "operation": {
                        "type": "string",
                        "enum": ["add", "subtract", "multiply", "divide"],
                        "description": "Mathematical operation to perform"
                    },
                    "x": {
                        "type": "number",
                        "description": "First number"
                    },
                    "y": {
                        "type": "number",
                        "description": "Second number"
                    },
                },
                "required": ["operation", "x", "y"],
            },
        },
        exec_fn=calculate,
    ),
]

print(f"✅ Defined {len(TOOLS)} tools: {[t.name for t in TOOLS]}")

## 3. Building the Tool-Calling Agent

In [None]:
class ToolCallingAgent(ResponsesAgent):
    """
    Production-ready tool-calling agent with ResponseAgent interface.
    
    Features:
    - Automatic tool execution
    - Iterative tool calling (agent can call multiple tools)
    - Full tracing of tool invocations
    - Retry logic with exponential backoff
    - Structured error handling
    """
    
    def __init__(self, model: str, tools: list[ToolInfo], system_prompt: str = None):
        """Initialize agent with model and tools."""
        self.model = model
        self.client = OpenAI()
        self._tools_dict = {tool.name: tool for tool in tools}
        self.system_prompt = system_prompt or "You are a helpful assistant with access to tools."
    
    def get_tool_specs(self) -> list[dict]:
        """Return tool specifications for OpenAI API."""
        return [tool_info.spec for tool_info in self._tools_dict.values()]
    
    @mlflow.trace(span_type=SpanType.TOOL)
    def execute_tool(self, tool_name: str, args: dict) -> Any:
        """
        Execute a tool with given arguments.
        
        This is traced separately for observability.
        """
        try:
            result = self._tools_dict[tool_name].exec_fn(**args)
            return result
        except Exception as e:
            return f"Error executing tool: {str(e)}"
    
    @backoff.on_exception(backoff.expo, openai.RateLimitError)
    @mlflow.trace(span_type=SpanType.LLM)
    def call_llm(self, input_messages) -> dict:
        """
        Call LLM with retry logic.
        
        Uses exponential backoff for rate limits.
        """
        response = self.client.responses.create(
            model=self.model,
            input=input_messages,
            tools=self.get_tool_specs(),
        )
        return response.output[0].model_dump(exclude_none=True)
    
    def handle_tool_call(self, tool_call: dict[str, Any]) -> ResponsesAgentStreamEvent:
        """
        Execute a tool call and return the result as a stream event.
        """
        # Parse arguments
        args = json.loads(tool_call["arguments"])
        
        # Execute tool
        result = self.execute_tool(tool_name=tool_call["name"], args=args)
        
        # Convert to string for LLM
        result_str = json.dumps(result) if isinstance(result, dict) else str(result)
        
        # Create tool output event
        tool_call_output = {
            "type": "function_call_output",
            "call_id": tool_call["call_id"],
            "output": result_str,
        }
        
        return ResponsesAgentStreamEvent(
            type="response.output_item.done",
            item=tool_call_output
        )
    
    def call_and_run_tools(
        self,
        input_messages: list,
        max_iter: int = 10,
    ) -> Generator[ResponsesAgentStreamEvent, None, None]:
        """
        Iteratively call LLM and execute tools until completion.
        
        This implements the agent loop:
        1. Call LLM
        2. If tool call -> execute and continue
        3. If text response -> done
        4. Repeat until max iterations
        """
        for iteration in range(max_iter):
            last_msg = input_messages[-1]
            
            # Check if we're done (assistant message with text)
            if (
                last_msg.get("type") == "message"
                and last_msg.get("role") == "assistant"
            ):
                return
            
            # Execute tool if last message is a tool call
            if last_msg.get("type") == "function_call":
                tool_call_res = self.handle_tool_call(last_msg)
                input_messages.append(tool_call_res.item)
                yield tool_call_res
            else:
                # Call LLM
                llm_output = self.call_llm(input_messages=input_messages)
                input_messages.append(llm_output)
                yield ResponsesAgentStreamEvent(
                    type="response.output_item.done",
                    item=llm_output,
                )
        
        # Max iterations reached
        yield ResponsesAgentStreamEvent(
            type="response.output_item.done",
            item={
                "id": str(uuid4()),
                "content": [{
                    "type": "output_text",
                    "text": "Maximum iterations reached. Task may be incomplete.",
                }],
                "role": "assistant",
                "type": "message",
            },
        )
    
    @mlflow.trace(span_type=SpanType.AGENT)
    def predict(self, request: ResponsesAgentRequest) -> ResponsesAgentResponse:
        """Non-streaming prediction."""
        outputs = [
            event.item
            for event in self.predict_stream(request)
            if event.type == "response.output_item.done"
        ]
        return ResponsesAgentResponse(
            output=outputs,
            custom_outputs=request.custom_inputs
        )
    
    @mlflow.trace(span_type=SpanType.AGENT)
    def predict_stream(
        self, request: ResponsesAgentRequest
    ) -> Generator[ResponsesAgentStreamEvent, None, None]:
        """Streaming prediction with tool execution."""
        # Prepare input messages
        input_messages = [
            {"role": "system", "content": self.system_prompt}
        ] + [i.model_dump() for i in request.input]
        
        # Run agent loop
        yield from self.call_and_run_tools(input_messages=input_messages)


print("✅ ToolCallingAgent class defined!")

## 4. Testing the Tool-Calling Agent

In [None]:
# Enable tracing
mlflow.openai.autolog()

# Create agent
agent = ToolCallingAgent(
    model="gpt-4o-mini",
    tools=TOOLS,
    system_prompt="You are a helpful assistant with access to weather and calculator tools."
)

print("Testing tool-calling agent...\n")

# Test 1: Weather tool
print("=" * 60)
print("Test 1: Weather Query")
print("=" * 60)
response1 = agent.predict({
    "input": [{
        "role": "user",
        "content": "What's the weather at coordinates 42.3601, -71.0589 (Boston)?"
    }]
})

print(f"\nNumber of output items: {len(response1.output)}")
for i, item in enumerate(response1.output):
    print(f"\nItem {i+1}: {item.get('type')}")
    if item.get('type') == 'function_call':
        print(f"  Tool: {item.get('name')}")
        print(f"  Args: {item.get('arguments')}")
    elif item.get('type') == 'function_call_output':
        print(f"  Result: {item.get('output')}")
    elif item.get('type') == 'message':
        print(f"  Response: {item['content'][0]['text']}")

# Test 2: Calculator tool
print("\n" + "=" * 60)
print("Test 2: Math Calculation")
print("=" * 60)
response2 = agent.predict({
    "input": [{
        "role": "user",
        "content": "What is 125 multiplied by 48?"
    }]
})

for i, item in enumerate(response2.output):
    if item.get('type') == 'message':
        print(f"\nFinal Response: {item['content'][0]['text']}")

print("\n✅ Tool-calling agent tests complete!")

## 5. Creating a Deployable Agent File

In [None]:
%%writefile tool_calling_agent.py
"""Production tool-calling agent with ResponseAgent."""

import os
import json
from typing import Any, Callable, Generator
from uuid import uuid4

import mlflow
import openai
import backoff
from pydantic import BaseModel
from openai import OpenAI

from mlflow.entities import SpanType
from mlflow.pyfunc import ResponsesAgent
from mlflow.types.responses import (
    ResponsesAgentRequest,
    ResponsesAgentResponse,
    ResponsesAgentStreamEvent,
)


class ToolInfo(BaseModel):
    name: str
    spec: dict
    exec_fn: Callable


def get_weather(latitude: float, longitude: float) -> dict:
    """Mock weather API."""
    return {
        "temperature": 72,
        "unit": "fahrenheit",
        "condition": "sunny",
    }


def calculate(operation: str, x: float, y: float) -> float:
    """Perform calculations."""
    ops = {
        "add": lambda a, b: a + b,
        "subtract": lambda a, b: a - b,
        "multiply": lambda a, b: a * b,
        "divide": lambda a, b: a / b if b != 0 else "Error: Division by zero",
    }
    return ops.get(operation, lambda a, b: "Unknown operation")(x, y)


class ToolCallingAgent(ResponsesAgent):
    def __init__(self, model: str, tools: list[ToolInfo], system_prompt: str = None):
        self.model = model
        self.client = OpenAI()
        self._tools_dict = {tool.name: tool for tool in tools}
        self.system_prompt = system_prompt or "You are a helpful assistant."
    
    def get_tool_specs(self) -> list[dict]:
        return [tool_info.spec for tool_info in self._tools_dict.values()]
    
    @mlflow.trace(span_type=SpanType.TOOL)
    def execute_tool(self, tool_name: str, args: dict) -> Any:
        try:
            return self._tools_dict[tool_name].exec_fn(**args)
        except Exception as e:
            return f"Error: {str(e)}"
    
    @backoff.on_exception(backoff.expo, openai.RateLimitError)
    @mlflow.trace(span_type=SpanType.LLM)
    def call_llm(self, input_messages) -> dict:
        response = self.client.responses.create(
            model=self.model,
            input=input_messages,
            tools=self.get_tool_specs(),
        )
        return response.output[0].model_dump(exclude_none=True)
    
    def handle_tool_call(self, tool_call: dict) -> ResponsesAgentStreamEvent:
        args = json.loads(tool_call["arguments"])
        result = self.execute_tool(tool_call["name"], args)
        result_str = json.dumps(result) if isinstance(result, dict) else str(result)
        
        return ResponsesAgentStreamEvent(
            type="response.output_item.done",
            item={
                "type": "function_call_output",
                "call_id": tool_call["call_id"],
                "output": result_str,
            }
        )
    
    def call_and_run_tools(
        self, input_messages: list, max_iter: int = 10
    ) -> Generator[ResponsesAgentStreamEvent, None, None]:
        for _ in range(max_iter):
            last_msg = input_messages[-1]
            
            if (
                last_msg.get("type") == "message"
                and last_msg.get("role") == "assistant"
            ):
                return
            
            if last_msg.get("type") == "function_call":
                tool_res = self.handle_tool_call(last_msg)
                input_messages.append(tool_res.item)
                yield tool_res
            else:
                llm_out = self.call_llm(input_messages)
                input_messages.append(llm_out)
                yield ResponsesAgentStreamEvent(
                    type="response.output_item.done", item=llm_out
                )
    
    @mlflow.trace(span_type=SpanType.AGENT)
    def predict(self, request: ResponsesAgentRequest) -> ResponsesAgentResponse:
        outputs = [
            event.item
            for event in self.predict_stream(request)
            if event.type == "response.output_item.done"
        ]
        return ResponsesAgentResponse(
            output=outputs, custom_outputs=request.custom_inputs
        )
    
    @mlflow.trace(span_type=SpanType.AGENT)
    def predict_stream(
        self, request: ResponsesAgentRequest
    ) -> Generator[ResponsesAgentStreamEvent, None, None]:
        input_messages = [
            {"role": "system", "content": self.system_prompt}
        ] + [i.model_dump() for i in request.input]
        yield from self.call_and_run_tools(input_messages)


# Define tools
TOOLS = [
    ToolInfo(
        name="get_weather",
        spec={
            "type": "function",
            "name": "get_weather",
            "description": "Get weather for coordinates.",
            "parameters": {
                "type": "object",
                "properties": {
                    "latitude": {"type": "number"},
                    "longitude": {"type": "number"},
                },
                "required": ["latitude", "longitude"],
            },
        },
        exec_fn=get_weather,
    ),
    ToolInfo(
        name="calculate",
        spec={
            "type": "function",
            "name": "calculate",
            "description": "Perform math operations.",
            "parameters": {
                "type": "object",
                "properties": {
                    "operation": {
                        "type": "string",
                        "enum": ["add", "subtract", "multiply", "divide"],
                    },
                    "x": {"type": "number"},
                    "y": {"type": "number"},
                },
                "required": ["operation", "x", "y"],
            },
        },
        exec_fn=calculate,
    ),
]

# Enable tracing
mlflow.openai.autolog()

# Create and set model
AGENT = ToolCallingAgent(
    model="gpt-4o-mini",
    tools=TOOLS,
    system_prompt="You are a helpful assistant with weather and calculator tools."
)
mlflow.models.set_model(AGENT)

## 6. Logging and Serving

In [None]:
# Log the tool-calling agent
with mlflow.start_run(run_name="tool_calling_agent") as run:
    model_info = mlflow.pyfunc.log_model(
        python_model="tool_calling_agent.py",
        artifact_path="agent",
        pip_requirements=[
            "mlflow",
            "openai",
            "pydantic>=2.0.0",
            "backoff",
        ],
    )
    
    print(f"✅ Tool-calling agent logged!")
    print(f"Model URI: {model_info.model_uri}")
    print(f"\nTo serve: mlflow models serve -m {model_info.model_uri} -p 5001")

## Summary

### What We Built:

1. ✅ Tool specifications (OpenAI compatible)
2. ✅ Tool execution framework
3. ✅ Iterative agent loop (LLM → Tool → LLM)
4. ✅ Full tracing and observability
5. ✅ Production-ready deployment

### Key Concepts:

- **Tool Definition**: Spec + Implementation
- **Agent Loop**: Iterate until completion
- **Transparency**: Return all intermediate steps
- **Tracing**: Track every tool execution

### Production Considerations:

- Add authentication for sensitive tools
- Implement rate limiting
- Add tool result validation
- Monitor tool usage and costs
- Handle tool failures gracefully