# MLflow ResponseAgent - Basics

This notebook introduces the **ResponseAgent** class in MLflow and explains why it's needed for building production-ready AI agents.

## Table of Contents
1. What is ResponseAgent?
2. Why Do We Need ResponseAgent?
3. Key Features
4. Basic ResponseAgent Implementation
5. Comparison with ChatAgent

## Setup

In [2]:
import os
from dotenv import load_dotenv
import mlflow
from mlflow.pyfunc import ResponsesAgent
from mlflow.types.responses import (
    ResponsesAgentRequest,
    ResponsesAgentResponse,
    ResponsesAgentStreamEvent,
)
from typing import Generator

# Load environment variables
load_dotenv()

# Set MLflow tracking
mlflow.set_experiment("ResponseAgent_Basics")

print(f"MLflow version: {mlflow.__version__}")

MLflow version: 3.8.1


## 1. What is ResponseAgent?

**ResponseAgent** is a specialized subclass of MLflow's `PythonModel` that provides a **framework-agnostic** interface for serving generative AI models with advanced capabilities.

### Key Characteristics:
- **Subclass of PythonModel**: Inherits all MLflow model capabilities
- **Framework-agnostic**: Works with any agent framework (LangGraph, LangChain, custom implementations)
- **OpenAI API Compatible**: Follows OpenAI's Responses API standard
- **Structured I/O**: Uses well-defined request/response schemas
- **Production-ready**: Built for deployment and serving

## 2. Why Do We Need ResponseAgent?

### Problems with Traditional Approaches:

1. **Framework Lock-in**: Agents built with specific frameworks are hard to migrate
2. **No Standardized I/O**: Different agents use different message formats
3. **Tool Calling Complexity**: Handling function calls requires custom logic
4. **Multi-agent Support**: Traditional models don't support multi-agent scenarios
5. **Deployment Challenges**: Difficult to serve and scale custom agents
6. **Observability Gaps**: Hard to track intermediate steps in agent execution

### How ResponseAgent Solves These:

✅ **Framework Independence**: Wrap any agent implementation

✅ **Standard Interface**: OpenAI Responses API compatibility

✅ **Built-in Tool Support**: Native function calling capabilities

✅ **Multi-agent Ready**: Support for complex agent interactions

✅ **Easy Deployment**: Log once, deploy anywhere with MLflow

✅ **Full Tracing**: Integrated with MLflow tracing for observability

## 3. Key Features

### Feature Comparison Table

| Feature | ChatModel | ChatAgent | ResponseAgent |
|---------|-----------|-----------|---------------|
| Basic Chat | ✅ | ✅ | ✅ |
| Tool Calling | ❌ | ✅ | ✅ |
| Multi-turn Dialog | ✅ | ✅ | ✅ |
| Streaming | ✅ | ✅ | ✅ |
| Multi-agent | ❌ | ❌ | ✅ |
| OpenAI Compatible | ❌ | ⚠️ Partial | ✅ Full |
| Intermediate Outputs | ❌ | ⚠️ Limited | ✅ Full |
| Annotations | ❌ | ❌ | ✅ |
| Custom Outputs | ❌ | ❌ | ✅ |

### ResponseAgent Advantages:

1. **Multiple Output Messages**: Return intermediate tool calls and results
2. **Multi-agent Orchestration**: Coordinate between multiple AI agents
3. **Full OpenAI Compatibility**: Works with OpenAI SDKs and UIs
4. **Flexible Output**: Custom outputs via `custom_outputs` field
5. **Enhanced Tracing**: Better observability for debugging

## 4. Basic ResponseAgent Implementation

Let's create a simple ResponseAgent that demonstrates core concepts.

In [8]:
from mlflow.entities.span import SpanType
from mlflow.pyfunc import ResponsesAgent
from mlflow.types.responses import (
    ResponsesAgentRequest,
    ResponsesAgentResponse,
)


class SimpleResponsesAgent(ResponsesAgent):
    """
    A basic ResponseAgent that echoes user messages with a greeting.
    
    This demonstrates:
    - Basic ResponseAgent structure
    - Request/Response handling
    - Text output creation
    - Custom outputs
    """
    
    @mlflow.trace(span_type=SpanType.AGENT)
    def predict(self, request: ResponsesAgentRequest) -> ResponsesAgentResponse:
        """
        Main prediction method.
        
        Args:
            request: ResponsesAgentRequest with user input
            
        Returns:
            ResponsesAgentResponse with agent output
        """
        # Extract user message from request
        user_message = request.input[-1].content
        
        # Create response text
        response_text = f"Hello! You said: '{user_message}'. How can I help you today?"
        
        # Create structured response
        return ResponsesAgentResponse(
            output=[
                self.create_text_output_item(
                    text=response_text,
                    id="msg_1",
                )
            ],
            # Add custom metadata
            custom_outputs={
                "agent_type": "simple_echo",
                "processed_at": "2026-01-29",
            },
        )


# Test the agent
print("Testing SimpleResponsesAgent...\n")
agent = SimpleResponsesAgent()

# Create a test request
test_request = {
    "input": [{"role": "user", "content": "What is MLflow?"}],
    "context": {"user_id": "test_user", "session_id": "session_123"},
}

# Get response
response = agent.predict(test_request)
response




Testing SimpleResponsesAgent...



ResponsesAgentResponse(tool_choice=None, truncation=None, id=None, created_at=None, error=None, incomplete_details=None, instructions=None, metadata=None, model=None, object='response', output=[OutputItem(type='message', id='msg_1', content=[{'text': "Hello! You said: 'What is MLflow?'. How can I help you today?", 'type': 'output_text'}], role='assistant')], parallel_tool_calls=None, temperature=None, tools=None, top_p=None, max_output_tokens=None, previous_response_id=None, reasoning=None, status=None, text=None, usage=None, user=None, custom_outputs={'agent_type': 'simple_echo', 'processed_at': '2026-01-29'})

In [9]:
# Access OutputItem properties correctly (it's a Pydantic model, not a dict)
output_item = response.output[0]
print(f"Agent Response: {output_item.content[0]['text']}")
print(f"Custom Outputs: {response.custom_outputs}")
print(f"\nOutput Item Type: {output_item.type}")
print(f"Output Item ID: {output_item.id}")
print(f"Output Item Role: {output_item.role}")

Agent Response: Hello! You said: 'What is MLflow?'. How can I help you today?
Custom Outputs: {'agent_type': 'simple_echo', 'processed_at': '2026-01-29'}

Output Item Type: message
Output Item ID: msg_1
Output Item Role: assistant


## 5. Understanding Request and Response Schemas

### ResponsesAgentRequest Schema

```python
{
    "input": [  # List of messages
        {
            "role": "user" | "assistant" | "system",
            "content": "message text"
        }
    ],
    "context": {  # Optional context data
        "user_id": "...",
        "session_id": "...",
        # Any custom fields
    },
    "tools": [...]  # Optional tool definitions
}
```

### ResponsesAgentResponse Schema

```python
{
    "output": [  # List of output items
        {
            "type": "message",
            "id": "msg_1",
            "role": "assistant",
            "content": [
                {
                    "type": "output_text",
                    "text": "response text"
                }
            ]
        }
    ],
    "custom_outputs": {  # Optional custom data
        "key": "value"
    }
}
```

## 6. Helper Methods in ResponseAgent

ResponseAgent provides convenient helper methods for creating outputs:

In [None]:
# Example of all helper methods
class DemoAgent(ResponsesAgent):
    @mlflow.trace(span_type=SpanType.AGENT)
    def predict(self, request: ResponsesAgentRequest) -> ResponsesAgentResponse:
        outputs = []
        
        # 1. Text output
        outputs.append(
            self.create_text_output_item(
                text="This is a text response",
                id="text_1"
            )
        )
        
        # 2. Function call (tool invocation)
        outputs.append(
            self.create_function_call_item(
                id="fc_1",
                call_id="call_123",
                name="calculator",
                arguments='{"operation": "add", "x": 5, "y": 3}'
            )
        )
        
        # 3. Function call output (tool result)
        outputs.append(
            self.create_function_call_output_item(
                call_id="call_123",
                output="8"
            )
        )
        
        # 4. Reasoning (chain-of-thought)
        outputs.append(
            self.create_reasoning_item(
                text="Let me think step by step...",
                id="reason_1"
            )
        )
        
        return ResponsesAgentResponse(output=outputs)

# Note: This is just a demonstration of available methods
print("✅ Helper methods demonstrated above")
print("\nAvailable helper methods:")
print("1. create_text_output_item()")
print("2. create_function_call_item()")
print("3. create_function_call_output_item()")
print("4. create_reasoning_item()")
print("5. create_text_delta() - for streaming")
print("6. create_annotation_added() - for streaming")

## 7. Logging a ResponseAgent

MLflow uses the "Models from Code" approach for logging ResponseAgent.

In [None]:
# Save our agent to a Python file
agent_code = '''
import mlflow
from mlflow.entities.span import SpanType
from mlflow.pyfunc import ResponsesAgent
from mlflow.types.responses import (
    ResponsesAgentRequest,
    ResponsesAgentResponse,
)


class SimpleResponsesAgent(ResponsesAgent):
    @mlflow.trace(span_type=SpanType.AGENT)
    def predict(self, request: ResponsesAgentRequest) -> ResponsesAgentResponse:
        user_message = request.input[-1].content
        response_text = f"Echo: {user_message}"
        
        return ResponsesAgentResponse(
            output=[
                self.create_text_output_item(
                    text=response_text,
                    id="msg_1",
                )
            ]
        )


# Set model for MLflow
mlflow.models.set_model(SimpleResponsesAgent())
'''

# Write to file
with open("simple_agent.py", "w") as f:
    f.write(agent_code)

# Log the model
with mlflow.start_run(run_name="simple_responses_agent"):
    model_info = mlflow.pyfunc.log_model(
        python_model="simple_agent.py",
        artifact_path="agent",
        # Signature and metadata are auto-inferred for ResponseAgent
    )
    
    print(f"✅ Model logged successfully!")
    print(f"Model URI: {model_info.model_uri}")
    print(f"\nAuto-generated metadata:")
    print(f"  Task: {model_info.metadata.get('task', 'N/A')}")

## 8. Loading and Testing the Logged Model

In [None]:
# Load the model
loaded_model = mlflow.pyfunc.load_model(model_info.model_uri)

# Test it
test_input = {
    "input": [{"role": "user", "content": "Hello, ResponseAgent!"}],
    "context": {"user_id": "user_123"},
}

response = loaded_model.predict(test_input)
print(f"Loaded model response: {response}")

## Summary

### What We Learned:

1. ✅ **What is ResponseAgent**: Framework-agnostic agent interface
2. ✅ **Why we need it**: Standardization, deployment, multi-agent support
3. ✅ **Key features**: OpenAI compatibility, tool calling, streaming
4. ✅ **Basic implementation**: Creating and using ResponseAgent
5. ✅ **Logging and loading**: Models from code approach

### Next Steps:
- Proceed to notebook 03 for LangGraph integration
- Learn about tool calling with ResponseAgent
- Explore streaming responses
- Deploy agents for production use