# ResponsesAgent with ChatCompletions API

This notebook demonstrates integrating OpenAI's ChatCompletions API with MLflow ResponsesAgent.

## Table of Contents
1. ChatCompletions vs Responses API
2. Basic ChatCompletions Integration
3. Format Conversion Utilities
4. Multi-turn Conversations
5. System Prompts and Context
6. Production Deployment

## Setup

In [None]:
import os
from dotenv import load_dotenv
from typing import Generator

import mlflow
from mlflow.entities.span import SpanType
from mlflow.pyfunc import ResponsesAgent
from mlflow.types.responses import (
    ResponsesAgentRequest,
    ResponsesAgentResponse,
    ResponsesAgentStreamEvent,
    output_to_responses_items_stream,
    to_chat_completions_input,
)
from openai import OpenAI

# Load environment
load_dotenv()
assert "OPENAI_API_KEY" in os.environ, "Please set OPENAI_API_KEY in .env file"

# Set experiment
mlflow.set_experiment("ChatCompletions_ResponseAgent")

print(f"MLflow version: {mlflow.__version__}")
print("✅ Setup complete!")

## 1. ChatCompletions vs Responses API

OpenAI offers two main APIs for conversational AI:

### ChatCompletions API
- **Mature and stable**: Available since GPT-3.5
- **Wide support**: Most libraries and tools support it
- **Simple format**: messages with role and content
- **Streaming**: Supports token-by-token streaming

```python
# ChatCompletions format
{
    "messages": [
        {"role": "system", "content": "You are helpful."},
        {"role": "user", "content": "Hello!"}
    ]
}
```

### Responses API (Newer)
- **Richer output**: Supports multiple output types
- **Built-in tools**: Native tool/function support
- **Annotations**: Link citations and references
- **Multi-agent**: Better support for agent workflows

```python
# Responses API format
{
    "input": [{"role": "user", "content": "Hello!"}],
    "output": [{"type": "message", "content": [...]}]
}
```

### MLflow ResponsesAgent
- Uses **Responses API format** for I/O
- Provides **conversion utilities** for ChatCompletions
- Enables **framework flexibility**

## 2. Basic ChatCompletions Integration

Let's build a ResponsesAgent that uses OpenAI's ChatCompletions API.

In [None]:
class ChatCompletionsAgent(ResponsesAgent):
    """
    ResponsesAgent that uses OpenAI ChatCompletions API.
    
    This demonstrates:
    - Converting ResponsesAgent input to ChatCompletions format
    - Calling ChatCompletions API
    - Converting response back to ResponsesAgent format
    """
    
    def __init__(self, model: str = "gpt-4o-mini"):
        self.model = model
        self.client = OpenAI()
    
    @mlflow.trace(span_type=SpanType.LLM)
    def call_llm(self, messages: list) -> dict:
        """
        Call OpenAI ChatCompletions API.
        
        Returns raw response for conversion.
        """
        response = self.client.chat.completions.create(
            model=self.model,
            messages=messages,
        )
        return response
    
    @mlflow.trace(span_type=SpanType.AGENT)
    def predict(self, request: ResponsesAgentRequest) -> ResponsesAgentResponse:
        """
        Main prediction method.
        
        Steps:
        1. Convert ResponsesAgent input to ChatCompletions format
        2. Call ChatCompletions API
        3. Convert response to ResponsesAgent format
        """
        # Step 1: Convert input format
        messages = to_chat_completions_input(
            [i.model_dump() for i in request.input]
        )
        
        # Step 2: Call LLM
        response = self.call_llm(messages)
        
        # Step 3: Convert response to ResponsesAgent format
        assistant_message = response.choices[0].message.content
        
        return ResponsesAgentResponse(
            output=[
                self.create_text_output_item(
                    text=assistant_message,
                    id="msg_1",
                )
            ],
            custom_outputs={
                "model": self.model,
                "usage": {
                    "prompt_tokens": response.usage.prompt_tokens,
                    "completion_tokens": response.usage.completion_tokens,
                    "total_tokens": response.usage.total_tokens,
                }
            }
        )
    
    @mlflow.trace(span_type=SpanType.AGENT)
    def predict_stream(
        self, request: ResponsesAgentRequest
    ) -> Generator[ResponsesAgentStreamEvent, None, None]:
        """
        Streaming prediction using ChatCompletions stream.
        """
        # Convert input format
        messages = to_chat_completions_input(
            [i.model_dump() for i in request.input]
        )
        
        # Stream from ChatCompletions
        stream = self.client.chat.completions.create(
            model=self.model,
            messages=messages,
            stream=True,
        )
        
        # Convert stream to ResponsesAgent format
        yield from output_to_responses_items_stream(
            chunk.to_dict() for chunk in stream
        )


# Enable autologging
mlflow.openai.autolog()

# Test the agent
print("Testing ChatCompletions agent...\n")
agent = ChatCompletionsAgent(model="gpt-4o-mini")

response = agent.predict({
    "input": [{"role": "user", "content": "What is MLflow in one sentence?"}]
})

print(f"Response: {response.output[0].content[0]['text']}")
print(f"\nUsage: {response.custom_outputs['usage']}")
print("\n✅ ChatCompletions agent working!")

## 3. Understanding Format Conversion Utilities

MLflow provides utilities for converting between formats.

In [None]:
# Demonstrate to_chat_completions_input
print("Format Conversion: to_chat_completions_input()")
print("=" * 60)

# ResponsesAgent input format
responses_input = [
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there!"},
    {"role": "user", "content": "How are you?"}
]

print("Input (ResponsesAgent format):")
for msg in responses_input:
    print(f"  {msg}")

# Convert to ChatCompletions format
cc_messages = to_chat_completions_input(responses_input)

print("\nOutput (ChatCompletions format):")
for msg in cc_messages:
    print(f"  {msg}")

print("\n" + "=" * 60)
print("✅ The formats are compatible for basic messages!")

In [None]:
# Demonstrate output_to_responses_items_stream
print("Format Conversion: output_to_responses_items_stream()")
print("=" * 60)

# Simulate ChatCompletions streaming chunks
fake_chunks = [
    {"choices": [{"delta": {"role": "assistant"}, "index": 0}]},
    {"choices": [{"delta": {"content": "Hello"}, "index": 0}]},
    {"choices": [{"delta": {"content": " world"}, "index": 0}]},
    {"choices": [{"delta": {"content": "!"}, "index": 0}]},
    {"choices": [{"delta": {}, "finish_reason": "stop", "index": 0}]},
]

print("Converting ChatCompletions stream to ResponsesAgent events:")
print()

for event in output_to_responses_items_stream(iter(fake_chunks)):
    print(f"Event type: {event.type}")
    if hasattr(event, 'delta') and event.delta:
        print(f"  Delta: '{event.delta}'")
    if hasattr(event, 'item') and event.item:
        print(f"  Item: {event.item}")
    print()

print("=" * 60)
print("✅ Stream conversion demonstrated!")

## 4. Multi-turn Conversations

Handling conversation history with ChatCompletions.

In [None]:
class ConversationalAgent(ResponsesAgent):
    """
    Agent that handles multi-turn conversations.
    
    Demonstrates:
    - Conversation history in requests
    - System prompt injection
    - Context preservation
    """
    
    def __init__(self, model: str = "gpt-4o-mini", system_prompt: str = None):
        self.model = model
        self.client = OpenAI()
        self.system_prompt = system_prompt or "You are a helpful assistant. Be concise."
    
    @mlflow.trace(span_type=SpanType.AGENT)
    def predict(self, request: ResponsesAgentRequest) -> ResponsesAgentResponse:
        # Prepare messages with system prompt
        messages = [{"role": "system", "content": self.system_prompt}]
        
        # Add conversation history
        messages.extend(
            to_chat_completions_input([i.model_dump() for i in request.input])
        )
        
        # Call LLM
        response = self.client.chat.completions.create(
            model=self.model,
            messages=messages,
        )
        
        return ResponsesAgentResponse(
            output=[
                self.create_text_output_item(
                    text=response.choices[0].message.content,
                    id="msg_1",
                )
            ]
        )
    
    @mlflow.trace(span_type=SpanType.AGENT)
    def predict_stream(
        self, request: ResponsesAgentRequest
    ) -> Generator[ResponsesAgentStreamEvent, None, None]:
        messages = [{"role": "system", "content": self.system_prompt}]
        messages.extend(
            to_chat_completions_input([i.model_dump() for i in request.input])
        )
        
        stream = self.client.chat.completions.create(
            model=self.model,
            messages=messages,
            stream=True,
        )
        
        yield from output_to_responses_items_stream(
            chunk.to_dict() for chunk in stream
        )


# Test multi-turn conversation
print("Testing multi-turn conversation...\n")
conv_agent = ConversationalAgent(
    model="gpt-4o-mini",
    system_prompt="You are a Python expert. Be concise and use code examples."
)

# Turn 1
print("Turn 1:")
print("-" * 40)
response1 = conv_agent.predict({
    "input": [{"role": "user", "content": "What is a list comprehension?"}]
})
print(f"User: What is a list comprehension?")
print(f"Agent: {response1.output[0].content[0]['text'][:200]}...")

# Turn 2 - with history
print("\nTurn 2:")
print("-" * 40)
response2 = conv_agent.predict({
    "input": [
        {"role": "user", "content": "What is a list comprehension?"},
        {"role": "assistant", "content": response1.output[0].content[0]['text']},
        {"role": "user", "content": "Can you show a nested example?"}
    ]
})
print(f"User: Can you show a nested example?")
print(f"Agent: {response2.output[0].content[0]['text'][:200]}...")

print("\n✅ Multi-turn conversation working!")

## 5. Advanced: Context and Custom Inputs

Using context data and custom inputs with the agent.

In [None]:
class ContextAwareAgent(ResponsesAgent):
    """
    Agent that uses context data for personalization.
    
    Demonstrates:
    - Using request.context for metadata
    - Using request.custom_inputs for custom data
    - Passing data through custom_outputs
    """
    
    def __init__(self, model: str = "gpt-4o-mini"):
        self.model = model
        self.client = OpenAI()
    
    def build_system_prompt(self, context: dict) -> str:
        """
        Build dynamic system prompt based on context.
        """
        base_prompt = "You are a helpful assistant."
        
        if context:
            user_name = context.get("user_name", "User")
            preferences = context.get("preferences", {})
            
            base_prompt += f"\n\nUser's name: {user_name}"
            
            if preferences.get("concise"):
                base_prompt += "\nBe very concise in your responses."
            
            if preferences.get("technical_level"):
                level = preferences["technical_level"]
                base_prompt += f"\nAdjust technical depth to: {level}"
        
        return base_prompt
    
    @mlflow.trace(span_type=SpanType.AGENT)
    def predict(self, request: ResponsesAgentRequest) -> ResponsesAgentResponse:
        # Extract context
        context = getattr(request, 'context', {}) or {}
        custom_inputs = getattr(request, 'custom_inputs', {}) or {}
        
        # Build dynamic system prompt
        system_prompt = self.build_system_prompt(context)
        
        # Prepare messages
        messages = [{"role": "system", "content": system_prompt}]
        messages.extend(
            to_chat_completions_input([i.model_dump() for i in request.input])
        )
        
        # Call LLM
        response = self.client.chat.completions.create(
            model=self.model,
            messages=messages,
        )
        
        return ResponsesAgentResponse(
            output=[
                self.create_text_output_item(
                    text=response.choices[0].message.content,
                    id="msg_1",
                )
            ],
            custom_outputs={
                "context_used": context,
                "custom_inputs_received": custom_inputs,
                "model": self.model,
            }
        )
    
    @mlflow.trace(span_type=SpanType.AGENT)
    def predict_stream(
        self, request: ResponsesAgentRequest
    ) -> Generator[ResponsesAgentStreamEvent, None, None]:
        context = getattr(request, 'context', {}) or {}
        system_prompt = self.build_system_prompt(context)
        
        messages = [{"role": "system", "content": system_prompt}]
        messages.extend(
            to_chat_completions_input([i.model_dump() for i in request.input])
        )
        
        stream = self.client.chat.completions.create(
            model=self.model,
            messages=messages,
            stream=True,
        )
        
        yield from output_to_responses_items_stream(
            chunk.to_dict() for chunk in stream
        )


# Test context-aware agent
print("Testing context-aware agent...\n")
context_agent = ContextAwareAgent(model="gpt-4o-mini")

# Request with context
response = context_agent.predict({
    "input": [{"role": "user", "content": "Explain machine learning"}],
    "context": {
        "user_name": "Alice",
        "preferences": {
            "concise": True,
            "technical_level": "beginner"
        }
    },
    "custom_inputs": {
        "session_id": "abc123",
        "request_type": "explanation"
    }
})

print(f"Response: {response.output[0].content[0]['text']}")
print(f"\nCustom outputs: {response.custom_outputs}")
print("\n✅ Context-aware agent working!")

## 6. Complete Production Agent

A complete, deployable ChatCompletions agent.

In [None]:
%%writefile chat_completions_agent.py
"""Production ChatCompletions agent with ResponsesAgent."""

import os
from typing import Generator

import mlflow
from mlflow.entities.span import SpanType
from mlflow.pyfunc import ResponsesAgent
from mlflow.types.responses import (
    ResponsesAgentRequest,
    ResponsesAgentResponse,
    ResponsesAgentStreamEvent,
    output_to_responses_items_stream,
    to_chat_completions_input,
)
from openai import OpenAI


class ChatCompletionsAgent(ResponsesAgent):
    """
    Production-ready ChatCompletions agent.
    
    Features:
    - OpenAI ChatCompletions integration
    - Streaming support
    - Context-aware system prompts
    - Token usage tracking
    - Full MLflow tracing
    """
    
    def __init__(
        self,
        model: str = "gpt-4o-mini",
        system_prompt: str = None,
        temperature: float = 0.7,
        max_tokens: int = None,
    ):
        self.model = model
        self.client = OpenAI()
        self.system_prompt = system_prompt or "You are a helpful AI assistant."
        self.temperature = temperature
        self.max_tokens = max_tokens
    
    def _prepare_messages(self, request: ResponsesAgentRequest) -> list:
        """Prepare messages for ChatCompletions API."""
        messages = [{"role": "system", "content": self.system_prompt}]
        messages.extend(
            to_chat_completions_input([i.model_dump() for i in request.input])
        )
        return messages
    
    @mlflow.trace(span_type=SpanType.AGENT)
    def predict(self, request: ResponsesAgentRequest) -> ResponsesAgentResponse:
        """Non-streaming prediction."""
        messages = self._prepare_messages(request)
        
        # Build API kwargs
        api_kwargs = {
            "model": self.model,
            "messages": messages,
            "temperature": self.temperature,
        }
        if self.max_tokens:
            api_kwargs["max_tokens"] = self.max_tokens
        
        # Call API
        response = self.client.chat.completions.create(**api_kwargs)
        
        return ResponsesAgentResponse(
            output=[
                self.create_text_output_item(
                    text=response.choices[0].message.content,
                    id="msg_1",
                )
            ],
            custom_outputs={
                "model": self.model,
                "usage": {
                    "prompt_tokens": response.usage.prompt_tokens,
                    "completion_tokens": response.usage.completion_tokens,
                    "total_tokens": response.usage.total_tokens,
                },
                **getattr(request, 'custom_inputs', {})
            }
        )
    
    @mlflow.trace(span_type=SpanType.AGENT)
    def predict_stream(
        self, request: ResponsesAgentRequest
    ) -> Generator[ResponsesAgentStreamEvent, None, None]:
        """Streaming prediction."""
        messages = self._prepare_messages(request)
        
        api_kwargs = {
            "model": self.model,
            "messages": messages,
            "temperature": self.temperature,
            "stream": True,
        }
        if self.max_tokens:
            api_kwargs["max_tokens"] = self.max_tokens
        
        stream = self.client.chat.completions.create(**api_kwargs)
        
        yield from output_to_responses_items_stream(
            chunk.to_dict() for chunk in stream
        )


# Enable tracing
mlflow.openai.autolog()

# Create and set model
agent = ChatCompletionsAgent(
    model="gpt-4o-mini",
    system_prompt="You are a helpful AI assistant. Be concise and informative.",
    temperature=0.7,
)
mlflow.models.set_model(agent)

In [None]:
# Log the agent
with mlflow.start_run(run_name="chat_completions_agent") as run:
    model_info = mlflow.pyfunc.log_model(
        python_model="chat_completions_agent.py",
        artifact_path="agent",
        pip_requirements=[
            "mlflow",
            "openai",
            "pydantic>=2.0.0",
        ],
    )
    
    print(f"✅ ChatCompletions agent logged!")
    print(f"Model URI: {model_info.model_uri}")
    print(f"\nTo serve locally:")
    print(f"  mlflow models serve -m {model_info.model_uri} -p 5001")

In [None]:
# Test the logged model
loaded_model = mlflow.pyfunc.load_model(model_info.model_uri)

response = loaded_model.predict({
    "input": [{"role": "user", "content": "What is Python?"}]
})

print(f"Loaded model response: {response['output'][0]['content'][0]['text'][:200]}...")
print("\n✅ Model loaded and tested successfully!")

## Summary

### What We Learned:

1. ✅ **ChatCompletions vs Responses API**: Understanding the differences
2. ✅ **Basic integration**: Connecting ChatCompletions to ResponsesAgent
3. ✅ **Format conversion**: Using MLflow utilities
4. ✅ **Multi-turn conversations**: Handling history
5. ✅ **Context awareness**: Using custom inputs and context
6. ✅ **Production deployment**: Logging and serving

### Key Utilities:

| Utility | Purpose |
|---------|--------|
| `to_chat_completions_input()` | Convert ResponsesAgent input to ChatCompletions format |
| `output_to_responses_items_stream()` | Convert ChatCompletions stream to ResponsesAgent events |

### Next Steps:
- Explore OpenAI Responses API integration
- Learn about deployment options
- Build advanced agents with tools