# <center>OpenAI agent pattern: evaluator optimizer agent</center>

A starter guide for building an agent which iteratively generates an output based on LLM feedback using the `openai-agents` library.

When creating LLM outputs, often times the first generation is unsatisfactory. You can use an agentic loop to iteratively improve the output by asking an LLM to give feedback, and then use the feedback to improve the output.

In the following example, we'll build a financial report system using this pattern:
1.  **Report Agent (Generation):** Creates a report on a particular stock ticker.
2.  **Evaluator Agent (Feedback):** Evaluates the report and provides feedback on what to improve.

### Install Libraries

In [1]:
# Install base libraries for OpenAI
!pip install -q openai openai-agents pydantic

# Install optional libraries for OpenInference/OpenTelemetry tracing
!pip install -q arize-phoenix-otel openinference-instrumentation-openai-agents openinference-instrumentation-openai

### Setup Keys

Add your OpenAI API key to the environment variable `OPENAI_API_KEY`.

Copy your Phoenix `API_KEY` from your settings page at [app.phoenix.arize.com](https://app.phoenix.arize.com).

In [3]:
import os
from getpass import getpass

os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"
if not os.environ.get("PHOENIX_CLIENT_HEADERS"):
    os.environ["PHOENIX_CLIENT_HEADERS"] = "api_key=" + getpass("Enter your Phoenix API key: ")

OPENAI_API_KEY = globals().get("OPENAI_API_KEY") or getpass(
    "🔑 Enter your OpenAI API key: "
)
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

### Setup Tracing

In [4]:
from phoenix.otel import register

tracer_provider = register(
    project_name="openai-agents",
    endpoint="https://app.phoenix.arize.com/v1/traces",
    auto_instrument=True,
)

🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: openai-agents
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: https://app.phoenix.arize.com/v1/traces
|  Transport: HTTP + protobuf
|  Transport Headers: {'api_key': '****'}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



## Creating the agent



In [6]:
from textwrap import dedent
from agents import Agent, Runner, TResponseInputItem
from pydantic import BaseModel, Field
from typing import Literal

CATALYSTS = """topline revenue growth, margin expansion, moat expansion, free cash flow generation, usage, pricing, distribution, share buyback, dividend, new products, regulation, competition, management team, mergers, acquisitions, analyst ratings, trading volume, technical indicators, price momentum"""

class EvaluationFeedback(BaseModel):
    feedback: str = Field(
        description=f"What is missing from the research report on positive and negative catalysts for a particular stock ticker. Catalysts include changes in {CATALYSTS}."
    )
    score: Literal["pass", "needs_improvement", "fail"] = Field(
        description="A score on the research report. Pass if the report is complete and contains at least 3 positive and 3 negative catalysts for the right stock ticker, needs_improvement if the report is missing some information, and fail if the report is completely wrong."
    )

report_agent = Agent(
    name="Catalyst Report Agent",
    instructions=dedent("""You are a research assistant specializing in stock research. Given a stock ticker, generate a report of 3 positive and 3 negative catalysts that could move the stock price in the future in 50 words or less."""),
    model="gpt-4.1",
)

evaluation_agent = Agent(
    name="Evaluation Agent",
    instructions=dedent("""You are a senior financial analyst. You will be provided with a stock research report with positive and negative catalysts. Your task is to evaluate the report and provide feedback on what to improve."""),
    model="gpt-4.1",
    output_type=EvaluationFeedback,
)

report_feedback = "fail"
input_items: list[TResponseInputItem] = [{"content": "AAPL", "role": "user"}]

while report_feedback != "pass":
    report = await Runner.run(report_agent, input_items)
    print("### REPORT ###")
    print(report.final_output)
    input_items = report.to_input_list()

    evaluation = await Runner.run(evaluation_agent, str(report.final_output))
    evaluation_feedback = evaluation.final_output_as(EvaluationFeedback)
    print("### EVALUATION ###")
    print(str(evaluation_feedback))
    report_feedback = evaluation_feedback.score

    if report_feedback != "pass":
        print("Re-running with feedback")
        input_items.append({"content": f"Feedback: {evaluation_feedback.feedback}", "role": "user"})

### REPORT ###
**AAPL (Apple Inc.)**

**Positive Catalysts:**
1. Strong iPhone/Services sales growth.
2. Launch of new product categories (e.g., AR/VR headset, AI features).
3. Expansion in emerging markets.

**Negative Catalysts:**
1. Supply chain disruptions, especially from China.
2. Regulatory scrutiny regarding App Store practices.
3. Slowing consumer demand for premium devices.
### EVALUATION ###
feedback='The research report correctly identifies three positive and three negative catalysts related to AAPL. Positive catalysts cover topline revenue growth via iPhone and Services sales, new product launches, and international expansion. Negative catalysts address supply chain, regulation, and demand trends. However, the report could be improved by including more detail and diversity in catalyst types. For example, margin expansion or contraction, free cash flow trends, usage/pricing/distribution changes, capital return policies (buybacks, dividends), competitive landscape, managemen