<center>
    <p style="text-align:center">
        <img alt="phoenix logo" src="https://storage.googleapis.com/arize-phoenix-assets/assets/phoenix-logo-light.svg" width="200"/>
        <br>
        <a href="https://docs.arize.com/phoenix/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/phoenix">GitHub</a>
        |
        <a href="https://join.slack.com/t/arize-ai/shared_invite/zt-1px8dcmlf-fmThhDFD_V_48oU7ALan4Q">Community</a>
    </p>
</center>

# LangGraph: Evaluator‚ÄìOptimizer Loop
In this tutorial, we‚Äôll build a code generation feedback loop using LangGraph ‚Äî where a generator LLM writes code and an evaluator LLM provides structured reviews. This iterative pattern is useful for refining outputs over multiple steps until they meet a defined success criterion.

The workflow consists of:

A **generator LLM** that produces or revises code based on feedback.

An **evaluator LLM** that assigns a grade (pass or fail) and gives feedback if needed.

A **LangGraph state machine** that loops the generator until the evaluator approves the result.

To make this fully observable and production-grade, we‚Äôve instrumented the graph with Phoenix tracing. This enables you to inspect each generation and evaluation step, see what the model produced, and understand why it did (or didn‚Äôt) pass.

In [None]:
!pip install langgraph langchain langchain_community "arize-phoenix" arize-phoenix-otel openinference-instrumentation-langchain



In [None]:
!pip install langchain_openai



In [None]:
from langgraph.graph import StateGraph, START, END
import os, getpass

In [None]:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

OpenAI API Key:¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


# Configure Phoenix Tracing

Make sure you go to https://app.phoenix.arize.com/ and generate an API key. This will allow you to trace your Langgraph application with Phoenix.

In [None]:
PHOENIX_API_KEY = getpass.getpass("Phoenix API Key:")
os.environ["PHOENIX_CLIENT_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"

Phoenix API Key:¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑


In [None]:
from phoenix.otel import register

tracer_provider = register(
  project_name="Evaluator-Optimizer",
  auto_instrument=True
)

üî≠ OpenTelemetry Tracing Details üî≠
|  Phoenix Project: Evaluator-Optimizer
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: https://app.phoenix.arize.com/v1/traces
|  Transport: HTTP + protobuf
|  Transport Headers: {'api_key': '****'}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



# Evaluator‚ÄëOptimizer ‚Ä¢ Code‚ÄëWriting Loop
---------------------------------------
Input  : problem_spec (natural‚Äëlanguage description of the function/program)

Output : refined, accepted code (Python string)


In [None]:
from typing import Literal, TypedDict
from langgraph.graph import StateGraph, START, END
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_openai import ChatOpenAI




===== FINAL CODE =====

```javascript
// JavaScript code for a complicated website with features

// Function to display a welcome message
function displayWelcomeMessage() {
    alert("Welcome to our complicated website!");
}

// Function to toggle a menu
function toggleMenu() {
    var menu = document.getElementById("menu");
    if (menu.style.display === "none") {
        menu.style.display = "block";
    } else {
        menu.style.display = "none";
    }
}

// Function to validate a form
function validateForm() {
    var name = document.getElementById("name").value;
    var email = document.getElementById("email").value;

    if (name === "" || email === "") {
        alert("Please fill out all fields in the form.");
        return false;
    }

    return true;
}

// Event listener to trigger displayWelcomeMessage function on page load
document.addEventListener("DOMContentLoaded", function() {
    displayWelcomeMessage();
});

// Event listener to trigger toggleMenu function when 

LLMs
----
‚Ä¢ generator_llm : writes / rewrites the code

‚Ä¢ evaluator_llm : grades the code via structured output

In [None]:
generator_llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.3)
evaluator_llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)


To enable structured, reliable feedback from the evaluator LLM, we define a Pydantic schema called Review. This schema ensures that all evaluations include both a binary grade (pass or fail) and feedback if the code needs improvement.

By binding this schema to the evaluator LLM, we guarantee consistent output formatting and make it easier to route logic in the graph. This step is essential for closing the feedback loop and driving iterative optimization.

In [None]:
class Review(BaseModel):
    grade: Literal["pass", "fail"] = Field(
        description="Did the code fully solve the problem?"
    )
    feedback: str = Field(
        description="If grade=='fail', give concrete, actionable feedback."
    )

evaluator = evaluator_llm.with_structured_output(Review)


# Langgraph Shared State
Defines the shared state for the evaluator-optimizer loop, tracking the problem description, generated code, feedback, and evaluation grade.

In [None]:
class State(TypedDict):
    problem_spec: str
    code: str
    feedback: str
    grade: str  # pass / fail


# Node Functions: Generator & Evaluator
These nodes power the evaluator‚Äìoptimizer loop. The code_generator node uses the generator LLM to produce or revise Python code based on the task and prior feedback. The code_evaluator node uses a structured evaluator LLM to simulate a code review ‚Äî mentally testing the code against the spec and returning a binary grade along with constructive feedback if needed. This feedback is then looped back to the generator until the code passes.

In [None]:
def code_generator(state: State):
    """Write or refine code based on feedback."""
    prompt = (
        "You are an expert Python developer.\n"
        "Write clear, efficient, PEP‚Äë8 compliant code that solves the task below.\n"
        "If feedback is provided, revise the previous code accordingly.\n\n"
        f"### Task\n{state['problem_spec']}\n\n"
    )
    if state.get("feedback"):
        prompt += f"### Previous Reviewer Feedback\n{state['feedback']}\n"

    msg = generator_llm.invoke(prompt)
    return {"code": msg.content}


def code_evaluator(state: State):
    """LLM reviews the code solution."""
    review = evaluator.invoke(
        [
            SystemMessage(
                content=(
                    "You are a strict code reviewer. "
                    "Run mental tests / reasoning to decide whether the code meets the spec. "
                    "If it fails, give concise actionable feedback."
                )
            ),
            HumanMessage(
                content=(
                    f"### Problem\n{state['problem_spec']}\n\n"
                    f"### Candidate Code\n{state['code']}"
                )
            ),
        ]
    )
    return {"grade": review.grade, "feedback": review.feedback}



# Routing Logic
To support iterative refinement, we define a simple routing function called route. After the code is evaluated, this function checks the grade returned by the evaluator: if the code passes, the process ends; if it fails, control loops back to the generator for another revision. This logic ensures the LLM can continuously improve its output based on feedback until it meets the quality bar.

In [None]:
def route(state: State):
    return "Accept" if state["grade"] == "pass" else "Revise"


# Building the LangGraph
We now define our LangGraph-based workflow. Each component ‚Äî the generator and evaluator ‚Äî is added as a node. Directed edges specify the flow: the graph starts with generation, then moves to evaluation. Conditional edges use the routing logic to either terminate the loop (if passed) or revisit the generator (if failed). This structure enables self-correcting behavior with every iteration traceable via Phoenix.

In [None]:
builder = StateGraph(State)

builder.add_node("code_generator", code_generator)
builder.add_node("code_evaluator", code_evaluator)

builder.add_edge(START, "code_generator")
builder.add_edge("code_generator", "code_evaluator")
builder.add_conditional_edges(
    "code_evaluator",
    route,
    {
        "Accept": END,
        "Revise": "code_generator",
    },
)

workflow = builder.compile()


# Example Usage

In [None]:
problem = """
Write code for a complicated website with javascript features.
"""

result_state = workflow.invoke({"problem_spec": problem})
print("===== FINAL CODE =====\n")
print(result_state["code"])


# Make sure to check our your traces in Phoenix!