<center>
    <p style="text-align:center">
        <img alt="phoenix logo" src="https://storage.googleapis.com/arize-phoenix-assets/assets/phoenix-logo-light.svg" width="200"/>
        <br>
        <a href="https://docs.arize.com/phoenix/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/phoenix">GitHub</a>
        |
        <a href="https://join.slack.com/t/arize-ai/shared_invite/zt-1px8dcmlf-fmThhDFD_V_48oU7ALan4Q">Community</a>
    </p>
</center>

# Langgraph - Prompt Chaining

This notebook demonstrates how to use prompt chaining with LangGraph to build a multi-step email assistant. The assistant guides the writing process through three distinct stages:

- Generating an outline based on subject and bullet points

- Writing the initial draft using the outline and desired tone

- Refining tone if needed to match the specified style

This approach enables fine-grained control over the content generation process by decomposing the task into logical steps. Each stage in the graph is handled by a separate node, enabling targeted prompting, intermediate outputs, and conditional logic.

In addition, the entire workflow is instrumented with Phoenix, which provides OpenTelemetry-powered tracing and debugging. You can inspect each step’s inputs, outputs, and transitions directly in the Phoenix UI to identify bottlenecks or missteps in generation.

In [1]:
!pip install langgraph langchain langchain_community "arize-phoenix==9.0.1" arize-phoenix-otel openinference-instrumentation-langchain



This is a template for prompt chaining with LangGraph. It is an email writer, with 3 steps: writing an outline, writing the email, and refining tone.

In [2]:
from langgraph.graph import StateGraph, START, END
import os, getpass

In [3]:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

OpenAI API Key:··········


# Configure Phoenix Tracing

Make sure you go to https://app.phoenix.arize.com/ and generate an API key. This will allow you to trace your Langgraph application with Phoenix.

In [4]:
PHOENIX_API_KEY = getpass.getpass("Phoenix API Key:")
os.environ["PHOENIX_CLIENT_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"

Phoenix API Key:··········


In [5]:
from phoenix.otel import register

tracer_provider = register(
  project_name="Prompt Chaining",
  auto_instrument=True
)

🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: Prompt Chaining
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: https://app.phoenix.arize.com/v1/traces
|  Transport: HTTP + protobuf
|  Transport Headers: {'api_key': '****'}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



# LLM of choice

In [None]:
from typing_extensions import TypedDict, Literal
from IPython.display import Image, display

from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.3)


  llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.3)


# Defining Graph State

The *EmailState* defines the shared memory for our email-writing agent. Each field represents the evolving state of the email — starting from the user’s initial inputs (subject, notes, tone) and gradually building up through the stages of outline generation, drafting, and final tone refinement. This state dictionary is passed between nodes to ensure context is maintained and updated incrementally throughout the workflow.

In [None]:
class EmailState(TypedDict, total=False):
    subject: str
    bullet_points: str           # raw user notes
    desired_tone: str            # "formal", "friendly", etc.
    outline: str                 # result of node 1
    draft_email: str             # result of node 2
    final_email: str             # after tone reformer (if needed)


# Step-by-Step Prompt Chain: Outline → Draft → Tone Check

This workflow chains multiple LLM calls to transform raw notes into a polished email:

**generate_outline**: Converts user bullet points into a structured outline.

**write_email**: Expands the outline into a complete email draft using the desired tone.

**tone_gate**: Checks if the draft meets the requested tone using a lightweight LLM classification.

**reform_tone**: If the tone doesn't match, this node rewrites the draft while preserving the content.

Each node is modular, enabling targeted debugging and reuse across different tasks or formats. This multi-step refinement mirrors human drafting processes and produces higher-quality outputs.

In [None]:
def generate_outline(state: EmailState) -> EmailState:
    """LLM call 1 – produce an outline from bullet points."""
    prompt = (
        "Create a concise outline for an email.\n"
        f"Subject: {state['subject']}\n"
        f"Bullet points:\n{state['bullet_points']}\n"
        "Return the outline as numbered points."
    )
    outline = llm.invoke(prompt).content
    return {"outline": outline}

def write_email(state: EmailState) -> EmailState:
    """LLM call 2 – write the email from the outline."""
    prompt = (
        f"Write a complete email using this outline:\n{state['outline']}\n\n"
        f"Tone: {state['desired_tone']}\n"
        "Start with a greeting, respect professional formatting, and keep it concise."
    )
    email = llm.invoke(prompt).content
    return {"draft_email": email}

def tone_gate(state: EmailState) -> Literal["Pass", "Fail"]:
    """
    Gate – quick heuristic:
      Pass  → email already includes the required tone keyword.
      Fail  → otherwise (we’ll ask another LLM call to adjust).
    """
    tone_keyword = state["desired_tone"].lower()
    prompt = (
        f"Check whether the following email matches the desired tone {state['desired_tone']}:\n\n"
        f"{state['draft_email']}\n"
        f"If it does, return 'Pass'. Otherwise, return 'Fail'."
    )
    return llm.invoke(prompt).content.strip(
    )

def reform_tone(state: EmailState) -> EmailState:
    """LLM call 3 – rewrite the email to fit the desired tone."""
    prompt = (
        f"Reform the following email so it has a {state['desired_tone']} tone.\n\n"
        f"EMAIL:\n{state['draft_email']}\n\n"
        "Keep content the same but adjust phrasing, word choice, and sign‑off."
    )
    final_email = llm.invoke(prompt).content
    return {"final_email": final_email}


# Compiling the Email Prompt Chain with LangGraph

Here we assemble the full LangGraph that represents our email generation pipeline. The graph begins at the outline_generator, moves to the email_writer, and conditionally routes to tone_reformer only if the tone check fails. This structure demonstrates the prompt chaining pattern with a dynamic control flow—adapting based on the model’s output. Once compiled, this graph can be invoked on user input and traced using Phoenix for debugging or optimization.

In [None]:
graph = StateGraph(EmailState)

graph.add_node("outline_generator", generate_outline)
graph.add_node("email_writer", write_email)
graph.add_node("tone_reformer", reform_tone)

# edges
graph.add_edge(START, "outline_generator")
graph.add_edge("outline_generator", "email_writer")
graph.add_conditional_edges(
    "email_writer",
    tone_gate,
    {"Pass": END, "Fail": "tone_reformer"},
)
graph.add_edge("tone_reformer", END)

email_chain = graph.compile()

# ────────────────────────────────────────────────
# 4. Visualize & run once
# ────────────────────────────────────────────────
# display(Image(email_chain.get_graph().draw_mermaid_png()))


# Example Usage

In [None]:
with tracer.start_as_current_span("llm_response") as span:
  initial_state = email_chain.invoke(
      {
          "subject": "Quarterly Sales Recap & Next Steps",
          "bullet_points": "- Q1 revenue up 18%\n- Need feedback on new pricing tiers\n- Reminder: submit pipeline forecasts by Friday",
          "desired_tone": "friendly",
      }
  )

print("\n========== EMAIL ==========")
print(initial_state.get("final_email", initial_state["draft_email"]))



Subject: Quarterly Sales Recap & Next Steps

Dear Team,

I hope this email finds you well! As we wrap up the first quarter, I wanted to take a moment to share our performance highlights and outline some important next steps.

I'm pleased to report that we achieved an impressive 18% increase in revenue for Q1. This is a fantastic accomplishment, and it reflects the hard work and dedication each of you has put in. Thank you for your efforts!

As we continue to refine our offerings, I would appreciate your feedback on the new pricing tiers we introduced last month. Your insights are invaluable in helping us ensure that we meet our customers' needs effectively.

Additionally, please remember to submit your pipeline forecasts by this Friday. Your timely updates are crucial for our planning and strategy moving forward.

Thank you all for your continued commitment and collaboration. I look forward to your feedback and forecasts!

Best regards,

[Your Name]  
[Your Position]  
[Your Company] 

# Make sure to view your traces in Phoenix!

# Let's add some Evaluations (Evals)

In this section we will evaluate each agent's success at its defined task.

- Outline Generation: Evaluating clarity/structure and relevance.
- Email Writing: Evaluating grammar/spelling and content coherence.
- Tone Checking: Evaluating whether tone detection correctly passed or failed.
- Tone Refinement: Evaluating if the rewritten email matches the desired tone.

In [24]:
OUTLINE_EVAL_TEMPLATE = """
You are evaluating an outline created to help write an email.

Email Subject: {subject}
Bullet Points: {bullet_points}

Generated Outline:
{outline}

Please assess the outline based on the following criteria:
1. Clarity & Structure – Is it logically organized and easy to follow?
2. Relevance – Does it fully reflect the bullet points provided?

Return one of:
- 0/2 if both are poor
- 1/2 if only one is good
- 2/2 if both are good
"""

EMAIL_EVAL_TEMPLATE = """
You are evaluating the quality of an email.

Outline Used:
{outline}

Drafted Email:
{final_email}

Evaluate the email on:
1. Grammar and Spelling – Are there noticeable errors?
2. Content Coherence – Does it follow the structure and meaning of the outline?

Return one of:
- 0/2 if both are poor
- 1/2 if only one is good
- 2/2 if both are good
"""

TONE_REFORM_EVAL_TEMPLATE = """
You are evaluating whether the tone of the final email matches the target tone.

Desired Tone: {desired_tone}

Final Email:
{final_email}

Does the final email match the desired tone?

Return "yes" or "no"
"""


# Pull Spans from Phoenix

In [25]:
import phoenix as px
df = px.Client().get_spans_dataframe("name == 'LangGraph'", project_name='Prompt Chaining')
df



Unnamed: 0_level_0,name,span_kind,parent_id,start_time,end_time,status_code,status_message,events,context.span_id,context.trace_id,attributes.openinference.span.kind,attributes.output.mime_type,attributes.input.value,attributes.output.value,attributes.input.mime_type
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
16bb067fdf50d19d,LangGraph,CHAIN,,2025-05-02 22:13:39.870811+00:00,2025-05-02 22:13:45.778529+00:00,OK,,[],16bb067fdf50d19d,ed9ac22d6349d9a2d88054715d435515,CHAIN,application/json,"{""subject"": ""Quarterly Sales Recap & Next Step...","{""subject"": ""Quarterly Sales Recap & Next Step...",application/json
4f9867761d02d85d,LangGraph,CHAIN,,2025-05-12 22:05:11.196072+00:00,2025-05-12 22:05:19.379758+00:00,OK,,[],4f9867761d02d85d,5113991de83b8775f3e86ad5f5b51d55,CHAIN,application/json,"{""subject"": ""Quarterly Sales Recap & Next Step...","{""subject"": ""Quarterly Sales Recap & Next Step...",application/json


In [12]:
df.to_csv("prompt_chaining_spans")

# Custom setup for email evaluation

In [26]:
import pandas as pd, json, re
from phoenix.evals import llm_classify, OpenAIModel

def unpack(row):
    blob = json.loads(row["attributes.output.value"])
    return pd.Series(
        {
            "subject":       blob.get("subject"),
            "bullet_points": blob.get("bullet_points"),
            "outline":       blob.get("outline"),
            "desired_tone":  blob.get("desired_tone"),
            "draft_email":   blob.get("draft_email"),
            "final_email_raw": blob.get("final_email"),     # may be None / ""
        }
    )

df = df.join(df.apply(unpack, axis=1))


def pick_email(row):
    fe = row["final_email_raw"]
    if fe and fe.strip():
        return fe
    return row["draft_email"]

df["final_email"] = df.apply(pick_email, axis=1)
df

# Generate Evals

In [36]:
model = OpenAIModel(model="gpt-4o")

outline_results = llm_classify(
    dataframe=df.dropna(subset=["outline"]),
    template=OUTLINE_EVAL_TEMPLATE,
    rails=["0/2", "1/2", "2/2"],
    model=model,
    provide_explanation=True,
    include_prompt=True,
)

email_results = llm_classify(
    dataframe=df.dropna(subset=["final_email"]),
    template=EMAIL_EVAL_TEMPLATE,
    rails=["0/2", "1/2", "2/2"],
    model=model,
    provide_explanation=True,
    include_prompt=True,
)

tone_results = llm_classify(
    dataframe=df.dropna(subset=["final_email"]),
    template=TONE_REFORM_EVAL_TEMPLATE,
    rails=["yes", "no"],
    model=model,
    provide_explanation=True,
    include_prompt=True,
)

  outline_results = llm_classify(


llm_classify |          | 0/2 (0.0%) | ⏳ 00:00<? | ?it/s

  email_results = llm_classify(


llm_classify |          | 0/2 (0.0%) | ⏳ 00:00<? | ?it/s

  tone_results = llm_classify(


llm_classify |          | 0/2 (0.0%) | ⏳ 00:00<? | ?it/s

Unnamed: 0_level_0,label,explanation,prompt,exceptions,execution_status,execution_seconds
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
16bb067fdf50d19d,2/2,The outline is logically organized and easy to...,\nYou are evaluating an outline created to hel...,[],COMPLETED,1.637126
4f9867761d02d85d,2/2,The outline is logically organized and easy to...,\nYou are evaluating an outline created to hel...,[],COMPLETED,1.857567


# Export Evals to Phoenix!

In [38]:
from phoenix.trace import SpanEvaluations

outline_results.drop(columns=["prompt", "exceptions", "execution_status", "execution_seconds"], inplace=True)
email_results.drop(columns=["prompt", "exceptions", "execution_status", "execution_seconds"], inplace=True)
tone_results.drop(columns=["prompt", "exceptions", "execution_status", "execution_seconds"], inplace=True)



px.Client().log_evaluations(
    SpanEvaluations(eval_name="Outline Evaluation", dataframe=outline_results)
)
px.Client().log_evaluations(
    SpanEvaluations(eval_name="Email Quality Evaluation", dataframe=email_results)
)
px.Client().log_evaluations(
    SpanEvaluations(eval_name="Tone Evaluation", dataframe=tone_results)
)

