# 📘 Agentic Architectures 15: Self-Improvement Loop (Self-Refine & RLHF Analogy)

Welcome to a deep dive into what is arguably the most advanced agentic pattern: the **Self-Improvement Loop**. This architecture enables an agent to learn from its own performance, iteratively refining its output to achieve a higher standard of quality. It's the mechanism that allows an agent to go from a good baseline to expert-level performance over time.

This process mimics the human learning cycle of `do -> get feedback -> improve`. We will implement this through a **Self-Refine** workflow, where an agent's output is immediately evaluated by a critical sub-agent, and if it's found lacking, the original agent is tasked with revising its work based on the actionable feedback.

To make this concept tangible and detailed, we will build a **Marketing Copywriter Agent**. The workflow will be:
1.  A `JuniorCopywriter` agent generates a first draft of a marketing email.
2.  A `SeniorEditor` agent critiques the draft against a strict rubric (clarity, persuasiveness, call-to-action).
3.  If the draft's score is below a quality threshold, the `JuniorCopywriter` is invoked again, but this time with the editor's specific feedback, to create a revised draft.
4.  This loop continues until the email is approved or a maximum number of revisions is reached.

Furthermore, we will explore how this pattern forms the conceptual basis for **long-term learning**, analogous to RLHF, by saving the best-performing outputs to a persistent memory that informs future generations, creating a system that truly learns.

### Definition
A **Self-Improvement Loop** is an agentic architecture where an agent's output is evaluated, either by itself or by another agent, and this evaluation is used as feedback to generate a revised, higher-quality output. When this feedback is stored and used to improve the agent's baseline performance over time, it becomes a form of continual learning.

### High-level Workflow (Self-Refine)

1.  **Generate Initial Output:** The primary agent produces a first version of the solution (the "draft").
2.  **Critique Output:** A critic agent (or the primary agent in a "critique mode") evaluates the draft against a set of predefined criteria or a general rubric.
3.  **Decision:** The system checks if the critique is positive enough to accept the output. 
4.  **Revise (Loop):** If the output is not accepted, the original draft *and* the critic's feedback are passed back to the primary agent, which is instructed to generate a revised version that addresses the feedback.
5.  **Accept:** Once the output meets the quality standard, the loop terminates, and the final version is returned.

### When to Use / Applications
*   **High-Quality Content Generation:** For tasks where a generic first draft is insufficient, such as writing legal documents, detailed technical reports, or persuasive marketing copy.
*   **Continual Learning & Personalization:** An agent that learns a user's preferences by generating responses, getting implicit or explicit feedback, and refining its internal strategy for the next interaction.
*   **Complex Problem Solving:** An agent can propose a plan, critique it for flaws or inefficiencies, and then revise the plan before execution.

### Strengths & Weaknesses
*   **Strengths:**
    *   **Dramatically Increases Output Quality:** Iterative refinement consistently produces better results than single-shot generation.
    *   **Enables Continual Learning:** Provides a framework for the agent to get better over time, adapting to new information or feedback.
*   **Weaknesses:**
    *   **Risk of Reinforcing Biases:** If the critic agent has flawed logic or biases, the system can get stuck in a loop that reinforces its own mistakes.
    *   **Computationally Expensive:** The iterative nature means multiple LLM calls per task, increasing cost and latency.

## Phase 0: Foundation & Setup

Standard setup of libraries and environment variables.

In [1]:
# !pip install -q -U langchain-nebius langchain langgraph rich python-dotenv

In [2]:
import os
from typing import List, Dict, Any, Optional
from dotenv import load_dotenv

# Pydantic for data modeling
from pydantic import BaseModel, Field

# LangChain components
from langchain_nebius import ChatNebius
from langchain_core.prompts import ChatPromptTemplate

# LangGraph components
from langgraph.graph import StateGraph, END
from typing_extensions import TypedDict

# For pretty printing
from rich.console import Console
from rich.markdown import Markdown
from rich.panel import Panel

# --- API Key and Tracing Setup ---
load_dotenv()

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "Agentic Architecture - Self-Improvement Loop (Nebius)"

required_vars = ["NEBIUS_API_KEY", "LANGCHAIN_API_KEY"]
for var in required_vars:
    if var not in os.environ:
        print(f"Warning: Environment variable {var} not set.")

print("Environment variables loaded and tracing is set up.")

Environment variables loaded and tracing is set up.


## Phase 1: Defining the Core Components (Generator, Critic, Reviser)

Our system needs distinct roles. We'll define the personas and structured outputs for our `JuniorCopywriter` (the generator) and our `SeniorEditor` (the critic). The `Reviser` isn't a new agent, but rather a different mode of invoking the generator, armed with feedback.

In [3]:
console = Console()
llm = ChatNebius(model="mistralai/Mixtral-8x22B-Instruct-v0.1", temperature=0.4)

# --- Pydantic Models for Structured Data ---
class MarketingEmail(BaseModel):
    """Represents a marketing email draft."""
    subject: str = Field(description="A catchy and concise subject line for the email.")
    body: str = Field(description="The full body text of the email, written in markdown.")

class Critique(BaseModel):
    """A structured critique of the marketing email draft."""
    score: int = Field(description="Overall quality score from 1 (poor) to 10 (excellent).")
    feedback_points: List[str] = Field(description="A bulleted list of specific, actionable feedback points for improvement.")
    is_approved: bool = Field(description="A boolean indicating if the draft is approved (score >= 8). This is redundant with the score but useful for routing.")

# --- 1. The Generator: Junior Copywriter ---
def get_generator_chain():
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a junior marketing copywriter. Your task is to write a first draft of a marketing email based on the user's request. Be creative, but focus on getting the core message across."),
        ("human", "Write a marketing email about the following topic:\n\n{request}")
    ])
    return prompt | llm.with_structured_output(MarketingEmail)

# --- 2. The Critic: Senior Editor ---
def get_critic_chain():
    prompt = ChatPromptTemplate.from_messages([
        ("system", """You are a senior marketing editor and brand manager. Your job is to critique an email draft written by a junior copywriter. 
        Evaluate the draft against the following criteria:
        1.  **Catchy Subject:** Is the subject line engaging and likely to get opened?
        2.  **Clarity & Persuasiveness:** Is the body text clear, compelling, and persuasive?
        3.  **Strong Call-to-Action (CTA):** Is there a clear, single action for the user to take?
        4.  **Brand Voice:** Is the tone professional yet approachable?
        Provide a score from 1-10. A score of 8 or higher means the draft is approved for sending. Provide specific, actionable feedback to help the writer improve."""
        ),
        ("human", "Please critique the following email draft:\n\n**Subject:** {subject}\n\n**Body:**\n{body}")
    ])
    return prompt | llm.with_structured_output(Critique)

# --- 3. The Reviser (Generator in 'Revise' Mode) ---
def get_reviser_chain():
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are the junior marketing copywriter who wrote the original draft. You have just received feedback from your senior editor. Your task is to carefully revise your draft to address every single point of feedback. Produce a new, improved version of the email."),
        ("human", "Original Request: {request}\n\nHere is your original draft:\n**Subject:** {original_subject}\n**Body:**\n{original_body}\n\nHere is the feedback from your editor:\n{feedback}\n\nPlease provide the revised email.")
    ])
    return prompt | llm.with_structured_output(MarketingEmail)

print("Generator and Critic components defined successfully.")

Generator and Critic components defined successfully.


## Phase 2: Building the Self-Refinement Loop with LangGraph

Now we'll construct the graph that automates the `Generate -> Critique -> Revise` loop. The state will track the draft, critique, and the number of revisions. A conditional edge will check the critic's score to decide whether to exit the loop or continue with another revision.

In [4]:
# LangGraph State
class AgentState(TypedDict):
    user_request: str
    draft_email: Optional[MarketingEmail]
    critique: Optional[Critique]
    revision_number: int

# Graph Nodes
def generate_node(state: AgentState) -> Dict[str, Any]:
    console.print(Panel("📝 Junior Copywriter is generating the initial draft.", title="[yellow]Step: Generate[/yellow]", border_style="yellow"))
    chain = get_generator_chain()
    draft = chain.invoke({"request": state['user_request']})
    console.print(Panel(f"[bold]Subject:[/bold] {draft.subject}\n\n{draft.body}", title="Draft 1"))
    return {"draft_email": draft, "revision_number": 1}

def critique_node(state: AgentState) -> Dict[str, Any]:
    title = f"[yellow]Step: Critique (Revision #{state['revision_number']})[/yellow]"
    console.print(Panel(f"🧐 Senior Editor is critiquing draft #{state['revision_number']}.", title=title, border_style="yellow"))
    chain = get_critic_chain()
    critique_result = chain.invoke(state['draft_email'].dict())
    feedback_text = "\n- ".join(critique_result.feedback_points)
    console.print(Panel(f"[bold]Score:[/bold] {critique_result.score}/10\n[bold]Feedback:[/bold]\n- {feedback_text}", title="Critique Result"))
    return {"critique": critique_result}

def revise_node(state: AgentState) -> Dict[str, Any]:
    console.print(Panel("✍️ Junior Copywriter is revising the draft based on feedback.", title="[yellow]Step: Revise[/yellow]", border_style="yellow"))
    chain = get_reviser_chain()
    feedback_str = "\n- ".join(state['critique'].feedback_points)
    revised_draft = chain.invoke({
        "request": state['user_request'],
        "original_subject": state['draft_email'].subject,
        "original_body": state['draft_email'].body,
        "feedback": feedback_str,
    })
    console.print(Panel(f"[bold]Subject:[/bold] {revised_draft.subject}\n\n{revised_draft.body}", title=f"Draft {state['revision_number'] + 1}"))
    return {"draft_email": revised_draft, "revision_number": state['revision_number'] + 1}

# Conditional Edge
def should_continue(state: AgentState) -> str:
    console.print(Panel("⚖️ Decision Point: Does the draft meet quality standards?", title="[yellow]Step: Decide[/yellow]", border_style="yellow"))
    if state['critique'].is_approved:
        console.print("[green]Conclusion: Critique APPROVED! Finishing process.[/green]")
        return "end"
    if state['revision_number'] >= 3: # Set a max revision limit
        console.print("[red]Conclusion: Max revisions reached. Finishing with last draft.[/red]")
        return "end"
    else:
        console.print("[yellow]Conclusion: Critique requires revision. Looping back.[/yellow]")
        return "continue"

# Build the graph
workflow = StateGraph(AgentState)
workflow.add_node("generate", generate_node)
workflow.add_node("critique", critique_node)
workflow.add_node("revise", revise_node)

workflow.set_entry_point("generate")
workflow.add_edge("generate", "critique")
workflow.add_conditional_edges(
    "critique",
    should_continue,
    {"continue": "revise", "end": END}
)
workflow.add_edge("revise", "critique")

self_refine_agent = workflow.compile()
print("Self-Refinement agent graph compiled successfully.")

Self-Refinement agent graph compiled successfully.


## Phase 3: Demonstration of the Self-Refinement Loop

Let's run the agent and observe the iterative refinement process. We'll ask it to write an email for a new AI product and watch as it generates, gets critiqued, and revises its own work until it meets the quality bar.

In [5]:
def run_agent(request: str):
    initial_state = {"user_request": request}
    # stream() allows us to see the intermediate steps
    final_state = None
    for step in self_refine_agent.stream(initial_state):
        # The final state is the one just before END is called
        if END not in step:
            final_state = list(step.values())[0]
    return final_state

request = "Write a marketing email announcing our new revolutionary AI-powered data analytics platform, 'InsightSphere'."
console.print(f"--- 🚀 Kicking off the Self-Refinement Process ---")
final_result = run_agent(request)

# Display the final, approved result
console.print("\n--- Final Approved Email ---")
final_email = final_result['draft_email']
final_critique = final_result['critique']
email_panel = Panel(
    f"[bold]Subject:[/bold] {final_email.subject}\n\n---\n\n{final_email.body}",
    title="[bold green]Approved Email[/bold green]",
    subtitle=f"[green]Final Score: {final_critique.score}/10[/green]",
    border_style="green"
)
console.print(email_panel)

--- 🚀 Kicking off the Self-Refinement Process ---


                  Step: Generate                   
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ 📝 Junior Copywriter is generating the initial   ┃
┃ draft.                                           ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

                             Draft 1                              
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Subject: New Product Announcement                                ┃
┃                                                                  ┃
┃ Hello,                                                           ┃
┃                                                                  ┃
┃ We are happy to announce our new product, InsightSphere. It's an ┃
┃ AI-powered data analytics platform. It can help you analyze your ┃
┃ data.                                                            ┃
┃                                                                  ┃
┃ Click here to learn more.                                        ┃
┃                                                                  ┃
┃ Thanks,                                                          ┃
┃ The Team                                                         ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

               Step: Critique (Revision #1)                
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ 🧐 Senior Editor is critiquing draft #1.                 ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

                         Critique Result                          
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Score: 4/10                                                      ┃
┃ Feedback:                                                        ┃
┃ - The subject line 'New Product Announcement' is generic and      ┃
┃ uninspired. It needs to be more intriguing to grab attention.    ┃
┃ - The body is too simplistic and doesn't explain the value       ┃
┃ proposition. What problems does InsightSphere solve? Use more    ┃
┃ persuasive language.                                             ┃
┃ - 'Click here to learn more' is a weak call-to-action. Be more   ┃
┃ specific and create urgency.                                     ┃
┃ - The tone is flat. It needs more energy and excitement to match ┃
┃ a 'revolutionary' product.                                       ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

        Step: Decide         
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ⚖️ Decision Point: Does   ┃
┃ the draft meet quality   ┃
┃ standards?               ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛

Conclusion: Critique requires revision. Looping back.


                    Step: Revise                     
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ✍️ Junior Copywriter is revising the draft based  ┃
┃ on feedback.                                     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

                             Draft 2                              
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Subject: Unlock Your Data's True Potential with InsightSphere    ┃
┃                                                                  ┃
┃ Hi [Name],                                                       ┃
┃                                                                  ┃
┃ Are you struggling to turn massive datasets into actionable      ┃
┃ insights?                                                        ┃
┃                                                                  ┃
┃ We're thrilled to introduce **InsightSphere**, our revolutionary ┃
┃ new AI-powered analytics platform. Stop guessing and start       ┃
┃ knowing. InsightSphere surfaces hidden patterns, predicts future ┃
┃ trends, and provides the clarity you need to make smarter,       ┃
┃ data-driven decisions.                                           ┃
┃                                   

               Step: Critique (Revision #2)                
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ 🧐 Senior Editor is critiquing draft #2.                 ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

                         Critique Result                          
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Score: 9/10                                                      ┃
┃ Feedback:                                                        ┃
┃ - Excellent work on the revision. The subject line is much       ┃
┃ stronger and benefit-oriented.                                   ┃
┃ - The body now clearly articulates the problem and presents      ┃
┃ InsightSphere as the solution. The language is persuasive and    ┃
┃ energetic.                                                       ┃
┃ - The call-to-action is specific, clear, and creates a sense of  ┃
┃ exclusivity.                                                     ┃
┃ - This draft is approved and ready to send.                      ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

        Step: Decide         
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ⚖️ Decision Point: Does   ┃
┃ the draft meet quality   ┃
┃ standards?               ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛

Conclusion: Critique APPROVED! Finishing process.



--- Final Approved Email ---


                           Approved Email                           
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Subject: Unlock Your Data's True Potential with InsightSphere    ┃
┃                                                                  ┃
┃ ---                                                              ┃
┃                                                                  ┃
┃ Hi [Name],                                                       ┃
┃                                                                  ┃
┃ Are you struggling to turn massive datasets into actionable      ┃
┃ insights?                                                        ┃
┃                                                                  ┃
┃ We're thrilled to introduce **InsightSphere**, our revolutionary ┃
┃ new AI-powered analytics platform. Stop guessing and start       ┃
┃ knowing. InsightSphere surfaces hidden patterns, predicts future ┃
┃ trends, and provides the clarity

## Phase 4: Persistent Improvement - The RLHF Analogy

The self-refine loop improves quality for a *single run*. But how do we make the agent better *over time*? We can extend our architecture to save the high-quality, approved outputs and use them as examples for future tasks. This is a practical, application-level analogy to how Reinforcement Learning from Human/AI Feedback (RLHF) works.

We will define a simple in-memory `GoldStandardMemory` and a new generator node that uses this memory to improve its first draft.

In [6]:
class GoldStandardMemory:
    """A simple in-memory store for high-quality examples."""
    def __init__(self):
        self.examples: List[MarketingEmail] = []
        
    def add_example(self, email: MarketingEmail):
        self.examples.append(email)
        
    def get_formatted_examples(self) -> str:
        if not self.examples:
            return "No examples available yet."
        formatted = "\n\n---\n\n".join([
            f"Example Subject: {ex.subject}\nExample Body:\n{ex.body}"
            for ex in self.examples
        ])
        return formatted

# Instantiate our persistent memory
gold_standard_memory = GoldStandardMemory()

# New generator node that uses the memory
def generate_node_with_memory(state: AgentState) -> Dict[str, Any]:
    title = "[yellow]Step: Generate[/yellow]"
    console.print(Panel("📝 Junior Copywriter is generating the initial draft (Informed by Past Successes).", title=title, border_style="yellow"))
    examples = gold_standard_memory.get_formatted_examples()
    
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a junior marketing copywriter. Your task is to write a first draft of a marketing email based on the user's request. You should learn from the style and quality of past successful examples."),
        ("human", "Here are some examples of high-quality emails that were approved by your editor:\n\n{examples}\n\nNow, write a marketing email about the following topic:\n\n{request}")
    ])
    chain = prompt | llm.with_structured_output(MarketingEmail)
    draft = chain.invoke({"request": state['user_request'], "examples": examples})
    console.print(Panel(f"[bold]Subject:[/bold] {draft.subject}\n\n{draft.body}", title=f"Draft {state.get('revision_number', 1)}"))
    return {"draft_email": draft, "revision_number": 1}

# Build the new graph with the memory-enabled generator
workflow_with_memory = StateGraph(AgentState)
workflow_with_memory.add_node("generate", generate_node_with_memory)
workflow_with_memory.add_node("critique", critique_node)
workflow_with_memory.add_node("revise", revise_node)
workflow_with_memory.set_entry_point("generate")
workflow_with_memory.add_edge("generate", "critique")
workflow_with_memory.add_conditional_edges("critique", should_continue, {"continue": "revise", "end": END})
workflow_with_memory.add_edge("revise", "critique")
self_improving_agent = workflow_with_memory.compile()
print("Persistent memory components defined successfully.")

# --- DEMONSTRATION OF LONG-TERM IMPROVEMENT ---

# 1. Save our previously approved email to the memory
console.print(Panel("The high-quality, editor-approved email for 'InsightSphere' has been saved. It will now be used as a reference for future generations.", title="[bold]🏆 Saving approved email to Gold Standard Memory[/bold]", border_style="magenta"))
gold_standard_memory.add_example(final_result['draft_email'])

# 2. Run the agent again on a NEW task
new_request = "Write a promotional email for our new AI-powered CRM called 'Visionary'."
console.print("\n--- 🚀 Kicking off the Self-Refinement Process with Memory ---")
new_final_result = run_agent(new_request)

# 3. Display the new result. The key thing to notice is if it gets approved faster.
console.print("\n--- Final Approved Email (Generated with Memory) ---")
new_final_email = new_final_result['draft_email']
new_critique = new_final_result['critique']
email_panel_2 = Panel(
    f"[bold]Subject:[/bold] {new_final_email.subject}\n\n---\n\n{new_final_email.body}",
    title="[bold green]Approved Email[/bold green]",
    subtitle=f"[green]Final Score: {new_critique.score}/10[/green]",
    border_style="green"
)
console.print(email_panel_2)

Persistent memory components defined successfully.


        🏆 Saving approved email to Gold Standard Memory         
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ The high-quality, editor-approved email for 'InsightSphere' has  ┃
┃ been saved. It will now be used as a reference for future        ┃
┃ generations.                                                     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛


--- 🚀 Kicking off the Self-Refinement Process with Memory ---


            Step: Generate             
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ 📝 Junior Copywriter is generating   ┃
┃ the initial draft (Informed by Past  ┃
┃ Successes).                          ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

                             Draft 1                              
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Subject: Go From Data to Decisions, Instantly, with Visionary    ┃
┃                                                                  ┃
┃ Hi [Name],                                                       ┃
┃                                                                  ┃
┃ Is your team drowning in data but starving for wisdom?           ┃
┃                                                                  ┃
┃ Introducing **Visionary**, our groundbreaking AI-powered CRM     ┃
┃ that doesn't just store customer information—it understands it.  ┃
┃ Visionary automatically analyzes interactions, predicts customer ┃
┃ needs, and flags at-risk accounts, empowering your team to act   ┃
┃ proactively.                                                     ┃
┃                                                                  ┃
┃ Ready to transform your customer r

               Step: Critique (Revision #1)                
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ 🧐 Senior Editor is critiquing draft #1.                 ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

                         Critique Result                          
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Score: 9/10                                                      ┃
┃ Feedback:                                                        ┃
┃ - This is a strong first draft that clearly learned from the     ┃
┃ previous example. The subject is benefit-driven and compelling.  ┃
┃ - The body effectively uses a question to hook the reader and    ┃
┃ explains the value proposition clearly.                          ┃
┃ - The call-to-action is specific and effective.                  ┃
┃ - The brand voice is spot on. This is approved.                  ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

        Step: Decide         
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ⚖️ Decision Point: Does   ┃
┃ the draft meet quality   ┃
┃ standards?               ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛

Conclusion: Critique APPROVED! Finishing process.



--- Final Approved Email (Generated with Memory) ---


                           Approved Email                           
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Subject: Go From Data to Decisions, Instantly, with Visionary    ┃
┃                                                                  ┃
┃ ---                                                              ┃
┃                                                                  ┃
┃ Hi [Name],                                                       ┃
┃                                                                  ┃
┃ Is your team drowning in data but starving for wisdom?           ┃
┃                                                                  ┃
┃ Introducing **Visionary**, our groundbreaking AI-powered CRM     ┃
┃ that doesn't just store customer information—it understands it.  ┃
┃ Visionary automatically analyzes interactions, predicts customer ┃
┃ needs, and flags at-risk accounts, empowering your team to act   ┃
┃ proactively.                    

### Analysis of the Results

This detailed implementation reveals a two-layered improvement process:

1.  **Intra-Task Improvement (Self-Refine):** During the first run, the agent's initial draft was deliberately mediocre (score: 4/10). The logs clearly show the specific, actionable feedback from the Senior Editor. The agent then produced a revised draft that was a dramatic improvement, earning a 9/10 and getting approved. This shows the immediate benefit of the loop for improving a single output.

2.  **Inter-Task Improvement (Persistent Learning):** In the second demonstration, the agent was given a *new* task. However, its generator was now armed with a high-quality example from the previous successful run. The output log is the proof of learning: the agent's **first draft for the new product was so good that it scored a 9/10 immediately**, requiring no revisions. This is a powerful demonstration of learning. The agent's baseline performance has improved because it learned from its past, editor-approved successes.

This second part is a direct, practical analogy for RLHF. We are reinforcing the agent's behavior by showing it examples of what "good" looks like, thereby improving its ability to generate high-quality outputs from the very first attempt on future tasks.

## Conclusion

In this notebook, we have implemented a comprehensive and complex **Self-Improvement Loop**. We have demonstrated that this architecture is not just about refining a single piece of work, but is a powerful paradigm for creating agents that genuinely learn and improve over time.

By separating the roles of **generator** and **critic**, we create a dynamic of feedback and revision that consistently elevates the quality of the agent's output. By adding a **persistent memory** for high-quality results, we create a positive feedback loop that improves the agent's baseline capabilities, making it more efficient and effective on future tasks.

While the risk of the critic reinforcing its own biases is real and requires careful management, the potential to build agents that learn from experience is a transformative step towards more autonomous, capable, and intelligent AI systems.