<a href="https://colab.research.google.com/github/Sam-Gartenstein/GenAI-Engineering-Workshop/blob/main/Part%202%3A%20Using%20LLM%20Agents%20for%20Essay%20Generation%20and%20Evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Part 2: Using LLM Agents for Synthetic Data Generation

You have completed [Part 1: Guide to OpenAI in Google Colab](https://github.com/Sam-Gartenstein/GenAI-Engineering-Workshop/blob/main/Part%201%3A%20Guide%20to%20OpenAI%20in%20Google%20Colab.ipynb). Now, let’s apply those skills to a real-world use case.  

In this section, you’ll learn about **[role-based collaboration](https://www.ibm.com/think/topics/multi-agent-collaboration)**, a framework where multiple LLMs work together by taking on distinct roles with specific purposes. Each agent performs its own task and passes the result to the next, creating a seamless workflow powered by **prompt chaining**.  

We will use generative AI to:

- **Create** two 11th-grade English essays of varying quality (Student Agent)  
- **Design** a rubric to evaluate the essay (Rubric Agent)  
- **Grade** the essay using the rubric (Grading Agent)  

By the end of this section, you’ll understand how to design and connect multiple role-based agents to simulate a realistic classroom workflow — from writing to evaluation to grading.


First, let's repeat the steps we followed in the previous notebook, which was importing the necessary libraries and loading our key.

**Note**: Once you have made a key in colab, it will automatically save, meaning you can use it in other notebooks. When running line two for the first time, you will get a pop up window asking you to grant access.

In [27]:
import openai
from openai import OpenAI
from google.colab import userdata
import os
from typing import Tuple, Optional, Dict, Any

In [28]:
# Pull your saved secret into an environment variable
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")

# Test if the key is available (without printing it)
if not os.getenv("OPENAI_API_KEY"):
    raise RuntimeError("OPENAI_API_KEY is not set. Add it via Colab Secrets (🔑) and try again.")
else:
    print("Key loaded?", True)

Key loaded? True


In [29]:
client = OpenAI()

## Text Generation Function

Let’s create **one** function for generating all outputs. It combines a **system role** (agent persona) with a **task prompt**, and can optionally append extra **sections** (e.g., a rubric and an essay) as clearly labeled blocks. This single helper can write essays, build rubrics, or grade work—just by changing the role, prompt, and sections. It also returns **usage** (token counts) so you can track how many tokens you’re using and estimate cost.

**What it does**
- Sends a single request to the model with:
  - **System role**: defines the voice/perspective (Student, Teacher, Grader, etc.).
  - **Prompt**: the task/instructions (e.g., “Evaluate the essay using the rubric…”).
  - **Sections (optional)**: a mapping of `Label → Content` appended below the prompt (e.g., `{"RUBRIC": ..., "ESSAY TO GRADE": ...}`).

**Returns**
- **Text**: the generated output as a string.
- **Usage**: a dict of token metadata from the API (input/output/total tokens), useful for logging and cost estimation.

**Why this design**
- **One function, many tasks**: swap in different roles/prompts without new code.
- **Structured context**: labeled sections keep long inputs organized.
- **Cost-aware**: the returned `usage` enables token tracking and pricing.

In [30]:
def text_generation(
    system_role: str,
    prompt: str,
    sections: dict | None = None,
    model: str = "gpt-4o-mini",
    temperature: float = 0.2,
):
    user_parts = [prompt]
    for label, content in (sections or {}).items():
        user_parts.append(f"{label}:\n{content}")
    user_msg = "\n\n".join(user_parts)

    resp = client.responses.create(
        model=model,
        input=[
            {"role": "system", "content": system_role},
            {"role": "user", "content": user_msg},
        ],
        temperature=temperature,
    )
    text = (getattr(resp, "output_text", "") or "").strip()
    usage = getattr(resp, "usage", None)
    if usage is not None and not isinstance(usage, dict):
        usage = getattr(usage, "__dict__", None)
    return text, usage


In [31]:
from IPython.display import display, Markdown  # Tools for displaying formatted text in Jupyter Notebooks

def to_markdown(text):
    # Convert the provided text to Markdown format for better display in Jupyter Notebooks
    return Markdown(text)


### Cost Summarization


This helper summarizes token usage and estimates **total cost (USD)** for a model call.

**What it uses**
- The API’s `usage` payload to read **input_tokens** (prompt) and **output_tokens** (completion).
- A built-in **pricing table (per 1M tokens)** for common models (e.g., `gpt-4o-mini`, `gpt-4o`, `gpt-3.5-turbo`).  
  If the model isn’t found, it defaults to `gpt-4o-mini`.

**Returns**
- `model`: the model name used for pricing lookup  
- `input_tokens`: tokens consumed by your messages  
- `output_tokens`: tokens generated by the model  
- `total_tokens`: input + output  
- `cost_usd`: estimated cost, rounded to 6 decimals

**Why it’s useful**
- Quick **cost visibility** per call
- Consistent **logging/analytics** across notebooks
- Works with different SDK field names (`input_tokens`/`prompt_tokens`, `output_tokens`/`completion_tokens`)


In [32]:
def summarize_usage_cost(usage: dict, model: str = "gpt-4o-mini") -> dict:
    """
    Summarize input/output token usage and total cost (USD).

    Args:
        usage (dict): Usage info returned by the API (includes input_tokens, output_tokens, etc.).
        model (str): Model name for pricing lookup.

    Returns:
        dict: Contains input_tokens, output_tokens, total_tokens, cost_usd.
    """
    # --- pricing table (per 1M tokens) ---
    pricing = {
        "gpt-4o-mini": {"input": 0.15 / 1_000_000, "output": 0.60 / 1_000_000},
        "gpt-4o": {"input": 2.50 / 1_000_000, "output": 10.00 / 1_000_000},
        "gpt-3.5-turbo": {"input": 0.50 / 1_000_000, "output": 1.50 / 1_000_000},
    }

    model_rate = pricing.get(model, pricing["gpt-4o-mini"])  # default if unknown

    # --- extract token counts safely ---
    input_tokens = usage.get("input_tokens") or usage.get("prompt_tokens", 0)
    output_tokens = usage.get("output_tokens") or usage.get("completion_tokens", 0)
    total_tokens = input_tokens + output_tokens

    # --- compute cost ---
    cost_usd = (
        input_tokens * model_rate["input"]
        + output_tokens * model_rate["output"]
    )

    return {
        "model": model,
        "input_tokens": input_tokens,
        "output_tokens": output_tokens,
        "total_tokens": total_tokens,
        "cost_usd": round(cost_usd, 6),
    }

## Essay Generation

Let's start by generating two essays on the same topic — **Shakespeare’s *Othello***. We’ll create one **A-level** and one **C-level** essay (see this [article](https://usahello.org/education/children/grade-levels/) for an overview of the U.S. grading system) to compare how writing quality changes when the same Student Agent follows different performance prompts.


### Student Agent

We’ll begin by creating a **Student Agent**, which represents a student completing an academic assignment. This agent dynamically adapts to different **subjects**, **grade levels**, **assignment types**, and **topics** based on the parameters passed into the `build_student_agent` function.  

The Student Agent defines the writer’s persona — realistic, thoughtful, and age-appropriate — ensuring consistent tone and perspective across assignments of varying difficulty or subject matter. By adjusting parameters such as `grade_level`, `subject`, `assignment_type`, and `topic`, you can simulate diverse classroom contexts (e.g., a 9th-grade science lab report or a 12th-grade history essay).  

This modular setup makes the Student Agent both **flexible** and **reusable**, allowing you to maintain a natural, student-like writing style across multiple educational use cases.





In [33]:
def build_student_agent(grade_level: str, subject: str, assignment_type: str, topic: str) -> str:
    """
    Build a student persona for role-based generation.
    Keeps voice/level in the system message and defers specifics to the prompt.
    """
    return (
        f"You are a {grade_level} student completing a {assignment_type} for your {subject} class "
        f"on the topic: {topic} "
        "Write in a realistic student voice—thoughtful but not overly advanced. "
        "Follow the assignment prompt’s instructions (length, structure, citation style). "
        "Use clear, organized paragraphs and avoid jargon unless requested."
    )

### 11th Grade English Student

In this example, we create a **Student Agent** representing an 11th-grade student writing an **English essay** on the topic: *“How does Iago exploit Othello’s insecurities to drive the tragedy in **Othello**?”* We’ll call this variable `english_student_11`.

This agent captures a realistic, age-appropriate writing voice — thoughtful but not overly advanced — suitable for a high school English student analyzing a complex literary text.


In [34]:
grade_level = "11th grade"
subject = "English"
assignment_type = "essay"
topic = "How does Iago exploit Othello’s insecurities to drive the tragedy in *Othello*?"

enlgish_student_11 = build_student_agent(grade_level, subject, assignment_type, topic)

### A-Level Prompt

Now that we’ve defined the **Student Agent** for an 11th-grade English student writing about *Othello*, we can move on to creating the **A-Level Prompt**, which provides the model with detailed writing instructions. We’ll call this prompt `essay_A_level_instructions`.  

The A-Level Prompt directs the model to produce an outstanding literary analysis essay that demonstrates deep understanding, originality, and strong analytical reasoning. It requires a clear, arguable thesis placed at the end of the introduction and supported by concise quotations and insightful interpretation.  

Writing at this level should be polished and formal — avoiding casual tone, first person, and rhetorical questions. Each paragraph should begin with a strong topic sentence and connect logically to the thesis. The essay should be **450–600 words** and cite evidence in *(act.scene.line)* format (e.g., *3.3.167*), resulting in a cohesive, high-quality literary analysis suitable for an advanced high school student.


In [35]:
essay_A_level_instructions = (
    "Write an A-level literary analysis essay that follows the detailed guidelines below.\n\n"
    "What 'A-level' means: excellent or outstanding work — demonstrates a deep and "
    "original understanding of the text, presents a clear and arguable thesis, and supports claims with precise "
    "evidence and insightful analysis. Writing should be organized, polished, and stylistically sophisticated.\n\n"
    "OUTPUT REQUIREMENTS:\n"
    "- Include a clear, arguable thesis at the END of the introduction.\n"
    "- Use concise, well-chosen quotations from the play and explain HOW they support your argument "
    "(analysis > summary).\n"
    "- Ensure each paragraph has a strong topic sentence and smooth transitions; every idea should advance the thesis.\n"
    "- Maintain a precise, academic tone throughout; avoid first person, rhetorical questions, or casual phrasing.\n"
    "- Aim for ~450–600 words.\n"
    "- Cite quotations using act.scene.line format (e.g., 3.3.167).\n"
)


In [36]:
A_level_essay, A_level_usage = text_generation(
    system_role=enlgish_student_11,
    prompt=essay_A_level_instructions,
    model="gpt-4o-mini",
    temperature=0.3,
)

to_markdown(A_level_essay)

In William Shakespeare's *Othello*, the character of Iago serves as the architect of tragedy, manipulating the insecurities of Othello to orchestrate his downfall. Iago's cunning exploitation of Othello's vulnerabilities—particularly his racial insecurities, his trust in others, and his fear of inadequacy—reveals how easily a person's weaknesses can be twisted to serve malicious ends. Through Iago's machinations, Shakespeare illustrates the destructive power of manipulation and the tragic consequences of unchecked jealousy and self-doubt. Ultimately, Iago's exploitation of Othello's insecurities not only drives the plot but also underscores the play's central themes of trust, betrayal, and the fragility of human relationships.

One of the most significant insecurities that Iago exploits is Othello's racial identity. As a Moor in Venetian society, Othello is acutely aware of his outsider status, which Iago uses to his advantage. Iago frequently refers to Othello in derogatory terms, such as when he describes him as "the Moor" (1.1.58), stripping him of his individuality and reinforcing his status as an outsider. This constant reminder of Othello's race feeds into his insecurities, making him more susceptible to Iago's insinuations about Desdemona's fidelity. For instance, when Iago suggests that Desdemona's love for Othello is unnatural, he plays on Othello's fears that he is unworthy of her affection. Iago states, "Blessed fig's-end! The wine she drinks is made of grapes" (2.3.319), implying that Desdemona's love is superficial and that she could easily be swayed by someone of higher status. This manipulation deepens Othello's insecurities, leading him to doubt Desdemona's loyalty and ultimately driving him to jealousy and rage.

Moreover, Iago exploits Othello's inherent trust in those around him, particularly in Iago himself. Othello's belief in Iago as an honest and loyal friend makes him vulnerable to manipulation. Iago's reputation as "honest Iago" (1.3.7) allows him to deceive Othello without raising suspicion. When Iago plants the seed of doubt regarding Desdemona's fidelity, Othello's trust in Iago blinds him to the truth. For example, Iago's assertion that he has seen Cassio with Desdemona leads Othello to demand proof, showing how Iago's manipulation has taken root in Othello's mind (3.3.373-375). This reliance on Iago's perceived honesty ultimately proves fatal, as Othello's trust becomes a weapon against him, allowing Iago to orchestrate his tragic downfall.

Lastly, Othello's fear of inadequacy plays a crucial role in Iago's exploitation of his character. Othello's status as a military leader does not shield him from feelings of inferiority, especially in relation to his marriage to Desdemona. Iago preys on this fear, suggesting that Othello is not deserving of Desdemona's love. The pivotal moment occurs when Othello, consumed by jealousy, asserts, "I am not what I am" (1.1.65), reflecting his internal struggle with identity and self-worth. This statement encapsulates Othello's conflict; he is torn between his noble persona and the insecurities that Iago amplifies. As Othello succumbs to Iago's manipulations, his fear of inadequacy transforms into a destructive jealousy that leads to tragic consequences, including the murder of Desdemona and his own demise.

In conclusion, Iago's exploitation of Othello's insecurities is central to the tragedy of *Othello*. By manipulating Othello's racial identity, undermining his trust in others, and amplifying his fears of inadequacy, Iago orchestrates a series of events that culminate in destruction and despair. Shakespeare's portrayal of this manipulation serves as a poignant reminder of the vulnerabilities inherent in human nature and the devastating effects of betrayal. Through Iago's cunning, the play illustrates how easily trust can be shattered and how insecurities can lead to one's downfall, making *Othello* a timeless exploration of the darker aspects of the human psyche.

#### Cost Summarization

In [37]:
summarize_usage_cost(A_level_usage)

{'model': 'gpt-4o-mini',
 'input_tokens': 268,
 'output_tokens': 885,
 'total_tokens': 1153,
 'cost_usd': 0.000571}

-----

### C-Level Prompt

The **C-Level Prompt** instructs the model to produce an average or barely passing essay. A *C-level* essay demonstrates basic comprehension of the topic but lacks depth, organization, and stylistic control. It may have a weak or missing thesis, minimal or poorly integrated evidence, repetitive ideas, and noticeable grammar or usage mistakes. We will call this `essay_C_level_instructions`.

Writing at this level should sound more casual and unpolished — including filler phrases such as “I think” or “in my opinion” and avoiding strong transitions like “however” or “therefore.” The essay should be short, approximately four paragraphs (200–300 words), and contain a few minor grammatical errors that sound natural for a struggling student writer.  

Like before, we provide **explicit instructions** in the prompt to guide the model’s behavior. The detailed output requirements ensure the essay intentionally reflects weaker academic performance while still maintaining a realistic student voice.

In [38]:
essay_C_level_instructions = (
    "Write an essay responding to the question below.\n\n"
    "IMPORTANT: Produce a deliberately poor-quality **C-level** essay for teaching purposes.\n"
    "What 'C-level' means: average or barely passing work — shows some understanding "
    "but lacks depth or polish; weak thesis or none, little/no evidence, poor organization, repetition/vagueness, "
    "and noticeable grammar/style mistakes.\n\n"
    "OUTPUT REQUIREMENTS:\n"
    "- Return ONLY the essay body (no headings or bullets).\n"
    "- Do NOT provide a clear thesis; keep the main claim vague.\n"
    "- Provide minimal or poorly integrated evidence (generic statements are fine; avoid direct quotations).\n"
    "- Organization should be weak (some repetition or loosely connected ideas is acceptable).\n"
    "- Keep it short: 4 paragraphs (~200–300 words).\n"
    "- Include a few minor grammar/usage mistakes naturally (e.g., comma splices, agreement issues).\n"
    "- Casual, somewhat imprecise tone is acceptable.\n"
    "- Use a few filler phrases like 'I think' or 'in my opinion,' and avoid strong transitions like 'however' or 'therefore.'"
)


C_level_essay, C_level_usage = text_generation(
    system_role=enlgish_student_11,
    prompt=essay_C_level_instructions,
    model="gpt-4o-mini",
    temperature=0.3,
)


In [39]:
to_markdown(C_level_essay)

In *Othello*, Iago really takes advantage of Othello’s insecurities. Othello is a great general, but he has a lot of doubts about himself, especially because he is not from Venice and is black. Iago knows this and uses it to make Othello feel less confident. I think this is important because it shows how Iago manipulates Othello. He tells Othello things that make him question his wife Desdemona’s loyalty. This is a big deal because Othello loves her a lot, and when Iago plants these seeds of doubt, it really messes with Othello’s mind.

Iago also plays on Othello’s fear of being an outsider. Since Othello is not from Venice, he feels like he has to prove himself all the time. Iago uses this to make Othello think that Desdemona would want someone who is more Venetian, like Cassio. This is kind of mean, but it works. Othello starts to believe that Desdemona could betray him, which is really sad. I mean, it’s like Iago is just using Othello’s own thoughts against him, and that’s pretty cruel.

Another thing is that Othello’s jealousy is something Iago really exploits. Othello becomes more and more jealous as Iago feeds him lies about Desdemona. This jealousy makes Othello act irrationally, which is what Iago wants. I think it’s interesting how Iago doesn’t even have to do much; he just nudges Othello in the wrong direction, and Othello does the rest. It’s like Iago is a puppet master, and Othello is the puppet.

In conclusion, Iago’s manipulation of Othello’s insecurities leads to the tragedy in the play. Othello’s doubts about himself and his relationship with Desdemona are what Iago uses to create chaos. It’s really a sad story because Othello is a good person, but Iago’s tricks ruin everything. I think this shows how easily someone can be influenced by their insecurities, which is something that can happen in real life too.

#### Cost Summarization

In [40]:
summarize_usage_cost(C_level_usage)

{'model': 'gpt-4o-mini',
 'input_tokens': 300,
 'output_tokens': 452,
 'total_tokens': 752,
 'cost_usd': 0.000316}

## Rubric Generation


Great! We’ve now generated two essays on the same topic but with different performance levels. The next step is to create a **rubric** that can be used to evaluate them.


### Rubric Agent Function

The `build_rubric_agent` function creates a **teacher persona** that specializes in building grading rubrics. It takes in the **grade level**, **subject**, **assignment type**, and **topic** to generate a clear system message describing the teacher’s role.  

This teacher is instructed to design a **student-facing rubric** with 4–6 criteria and A/B/C/D performance levels. The rubric should use concise, observable, and consistent language, focusing only on defining expectations — not grading specific work.


In [41]:
def build_rubric_agent(grade_level: str, subject: str, assignment_type: str, topic: str) -> str:
    """
    Build a teacher persona for rubric generation.
    Keeps guidance in the system message; the task/prompt comes separately.
    """
    return (
        f"You are a {subject} teacher creating a grading rubric for a {assignment_type} "
        f"on the topic: {topic}, for {grade_level} students. "
        "Be clear and student-facing."
    )

In [42]:
rubric_agent = build_rubric_agent(
    grade_level="11th grade",
    subject="English",
    assignment_type="essay",
    topic="How does Iago exploit Othello’s insecurities to drive the tragedy in *Othello*?"
)

In [43]:
rubric_instructions = (
    "Create a detailed, student-facing grading rubric tailored to the given topic and grade level. "
    "The rubric should include **4–6 criteria** that capture key dimensions of performance "
    "(e.g., Thesis, Evidence, Analysis, Organization, Mechanics).\n\n"
    "For each criterion, provide concise and specific descriptors for **A**, **B**, **C**, **D**, and **F** levels, "
    "focusing on observable behaviors and measurable writing qualities rather than vague judgments.\n\n"
    "Return the rubric in clean Markdown format using this structure:\n\n"
    "## Rubric\n"
    "### Criterion Name\n"
    "- **Weight (%):** [assign a reasonable percentage]\n"
    "- **A:** [clear, exceptional performance descriptor]\n"
    "- **B:** [strong, competent performance descriptor]\n"
    "- **C:** [basic or developing performance descriptor]\n"
    "- **D:** [limited or below-standard performance descriptor]\n"
    "- **F:** [incomplete, missing, or unsatisfactory work]\n\n"
    "After listing all criteria, include a **Notes** section summarizing key expectations and overall grading guidance. "
    "Ensure the tone remains student-friendly, consistent, and easy to understand."
)


In [44]:
english_rubric, rubric_usage = text_generation(
    system_role=rubric_agent,
    prompt=rubric_instructions,
    model="gpt-4o-mini",
    temperature=0.3,
)

to_markdown(english_rubric)

## Rubric

### Thesis
- **Weight (%):** 20%
- **A:** Presents a clear, insightful, and original thesis that directly addresses how Iago exploits Othello's insecurities.
- **B:** States a clear thesis that addresses the prompt but may lack depth or originality.
- **C:** Presents a thesis that is somewhat clear but may be vague or only partially address the prompt.
- **D:** Thesis is unclear, weak, or does not address the prompt adequately.
- **F:** No thesis statement is present.

### Evidence
- **Weight (%):** 25%
- **A:** Provides multiple, relevant, and well-integrated textual examples that strongly support the thesis.
- **B:** Uses relevant textual examples that support the thesis, though some may lack integration or depth.
- **C:** Includes some textual evidence, but it may be limited, vague, or not effectively connected to the thesis.
- **D:** Provides minimal or irrelevant evidence that does not support the thesis.
- **F:** No evidence is provided.

### Analysis
- **Weight (%):** 25%
- **A:** Offers insightful and thorough analysis of how Iago exploits Othello's insecurities, demonstrating deep understanding of the text.
- **B:** Provides competent analysis that explains how Iago exploits Othello's insecurities, though it may lack depth or insight.
- **C:** Analysis is present but may be superficial or only partially explain the connection between Iago's actions and Othello's insecurities.
- **D:** Limited analysis that fails to connect Iago's actions to Othello's insecurities clearly.
- **F:** No analysis is provided.

### Organization
- **Weight (%):** 15%
- **A:** Essay is well-organized with a clear structure, including a strong introduction, coherent body paragraphs, and a conclusive ending.
- **B:** Generally organized with a clear structure, though some transitions or connections between ideas may be weak.
- **C:** Organization is apparent but may be inconsistent or unclear, making it difficult to follow the argument.
- **D:** Lacks clear organization, making it hard to understand the flow of ideas.
- **F:** No discernible organization is present.

### Mechanics
- **Weight (%):** 15%
- **A:** Writing is free of grammatical, spelling, and punctuation errors; demonstrates a high level of polish and professionalism.
- **B:** Few minor errors in grammar, spelling, or punctuation that do not detract from the overall clarity of the essay.
- **C:** Noticeable errors in grammar, spelling, or punctuation that may distract the reader but do not significantly impede understanding.
- **D:** Frequent errors in grammar, spelling, or punctuation that make the essay difficult to read.
- **F:** Numerous errors that severely hinder understanding.

## Notes
When writing your essay, remember to clearly state your thesis early on and support it with strong evidence from the text. Make sure to analyze that evidence thoroughly, explaining how it connects to Iago's manipulation of Othello's insecurities. Organize your essay logically, using clear transitions between ideas, and proofread for mechanical errors. Aim for clarity and depth in your writing to effectively convey your understanding of the tragedy in *Othello*. Good luck!

In [45]:
summarize_usage_cost(rubric_usage)

{'model': 'gpt-4o-mini',
 'input_tokens': 275,
 'output_tokens': 679,
 'total_tokens': 954,
 'cost_usd': 0.000449}

----

## Essay Grading

Now we can use the LLM to **grade the essay**. This step allows us to evaluate whether the generated writing aligns with the expected performance level (A or C) and provides structured, rubric-based feedback.

### Grading Agent

Next, we’ll create a **Grading Agent**, which also takes on the teacher persona but focuses specifically on evaluating student work. This agent uses the rubric and grading instructions to assess the essay’s strengths and weaknesses across multiple categories, assigning both a letter grade and a numerical score.

In [46]:
def build_grading_agent(grade_level: str, subject: str, assignment_type: str, topic: str) -> str:
    """
    Build a teacher persona for grading essay.
    Keeps guidance in the system message; the task/prompt comes separately.
    """
    return (
        f"You are a {subject} teacher carefully grading a {assignment_type} "
        f"on the topic: {topic}, for {grade_level} students. "
        "Be fair, consistent, and concise."
    )

In [47]:
grading_agent = build_grading_agent(
    grade_level="11th grade",
    subject="English",
    assignment_type="essay",
    topic="How does Iago exploit Othello’s insecurities to drive the tragedy in *Othello*?"
)

In [48]:
grading_instructions = (
    "Evaluate the essay using the provided rubric.\n\n"
    "OUTPUT FORMAT (Markdown):\n"
    "## Overall Grade\n"
    "- **Level:** A|B|C|D|F\n"
    "- **Score:** [numeric total]/100\n"
    "- **Summary (2–3 sentences):** …\n\n"
    "## Category Breakdown\n"
    "For each category from the rubric (in the same order), include:\n"
    "- **Level:** A|B|C|D|F\n"
    "- **Score:** [earned points]/[category total]\n"
    "- **Justification (1–2 sentences):** specific and tied directly to the rubric\n\n"
    "Rules:\n"
    "- Mirror the rubric categories exactly and in order.\n"
    "- Be concise, evidence-based, and professional.\n"
    "- Do not invent categories that are not in the rubric.\n"
    "- Ensure total points across all categories add up to 100."
)


### Grading A Level Essay

In [49]:
A_level_essay_graded, A_level_essay_graded_usage = text_generation(
    system_role=grading_agent,
    prompt=grading_instructions,
    sections={"RUBRIC": english_rubric, "ESSAY TO GRADE": A_level_essay},
    temperature=0.2,
)

to_markdown(A_level_essay_graded)

## Overall Grade
- **Level:** A
- **Score:** 95/100
- **Summary:** This essay presents a clear and insightful thesis regarding Iago's exploitation of Othello's insecurities, supported by relevant textual evidence and thorough analysis. The organization is strong, and the writing is polished, making it an exemplary response to the prompt.

## Category Breakdown
- **Thesis**
  - **Level:** A
  - **Score:** 20/20
  - **Justification:** The thesis is clear, insightful, and directly addresses how Iago exploits Othello's insecurities, setting a strong foundation for the essay.

- **Evidence**
  - **Level:** A
  - **Score:** 23/25
  - **Justification:** The essay provides multiple relevant textual examples that are well-integrated and support the thesis effectively, though a bit more depth in one or two examples could enhance the argument.

- **Analysis**
  - **Level:** A
  - **Score:** 23/25
  - **Justification:** The analysis is thorough and insightful, demonstrating a deep understanding of how Iago manipulates Othello's insecurities. It connects Iago's actions to Othello's vulnerabilities effectively.

- **Organization**
  - **Level:** A
  - **Score:** 15/15
  - **Justification:** The essay is well-organized with a clear structure, including a strong introduction, coherent body paragraphs, and a conclusive ending that ties back to the thesis.

- **Mechanics**
  - **Level:** A
  - **Score:** 14/15
  - **Justification:** The writing is mostly free of grammatical, spelling, and punctuation errors, demonstrating a high level of polish, with only minor issues that do not detract from clarity.

Overall, this essay effectively addresses the prompt and showcases a strong understanding of the text and its themes. Great job!

In [50]:
summarize_usage_cost(A_level_essay_graded_usage)

{'model': 'gpt-4o-mini',
 'input_tokens': 1795,
 'output_tokens': 403,
 'total_tokens': 2198,
 'cost_usd': 0.000511}

#### Analysis

The grading agent correctly assigned this essay an **A-level score**, aligning with the intended prompt for a high-performing literary analysis. The feedback highlights the essay’s strengths: a clear and insightful thesis, well-chosen textual evidence, and deep analytical engagement with Iago’s manipulation of Othello’s insecurities. The small deductions in **Evidence** and **Mechanics** categories show thoughtful nuance — recognizing that even excellent essays can improve slightly in depth and polish.

This evaluation demonstrates that the grading agent effectively applied the rubric, identifying both the essay’s advanced qualities and minor areas for refinement. The result confirms that the **A-level prompt** successfully produced writing consistent with top-tier academic standards, and that the rubric and grading instructions are well-calibrated for distinguishing high performance.



### Grading C Level Essay

In [51]:
C_level_essay_graded, C_level_essay_graded_usage = text_generation(
    system_role=grading_agent,
    prompt=grading_instructions,
    sections={"RUBRIC": english_rubric, "ESSAY TO GRADE": C_level_essay},
    temperature=0.2,
)

to_markdown(C_level_essay_graded)

## Overall Grade
- **Level:** B
- **Score:** 80/100
- **Summary:** The essay presents a clear thesis regarding Iago's exploitation of Othello's insecurities, supported by relevant examples. However, the analysis lacks depth and insight, and the organization could be improved for better clarity.

## Category Breakdown
- **Thesis**
  - **Level:** B
  - **Score:** 16/20
  - **Justification:** The thesis is clear and addresses the prompt, but it lacks depth and originality in its expression.

- **Evidence**
  - **Level:** B
  - **Score:** 20/25
  - **Justification:** The essay provides relevant textual examples that support the thesis, though some examples could be better integrated into the argument.

- **Analysis**
  - **Level:** C
  - **Score:** 18/25
  - **Justification:** While the analysis identifies how Iago exploits Othello's insecurities, it is somewhat superficial and lacks deeper insight into the implications of these actions.

- **Organization**
  - **Level:** B
  - **Score:** 12/15
  - **Justification:** The essay is generally organized with a clear structure, but transitions between ideas could be smoother to enhance the flow of the argument.

- **Mechanics**
  - **Level:** B
  - **Score:** 14/15
  - **Justification:** The writing is mostly free of grammatical errors, with only minor issues that do not significantly detract from clarity.

Overall, the essay demonstrates a solid understanding of the topic but would benefit from deeper analysis and more polished organization.

In [52]:
summarize_usage_cost(C_level_essay_graded_usage)

{'model': 'gpt-4o-mini',
 'input_tokens': 1362,
 'output_tokens': 348,
 'total_tokens': 1710,
 'cost_usd': 0.000413}

#### Analysis

Even though we requested a C-level essay, the grading agent gave this essay a B-level score! This suggests that our **C-level prompt** may not have produced writing that was weak enough to clearly align with average or below-average performance. It’s also possible that the **rubric** or **grading instructions** were slightly too lenient.  

This highlights an important part of prompt design — even small wording changes can shift how the model interprets quality. To improve alignment, we could make the C-level prompt more restrictive (e.g., emphasize poor organization, missing evidence, and vague claims) or adjust the grading rubric to enforce stricter distinctions between B- and C-level criteria.  


## Forward

You have now seen how LLMs can be used to create agents that **write, evaluate, and grade essays** through **role-based collaboration**. Each agent — the Student, Teacher, and Grading Agent — performs a distinct function, working together in a connected workflow powered by **prompt chaining**. This approach allows outputs from one model to serve as inputs for another, mirroring how humans collaborate in structured academic tasks like writing and assessment.

Now, the rest of the notebook is up to you — it’s your turn to experiment! Try extending what you’ve learned in creative ways. For example, you can:

- **Fix mismatches** between generated essay quality and rubric grading  
- Generate **D- or F-level** essays to simulate weaker student writing  
- **Adjust existing prompts** to modify tone, structure, or grading criteria  
- **Create additional agents** such as a Peer Reviewer, Rubric Validator, or Format Checker  
- **Redesign the rubric** to focus on different skills or subject areas  

Exploring these variations will help you deepen your understanding of how **prompt design**, **role specialization**, and **multi-agent collaboration** influence the quality, reasoning, and consistency of LLM-generated outputs.
