<a href="https://colab.research.google.com/github/Sam-Gartenstein/GenAI-Engineering-Workshop/blob/main/Part%202%3A%20Using%20LLM%20Agents%20for%20Synthetic%20Data%20Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Part 2: Using LLM Agents for Synthetic Data Generation

**Objective**  
Generate a synthetic dataset of student essays that vary by **quality**, **grade level**, and **subject area**.

**Overview**  
This notebook demonstrates a **simplified agentic workflow** for synthetic data generation.  
The agents and steps presented here illustrate one possible approach — adaptable to other contexts or goals.

---

You’ve completed [Part 1: Guide to OpenAI in Google Colab](https://github.com/Sam-Gartenstein/GenAI-Engineering-Workshop/blob/main/Part%201%3A%20Guide%20to%20OpenAI%20in%20Google%20Colab.ipynb). Now, let’s apply those skills to a practical, classroom-inspired use case.  

In this section, we’ll explore **[role-based collaboration](https://www.ibm.com/think/topics/multi-agent-collaboration)** — a framework in which multiple LLMs work together by taking on distinct roles with defined responsibilities. Each agent performs a specific task and passes its output to the next, forming a seamless workflow powered by **prompt chaining**.  

We’ll use generative AI to:

- **Create** two 11th-grade English essays of differing quality (**Student Agent**)  
- **Design** a rubric to evaluate the essays (**Rubric Agent**)  
- **Grade** the essays using the rubric (**Grading Agent**)  

By the end of this notebook, you’ll understand how to design and connect multiple role-based agents to simulate a realistic educational workflow — from essay writing to evaluation and grading.


## Table of Contents

1. **[Set Up](#set-up)**

2. **[Text Generation Function](#text-generation-function)**

3. **[Cost Summarization](#cost-summarization)**

4. **[Essay Generation](#essay-generation)**

    - Student Agent

    - 11th Grade English Student

    - A-Level Prompt

    - C-Level Prompt

5.  **[Rubric Generation](#rubric-generation)**

    - Rubric Agent Function

6.  **[Essay Grading](#essay-grading)**

    - Grading Agent

    - Grading A Level Essay

    - Grading C Level Essay

7. **[Putting It All Together](#putting-it-all-together)**

<a name="set-up"></a>

## 1. Set Up

First, let's repeat the steps we followed in the previous notebook, which was importing the necessary libraries and loading our key.

**Note**: Once you have made a key in colab, it will automatically save, meaning you can use it in other notebooks. When running line two for the first time, you will get a pop up window asking you to grant access.

In [1]:
import openai
from openai import OpenAI
from google.colab import userdata
import os
from typing import Tuple, Optional, Dict, Any

In [2]:
# Pull your saved secret into an environment variable
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")

# Test if the key is available (without printing it)
if not os.getenv("OPENAI_API_KEY"):
    raise RuntimeError("OPENAI_API_KEY is not set. Add it via Colab Secrets (🔑) and try again.")
else:
    print("Key loaded?", True)

Key loaded? True


In [3]:
client = OpenAI()

<a name="text-generation-function"></a>


## 2. Text Generation Function

Let’s create **one** function for generating all outputs. It combines a **system role** (agent persona) with a **task prompt**, and can optionally append extra **sections** (e.g., a rubric and an essay) as clearly labeled blocks. This single helper can write essays, build rubrics, or grade work—just by changing the role, prompt, and sections. It also returns **usage** (token counts) so you can track how many tokens you’re using and estimate cost.

**What it does**
- Sends a single request to the model with:
  - **System role**: defines the voice/perspective (Student, Teacher, Grader, etc.).
  - **Prompt**: the task/instructions (e.g., “Evaluate the essay using the rubric…”).
  - **Sections (optional)**: a mapping of `Label → Content` appended below the prompt (e.g., `{"RUBRIC": ..., "ESSAY TO GRADE": ...}`).

**Returns**
- **Text**: the generated output as a string.
- **Usage**: a dict of token metadata from the API (input/output/total tokens), useful for logging and cost estimation.

**Why this design**
- **One function, many tasks**: swap in different roles/prompts without new code.
- **Structured context**: labeled sections keep long inputs organized.
- **Cost-aware**: the returned `usage` enables token tracking and pricing.

In [4]:
def text_generation(
    system_role: str,
    prompt: str,
    sections: dict | None = None,
    model: str = "gpt-4o-mini",
    temperature: float = 0.2,
):
    user_parts = [prompt]
    for label, content in (sections or {}).items():
        user_parts.append(f"{label}:\n{content}")
    user_msg = "\n\n".join(user_parts)

    resp = client.responses.create(
        model=model,
        input=[
            {"role": "system", "content": system_role},
            {"role": "user", "content": user_msg},
        ],
        temperature=temperature,
    )
    text = (getattr(resp, "output_text", "") or "").strip()
    usage = getattr(resp, "usage", None)
    if usage is not None and not isinstance(usage, dict):
        usage = getattr(usage, "__dict__", None)
    return text, usage


In [5]:
from IPython.display import display, Markdown  # Tools for displaying formatted text in Jupyter Notebooks

def to_markdown(text):
    # Convert the provided text to Markdown format for better display in Jupyter Notebooks
    return Markdown(text)


<a name="cost-summarization"></a>

## 3. Cost Summarization


This helper summarizes token usage and estimates **total cost (USD)** for a model call.

**What it uses**
- The API’s `usage` payload to read **input_tokens** (prompt) and **output_tokens** (completion).
- A built-in **pricing table (per 1M tokens)** for common models (e.g., `gpt-4o-mini`, `gpt-4o`, `gpt-3.5-turbo`).  
  If the model isn’t found, it defaults to `gpt-4o-mini`.

**Returns**
- `model`: the model name used for pricing lookup  
- `input_tokens`: tokens consumed by your messages  
- `output_tokens`: tokens generated by the model  
- `total_tokens`: input + output  
- `cost_usd`: estimated cost, rounded to 6 decimals

**Why it’s useful**
- Quick **cost visibility** per call
- Consistent **logging/analytics** across notebooks
- Works with different SDK field names (`input_tokens`/`prompt_tokens`, `output_tokens`/`completion_tokens`)

<br>

**Note**: For full prices, see the official [API Pricing](https://openai.com/api/pricing/).

In [6]:
def summarize_usage_cost(usage: dict, model: str = "gpt-4o-mini") -> dict:
    """
    Summarize input/output token usage and total cost (USD).

    Args:
        usage (dict): Usage info returned by the API (includes input_tokens, output_tokens, etc.).
        model (str): Model name for pricing lookup.

    Returns:
        dict: Contains input_tokens, output_tokens, total_tokens, cost_usd.
    """
    # --- pricing table (per 1M tokens) ---
    pricing = {
        "gpt-5": {"input": 1.25 / 1_000_000, "output": 10.00 / 1_000_000},
        "gpt-5-mini": {"input": 0.25 / 1_000_000, "output": 2.00 / 1_000_000},
        "gpt-4o-mini": {"input": 0.15 / 1_000_000, "output": 0.60 / 1_000_000},
    }

    model_rate = pricing.get(model, pricing["gpt-4o-mini"])  # default if unknown

    # --- extract token counts safely ---
    input_tokens = usage.get("input_tokens") or usage.get("prompt_tokens", 0)
    output_tokens = usage.get("output_tokens") or usage.get("completion_tokens", 0)
    total_tokens = (input_tokens or 0) + (output_tokens or 0)

    # --- compute cost ---
    cost_usd = (
        (input_tokens or 0) * model_rate["input"]
        + (output_tokens or 0) * model_rate["output"]
    )

    return {
        "model": model,
        "input_tokens": input_tokens or 0,
        "output_tokens": output_tokens or 0,
        "total_tokens": total_tokens,
        "cost_usd": round(cost_usd, 6),
    }

<a name="essay-generation"></a>

## 4. Essay Generation

Let's start by generating two essays on the same topic — **Shakespeare’s *Othello***. We’ll create one **A-level** and one **C-level** essay (see this [article](https://usahello.org/education/children/grade-levels/) for an overview of the U.S. grading system) to compare how writing quality changes when the same Student Agent follows different performance prompts.


### Student Agent

We’ll begin by creating a **Student Agent**, which represents a student completing an academic assignment. This agent dynamically adapts to different **subjects**, **grade levels**, **assignment types**, and **topics** based on the parameters passed into the `build_student_agent` function.  

The Student Agent defines the writer’s persona — realistic, thoughtful, and age-appropriate — ensuring consistent tone and perspective across assignments of varying difficulty or subject matter. By adjusting parameters such as `grade_level`, `subject`, `assignment_type`, and `topic`, you can simulate diverse classroom contexts (e.g., a 9th-grade science lab report or a 12th-grade history essay).  

This modular setup makes the Student Agent both **flexible** and **reusable**, allowing you to maintain a natural, student-like writing style across multiple educational use cases.





In [7]:
def build_student_agent(grade_level: str, subject: str, assignment_type: str, topic: str) -> str:
    """
    Build a student persona for role-based generation.
    Keeps voice/level in the system message and defers specifics to the prompt.
    """
    return (
        f"You are a {grade_level} student completing a {assignment_type} for your {subject} class "
        f"on the topic: {topic} "
        "Write in a realistic student voice—thoughtful but not overly advanced. "
        "Follow the assignment prompt’s instructions (length, structure, citation style). "
        "Use clear, organized paragraphs and avoid jargon unless requested."
    )

### 11th Grade English Student

In this example, we create a **Student Agent** representing an 11th-grade student writing an **English essay** on the topic: *“How does Iago exploit Othello’s insecurities to drive the tragedy in **Othello**?”* We’ll call this variable `english_student_11`.

This agent captures a realistic, age-appropriate writing voice — thoughtful but not overly advanced — suitable for a high school English student analyzing a complex literary text.


In [8]:
grade_level = "11th grade"
subject = "English"
assignment_type = "essay"
topic = "How does Iago exploit Othello’s insecurities to drive the tragedy in *Othello*?"

english_student_11 = build_student_agent(grade_level, subject, assignment_type, topic)

### A-Level Prompt

Now that we’ve defined the **Student Agent** for an 11th-grade English student writing about *Othello*, we can move on to creating the **A-Level Prompt**, which provides the model with detailed writing instructions. We’ll call this prompt `essay_A_level_instructions`.  

The A-Level Prompt directs the model to produce an outstanding literary analysis essay that demonstrates deep understanding, originality, and strong analytical reasoning. It requires a clear, arguable thesis placed at the end of the introduction and supported by concise quotations and insightful interpretation.  

Writing at this level should be polished and formal — avoiding casual tone, first person, and rhetorical questions. Each paragraph should begin with a strong topic sentence and connect logically to the thesis. The essay should be **450–600 words** and cite evidence in *(act.scene.line)* format (e.g., *3.3.167*), resulting in a cohesive, high-quality literary analysis suitable for an advanced high school student.


In [9]:
essay_A_level_instructions = (
    "Write an A-level literary analysis essay that follows the detailed guidelines below.\n\n"
    "What 'A-level' means: excellent or outstanding work — demonstrates a deep and "
    "original understanding of the text, presents a clear and arguable thesis, and supports claims with precise "
    "evidence and insightful analysis. Writing should be organized, polished, and stylistically sophisticated.\n\n"
    "OUTPUT REQUIREMENTS:\n"
    "- Include a clear, arguable thesis at the END of the introduction.\n"
    "- Use concise, well-chosen quotations from the play and explain HOW they support your argument "
    "(analysis > summary).\n"
    "- Ensure each paragraph has a strong topic sentence and smooth transitions; every idea should advance the thesis.\n"
    "- Maintain a precise, academic tone throughout; avoid first person, rhetorical questions, or casual phrasing.\n"
    "- Aim for ~450–600 words.\n"
    "- Cite quotations using act.scene.line format (e.g., 3.3.167).\n"
)


In [10]:
A_level_essay, A_level_usage = text_generation(
    system_role=english_student_11,
    prompt=essay_A_level_instructions,
    model="gpt-4o-mini",
    temperature=0.3,
)

to_markdown(A_level_essay)

In Shakespeare's *Othello*, the tragic downfall of the titular character is intricately tied to his insecurities, which Iago exploits to orchestrate his manipulation. Othello, a Moor and a general in the Venetian army, is portrayed as a noble and capable leader; however, his status as an outsider in Venetian society and his deep-seated insecurities regarding his race and his marriage to Desdemona render him vulnerable to Iago's machinations. Iago's calculated exploitation of Othello's vulnerabilities not only drives the plot forward but also highlights the themes of jealousy, trust, and the destructive power of manipulation. Ultimately, Iago's manipulation reveals how deeply ingrained insecurities can lead to one's tragic demise, illustrating that the greatest threats often come from within.

Iago's manipulation begins with his understanding of Othello's insecurities regarding his racial identity. Othello is acutely aware that he is different from the predominantly white Venetian society, which leads him to doubt his worthiness and place within it. Iago capitalizes on this vulnerability by insinuating that Desdemona's love for Othello is unnatural and that she could easily be swayed by someone of her own race. He suggests, “Blessed fig's-end! The wine she drinks is made of grapes: if she had been blessed, she would never have loved the Moor” (2.1.247-249). This statement not only undermines Othello's confidence but also plants the seed of doubt regarding Desdemona's fidelity. By framing Othello's love as something that defies societal norms, Iago exacerbates Othello's insecurities, making him more susceptible to jealousy and suspicion.

Moreover, Iago exploits Othello's insecurities about his marriage. Othello's love for Desdemona is profound, yet his self-doubt leads him to question her loyalty. Iago's manipulation is evident when he uses the handkerchief, a token of love given to Desdemona by Othello, as a symbol of infidelity. When Iago tells Othello, “Trifles light as air are to the jealous confirmations strong as proofs of holy writ” (3.3.322-324), he emphasizes how jealousy can distort reality. This statement reveals that Iago understands how to twist Othello's perception of love and trust. The handkerchief becomes a pivotal piece of evidence that Iago uses to convince Othello of Desdemona's betrayal, further fueling Othello's insecurities and leading him down a path of tragic misunderstanding.

Additionally, Iago's manipulation is not limited to Othello's personal insecurities; it also extends to his professional identity. As a respected general, Othello is aware that any sign of weakness could jeopardize his position. Iago exploits this by suggesting that Othello's reputation is at stake due to Desdemona's alleged infidelity. When Iago states, “O, beware, my lord, of jealousy! It is the green-eyed monster which doth mock the meat it feeds on” (3.3.165-167), he warns Othello about the dangers of jealousy while simultaneously inciting it within him. This paradoxical advice reveals Iago's cunning nature and his ability to manipulate Othello’s insecurities regarding his authority and reputation. As Othello becomes increasingly consumed by jealousy, he loses sight of his rationality and judgment, leading to his tragic downfall.

In conclusion, Iago's exploitation of Othello's insecurities regarding his race, marriage, and professional identity serves as the catalyst for the tragedy that unfolds in Shakespeare's *Othello*. Through calculated manipulation, Iago transforms Othello's noble qualities into tragic flaws, ultimately leading to a catastrophic end. The play serves as a poignant reminder of how deeply personal insecurities can be weaponized, resulting in devastating consequences. Othello's tragedy is not merely a story of jealousy and betrayal; it is a profound exploration of the vulnerabilities that lie within us all and the catastrophic potential they hold when exploited by those we trust.

#### Cost Summarization

In [11]:
summarize_usage_cost(A_level_usage)

{'model': 'gpt-4o-mini',
 'input_tokens': 268,
 'output_tokens': 843,
 'total_tokens': 1111,
 'cost_usd': 0.000546}

-----

### C-Level Prompt

The **C-Level Prompt** instructs the model to produce an average or barely passing essay. A *C-level* essay demonstrates basic comprehension of the topic but lacks depth, organization, and stylistic control. It may have a weak or missing thesis, minimal or poorly integrated evidence, repetitive ideas, and noticeable grammar or usage mistakes. We will call this `essay_C_level_instructions`.

Writing at this level should sound more casual and unpolished — including filler phrases such as “I think” or “in my opinion” and avoiding strong transitions like “however” or “therefore.” The essay should be short, approximately four paragraphs (200–300 words), and contain a few minor grammatical errors that sound natural for a struggling student writer.  

Like before, we provide **explicit instructions** in the prompt to guide the model’s behavior. The detailed output requirements ensure the essay intentionally reflects weaker academic performance while still maintaining a realistic student voice.

In [12]:
essay_C_level_instructions = (
    "Write an essay responding to the question below.\n\n"
    "IMPORTANT: Produce a deliberately poor-quality **C-level** essay for teaching purposes.\n"
    "What 'C-level' means: average or barely passing work — shows some understanding "
    "but lacks depth or polish; weak thesis or none, little/no evidence, poor organization, repetition/vagueness, "
    "and noticeable grammar/style mistakes.\n\n"
    "OUTPUT REQUIREMENTS:\n"
    "- Return ONLY the essay body (no headings or bullets).\n"
    "- Do NOT provide a clear thesis; keep the main claim vague.\n"
    "- Provide minimal or poorly integrated evidence (generic statements are fine; avoid direct quotations).\n"
    "- Organization should be weak (some repetition or loosely connected ideas is acceptable).\n"
    "- Keep it short: 4 paragraphs (~200–300 words).\n"
    "- Include a few minor grammar/usage mistakes naturally (e.g., comma splices, agreement issues).\n"
    "- Casual, somewhat imprecise tone is acceptable.\n"
    "- Use a few filler phrases like 'I think' or 'in my opinion,' and avoid strong transitions like 'however' or 'therefore.'"
)


C_level_essay, C_level_usage = text_generation(
    system_role=english_student_11,
    prompt=essay_C_level_instructions,
    model="gpt-4o-mini",
    temperature=0.3,
)


In [13]:
to_markdown(C_level_essay)

In *Othello*, Iago is a character who really takes advantage of Othello’s insecurities. Othello is a Moor and he feels like he doesn’t fit in with the other Venetians. I think this makes him more vulnerable to Iago’s manipulation. Iago knows that Othello is insecure about his race and status, and he uses this to his advantage. For example, Iago keeps planting doubts in Othello’s mind about Desdemona’s faithfulness. This is important because it shows how Iago can twist Othello’s feelings and make him question everything.

Another way Iago exploits Othello’s insecurities is by pretending to be his friend. He acts like he is looking out for Othello, but really he is just trying to ruin him. Othello trusts Iago, which is a big mistake. Iago’s manipulation makes Othello feel even more insecure about Desdemona. This is a big deal because it leads Othello to make bad decisions. Iago keeps pushing Othello to think that Desdemona is cheating, which drives Othello to jealousy and rage.

I think it’s also worth mentioning that Iago plays on Othello’s feelings of inadequacy. Othello is a great general, but he still doubts himself. Iago uses this self-doubt to make Othello feel like he is not good enough for Desdemona. This really messes with Othello’s mind and makes him act irrationally. It’s like Iago knows exactly how to get under Othello’s skin, which is kind of scary.

In conclusion, Iago’s exploitation of Othello’s insecurities is a major part of the tragedy in the play. Othello’s trust in Iago and his own self-doubt lead him to make choices that ultimately destroy him. I think this shows how dangerous it can be when someone takes advantage of another person’s weaknesses. Overall, Iago’s manipulation is what drives the tragic events in *Othello*.

#### Cost Summarization

In [14]:
summarize_usage_cost(C_level_usage)

{'model': 'gpt-4o-mini',
 'input_tokens': 300,
 'output_tokens': 423,
 'total_tokens': 723,
 'cost_usd': 0.000299}

<a name="rubric-generation"></a>

## 5. Rubric Generation


Great! We’ve now generated two essays on the same topic but with different performance levels. The next step is to create a **rubric** that can be used to evaluate them.


### Rubric Agent Function

The `build_rubric_agent` function creates a **teacher persona** that specializes in building grading rubrics. It takes in the **grade level**, **subject**, **assignment type**, and **topic** to generate a clear system message describing the teacher’s role.  

This teacher is instructed to design a **student-facing rubric** with 4–6 criteria and A/B/C/D performance levels. The rubric should use concise, observable, and consistent language, focusing only on defining expectations — not grading specific work.


In [15]:
def build_rubric_agent(grade_level: str, subject: str, assignment_type: str, topic: str) -> str:
    """
    Build a teacher persona for rubric generation.
    Keeps guidance in the system message; the task/prompt comes separately.
    """
    return (
        f"You are a {subject} teacher creating a grading rubric for a {assignment_type} "
        f"on the topic: {topic}, for {grade_level} students. "
        "Be clear and student-facing."
    )

In [16]:
rubric_agent = build_rubric_agent(
    grade_level="11th grade",
    subject="English",
    assignment_type="essay",
    topic="How does Iago exploit Othello’s insecurities to drive the tragedy in *Othello*?"
)

In [17]:
rubric_instructions = (
    "Create a detailed, student-facing grading rubric tailored to the given topic and grade level. "
    "The rubric should include **4–6 criteria** that capture key dimensions of performance "
    "(e.g., Thesis, Evidence, Analysis, Organization, Mechanics).\n\n"
    "For each criterion, provide concise and specific descriptors for **A**, **B**, **C**, **D**, and **F** levels, "
    "focusing on observable behaviors and measurable writing qualities rather than vague judgments.\n\n"
    "Return the rubric in clean Markdown format using this structure:\n\n"
    "## Rubric\n"
    "### Criterion Name\n"
    "- **Weight (%):** [assign a reasonable percentage]\n"
    "- **A:** [clear, exceptional performance descriptor]\n"
    "- **B:** [strong, competent performance descriptor]\n"
    "- **C:** [basic or developing performance descriptor]\n"
    "- **D:** [limited or below-standard performance descriptor]\n"
    "- **F:** [incomplete, missing, or unsatisfactory work]\n\n"
    "After listing all criteria, include a **Notes** section summarizing key expectations and overall grading guidance. "
    "Ensure the tone remains student-friendly, consistent, and easy to understand."
)


In [18]:
english_rubric, rubric_usage = text_generation(
    system_role=rubric_agent,
    prompt=rubric_instructions,
    model="gpt-4o-mini",
    temperature=0.3,
)

to_markdown(english_rubric)

## Rubric

### Thesis
- **Weight (%):** 20%
- **A:** Presents a clear, insightful, and original thesis that directly addresses how Iago exploits Othello’s insecurities.
- **B:** States a clear thesis that addresses the prompt, with some insight into Iago's manipulation of Othello.
- **C:** Presents a basic thesis that addresses the topic but lacks depth or clarity regarding Iago’s exploitation of Othello.
- **D:** States a vague or unclear thesis that does not adequately address the prompt or is off-topic.
- **F:** No thesis statement or a thesis that is completely irrelevant to the topic.

### Evidence
- **Weight (%):** 25%
- **A:** Integrates multiple, relevant textual examples that effectively support the thesis and illustrate Iago’s manipulation.
- **B:** Uses relevant textual examples that support the thesis, though may lack depth or variety.
- **C:** Provides some textual evidence, but it may be limited, not always relevant, or insufficiently connected to the thesis.
- **D:** Includes minimal or irrelevant evidence that does not support the thesis or is poorly integrated.
- **F:** No evidence provided or evidence that is entirely unrelated to the topic.

### Analysis
- **Weight (%):** 25%
- **A:** Offers insightful and thorough analysis of how Iago exploits Othello’s insecurities, demonstrating a deep understanding of the text.
- **B:** Provides competent analysis that explains how Iago manipulates Othello, but may lack depth or thoroughness.
- **C:** Presents basic analysis that touches on Iago’s exploitation of Othello but lacks clarity or depth.
- **D:** Offers limited analysis that does not clearly connect evidence to the thesis or lacks understanding of the text.
- **F:** No analysis provided or analysis that is completely off-topic or incorrect.

### Organization
- **Weight (%):** 15%
- **A:** Essay is well-organized with a clear structure, including a strong introduction, logical transitions, and a cohesive conclusion.
- **B:** Organization is clear, with a logical structure and transitions, though some areas may be slightly unclear.
- **C:** Basic organization is present, but the essay may lack clear transitions or a cohesive flow.
- **D:** Poorly organized, with unclear structure and weak transitions that hinder understanding.
- **F:** No clear organization; the essay is difficult to follow or lacks a coherent structure.

### Mechanics
- **Weight (%):** 15%
- **A:** Contains few or no grammatical, spelling, or punctuation errors; writing is polished and professional.
- **B:** Contains some minor errors, but they do not interfere with understanding; writing is generally clear.
- **C:** Contains several errors that may distract the reader but do not significantly impede understanding.
- **D:** Frequent errors in grammar, spelling, or punctuation that make the writing difficult to read.
- **F:** Numerous errors that severely hinder understanding; writing is unprofessional and unclear.

## Notes
- **Key Expectations:** Your essay should clearly address how Iago exploits Othello’s insecurities, supported by relevant textual evidence and insightful analysis. Ensure your thesis is strong and your essay is well-organized. Pay attention to mechanics to present your ideas clearly.
- **Overall Grading Guidance:** Aim for clarity, depth, and coherence in your writing. Use specific examples from the text to support your points, and make sure to analyze those examples thoroughly. Good luck!

In [19]:
summarize_usage_cost(rubric_usage)

{'model': 'gpt-4o-mini',
 'input_tokens': 275,
 'output_tokens': 720,
 'total_tokens': 995,
 'cost_usd': 0.000473}

----

<a name="essay-grading"></a>


## 6. Essay Grading

Now we can use the LLM to **grade the essay**. This step allows us to evaluate whether the generated writing aligns with the expected performance level (A or C) and provides structured, rubric-based feedback.

### Grading Agent

Next, we’ll create a **Grading Agent**, which also takes on the teacher persona but focuses specifically on evaluating student work. This agent uses the rubric and grading instructions to assess the essay’s strengths and weaknesses across multiple categories, assigning both a letter grade and a numerical score.

In [20]:
def build_grading_agent(grade_level: str, subject: str, assignment_type: str, topic: str) -> str:
    """
    Build a teacher persona for grading essay.
    Keeps guidance in the system message; the task/prompt comes separately.
    """
    return (
        f"You are a {subject} teacher carefully grading a {assignment_type} "
        f"on the topic: {topic}, for {grade_level} students. "
        "Be fair, consistent, and concise."
    )

In [21]:
grading_agent = build_grading_agent(
    grade_level="11th grade",
    subject="English",
    assignment_type="essay",
    topic="How does Iago exploit Othello’s insecurities to drive the tragedy in *Othello*?"
)

In [22]:
grading_instructions = (
    "Evaluate the essay using the provided rubric.\n\n"
    "OUTPUT FORMAT (Markdown):\n"
    "## Overall Grade\n"
    "- **Level:** A|B|C|D|F\n"
    "- **Score:** [numeric total]/100\n"
    "- **Summary (2–3 sentences):** …\n\n"
    "## Category Breakdown\n"
    "For each category from the rubric (in the same order), include:\n"
    "- **Level:** A|B|C|D|F\n"
    "- **Score:** [earned points]/[category total]\n"
    "- **Justification (1–2 sentences):** specific and tied directly to the rubric\n\n"
    "Rules:\n"
    "- Mirror the rubric categories exactly and in order.\n"
    "- Be concise, evidence-based, and professional.\n"
    "- Do not invent categories that are not in the rubric.\n"
    "- Ensure total points across all categories add up to 100."
)


### Grading A Level Essay

In [23]:
A_level_essay_graded, A_level_essay_graded_usage = text_generation(
    system_role=grading_agent,
    prompt=grading_instructions,
    sections={"RUBRIC": english_rubric, "ESSAY TO GRADE": A_level_essay},
    temperature=0.2,
)

to_markdown(A_level_essay_graded)

## Overall Grade
- **Level:** A
- **Score:** 95/100
- **Summary:** This essay presents a clear and insightful thesis regarding how Iago exploits Othello's insecurities, supported by relevant textual evidence and thorough analysis. The organization and mechanics are strong, making for a compelling and coherent argument.

## Category Breakdown
- **Thesis**
  - **Level:** A
  - **Score:** 20/20
  - **Justification:** The thesis is clear, insightful, and directly addresses how Iago exploits Othello's insecurities, setting a strong foundation for the essay.

- **Evidence**
  - **Level:** A
  - **Score:** 23/25
  - **Justification:** The essay integrates multiple relevant textual examples that effectively support the thesis, though a bit more variety in evidence could enhance depth.

- **Analysis**
  - **Level:** A
  - **Score:** 24/25
  - **Justification:** The analysis is insightful and demonstrates a deep understanding of how Iago manipulates Othello's insecurities, connecting evidence to the thesis effectively.

- **Organization**
  - **Level:** A
  - **Score:** 15/15
  - **Justification:** The essay is well-organized with a clear structure, logical transitions, and a cohesive conclusion that reinforces the main argument.

- **Mechanics**
  - **Level:** A
  - **Score:** 13/15
  - **Justification:** The writing is polished and professional, with only minor errors that do not interfere with understanding.

Overall, this essay effectively addresses the prompt with clarity and depth, showcasing a strong grasp of the text and its themes.

In [24]:
summarize_usage_cost(A_level_essay_graded_usage)

{'model': 'gpt-4o-mini',
 'input_tokens': 1794,
 'output_tokens': 357,
 'total_tokens': 2151,
 'cost_usd': 0.000483}

#### Analysis

The grading agent correctly assigned this essay an **A-level score**, aligning with the intended prompt for a high-performing literary analysis. The feedback highlights the essay’s strengths: a clear and insightful thesis, well-chosen textual evidence, and deep analytical engagement with Iago’s manipulation of Othello’s insecurities. The small deductions in **Evidence** and **Mechanics** categories show thoughtful nuance — recognizing that even excellent essays can improve slightly in depth and polish.

This evaluation demonstrates that the grading agent effectively applied the rubric, identifying both the essay’s advanced qualities and minor areas for refinement. The result confirms that the **A-level prompt** successfully produced writing consistent with top-tier academic standards, and that the rubric and grading instructions are well-calibrated for distinguishing high performance.



### Grading C Level Essay

In [25]:
C_level_essay_graded, C_level_essay_graded_usage = text_generation(
    system_role=grading_agent,
    prompt=grading_instructions,
    sections={"RUBRIC": english_rubric, "ESSAY TO GRADE": C_level_essay},
    temperature=0.2,
)

to_markdown(C_level_essay_graded)

## Overall Grade
- **Level:** B
- **Score:** 80/100
- **Summary:** The essay presents a clear thesis regarding Iago's exploitation of Othello's insecurities, supported by relevant examples. However, the analysis could be more in-depth, and the organization could be improved for better clarity.

## Category Breakdown
- **Thesis**
  - **Level:** B
  - **Score:** 16/20
  - **Justification:** The thesis is clear and addresses the prompt, indicating Iago's manipulation of Othello's insecurities, but lacks a more nuanced insight into the implications of this exploitation.

- **Evidence**
  - **Level:** B
  - **Score:** 20/25
  - **Justification:** The essay uses relevant textual examples to support the thesis, but the variety and depth of examples could be enhanced to strengthen the argument.

- **Analysis**
  - **Level:** B
  - **Score:** 20/25
  - **Justification:** The analysis explains how Iago manipulates Othello, but it could benefit from deeper exploration of the consequences of Iago's actions and their impact on Othello's character development.

- **Organization**
  - **Level:** C
  - **Score:** 10/15
  - **Justification:** The essay has a basic structure, but transitions between ideas are somewhat abrupt, which affects the overall flow and coherence of the argument.

- **Mechanics**
  - **Level:** B
  - **Score:** 14/15
  - **Justification:** The writing is generally clear with only minor grammatical errors that do not significantly impede understanding, demonstrating a polished style.

In [26]:
summarize_usage_cost(C_level_essay_graded_usage)

{'model': 'gpt-4o-mini',
 'input_tokens': 1374,
 'output_tokens': 356,
 'total_tokens': 1730,
 'cost_usd': 0.00042}

#### Analysis

Even though we requested a C-level essay, the grading agent gave this essay a B-level score! This suggests that our **C-level prompt** may not have produced writing that was weak enough to clearly align with average or below-average performance. It’s also possible that the **rubric** or **grading instructions** were slightly too lenient.  

This highlights an important part of prompt design — even small wording changes can shift how the model interprets quality. To improve alignment, we could make the C-level prompt more restrictive (e.g., emphasize poor organization, missing evidence, and vague claims) or adjust the grading rubric to enforce stricter distinctions between B- and C-level criteria.  


<a name="putting-it-all-together"></a>


## 7. Putting It All Together

You have now seen how LLMs can be used to create agents that **write, evaluate, and grade essays** through **role-based collaboration**. The next step is to **scale up essay generation** to match your desired dataset size — for example, by generating **100 essays** across different topics or performance levels.

1. **Select 10 use cases** — combinations of subject areas and grade levels — and generate **10 essays per use case** with varying quality levels (e.g., 2 A-level, 4 B-level, 2 C-level, 2 D-level).  
2. **Create a Pandas DataFrame** where each row represents a unique essay. Include columns for all relevant inputs (e.g., grade level, subject, prompt, intended essay quality) plus a column for the essay text.  
3. **Use the essay grading function** to verify whether the grades assigned by the LLM align with the intended essay-quality labels.  
4. **Analyze discrepancies:**  
   - Do the LLM-assigned grades differ from the expected quality levels?  
   - What might explain these differences — workflow design issues, LLM limitations, or data variation?  
   - How could you refine your approach to reduce these discrepancies?  
5. **Iterate and refine** your workflow based on these insights.

---

Now, **experiment with your own workflow** — explore different agent structures, prompts, or workflows to see how they affect dataset quality and consistency.
