<a href="https://colab.research.google.com/github/Sam-Gartenstein/GenAI-Engineering-Workshop/blob/main/Part%202%20Using%20LLM%20Agents%20for%20Essay%20Generation%20and%20Evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Part 2: Using LLM Agents for Synthetic Data Generation

You have completed [Part 1: Guide to OpenAI in Google Colab](https://github.com/Sam-Gartenstein/GenAI-Engineering-Workshop/blob/main/Part%201%3A%20Guide%20to%20OpenAI%20in%20Google%20Colab.ipynb). Now, let’s apply those skills to a real-world use case.  

In this section, you’ll learn about **[role-based collaboration](https://www.ibm.com/think/topics/multi-agent-collaboration)**, a framework where multiple LLMs work together by taking on distinct roles with specific purposes. Each agent performs its own task and passes the result to the next, creating a seamless workflow powered by **prompt chaining**.  

We will use generative AI to:

- **Create** an 11th-grade English essay (Student Agent)  
- **Design** a rubric to evaluate the essay (Rubric Agent)  
- **Grade** the essay using the rubric (Grading Agent)  

By the end of this section, you’ll understand how to design and connect multiple role-based agents to simulate a realistic classroom workflow — from writing to evaluation to grading.


First, let's repeat the steps we followed in the previous notebook, which was importing the necessary libraries and loading our key.

**Note**: Once you have made a key in colab, it will automatically save, meaning you can use it in other notebooks. When running line two for the first time, you will get a pop up window asking you to grant access.

In [1]:
import openai
from openai import OpenAI
from google.colab import userdata
import os
from typing import Tuple, Optional, Dict, Any

In [2]:
# Pull your saved secret into an environment variable
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")

# Test if the key is available (without printing it)
if not os.getenv("OPENAI_API_KEY"):
    raise RuntimeError("OPENAI_API_KEY is not set. Add it via Colab Secrets (🔑) and try again.")
else:
    print("Key loaded?", True)

Key loaded? True


## Generating Essays

Let's start by generating two essays on the same topic — **Shakespeare’s *Othello***. We’ll create one **A-level** and one **C-level** essay to compare how writing quality changes when the same Student Agent follows different performance prompts.

The function we use, `generate_essay`, produces a single essay at a time.  
In addition to the main inputs (`system_role`, `prompt`, `model`, and `temperature`), it also takes two key arguments:

- **`topic`** — The essay question or theme.  
  This defines *what* the student is writing about. For example:  
  *“Discuss how jealousy shapes Othello’s decisions and relationships.”*

- **`grade_level`** — A label that identifies the educational level or target audience for the essay.  
  Here, it helps maintain context, such as *“11th grade,”* so the model writes with an appropriate voice and complexity. If not familiar with the US education system, **please read** this [article](https://usahello.org/education/children/grade-levels/) explaining the US grade systems.

By generating both essays on the same topic, we can clearly see how **prompt design** shapes differences in reasoning depth, structure, and tone.

Additionally, we will use the markdown format function!


In [3]:
from typing import Tuple, Optional, Dict, Any

def text_generation(
    system_role: str,
    prompt: str,
    sections: dict | None = None,   # e.g., {"RUBRIC": rubric_md, "ESSAY TO GRADE": essay_text}
    model: str = "gpt-4o-mini",
    temperature: float = 0.2,
    **api_kwargs,                   # pass through extra API params (seed, top_p, etc.)
) -> Tuple[str, Optional[Dict[str, Any]]]:
    """
    Generic text generation helper.
    Returns: (text, usage_dict)

    - `prompt` contains the task/instructions (e.g., "Evaluate the essay using the rubric...").
    - `sections` is an optional mapping of label -> content appended as clearly marked blocks.
    - `usage_dict` mirrors the API's usage payload (may be None).
    """
    user_parts = [prompt]
    for label, content in (sections or {}).items():
        user_parts.append(f"{label}:\n{content}")
    user_msg = "\n\n".join(user_parts)

    resp = client.responses.create(
        model=model,
        input=[
            {"role": "system", "content": system_role},
            {"role": "user", "content": user_msg},
        ],
        temperature=temperature,
        **api_kwargs,
    )

    text = (getattr(resp, "output_text", "") or "").strip()

    # Normalize usage to a dict when possible
    usage = getattr(resp, "usage", None)
    if usage is not None and not isinstance(usage, dict):
        usage = getattr(usage, "__dict__", None)

    return text, usage

In [4]:
from IPython.display import display, Markdown  # Tools for displaying formatted text in Jupyter Notebooks

def to_markdown(text):
    # Convert the provided text to Markdown format for better display in Jupyter Notebooks
    return Markdown(text)


In [5]:
def summarize_usage_cost(usage: dict, model: str = "gpt-4o-mini") -> dict:
    """
    Summarize input/output token usage and total cost (USD).

    Args:
        usage (dict): Usage info returned by the API (includes input_tokens, output_tokens, etc.).
        model (str): Model name for pricing lookup.

    Returns:
        dict: Contains input_tokens, output_tokens, total_tokens, cost_usd.
    """
    # --- pricing table (per 1M tokens) ---
    pricing = {
        "gpt-4o-mini": {"input": 0.15 / 1_000_000, "output": 0.60 / 1_000_000},
        "gpt-4o": {"input": 2.50 / 1_000_000, "output": 10.00 / 1_000_000},
        "gpt-3.5-turbo": {"input": 0.50 / 1_000_000, "output": 1.50 / 1_000_000},
    }

    model_rate = pricing.get(model, pricing["gpt-4o-mini"])  # default if unknown

    # --- extract token counts safely ---
    input_tokens = usage.get("input_tokens") or usage.get("prompt_tokens", 0)
    output_tokens = usage.get("output_tokens") or usage.get("completion_tokens", 0)
    total_tokens = input_tokens + output_tokens

    # --- compute cost ---
    cost_usd = (
        input_tokens * model_rate["input"]
        + output_tokens * model_rate["output"]
    )

    return {
        "model": model,
        "input_tokens": input_tokens,
        "output_tokens": output_tokens,
        "total_tokens": total_tokens,
        "cost_usd": round(cost_usd, 6),
    }

## Essay Generation

As shown in [Part 1](https://github.com/Sam-Gartenstein/GenAI-Engineering-Workshop/blob/main/Part%201%3A%20Guide%20to%20OpenAI%20in%20Google%20Colab.ipynb) of this course series, we will be using system/user role prompting. Our goal is to generate essays of varying performance levels, with one consistent Student Agent representing the writer’s identity, and the user prompt controlling the essay’s quality.

The Student Agent defines the persona — a realistic high school student writing for an English class — while the prompt specifies the assignment requirements and the expected performance level. By adjusting the temperature and prompt details, we can simulate essays ranging from polished and insightful (A-level) to basic and error-prone (C-level).

This setup allows us to maintain a consistent writing voice while producing diverse samples for grading, evaluation, or fine-tuning models that assess writing quality.

### Understanding Letter Grades

In many U.S. schools and universities, essays are graded on a **letter scale** from **A** to **F**. For more information, please read [Grading System in USA](https://www.canamgroup.com/blog/grading-system-in-usa).

First, as always, let's creates a client object.

In [6]:
client = OpenAI()

### Student Agent

The **Student Agent** represents a student completing an academic assignment. It adapts dynamically to different subjects, grade levels, assignment types, and topics based on the parameters passed into the `build_student_agent` function.  

This role defines the writer’s persona — realistic, thoughtful, and age-appropriate — ensuring consistency across assignments of varying quality levels or subjects. By adjusting variables such as `grade_level`, `subject`, `assignment_type`, and `topic`, you can easily simulate different student contexts (for example, a 9th-grade science lab report or a 12th-grade history essay).  

This modular approach makes the Student Agent flexible and reusable across multiple educational scenarios while maintaining a natural, student-like writing style.


In [7]:
def build_student_agent(grade_level: str, subject: str, assignment_type: str, topic: str) -> str:
    """
    Build a student persona for role-based generation.
    Keeps voice/level in the system message and defers specifics to the prompt.
    """
    return (
        f"You are a {grade_level} student completing a {assignment_type} for your {subject} class "
        f"on the topic: {topic} "
        "Write in a realistic student voice—thoughtful but not overly advanced. "
        "Follow the assignment prompt’s instructions (length, structure, citation style). "
        "Use clear, organized paragraphs and avoid jargon unless requested."
    )

### 11th Grade English Student

In this example, we create a **Student Agent** representing an 11th-grade student writing an **English essay** on the topic: *“How does Iago exploit Othello’s insecurities to drive the tragedy in **Othello**?”* We’ll call this variable `english_student_11`.

The agent is built using the `build_student_agent` function, which defines the student’s persona based on grade level, subject, assignment type, and topic. This creates a realistic, age-appropriate writing voice that is thoughtful but not overly advanced.


In [8]:
grade_level = "11th grade"
subject = "English"
assignment_type = "essay"
topic = "How does Iago exploit Othello’s insecurities to drive the tragedy in *Othello*?"

enlgish_student_11 = build_student_agent(grade_level, subject, assignment_type, topic)

### A-Level Prompt

Now that we’ve defined the **Student Agent** for an 11th-grade English student writing about *Othello*, we can move on to creating the **A-Level Prompt**, which provides the model with detailed writing instructions.  

The A-Level Prompt directs the model to produce an outstanding literary analysis essay that demonstrates deep understanding, originality, and strong analytical reasoning. It requires a clear, arguable thesis at the end of the introduction, supported by concise quotations and insightful interpretation.  

Writing at this level should be polished and formal — avoiding casual tone, first person, and rhetorical questions. Each paragraph should begin with a strong topic sentence and connect logically to the thesis. The essay should be **450–600 words**, with citations formatted as *(act.scene.line)* (e.g., *3.3.167*), ensuring a cohesive, high-quality literary analysis suitable for an advanced high school student.


In [9]:
essay_prompt_A_level = (
    "Write an A-level literary analysis essay that follows the detailed guidelines below.\n\n"
    "What 'A-level' means: excellent or outstanding work — demonstrates a deep and "
    "original understanding of the text, presents a clear and arguable thesis, and supports claims with precise "
    "evidence and insightful analysis. Writing should be organized, polished, and stylistically sophisticated.\n\n"
    "OUTPUT REQUIREMENTS:\n"
    "- Include a clear, arguable thesis at the END of the introduction.\n"
    "- Use concise, well-chosen quotations from the play and explain HOW they support your argument "
    "(analysis > summary).\n"
    "- Ensure each paragraph has a strong topic sentence and smooth transitions; every idea should advance the thesis.\n"
    "- Maintain a precise, academic tone throughout; avoid first person, rhetorical questions, or casual phrasing.\n"
    "- Aim for ~450–600 words.\n"
    "- Cite quotations using act.scene.line format (e.g., 3.3.167).\n"
)


In [10]:
A_level_essay, A_level_usage = text_generation(
    system_role=enlgish_student_11,
    prompt=essay_prompt_A_level,
    model="gpt-4o-mini",
    temperature=0.3,
)

to_markdown(A_level_essay)

In William Shakespeare's *Othello*, the character of Iago masterfully exploits Othello's insecurities, particularly regarding his race and status, to orchestrate a tragic downfall. Iago's manipulation is not merely a product of malice; it is a calculated strategy that reveals the vulnerabilities of Othello's character. By preying on Othello's deep-seated fears of inadequacy and jealousy, Iago catalyzes a chain of events that ultimately leads to Othello's tragic demise. Thus, Iago's exploitation of Othello's insecurities serves as a critical mechanism driving the tragedy of the play.

From the outset, Iago recognizes Othello's insecurities about his outsider status in Venetian society. Othello, a Moor and a military leader, grapples with feelings of alienation and self-doubt. Iago capitalizes on this vulnerability by suggesting that Desdemona, Othello's wife, is unfaithful. He insinuates that Othello's race makes him unworthy of Desdemona's love, stating, "Blessed fig's-end! The wine she drinks is made of grapes: if she be fair and wise, fairness and wit, the one's for use, the other useth it" (2.1.247-249). Here, Iago implies that Desdemona's beauty and intelligence are incompatible with Othello's racial identity, thereby planting seeds of doubt in Othello's mind. This manipulation not only highlights Iago's cunning but also underscores Othello's internal struggle with his racial identity, which Iago exploits to provoke jealousy.

Moreover, Iago's strategic use of language further exacerbates Othello's insecurities. He often employs suggestive imagery and ambiguous statements that leave Othello questioning his own perceptions. For instance, when Iago tells Othello to "look to your wife; observe her well with Cassio" (3.3.198), he subtly implies infidelity without providing concrete evidence. This insinuation plays on Othello's fears, leading him to obsess over the possibility of Desdemona's betrayal. The psychological manipulation Iago employs reveals how easily Othello can be swayed by doubt, illustrating the fragility of his self-esteem. As Othello becomes increasingly consumed by jealousy, he loses sight of his rationality, demonstrating how Iago's exploitation of his insecurities drives him toward tragedy.

Iago's manipulation reaches its zenith when he presents Othello with the handkerchief, a symbol of love and fidelity. By planting the handkerchief in Cassio's possession, Iago creates a tangible representation of Othello's worst fears. Othello's reaction to the handkerchief—“O, the more angel she, and you the blacker devil!” (5.2.125)—reveals the tragic culmination of Iago's exploitation. Othello's internalization of Iago's insinuations leads him to equate his racial identity with evil, further deepening his insecurities. This moment encapsulates the tragic irony of the play: Othello, a noble figure, succumbs to the very insecurities that Iago has instilled in him, leading to his ultimate downfall.

In conclusion, Iago's exploitation of Othello's insecurities is central to the tragedy of *Othello*. By manipulating Othello's fears of inadequacy and jealousy, Iago orchestrates a series of events that culminate in Othello's tragic demise. The interplay between Iago's cunning and Othello's vulnerabilities not only drives the plot but also serves as a poignant commentary on the destructive power of manipulation and the fragility of self-identity. Ultimately, Shakespeare crafts a narrative that reveals how deeply ingrained insecurities can lead to one's undoing, making *Othello* a timeless exploration of the human condition.

In [11]:
summarize_usage_cost(A_level_usage)

{'model': 'gpt-4o-mini',
 'input_tokens': 268,
 'output_tokens': 800,
 'total_tokens': 1068,
 'cost_usd': 0.00052}

-----

### C-Level Prompt

The **C-Level Prompt** instructs the model to produce an average or barely passing essay. A *C-level* essay demonstrates basic comprehension of the topic but lacks depth, organization, and stylistic control. It may have a weak or missing thesis, minimal or poorly integrated evidence, repetitive ideas, and noticeable grammar or usage mistakes.  

Writing at this level should sound more casual and unpolished — including filler phrases such as “I think” or “in my opinion” and avoiding strong transitions like “however” or “therefore.” The essay should be short, approximately four paragraphs (200–300 words), and contain a few minor grammatical errors that sound natural for a struggling student writer.  

Like before, we provide **explicit instructions** in the prompt to guide the model’s behavior. The detailed output requirements ensure the essay intentionally reflects weaker academic performance while still maintaining a realistic student voice.

In [12]:
essay_prompt_C_level = (
    "Write an essay responding to the question below.\n\n"
    "IMPORTANT: Produce a deliberately poor-quality **C-level** essay for teaching purposes.\n"
    "What 'C-level' means: average or barely passing work — shows some understanding "
    "but lacks depth or polish; weak thesis or none, little/no evidence, poor organization, repetition/vagueness, "
    "and noticeable grammar/style mistakes.\n\n"
    "OUTPUT REQUIREMENTS:\n"
    "- Return ONLY the essay body (no headings or bullets).\n"
    "- Do NOT provide a clear thesis; keep the main claim vague.\n"
    "- Provide minimal or poorly integrated evidence (generic statements are fine; avoid direct quotations).\n"
    "- Organization should be weak (some repetition or loosely connected ideas is acceptable).\n"
    "- Keep it short: 4 paragraphs (~200–300 words).\n"
    "- Include a few minor grammar/usage mistakes naturally (e.g., comma splices, agreement issues).\n"
    "- Casual, somewhat imprecise tone is acceptable.\n"
    "- Use a few filler phrases like 'I think' or 'in my opinion,' and avoid strong transitions like 'however' or 'therefore.'"
)


C_level_essay, C_level_usage = text_generation(
    system_role=enlgish_student_11,
    prompt=essay_prompt_C_level,
    model="gpt-4o-mini",
    temperature=0.3,
)


In [13]:
to_markdown(C_level_essay)

In *Othello*, Iago is a character who really knows how to take advantage of Othello’s insecurities. Othello, being a Moor and an outsider in Venetian society, has a lot of doubts about himself. Iago uses this to make Othello feel even worse about himself. For example, he keeps suggesting that Desdemona, Othello’s wife, is unfaithful. This really gets to Othello because he already feels insecure about his race and status. I think this is a big part of why Othello starts to believe Iago. 

Iago also plays on Othello's trust in him. Othello sees Iago as a friend, which is kind of ironic because Iago is actually the one who is betraying him. Othello thinks that Iago is honest, but in reality, Iago is just using that trust to manipulate him. This is important because it shows how easily Othello can be led astray. I mean, if he had questioned Iago more, maybe things would have turned out differently. But instead, Othello just keeps listening to Iago, which is not good for him.

Another thing is that Iago knows how to make Othello doubt Desdemona. He plants seeds of doubt in Othello's mind, which grows into something much bigger. Othello becomes obsessed with the idea that Desdemona is cheating on him, and this makes him act irrationally. It’s like Iago is playing a game, and Othello is just a pawn. I think this shows how powerful Iago is, but it also shows how weak Othello is in dealing with his own feelings.

In conclusion, Iago really takes advantage of Othello’s insecurities and trust issues. This leads to a lot of misunderstandings and ultimately tragedy. Othello’s downfall is really tied to how Iago manipulates him, and it’s sad to see how easily Othello falls for it. I guess it just goes to show that sometimes people can be their own worst enemies, especially when they let others play with their minds.

In [14]:
summarize_usage_cost(C_level_usage)

{'model': 'gpt-4o-mini',
 'input_tokens': 300,
 'output_tokens': 442,
 'total_tokens': 742,
 'cost_usd': 0.00031}

## Rubric Generation


Great! We’ve now generated two essays on the same topic but with different performance levels. The next step is to create a **rubric** that can be used to evaluate them.

This rubric function works almost exactly like the generate_essay function. It still takes in the same core arguments — system_role, prompt, topic, and grade_level. The only difference is in how the user message is structured.

Inside the user_content, instead of including
`"ESSAY TOPIC / QUESTION:\n{topic}\n\n"` we replace it with

`"RUBRIC TOPIC / CONTEXT:\n{topic}\n\n".`

This small change tells the model that it should generate an **evaluation framework** rather than an essay. The output will describe the criteria, expectations, and performance distinctions (for example, what separates A-level from C-level work) for that specific topic or assignment.

### Rubric Agent Function

The `build_rubric_agent` function creates a **teacher persona** that specializes in building grading rubrics. It takes in the **grade level**, **subject**, **assignment type**, and **topic** to generate a clear system message describing the teacher’s role.  

This teacher is instructed to design a **student-facing rubric** with 4–6 criteria and A/B/C/D performance levels. The rubric should use concise, observable, and consistent language, focusing only on defining expectations — not grading specific work.


In [15]:
def build_rubric_agent(grade_level: str, subject: str, assignment_type: str, topic: str) -> str:
    """
    Build a teacher persona for rubric generation.
    Keeps guidance in the system message; the task/prompt comes separately.
    """
    return (
        f"You are a {subject} teacher creating a grading rubric for a {assignment_type} "
        f"on the topic: {topic}, for {grade_level} students. "
        "Be clear and student-facing."
    )

In [16]:
rubric_agent = build_rubric_agent(
    grade_level="11th grade",
    subject="English",
    assignment_type="essay",
    topic="How does Iago exploit Othello’s insecurities to drive the tragedy in *Othello*?"
)

In [17]:
rubric_instructions = (
    "Create a grading rubric tailored to the topic and grade. Include 4–6 criteria.\n"
    "For each criterion, provide concise A/B/C/D descriptors using observable behaviors.\n"
    "Return the rubric in clean Markdown format with the following structure:\n\n"
    "## Rubric\n"
    "### Criterion Name\n"
    "- **Weight (%):** \n"
    "- **A:** \n"
    "- **B:** \n"
    "- **C:** \n"
    "- **D:** \n\n"
    "Include a short notes section at the end summarizing key expectations or grading guidance."
)


In [18]:
english_rubric, rubric_usage = text_generation(
    system_role=rubric_agent,
    prompt=rubric_instructions,
    model="gpt-4o-mini",
    temperature=0.3,
)

to_markdown(english_rubric)

## Rubric

### Thesis Statement
- **Weight (%):** 20
- **A:** Clear, insightful thesis that directly addresses how Iago exploits Othello’s insecurities; sets a strong foundation for the essay.
- **B:** Clear thesis that addresses the prompt but may lack depth or specificity; provides a solid foundation.
- **C:** Thesis is present but vague or only partially addresses the prompt; lacks clarity in direction.
- **D:** No clear thesis or completely off-topic; does not address the prompt.

### Evidence and Analysis
- **Weight (%):** 30
- **A:** Provides multiple relevant textual examples that are thoroughly analyzed; demonstrates deep understanding of the text.
- **B:** Provides relevant examples with some analysis; shows a good understanding of the text but may lack depth.
- **C:** Provides few examples; analysis is superficial or only loosely connected to the thesis.
- **D:** Lacks relevant examples or analysis; fails to engage with the text meaningfully.

### Organization and Structure
- **Weight (%):** 20
- **A:** Well-organized essay with clear, logical flow; effective use of paragraphs and transitions enhances readability.
- **B:** Generally organized with a logical flow; some transitions may be awkward but overall structure is clear.
- **C:** Organization is present but may be confusing; paragraphs may lack clear focus or transitions.
- **D:** Poorly organized; lacks clear structure and flow, making it difficult to follow the argument.

### Language and Style
- **Weight (%):** 15
- **A:** Uses varied and sophisticated language; tone is appropriate for an academic essay; few to no grammatical errors.
- **B:** Language is clear and appropriate; some variety in sentence structure; few grammatical errors.
- **C:** Language is basic or repetitive; tone may not be fully appropriate; several grammatical errors present.
- **D:** Language is unclear or inappropriate; frequent grammatical errors hinder understanding.

### Conclusion
- **Weight (%):** 15
- **A:** Strong conclusion that effectively summarizes key points and reinforces the thesis; leaves a lasting impression.
- **B:** Clear conclusion that summarizes main points but may not fully reinforce the thesis or provide insight.
- **C:** Conclusion is present but weak; may simply restate points without adding depth or insight.
- **D:** No conclusion or a conclusion that fails to summarize or connect to the thesis.

### Notes
- Ensure your essay directly addresses the prompt and provides a clear argument supported by textual evidence.
- Pay attention to the organization of your ideas; each paragraph should focus on a single point that supports your thesis.
- Use varied language and maintain an academic tone throughout your essay.
- Proofread for grammatical errors and clarity before submission.

In [19]:
summarize_usage_cost(rubric_usage)

{'model': 'gpt-4o-mini',
 'input_tokens': 158,
 'output_tokens': 565,
 'total_tokens': 723,
 'cost_usd': 0.000363}

----

## Essay Grading

Now we can use the LLM to grade the essay! This will allow us to double check that our output matches expectations!

### Grading Agent

Once again, we can create another agent, but this one for grading the essay!

In [20]:
def build_grading_agent(grade_level: str, subject: str, assignment_type: str, topic: str) -> str:
    """
    Build a teacher persona for grading essay.
    Keeps guidance in the system message; the task/prompt comes separately.
    """
    return (
        f"You are a {subject} teacher carefully grading a {assignment_type} "
        f"on the topic: {topic}, for {grade_level} students. "
        "Be fair, consistent, and concise."
    )

In [21]:
grading_agent = build_grading_agent(
    grade_level="11th grade",
    subject="English",
    assignment_type="essay",
    topic="How does Iago exploit Othello’s insecurities to drive the tragedy in *Othello*?"
)

In [22]:
grading_instructions = (
    "Evaluate the essay using the provided rubric. Output in Markdown only.\n\n"
    "SCORING SCALES:\n"
    "- Levels (both overall and per-category):\n"
    "  - A = Excellent (90–100)\n"
    "  - B = Good (80–89)\n"
    "  - C = Satisfactory (70–79)\n"
    "  - D = Below Average (60–69)\n"
    "  - F = Failing (0–59)\n"
    "- Numeric Scores: Use integers 0–100. Choose a score consistent with the level’s band.\n"
    "- Overall Score: Compute a weighted average of category scores using rubric weights; if no weights are given, weight categories equally. Round to nearest integer.\n\n"
    "JUSTIFICATION RULES:\n"
    "- Tie every justification to observable features in the essay and to the rubric’s descriptors.\n"
    "- Be concise (1–2 sentences per category) and avoid vague language.\n"
    "- Do not introduce criteria that are not in the rubric.\n\n"
    "OUTPUT FORMAT (Markdown):\n"
    "## Overall Grade\n"
    "- **Level:** A|B|C|D|F\n"
    "- **Score:** 0–100\n"
    "- **Summary (2–3 sentences):** Brief rationale referencing key strengths/weaknesses relative to the rubric.\n\n"
    "## Category Breakdown (mirror rubric order)\n"
    "For each category from the rubric, include:\n"
    "- **Level:** A|B|C|D|F (use the bands above)\n"
    "- **Score:** 0–100 (consistent with the level band)\n"
    "- **Justification (1–2 sentences):** Reference specific evidence and the rubric language.\n\n"
    "CONSTRAINTS:\n"
    "- Mirror the rubric categories exactly and in the same order.\n"
    "- If rubric weights are present, respect them; otherwise assume equal weights.\n"
    "- Keep the tone professional and concise.\n"
    "- Do not add extra sections or prose outside the specified format."
)



### Grading A Level Essay

In [23]:
A_level_essay_graded, A_level_essay_graded_usage = text_generation(
    system_role=grading_agent,
    prompt=grading_instructions,
    sections={"RUBRIC": english_rubric, "ESSAY TO GRADE": A_level_essay},
    temperature=0.2,
)

to_markdown(A_level_essay_graded)

## Overall Grade
- **Level:** A
- **Score:** 95
- **Summary:** The essay presents a clear and insightful thesis that effectively addresses how Iago exploits Othello's insecurities. It is well-supported with relevant textual evidence and demonstrates a deep understanding of the play's themes and characters.

## Category Breakdown

### Thesis Statement
- **Level:** A
- **Score:** 95
- **Justification:** The thesis is clear and insightful, directly addressing how Iago exploits Othello's insecurities and setting a strong foundation for the essay.

### Evidence and Analysis
- **Level:** A
- **Score:** 95
- **Justification:** The essay provides multiple relevant textual examples that are thoroughly analyzed, demonstrating a deep understanding of the text and effectively linking back to the thesis.

### Organization and Structure
- **Level:** A
- **Score:** 90
- **Justification:** The essay is well-organized with a clear logical flow; effective use of paragraphs and transitions enhances readability, though a few transitions could be smoother.

### Language and Style
- **Level:** A
- **Score:** 90
- **Justification:** The language is varied and sophisticated, maintaining an appropriate academic tone with few grammatical errors, contributing to the overall clarity of the argument.

### Conclusion
- **Level:** A
- **Score:** 95
- **Justification:** The conclusion effectively summarizes key points and reinforces the thesis, leaving a lasting impression on the reader and providing insight into the play's themes.

In [24]:
summarize_usage_cost(A_level_essay_graded_usage)

{'model': 'gpt-4o-mini',
 'input_tokens': 1795,
 'output_tokens': 318,
 'total_tokens': 2113,
 'cost_usd': 0.00046}

### Grading C Level Essay

In [25]:
C_level_essay_graded, C_level_essay_graded_usage = text_generation(
    system_role=grading_agent,
    prompt=grading_instructions,
    sections={"RUBRIC": english_rubric, "ESSAY TO GRADE": C_level_essay},
    temperature=0.2,
)

to_markdown(C_level_essay_graded)

## Overall Grade
- **Level:** B
- **Score:** 85
- **Summary:** The essay presents a clear thesis and relevant examples, but lacks depth in analysis and organization. While it effectively addresses how Iago exploits Othello's insecurities, the insights could be more thoroughly developed.

## Category Breakdown

### Thesis Statement
- **Level:** B
- **Score:** 85
- **Justification:** The thesis clearly addresses the prompt by stating that Iago exploits Othello’s insecurities, but it could benefit from more specificity regarding the mechanisms of this exploitation.

### Evidence and Analysis
- **Level:** B
- **Score:** 80
- **Justification:** The essay provides relevant examples of Iago's manipulation, but the analysis is somewhat superficial and lacks deeper exploration of the implications of these actions on Othello's character.

### Organization and Structure
- **Level:** C
- **Score:** 75
- **Justification:** The organization is present, but the flow between ideas is occasionally awkward, and some paragraphs lack clear focus, making it harder to follow the argument.

### Language and Style
- **Level:** B
- **Score:** 80
- **Justification:** The language is generally clear and appropriate for an academic essay, though it lacks some variety in sentence structure and contains a few informal phrases that detract from the overall tone.

### Conclusion
- **Level:** B
- **Score:** 85
- **Justification:** The conclusion effectively summarizes the main points and reinforces the thesis, but it could provide more insight into the broader implications of Othello's tragedy.

In [26]:
summarize_usage_cost(C_level_essay_graded_usage)

{'model': 'gpt-4o-mini',
 'input_tokens': 1437,
 'output_tokens': 338,
 'total_tokens': 1775,
 'cost_usd': 0.000418}

## Forward

You have now seen how we can use LLMs to generate synthetic essays through **role-based prompting**. In this notebook, we defined a single **Student Agent** — an 11th-grade student writing for an English class — whose role establishes a consistent, realistic writing voice. By clearly defining the agent’s perspective and pairing it with detailed prompts, we can guide the model to produce essays of varying quality and tone.  

However, just because we instructed the model to create essays to reflect a certain quality, it may not actually reflect the quality. This is the subjective part,

*EDIT THIS*

Now, the rest of the notebook is up to you — it’s your turn to experiment! Try extending what you’ve learned in creative ways. For example, you can:

- Generate **D- or F-level** essays to simulate weaker student writing  
- **Adjust the prompts** to explore different subjects or essay types, or adjust the existing prompts! Perhaps these essays don't actually reflect A or C level quality.
- **Modify the Student Agent** to represent different grade levels or academic disciplines  
- Experiment with new **output requirements** such as word limits, tone, or structure  

Exploring these variations will help you better understand how **role definition** and **prompt design** shape the quality, depth, and realism of LLM-generated writing.

