<a href="https://colab.research.google.com/github/Sam-Gartenstein/GenAI-Engineering-Workshop/blob/main/Part%202%3A%20Using%20LLM%20Agents%20for%20Essay%20Generation%20and%20Evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Part 2: Using LLM Agents for Essay Generation and Evaluation

You have completed [Part 1: Guide to OpenAI in Google Colab](https://github.com/Sam-Gartenstein/GenAI-Engineering-Workshop/blob/main/Part%201%3A%20Guide%20to%20OpenAI%20in%20Google%20Colab.ipynb). Now, let’s apply those skills to a real-world use case.  

In this section, you’ll learn about **[role-based collaboration](https://www.ibm.com/think/topics/multi-agent-collaboration)**, a framework where multiple LLMs work together by taking on distinct roles with specific purposes. Each agent performs its own task and passes the result to the next, creating a seamless workflow powered by **prompt chaining**.  

We will use generative AI to:

- **Create** an 11th-grade English essay (Student Agent)  
- **Design** a rubric to evaluate the essay (Rubric Agent)  
- **Grade** the essay using the rubric (Grading Agent)  

By the end of this section, you’ll understand how to design and connect multiple role-based agents to simulate a realistic classroom workflow — from writing to evaluation to grading.


First, let's repeat the steps we followed in the previous notebook, which was importing the necessary libraries and loading our key.

**Note**: Once you have made a key in colab, it will automatically save, meaning you can use it in other notebooks. When running line two for the first time, you will get a pop up window asking you to grant access.

In [20]:
import openai
from openai import OpenAI
from google.colab import userdata
import os

In [21]:
# Pull your saved secret into an environment variable
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")

# Test if the key is available (without printing it)
if not os.getenv("OPENAI_API_KEY"):
    raise RuntimeError("OPENAI_API_KEY is not set. Add it via Colab Secrets (🔑) and try again.")
else:
    print("Key loaded?", True)

Key loaded? True


## Generating Essays

Let's start by generating two essays on the same topic — **Shakespeare’s *Othello***. We’ll create one **A-level** and one **C-level** essay to compare how writing quality changes when the same Student Agent follows different performance prompts.

The function we use, `generate_essay`, produces a single essay at a time.  
In addition to the main inputs (`system_role`, `prompt`, `model`, and `temperature`), it also takes two key arguments:

- **`topic`** — The essay question or theme.  
  This defines *what* the student is writing about. For example:  
  *“Discuss how jealousy shapes Othello’s decisions and relationships.”*

- **`grade_level`** — A label that identifies the educational level or target audience for the essay.  
  Here, it helps maintain context, such as *“11th grade,”* so the model writes with an appropriate voice and complexity. If not familiar with the US education system, **please read** this [article](https://usahello.org/education/children/grade-levels/) explaining the US grade systems.

By generating both essays on the same topic, we can clearly see how **prompt design** shapes differences in reasoning depth, structure, and tone.

Additionally, we will use the markdown format function!


In [22]:
def generate_essay(
    system_role: str,
    prompt: str,
    topic: str,
    grade_level: str,
    model: str = "gpt-4o-mini",
    temperature: float = 0.2,
) -> str:
    """
    Generate a synthetic essay using the OpenAI Responses API.
    Keep it simple: pass output requirements inside `prompt`.
    """
    user_content = (
        f"{prompt}\n\n"
        f"ESSAY TOPIC / QUESTION:\n{topic}\n\n"
        f"TARGET GRADE LEVEL: {grade_level}\n"
    )

    resp = client.responses.create(
        model=model,
        input=[
            {"role": "system", "content": system_role},
            {"role": "user", "content": user_content},
        ],
        temperature=temperature,
    )
    return resp.output_text

In [23]:
from IPython.display import display, Markdown  # Tools for displaying formatted text in Jupyter Notebooks

def to_markdown(text):
    # Convert the provided text to Markdown format for better display in Jupyter Notebooks
    return Markdown(text)


## Essay Generation

As shown in [Part 1](https://github.com/Sam-Gartenstein/GenAI-Engineering-Workshop/blob/main/Part%201%3A%20Guide%20to%20OpenAI%20in%20Google%20Colab.ipynb) of this course series, we will be using system/user role prompting. Our goal is to generate essays of varying performance levels, with one consistent Student Agent representing the writer’s identity, and the user prompt controlling the essay’s quality.

The Student Agent defines the persona — a realistic high school student writing for an English class — while the prompt specifies the assignment requirements and the expected performance level. By adjusting the temperature and prompt details, we can simulate essays ranging from polished and insightful (A-level) to basic and error-prone (C-level).

This setup allows us to maintain a consistent writing voice while producing diverse samples for grading, evaluation, or fine-tuning models that assess writing quality.

### Understanding Letter Grades

In many U.S. schools and universities, essays are graded on a **letter scale** from **A** to **F**. For more information, please read [Grading System in USA](https://www.canamgroup.com/blog/grading-system-in-usa).

First, as always, let's creates a client object.

In [24]:
client = OpenAI()

### Student Agent

The Student Agent represents an 11th-grade student writing an essay for an English class. This role defines the writer’s voice — thoughtful but not overly advanced — ensuring consistency across essays of different quality levels.


In [25]:
student_agent = (
    "You are an 11th-grade high school student writing an essay for your English class. "
    "Write in a realistic student voice — thoughtful but not overly advanced. "
    "Use natural language, as if written by a real student rather than an adult academic."
)

### A-Level Prompt

The **A-Level Prompt** instructs the model to produce an outstanding literary analysis essay that meets the highest standards of academic writing. An *A-level* essay demonstrates deep understanding, originality, and strong analytical reasoning. It presents a clear, arguable thesis — placed at the end of the introduction — and supports that argument with carefully selected textual evidence and insightful interpretation.  

Writing at this level should be polished, precise, and logically structured. The essay must avoid casual tone, first-person language, and rhetorical questions, maintaining a formal academic voice throughout. Each paragraph should begin with a clear topic sentence and connect cohesively to the thesis, creating a unified, persuasive argument.  

In our prompt, the model is **explicitly guided** to follow these conventions. It is asked to use short, purposeful quotations from the play and analyze *how* they support the central argument, emphasizing analysis over summary. The target length is approximately **450–600 words**, with citations formatted as *(act.scene.line)* (e.g., *3.3.167*). These clear and specific instructions ensure that the model’s output resembles a well-crafted, high-performing literary analysis essay appropriate for an advanced high school student.


In [26]:
A_level_prompt = (
    "Write an A-level literary analysis essay that follows the detailed guidelines below.\n\n"
    "What 'A-level' means: excellent or outstanding work — demonstrates a deep and "
    "original understanding of the text, presents a clear and arguable thesis, and supports claims with precise "
    "evidence and insightful analysis. Writing should be organized, polished, and stylistically sophisticated.\n\n"
    "OUTPUT REQUIREMENTS:\n"
    "- Include a clear, arguable thesis at the END of the introduction.\n"
    "- Use concise, well-chosen quotations from the play and explain HOW they support your argument "
    "(analysis > summary).\n"
    "- Ensure each paragraph has a strong topic sentence and smooth transitions; every idea should advance the thesis.\n"
    "- Maintain a precise, academic tone throughout; avoid first person, rhetorical questions, or casual phrasing.\n"
    "- Aim for ~450–600 words.\n"
    "- Cite quotations using act.scene.line format (e.g., 3.3.167).\n"
)


In [27]:
topic = "How does Iago exploit Othello’s insecurities to drive the tragedy in *Othello*?"
grade_level = "11th grade"

A_level_essay = generate_essay(
    system_role=student_agent,
    prompt=A_level_prompt,
    topic=topic,
    grade_level=grade_level,
    model="gpt-4o-mini",
    temperature=0.3,
)


In [28]:
to_markdown(A_level_essay)

**Title: The Manipulation of Insecurity in Shakespeare's *Othello***

In William Shakespeare's *Othello*, the themes of jealousy and manipulation are intricately woven into the fabric of the narrative, primarily through the character of Iago. Iago's cunning exploitation of Othello's insecurities serves as a catalyst for the tragic events that unfold. By preying on Othello's vulnerabilities regarding his race, status, and relationship with Desdemona, Iago orchestrates a downfall that ultimately leads to the destruction of Othello's character and life. This essay argues that Iago's manipulation of Othello's insecurities not only drives the tragedy but also highlights the destructive power of deceit and self-doubt.

Iago's exploitation of Othello's racial insecurities is one of the most prominent ways he manipulates the Moor. Othello, a black man in a predominantly white Venetian society, grapples with feelings of alienation and inadequacy. Iago recognizes this vulnerability and uses it to his advantage. For instance, when Iago refers to Othello as "the Moor" (1.3.333), he reduces Othello to his race, emphasizing his otherness and reinforcing societal prejudices. This constant reminder of Othello's racial identity serves to undermine his confidence, making him more susceptible to Iago's manipulations. By exploiting these insecurities, Iago effectively plants the seeds of doubt in Othello's mind regarding his worthiness and place within Venetian society.

Moreover, Iago capitalizes on Othello's insecurities about his status and experience as a military leader. Despite his accomplishments, Othello often feels the need to prove himself, particularly in his relationship with Desdemona. Iago exploits this by suggesting that Desdemona's love is fickle and that she may be unfaithful. He states, "Blessed fig's-end! The wine she drinks is made of grapes" (2.3.319), implying that Desdemona's affections are superficial and that she is unworthy of Othello's devotion. This manipulation feeds into Othello's insecurities about his worthiness as a husband, making him question Desdemona's loyalty. Iago's insinuations create a chasm between Othello and Desdemona, ultimately leading Othello to act out of jealousy and rage rather than reason.

Iago's manipulation extends to Othello's relationship with Desdemona, further exacerbating Othello's insecurities. Othello's love for Desdemona is profound, yet it is also marked by a fear of inadequacy. Iago exploits this fear by suggesting that Desdemona is involved with Cassio, Othello's lieutenant. He strategically uses the handkerchief, a symbol of Othello's love, as a tool of deception. When Iago states, "I will in Cassio's lodging lose this napkin" (3.3.438), he orchestrates a scenario that leads Othello to believe in Desdemona's infidelity. This manipulation not only intensifies Othello's insecurities but also drives him to a point of irrationality, culminating in tragic consequences. The handkerchief, once a token of love, becomes a symbol of betrayal, illustrating how Iago's machinations distort Othello's perception of reality.

In conclusion, Iago's exploitation of Othello's insecurities regarding race, status, and love is central to the tragic trajectory of *Othello*. Through cunning manipulation and deceit, Iago transforms Othello's vulnerabilities into instruments of his downfall. The tragedy of Othello is not merely a tale of jealousy but a profound exploration of how insecurities can be weaponized, leading to devastating consequences. Shakespeare's portrayal of Iago's manipulation serves as a timeless reminder of the destructive power of deceit and the fragility of human trust.

-----

### C-Level Prompt

The **C-Level Prompt** instructs the model to produce an average or barely passing essay. A *C-level* essay demonstrates basic comprehension of the topic but lacks depth, organization, and stylistic control. It may have a weak or missing thesis, minimal or poorly integrated evidence, repetitive ideas, and noticeable grammar or usage mistakes.  

Writing at this level should sound more casual and unpolished — including filler phrases such as “I think” or “in my opinion” and avoiding strong transitions like “however” or “therefore.” The essay should be short, approximately four paragraphs (200–300 words), and contain a few minor grammatical errors that sound natural for a struggling student writer.  

Like before, we provide **explicit instructions** in the prompt to guide the model’s behavior. The detailed output requirements ensure the essay intentionally reflects weaker academic performance while still maintaining a realistic student voice.

In [29]:
C_level_prompt = (
    "Write an essay responding to the question below.\n\n"
    "IMPORTANT: Produce a deliberately poor-quality **C-level** essay for teaching purposes.\n"
    "What 'C-level' means: average or barely passing work — shows some understanding "
    "but lacks depth or polish; weak thesis or none, little/no evidence, poor organization, repetition/vagueness, "
    "and noticeable grammar/style mistakes.\n\n"
    "OUTPUT REQUIREMENTS:\n"
    "- Return ONLY the essay body (no headings or bullets).\n"
    "- Do NOT provide a clear thesis; keep the main claim vague.\n"
    "- Provide minimal or poorly integrated evidence (generic statements are fine; avoid direct quotations).\n"
    "- Organization should be weak (some repetition or loosely connected ideas is acceptable).\n"
    "- Keep it short: 4 paragraphs (~200–300 words).\n"
    "- Include a few minor grammar/usage mistakes naturally (e.g., comma splices, agreement issues).\n"
    "- Casual, somewhat imprecise tone is acceptable.\n"
    "- Use a few filler phrases like 'I think' or 'in my opinion,' and avoid strong transitions like 'however' or 'therefore.'"
)


C_level_essay = generate_essay(
    system_role=student_agent,
    prompt=C_level_prompt,
    topic=topic,
    grade_level=grade_level,
    model="gpt-4o-mini",
    temperature=0.3,
)

In [30]:
to_markdown(C_level_essay)

In the play *Othello*, Iago is a character who takes advantage of Othello's insecurities. Othello is a Moor and feels like an outsider in Venice, which makes him vulnerable. I think Iago knows this and uses it to manipulate Othello. He plays on Othello's doubts about his worthiness and his relationship with Desdemona. This is important because it shows how Iago can twist Othello's mind and make him believe things that aren't true.

One way Iago exploits Othello's insecurities is by planting seeds of doubt. He suggests that Desdemona might be unfaithful, which really gets to Othello. It's like Iago knows exactly what to say to make Othello question everything. Othello's jealousy grows, and it seems like he can't control it. This jealousy is a big part of the tragedy because it leads Othello to make terrible decisions. Iago's manipulation is really effective because Othello already has these feelings of inadequacy.

Also, Iago is really good at pretending to be Othello's friend. He acts like he cares about Othello, but really, he's just using him. This makes it easier for Iago to get into Othello's head. Othello trusts Iago, which is a big mistake. I think this trust is what makes the tragedy even worse because Othello believes Iago's lies without questioning them. It’s like he’s blind to the truth.

In conclusion, Iago's exploitation of Othello's insecurities is a major factor in the tragedy of the play. Othello's feelings of being an outsider and his jealousy are manipulated by Iago, which leads to a lot of chaos. I think this shows how powerful manipulation can be, especially when someone is already struggling with their own issues. Overall, it's a sad story about trust and betrayal.

----

## Rubric Generation

Great! We’ve now generated two essays on the same topic but with different performance levels. The next step is to create a **rubric** that can be used to evaluate them.

This rubric function works almost exactly like the `generate_essay` function. It still takes in the same core arguments — `system_role`, `prompt`, `topic`, and `grade_level`. The only difference is in how the user message is structured.

Inside the `user_content`, instead of including `"ESSAY TOPIC / QUESTION:\n{topic}\n\n"` we replace it with

`"RUBRIC TOPIC / CONTEXT:\n{topic}\n\n"`.

This small change tells the model that it should generate an **evaluation framework** rather than an essay. The output will describe the criteria, expectations, and performance distinctions (for example, what separates A-level from C-level work) for that specific topic or assignment.


In [31]:
def generate_essay_rubric(
    system_role: str,
    prompt: str,          # include OUTPUT REQUIREMENTS here (categories, A–F bands, format)
    topic: str,           # e.g., the essay question or assignment name
    grade_level: str,     # e.g., "11th grade"
    model: str = "gpt-4o-mini",
    temperature: float = 0.1,
) -> str:
    """
    Generate a grading rubric using the OpenAI Responses API.
    Keep it simple: pass rubric requirements inside `prompt`.
    """
    user_content = (
        f"{prompt}\n\n"
        f"RUBRIC TOPIC / CONTEXT:\n{topic}\n\n"
        f"TARGET GRADE LEVEL: {grade_level}\n"
    )

    resp = client.responses.create(
        model=model,
        input=[
            {"role": "system", "content": system_role},
            {"role": "user", "content": user_content},
        ],
        temperature=temperature,
    )
    return resp.output_text

### First Teacher Agent and Prompt

Now, we can create our first **Teacher Agent**, which will be responsible for designing grading rubrics. Since this agent focuses on rubric generation rather than essay writing, we’ll call it the `rubric_agent`.

The `rubric_agent` represents a **high school English teacher** who creates clear, student-facing grading rubrics. This agent interprets essay prompts and outlines performance expectations across categories such as Thesis, Evidence, Analysis, Organization, and Mechanics.

To generate a rubric, we pass in a short instructional prompt that tells the model to create a structured grading guide. The output should include 4–5 evaluation categories, each with descriptions for performance levels **A through F**, written concisely in Markdown format. The temperature is kept low (0.1) to ensure consistent and structured output.


In [32]:
rubric_agent = (
    "You are a high school English teacher. "
    "Design clear, student-facing grading rubrics."
)

prompt = (
    "Create a rubric for the essay prompt below.\n\n"
    "OUTPUT REQUIREMENTS:\n"
    "- Use 4–5 categories (e.g., Thesis, Evidence, Analysis, Organization, Mechanics).\n"
    "- For each category, define performance levels: A, B, C, D, F.\n"
    "- Keep descriptions concise and specific.\n"
    "- Format in Markdown with headings and bullet points."
)

In [33]:
topic = "How does Iago exploit Othello’s insecurities to drive the tragedy in *Othello*?"

othello_rubric = generate_essay_rubric(
    system_role=rubric_agent,
    prompt=prompt,
    topic=topic,
    grade_level="11th grade",
    model="gpt-4o-mini",
    temperature=0.1,
)

In [34]:
to_markdown(othello_rubric)

# Essay Grading Rubric: Iago's Exploitation of Othello's Insecurities

## Categories

### 1. Thesis
- **A**: Clear, insightful thesis that directly addresses the prompt and presents a unique perspective.
- **B**: Clear thesis that addresses the prompt but lacks depth or originality.
- **C**: Thesis is present but vague or only partially addresses the prompt.
- **D**: Weak thesis that does not clearly address the prompt.
- **F**: No thesis present.

### 2. Evidence
- **A**: Uses multiple, relevant textual examples that effectively support the thesis.
- **B**: Uses relevant textual examples, but may lack variety or depth.
- **C**: Uses some textual examples, but they are not always relevant or well-integrated.
- **D**: Few examples provided, and they are mostly irrelevant or poorly integrated.
- **F**: No textual evidence provided.

### 3. Analysis
- **A**: Insightful analysis that connects evidence to the thesis and explores implications deeply.
- **B**: Clear analysis that connects evidence to the thesis but lacks depth in exploration.
- **C**: Basic analysis that connects some evidence to the thesis but is often superficial.
- **D**: Minimal analysis; connections between evidence and thesis are unclear or weak.
- **F**: No analysis provided.

### 4. Organization
- **A**: Well-structured essay with clear, logical progression of ideas and effective transitions.
- **B**: Generally organized with a logical flow, but may have minor issues with transitions.
- **C**: Some organization present, but ideas may be jumbled or lack clear transitions.
- **D**: Poorly organized; ideas are difficult to follow and lack logical progression.
- **F**: No discernible organization.

### 5. Mechanics
- **A**: Virtually no errors in grammar, punctuation, or spelling; writing is polished.
- **B**: Few minor errors that do not interfere with understanding.
- **C**: Some errors in grammar, punctuation, or spelling that occasionally hinder clarity.
- **D**: Frequent errors that significantly interfere with understanding.
- **F**: Numerous errors that make the essay difficult to read.

---

### Total Score Interpretation
- **A (90-100)**: Exceptional understanding and execution of the prompt.
- **B (80-89)**: Good understanding with minor issues.
- **C (70-79)**: Satisfactory understanding but lacks depth.
- **D (60-69)**: Poor understanding; significant issues present.
- **F (below 60)**: Unsatisfactory; fails to meet basic requirements.

## Essay Grading

Now, let's get to the heart of this task, using the LLM to actually grade the essay! In addition to our normal arguments, this function will now take in the essay text and our rubric!

In [35]:
def grade_essay_with_rubric_md(
    system_role: str,
    prompt: str,          # tell it HOW to grade + the desired Markdown format
    rubric_md: str,       # the rubric text (Markdown or plain text)
    essay_text: str,      # the student essay
    grade_level: str,     # e.g., "11th grade"
    model: str = "gpt-4o-mini",
    temperature: float = 0.2, # low temperature for consistency
) -> str:
    """
    Grade an essay using the provided rubric and return a concise Markdown report.
    The model should mirror whatever categories exist in rubric_md.
    """
    user_content = (
        f"{prompt}\n\n"
        f"RUBRIC (Markdown allowed):\n{rubric_md}\n\n"
        f"ESSAY TO GRADE:\n{essay_text}\n\n"
        f"TARGET GRADE LEVEL: {grade_level}\n"
    )

    resp = client.responses.create(
        model=model,
        input=[
            {"role": "system", "content": system_role},
            {"role": "user", "content": user_content},
        ],
        temperature=temperature,
    )
    return resp.output_text


### Second Teacher Agent and Prompt

Let's define the grading agent! The **Grading Agent** is a high school English teacher whose job is to evaluate an essay **using the provided rubric**. This agent prioritizes fairness, consistency, and clarity. Feedback is professional and concise.

The grading prompt instructs the model to return a **Markdown report** with two sections:

- **Overall Grade**: includes a performance level (A–F), a numeric score (0–100), and a brief 2–3 sentence summary.
- **Category Breakdown**: follows the rubric’s categories **exactly and in order**, giving each a level (A–F), a numeric score (0–100), and a 1–2 sentence justification tied to the rubric criteria.

Rules enforced by the prompt:
- Mirror the rubric categories verbatim and keep the same order.
- Base judgments on evidence from the essay and the rubric (no invented criteria).
- Keep explanations brief, specific, and directly connected to the rubric.

In [36]:
grading_agent = (
    "You are a high school English teacher and careful grader. "
    "Use the provided rubric to evaluate the essay fairly and consistently. "
    "Your feedback should be clear, concise, and professional."
)

prompt = (
    "Evaluate the essay using the provided rubric.\n\n"
    "OUTPUT FORMAT (Markdown):\n"
    "## Overall Grade\n"
    "- **Level:** A|B|C|D|F\n"
    "- **Score:** 0–100\n"
    "- **Summary (2–3 sentences):** …\n\n"
    "## Category Breakdown\n"
    "For each category from the rubric (in the same order), include:\n"
    "- **Level:** A|B|C|D|F\n"
    "- **Score:** 0–100\n"
    "- **Justification (1–2 sentences):** specific, tied to the rubric\n\n"
    "Rules:\n"
    "- Mirror the rubric categories exactly and in order.\n"
    "- Be concise and evidence-based.\n"
    "- Do not invent categories that are not in the rubric."
)

#### Grading A Level Essay

In [37]:
A_essay_eval = grade_essay_with_rubric_md(
    system_role=grading_agent,
    prompt=prompt,
    rubric_md=othello_rubric,
    essay_text=A_level_essay,
    grade_level="11th grade",
)
to_markdown(A_essay_eval)

## Overall Grade
- **Level:** A
- **Score:** 95
- **Summary:** This essay presents a clear and insightful thesis that effectively addresses the prompt regarding Iago's exploitation of Othello's insecurities. The analysis is deep and well-supported by relevant textual evidence, demonstrating a strong understanding of the play's themes.

## Category Breakdown

### 1. Thesis
- **Level:** A
- **Score:** 95
- **Justification:** The thesis is clear and insightful, directly addressing the prompt while presenting a unique perspective on Iago's manipulation of Othello's insecurities.

### 2. Evidence
- **Level:** A
- **Score:** 90
- **Justification:** The essay uses multiple relevant textual examples that effectively support the thesis, although there could be slightly more variety in the examples used.

### 3. Analysis
- **Level:** A
- **Score:** 95
- **Justification:** The analysis is insightful and connects the evidence to the thesis, exploring the implications of Iago's manipulation in depth.

### 4. Organization
- **Level:** A
- **Score:** 95
- **Justification:** The essay is well-structured with a clear progression of ideas and effective transitions, making it easy to follow.

### 5. Mechanics
- **Level:** A
- **Score:** 95
- **Justification:** The writing is polished with virtually no errors in grammar, punctuation, or spelling, contributing to overall clarity.

-----

#### Analysis

The grading output shows that the LLM effectively followed the rubric and grading prompt, producing a clear and well-structured evaluation. It begins with an overall grade that includes the level, score, and summary, followed by a detailed category breakdown that mirrors the rubric. Each section—Thesis, Evidence, Analysis, Organization, and Mechanics—uses consistent formatting and concise justifications tied directly to the rubric.

The model assigns mostly A-level scores, with slight variation such as a 90 for Evidence to reflect minor areas for improvement. Its feedback is professional and balanced, combining praise with targeted suggestions like expanding certain examples. Notably, the model graded the essay as an **A**, aligning perfectly with the essay’s intended quality level. This confirms that the LLM correctly understood both the essay-generation and grading instructions, demonstrating reliable rubric-based evaluation and teacher-like judgment.


#### Grading C Level Essay

In [38]:
C_essay_eval = grade_essay_with_rubric_md(
    system_role=grading_agent,
    prompt=prompt,
    rubric_md=othello_rubric,
    essay_text=C_level_essay,
    grade_level="11th grade",
)
to_markdown(C_essay_eval)

## Overall Grade
- **Level:** C
- **Score:** 75
- **Summary:** The essay presents a satisfactory understanding of Iago's exploitation of Othello's insecurities but lacks depth and a clear thesis. While some relevant examples are provided, the analysis is basic and the organization could be improved.

## Category Breakdown

### 1. Thesis
- **Level:** C
- **Score:** 70
- **Justification:** The thesis is present but vague, primarily stating that Iago exploits Othello's insecurities without offering a unique perspective or deeper insight into the implications.

### 2. Evidence
- **Level:** C
- **Score:** 70
- **Justification:** The essay uses some relevant textual examples, such as Iago's manipulation of Othello's jealousy, but lacks variety and depth in the evidence presented.

### 3. Analysis
- **Level:** C
- **Score:** 65
- **Justification:** The analysis connects some evidence to the thesis but is often superficial, failing to explore the deeper implications of Iago's manipulation and its effects on Othello.

### 4. Organization
- **Level:** C
- **Score:** 70
- **Justification:** The essay has some organization, but the ideas can feel jumbled, and transitions between points are not always clear, making the flow of the argument less effective.

### 5. Mechanics
- **Level:** B
- **Score:** 80
- **Justification:** There are few minor errors in grammar and punctuation, but they do not significantly hinder understanding, indicating a generally polished writing style.

-----

### Analysis

Once again, the LLM effectively followed the rubric and grading prompt to produce a structured and balanced evaluation. The model graded the essay as a **C**, matching the intended performance level from the prompt. The feedback highlights key weaknesses—such as a vague thesis, limited analytical depth, and underdeveloped connections between evidence and argument—while still acknowledging strengths in organization and mechanics. This shows that the model can accurately differentiate between levels of writing quality and apply the rubric criteria fairly and consistently. Overall, the output reflects reliable, evidence-based grading aligned with the expectations of a high school English teacher.


## Forward

You have now seen how we can use LLMs to create agents that write, evaluate, and grade essays through **role-based collaboration**. Each agent — the Student, Teacher, and Grading Agent — performs a distinct function, and together they form a connected workflow powered by **prompt chaining**. This process allows outputs from one model to serve as inputs for another, mirroring how humans collaborate in structured tasks like writing and assessment.

Now, the rest of the notebook is up to you — it’s your turn to experiment! Try extending what you’ve learned in creative ways. For example, you can:

- Generate **D- or F-level** essays to simulate weaker student writing  
- **Adjust the existing prompts** to modify tone, structure, or grading criteria  
- **Create additional agents** such as a Peer Reviewer, Rubric Validator, or Format Checker  
- Modify the **grading rubric** to target different skills or subject areas  

Exploring these variations will help you deepen your understanding of how **prompt design**, **role specialization**, and **collaboration** shape the quality, reasoning, and reliability of LLM-generated outputs.
