# 03 - LLM Judges

This notebook demonstrates using an LLM as a judge to score how well a summary matches an original article.

**Prerequisites:** Set the `OPENAI_API_KEY` environment variable before running.

In [None]:
import math
from openai import AzureOpenAI, OpenAI
import os
from dotenv import load_dotenv
from pydantic import BaseModel, Field

load_dotenv()

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

## Data: FT Article

The source article that will be evaluated against a reference summary.

In [None]:
ft_article = """
https://www.ft.com/content/1cbb0d4a-1491-4403-90b5-b660b64a95f6

Deutsche Bank reports record quarterly profits
Currency and bond trading help drive earnings at German lender

Florian M\u00fcller in Frankfurt

PublishedJAN 29 2026

UpdatedJAN 29 2026, 12:11

Deutsche Bank reported record quarterly profits on Thursday as prosecutors continued searching its offices as part of a money-laundering investigation linked to Russian billionaire Roman Abramovich.

Pre-tax profits at Germany\u2019s biggest lender surged to \u20ac2bn in the quarter, a more than threefold increase from a year ago, powered by rising revenues at its fixed-income and currencies business.

The quarter capped a record year for Deutsche and underlined the progress chief executive Christian Sewing has made in overhauling the bank. For the whole of 2025, the bank\u2019s pre-tax profits rose more than 80 per cent to \u20ac9.7bn.

After years weighed down by litigation over the mis-selling of mortgage-backed securities and the troubled acquisition of retail lender Postbank, Deutsche has returned to profitability under Sewing, sharply reducing legal provisions last year.

Some of the gloss was taken off the results after German prosecutors on Wednesday raided the bank\u2019s Frankfurt headquarters as part of a money-laundering investigation linked to Abramovich. Prosecutors said the searches were continuing on Thursday, with the data transfer still under way.

The probe by prosecutors is connected to the bank\u2019s examinations of transactions between 2013 and 2018 and whether it was too slow to file a suspicious activity report, Sewing said. It is looking into business ties to companies linked to Abramovich, according to a person familiar with the matter.

Abramovich, who was placed under EU sanctions in 2022, has denied any wrongdoing. Deutsche said it was co-operating fully with authorities.

Shares in Deutsche, which have surged more than 60 per cent over the past 12 months, dipped 1.8 per cent in morning trading in Frankfurt.


Sewing said its quarterly results provided \u201cthe strongest possible foundation\u201d for the next phase of Deutsche\u2019s strategy, which targets a return on tangible equity of more than 13 per cent and a cost-income ratio below 60 per cent by 2028.

Revenues at the bank\u2019s fixed-income and currencies business were up 6 per cent, while those at its advisory business fell 9 per cent in the period.

The bank also met strategic targets for the three-year period to 2025, posting a return on tangible equity of 10.3 per cent for 2025.

Deutsche said it had secured regulatory approval for a further \u20ac1bn share buyback and proposed a dividend of \u20ac1 per share, taking total shareholder distributions to 50 per cent of last year\u2019s profit.

Chief financial officer James von Moltke said that while revenues in fixed income and currencies were expected to be flat year on year in the current quarter, management was encouraged by a strong start to January. He is due to hand over to Raja Akram in March.

Separately, Deutsche Bank\u2019s asset manager DWS said late on Wednesday that it had raised its midterm profit target and would pay a special dividend after keeping a tight rein on costs and refraining from large acquisitions.
"""

## Judge Model & Prompt

Define the structured output model and system prompt for the LLM judge.

In [None]:
class JudgeAnswer(BaseModel):
    score: int = Field(description="Integer 1-5, 1 being the worst score and being the best score")
    score_description: str = Field(description="One sentence paraphrasing why the chosen score fits.")
    feedback: str = Field(description="2-4 sentences giving concrete, actionable feedback.")


SYSTEM_PROMPT = """
You are a strict grading assistant.

You will be given:
- An article.
- A reference summary.

You must:
- Compare the article with the reference summary.
- Assign a score from 1 to 5 according to the scale below.
- Provide short, specific feedback.

SCORING SCALE:
1 = Very poor: mostly incorrect, off-topic, or nonsensical.
2 = Poor: major misunderstandings, significant errors or omissions.
3 = Fair: partially correct, but missing important details or somewhat unclear.
4 = Good: mostly correct, minor issues or missing small details.
5 = Excellent: fully correct, complete, clear, and aligned with the reference summary.

Return:
  \"score\": <integer 1-5>,
  \"score_description\": \"<one sentence paraphrasing why the chosen score fits>\",
  \"feedback\": \"<2-4 sentences giving concrete, actionable feedback>\"

"""

## Judge Function

In [None]:
def judge_answer(article: str, reference_summary: str):
    user_prompt = f"""
    Article:
    {article}

    REFERENCE Summary:
    {reference_summary}

    """
    completion = client.beta.chat.completions.parse(
        model="gpt-5.1",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_prompt},
        ],
        response_format=JudgeAnswer,
    )
    return completion.choices[0].message.parsed

## Run the Judge

Provide a candidate summary and let the LLM judge evaluate it against the article.

In [None]:
summary = """
Deutsche Bank reported record quarterly pre-tax profit of \u20ac2bn, more than triple a year earlier, 
mainly driven by growth in its fixed-income and currencies trading business. 
For 2025 as a whole, pre-tax profit jumped over 80% to \u20ac9.7bn, 
and the bank met its strategic target with a 10.3% return on tangible equity. 
The strong results were overshadowed somewhat by ongoing raids by German prosecutors at its Frankfurt headquarters, 
tied to a money-laundering investigation involving past transactions linked to Roman Abramovich 
and possible delays in filing a suspicious activity report; Deutsche says it is fully co-operating. 
The bank announced a further \u20ac1bn share buyback and a \u20ac1 per share dividend, 
returning about half of last year\u2019s profit to shareholders, 
while its shares\u2014up more than 60% over 12 months\u2014fell 1.8% on the day.
"""
llm_judge = judge_answer(ft_article, summary)
print(f"Score: {llm_judge.score}")
print(f"Description: {llm_judge.score_description}")
print(f"Feedback: {llm_judge.feedback}")