# Deploying AI
## Assignment 1: Evaluating Summaries

A key application of LLMs is to summarize documents. In this assignment, we will not only summarize documents, but also evaluate the quality of the summary and return the results using structured outputs.

**Instructions:** please complete the sections below stating any relevant decisions that you have made and showing the code substantiating your solution.

## Select a Document

Please select one out of the following articles:

+ [Managing Oneself, by Peter Druker](https://www.thecompleteleader.org/sites/default/files/imce/Managing%20Oneself_Drucker_HBR.pdf)  (PDF)
+ [The GenAI Divide: State of AI in Business 2025](https://www.artificialintelligence-news.com/wp-content/uploads/2025/08/ai_report_2025.pdf) (PDF)
+ [What is Noise?, by Alex Ross](https://www.newyorker.com/magazine/2024/04/22/what-is-noise) (Web)

# Load Secrets

In [1]:
%load_ext dotenv
%dotenv ../05_src/.secrets

## Load Document

Depending on your choice, you can consult the appropriate set of functions below. Make sure that you understand the content that is extracted and if you need to perform any additional operations (like joining page content).

### PDF

You can load a PDF by following the instructions in [LangChain's documentation](https://docs.langchain.com/oss/python/langchain/knowledge-base#loading-documents). Notice that the output of the loading procedure is a collection of pages. You can join the pages by using the code below.

```python
document_text = ""
for page in docs:
    document_text += page.page_content + "\n"
```

### Web

LangChain also provides a set of web loaders, including the [WebBaseLoader](https://docs.langchain.com/oss/python/integrations/document_loaders/web_base). You can use this function to load web pages.

In [2]:
#user_url = input('please enter your url')
user_url_1 = 'https://www.thecompleteleader.org/sites/default/files/imce/Managing%20Oneself_Drucker_HBR.pdf'
user_url_2 = 'https://www.newyorker.com/magazine/2024/04/22/what-is-noise'
#user_url = 'meaningless.com'

In [3]:
import requests
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.document_loaders import WebBaseLoader
import re


def load_document(user_url):
    '''Loads the content of the url with appropriate LangChain loader'''
    # check content tyle
    try:
        response = requests.get(user_url, stream=True, timeout  = 10)
        content_type = response.headers.get('Content-Type','').lower()

        # if page is pdf
        if 'pdf' in content_type:
            loader = PyPDFLoader(user_url)
            docs = loader.load()
            document_text = ""
            for page in docs:
                document_text += page.page_content + "\n"
            return document_text
        
        # if page is html
        elif 'html' in content_type:
            loader = WebBaseLoader(user_url)
            docs = loader.load()
            return response
        
        # if neither
        else:
            return f"Other ({content_type})"

    except requests.exceptions.RequestException as e:
            return f"Encountered and error: {e}"

def clean_text(text):
     
     """
     
     """
     # Replace newlines and tabs with space
     text = text.replace('\n', ' ').replace('\r', ' ').replace('\t', ' ')
     # Collapse multiple spaces
     text = re.sub(r'\s+', ' ', text)

     # Strip leading/trailing spaces
     return text.strip()


USER_AGENT environment variable not set, consider setting it to identify your requests.


In [4]:
content = load_document(user_url_1)
clean_content = clean_text(content)

clean_content

'www.hbr.org B EST OF HBR 1999 Managing Oneself by Peter F . Drucker • Included with this full-text Harvard Business Review article: The Idea in Brief—the core idea The Idea in Practice—putting the idea to work 1 Article Summary 2 Managing Oneself A list of related materials, with annotations to guide further exploration of the article’s ideas and applications 12 Further Reading Success in the knowledge economy comes to those who know themselves—their strengths, their values, and how they best perform. Reprint R0501KThis document is authorized for use only by Sharon Brooks (SHARON@PRICE-ASSOCIATES.COM). Copying or posting is an infringement of copyright. Please contact customerservice@harvardbusiness.org or 800-988-0886 for additional copies. B EST OF HBR 1999 Managing Oneself page 1 The Idea in Brief The Idea in Practice COPYRIGHT © 2004 HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. We live in an age of unprecedented oppor- tunity: If you’ve got ambition, drive,

In [5]:
len(clean_content)

50920

## Generation Task

Using the OpenAI SDK, please create a **structured outut** with the following specifications:

+ Use a model that is NOT in the GPT-5 family.
+ Output should be a Pydantic BaseModel object. The fields of the object should be:

    - Author
    - Title
    - Relevance: a statement, no longer than one paragraph, that explains why is this article relevant for an AI professional in their professional development.
    - Summary: a concise and succinct summary no longer than 1000 tokens.
    - Tone: the tone used to produce the summary (see below).
    - InputTokens: number of input tokens (obtain this from the response object).
    - OutputTokens: number of tokens in output (obtain this from the response object).
       
+ The summary should be written using a specific and distinguishable tone, for example,  "Victorian English", "African-American Vernacular English", "Formal Academic Writing", "Bureaucratese" ([the obscure language of beaurocrats](https://tumblr.austinkleon.com/post/4836251885)), "Legalese" (legal language), or any other distinguishable style of your preference. Make sure that the style is something you can identify. 
+ In your implementation please make sure to use the following:

    - Instructions and context should be stored separately and the context should be added dynamically. Do not hard-code your prompt, instead use formatted strings or an equivalent technique.
    - Use the developer (instructions) prompt and the user prompt.


In [6]:
from openai import OpenAI
from pydantic import BaseModel, Field, constr
from typing import Annotated

In [7]:
# define the structured schema 

SUMMARY_TOKENS_MAX = 1000
SUMMARY_TOKENS_MIN = 990

In [8]:
# Defining prompt generation functions

def construct_user_prompt(author = 'author', title = 'title',summary_tokens=SUMMARY_TOKENS_MAX):

    prompt = f"""
        Given the following context from an article, do the following:
        1. Identify the article's {author}
        2. Identify the article's  {title}
        3. Construct a statement, no longer than one paragraph, that explains why is this 
           article relevant for an AI professional in their professional development.
        4. Summarize the article concisely and succinctly in no longer than {summary_tokens} tokens

        the article is the following:
        <article>
        {clean_content}
        </article>
    """

    return prompt


def construct_dev_prompt(tone='Victorian English'):
    prompt = f""" 
            You are a professional AI practitioner with a vast amount of experience in the field.
            The responses you come up with have to strictly adhere to the {tone} and be distinguishable to be of that tone,
            meaning you will only be using the vocabulary that is aligned with the tone of {tone}. Include the tone you used in the response.
    """

    return prompt

In [9]:
user_prompt = construct_user_prompt('author', 'title', 'summary_tokens')
dev_prompt = construct_dev_prompt('Victorian English')


In [10]:
# COnstructing the Pydantic Output Object
class Summary(BaseModel):
    
    author: Annotated[
        str,
        constr(strip_whitespace=True, min_length=3),
        Field(description=f"Full name of the author of the article")
    ]

    title: Annotated[
        str,
        constr(strip_whitespace=True, min_length=5),
        Field(description="title of the article")
    ]
    relevance: Annotated[
        str,
        constr(strip_whitespace=True, max_length=500, min_length = 100),
        Field(description = "why is this article relevant for an AI professional in their professional development.")
    ]

    summary: Annotated[
        str,
        constr(strip_whitespace=True, max_length=SUMMARY_TOKENS_MAX, min_length = SUMMARY_TOKENS_MIN),
        Field(description="a concise and succinct summary no longer than 1000 tokens.")
    ]

    tone: Annotated[
        str,
        Field(description = "the tone used to produce the summary")
    ]

    InputTokens: Annotated[
        int | None,
        Field(None, description="number of input tokens (obtain this from the response object)")
    ]

    Outputokens: Annotated[
        int | None,
        Field(None, description="number of input tokens (obtain this from the response object)")
    ]
    

In [25]:
# Putting everything together and calling OpenAI

from openai import OpenAI
client = OpenAI()

response = client.responses.parse(
    model="gpt-4o",
    instructions = dev_prompt,
    input=[
    {"role": "user", 
        "content": user_prompt}
    ],
    text_format = Summary,
    temperature=1.2
)

article_summary = response.output_parsed


In [24]:
#post process - injecting data to the fields
article_summary.InputTokens = response.usage.input_tokens
article_summary.Outputokens = response.usage.output_tokens

In [None]:
# Putting everything in a dictionary so I can see it.
import json

response_dict = json.loads(article_summary.model_dump_json())

for key,val in response_dict.items():
    print(f"**{key}**: {val}")

**author**: Peter F. Drucker
**title**: Managing Oneself
**relevance**: The article is profoundly relevant to an AI professional as it underscores the importance of self-awareness and self-management in a rapidly evolving field. It stresses on understanding one's strengths, work style, and values which are crucial for navigating and excelling in the dynamic realm of AI.
**summary**: Peter F. Drucker expounds upon the imperative of self-management in the modern knowledge economy, where individuals must act as their own chief executives. He posits that success relies on a deep comprehension of one's strengths, weaknesses, values, and preferred work styles. Drucker introduces feedback analysis as a tool to discover true strengths, urging individuals to concentrate on these areas for peak performance. He emphasizes that knowledge workers must consistently manage their personal development and contribute meaningfully to their organizations, fostering a sense of purpose through understanding

In [None]:
summary = response_dict['summary']
summary

"Peter F. Drucker expounds upon the imperative of self-management in the modern knowledge economy, where individuals must act as their own chief executives. He posits that success relies on a deep comprehension of one's strengths, weaknesses, values, and preferred work styles. Drucker introduces feedback analysis as a tool to discover true strengths, urging individuals to concentrate on these areas for peak performance. He emphasizes that knowledge workers must consistently manage their personal development and contribute meaningfully to their organizations, fostering a sense of purpose through understanding and aligning with the values of their work environment."

# Evaluate the Summary

Use the DeepEval library to evaluate the **summary** as follows:

+ Summarization Metric:

    - Use the [Summarization metric](https://deepeval.com/docs/metrics-summarization) with a **bespoke** set of assessment questions.
    - Please use, at least, five assessment questions.

+ G-Eval metrics:

    - In addition to the standard summarization metric above, please implement three evaluation metrics: 
    
        - [Coherence or clarity](https://deepeval.com/docs/metrics-llm-evals#coherence)
        - [Tonality](https://deepeval.com/docs/metrics-llm-evals#tonality)
        - [Safety](https://deepeval.com/docs/metrics-llm-evals#safety)

    - For each one of the metrics above, implement five assessment questions.

+ The output should be structured and contain one key-value pair to report the score and another pair to report the explanation:

    - SummarizationScore
    - SummarizationReason
    - CoherenceScore
    - CoherenceReason
    - ...

In [None]:
from deepeval import evaluate
from deepeval.test_case import LLMTestCase
from deepeval.test_case import LLMTestCaseParams
from deepeval.metrics import SummarizationMetric,GEval


In [26]:
test_case = LLMTestCase(input=clean_content, actual_output=summary)
metric = SummarizationMetric(
    threshold=0.7,
    model="gpt-4o",
    assessment_questions=[
        "Is it up to us to know when to change course?",
        "Is a deep comprehension of one's strengths necessary for success?",
        "Do many people know how they get things done?",
        "Do knowledge workers have to manage their personal development?",
        "Is Peter F. Drucker the name of the author for this article?"
    ]
)


evaluate(test_cases=[test_case], metrics=[metric])

Output()



Metrics Summary

  - ❌ Summarization (score: 0.5, threshold: 0.7, strict: False, evaluation model: gpt-4o, reason: The score is 0.50 because the summary contains significant contradictions and extra information not present in the original text. It incorrectly attributes the invention of feedback analysis to Drucker and misrepresents the focus on weaknesses. Additionally, it introduces concepts like fostering a sense of purpose that are not mentioned in the original text. Furthermore, the summary fails to address specific questions that the original text can answer, such as the role of personal decision-making in changing course and the authorship of the article., error: None)

For test case:

  - input: www.hbr.org B EST OF HBR 1999 Managing Oneself by Peter F . Drucker • Included with this full-text Harvard Business Review article: The Idea in Brief—the core idea The Idea in Practice—putting the idea to work 1 Article Summary 2 Managing Oneself A list of related materials, with anno

EvaluationResult(test_results=[TestResult(name='test_case_0', success=False, metrics_data=[MetricData(name='Summarization', threshold=0.7, success=False, score=0.5, reason='The score is 0.50 because the summary contains significant contradictions and extra information not present in the original text. It incorrectly attributes the invention of feedback analysis to Drucker and misrepresents the focus on weaknesses. Additionally, it introduces concepts like fostering a sense of purpose that are not mentioned in the original text. Furthermore, the summary fails to address specific questions that the original text can answer, such as the role of personal decision-making in changing course and the authorship of the article.', strict_mode=False, evaluation_model='gpt-4o', error=None, evaluation_cost=0.07087000000000002, verbose_logs='Truths (limit=None):\n[\n    "The article \'Managing Oneself\' was written by Peter F. Drucker.",\n    "The article was included in the Best of Harvard Business

# G-Eval: Coherence

In [29]:

clarity_metric = GEval(
    name="Clarity",
    evaluation_steps=[
        "Evaluate whether the response uses clear and direct language.",
        "Check if the explanation avoids jargon or explains it when used.",
        "Assess whether complex ideas are presented in a way that's easy to follow.",
        "Identify any vague or confusing parts that reduce understanding."
    ],
    evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT],
    model="gpt-4o-mini",
)

test_case = LLMTestCase(input=clean_content, actual_output=summary)
evaluate(test_cases=[test_case], metrics=[clarity_metric])

Output()



Metrics Summary

  - ✅ Clarity [GEval] (score: 0.8679178705669168, threshold: 0.5, strict: False, evaluation model: gpt-4o-mini, reason: The response uses clear and direct language, effectively summarizing Drucker's key points about self-management and feedback analysis. It avoids jargon and presents complex ideas in an accessible manner. However, it could benefit from slightly more detail on how to implement the feedback analysis method, which may enhance understanding for readers unfamiliar with the concept., error: None)

For test case:

  - input: www.hbr.org B EST OF HBR 1999 Managing Oneself by Peter F . Drucker • Included with this full-text Harvard Business Review article: The Idea in Brief—the core idea The Idea in Practice—putting the idea to work 1 Article Summary 2 Managing Oneself A list of related materials, with annotations to guide further exploration of the article’s ideas and applications 12 Further Reading Success in the knowledge economy comes to those who know th

EvaluationResult(test_results=[TestResult(name='test_case_0', success=True, metrics_data=[MetricData(name='Clarity [GEval]', threshold=0.5, success=True, score=0.8679178705669168, reason="The response uses clear and direct language, effectively summarizing Drucker's key points about self-management and feedback analysis. It avoids jargon and presents complex ideas in an accessible manner. However, it could benefit from slightly more detail on how to implement the feedback analysis method, which may enhance understanding for readers unfamiliar with the concept.", strict_mode=False, evaluation_model='gpt-4o-mini', error=None, evaluation_cost=0.0017788499999999998, verbose_logs='Criteria:\nNone \n \nEvaluation Steps:\n[\n    "Evaluate whether the response uses clear and direct language.",\n    "Check if the explanation avoids jargon or explains it when used.",\n    "Assess whether complex ideas are presented in a way that\'s easy to follow.",\n    "Identify any vague or confusing parts th

# Enhancement

Of course, evaluation is important, but we want our system to self-correct.  

+ Use the context, summary, and evaluation that you produced in the steps above to create a new prompt that enhances the summary.
+ Evaluate the new summary using the same function.
+ Report your results. Did you get a better output? Why? Do you think these controls are enough?

You can do two things, ask the model and see self critiquing, or ask for the model to reason to imporve the respomse look at slide 60 in the prompt engineering model. You can either do this automatically or you can do it for yourself manually. 


Please, do not forget to add your comments.


# Submission Information

🚨 **Please review our [Assignment Submission Guide](https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md)** 🚨 for detailed instructions on how to format, branch, and submit your work. Following these guidelines is crucial for your submissions to be evaluated correctly.

## Submission Parameters

- The Submission Due Date is indicated in the [readme](../README.md#schedule) file.
- The branch name for your repo should be: assignment-1
- What to submit for this assignment:
    + This Jupyter Notebook (assignment_1.ipynb) should be populated and should be the only change in your pull request.
- What the pull request link should look like for this assignment: `https://github.com/<your_github_username>/production/pull/<pr_id>`
    + Open a private window in your browser. Copy and paste the link to your pull request into the address bar. Make sure you can see your pull request properly. This helps the technical facilitator and learning support staff review your submission easily.

## Checklist

+ Created a branch with the correct naming convention.
+ Ensured that the repository is public.
+ Reviewed the PR description guidelines and adhered to them.
+ Verify that the link is accessible in a private browser window.

If you encounter any difficulties or have questions, please don't hesitate to reach out to our team via our Slack. Our Technical Facilitators and Learning Support staff are here to help you navigate any challenges.
