# Deploying AI
## Assignment 1: Evaluating Summaries

A key application of LLMs is to summarize documents. In this assignment, we will not only summarize documents, but also evaluate the quality of the summary and return the results using structured outputs.

**Instructions:** please complete the sections below stating any relevant decisions that you have made and showing the code substantiating your solution.

## Select a Document

Please select one out of the following articles:

+ [Managing Oneself, by Peter Druker](https://www.thecompleteleader.org/sites/default/files/imce/Managing%20Oneself_Drucker_HBR.pdf)  (PDF)
+ [The GenAI Divide: State of AI in Business 2025](https://www.artificialintelligence-news.com/wp-content/uploads/2025/08/ai_report_2025.pdf) (PDF)
+ [What is Noise?, by Alex Ross](https://www.newyorker.com/magazine/2024/04/22/what-is-noise) (Web)

# Load Secrets

In [119]:
%load_ext dotenv
%dotenv ../05_src/.secrets

The dotenv extension is already loaded. To reload it, use:
  %reload_ext dotenv


## Load Document

Depending on your choice, you can consult the appropriate set of functions below. Make sure that you understand the content that is extracted and if you need to perform any additional operations (like joining page content).

### PDF

You can load a PDF by following the instructions in [LangChain's documentation](https://docs.langchain.com/oss/python/langchain/knowledge-base#loading-documents). Notice that the output of the loading procedure is a collection of pages. You can join the pages by using the code below.

```python
document_text = ""
for page in docs:
    document_text += page.page_content + "\n"
```

### Web

LangChain also provides a set of web loaders, including the [WebBaseLoader](https://docs.langchain.com/oss/python/integrations/document_loaders/web_base). You can use this function to load web pages.

In [120]:
# Should work with any PDF, the html file did not work as it was behind a Paywal. But the text cleaning should work on scraped html as well.

#user_url = input('please enter your url')
user_url_1 = 'https://www.thecompleteleader.org/sites/default/files/imce/Managing%20Oneself_Drucker_HBR.pdf'
#user_url_2 = 'https://www.newyorker.com/magazine/2024/04/22/what-is-noise'
#user_url = 'meaningless.com'

In [None]:
import requests
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.document_loaders import WebBaseLoader
import re

# Load the docs and clean the text
def load_document(user_url):
    '''Loads the content of the url with appropriate LangChain loader'''
    # check content tyle
    try:
        response = requests.get(user_url, stream=True, timeout  = 10)
        content_type = response.headers.get('Content-Type','').lower()

        # if page is pdf
        if 'pdf' in content_type:
            loader = PyPDFLoader(user_url)
            docs = loader.load()
            document_text = ""
            for page in docs:
                document_text += page.page_content + "\n"
            return document_text
        
        # if page is html
        elif 'html' in content_type:
            loader = WebBaseLoader(user_url)
            docs = loader.load()
            return response
        
        # if neither
        else:
            return f"Other ({content_type})"

    except requests.exceptions.RequestException as e:
            return f"Encountered and error: {e}"

def clean_text(text):
     
     """
     Cleans text to get better results. Can be improved with addition of more conditions. However it will
     do for the current task
     """
     # Replace newlines and tabs with space
     text = text.replace('\n', ' ').replace('\r', ' ').replace('\t', ' ')

     # Replace the copyright sign
     text = text.replace("©", "")

     # Remove page1, page 2 etc.
     text = re.sub(r'\bpage\s*\d+\b', '', text, flags=re.IGNORECASE)

     # Collapse multiple spaces
     text = re.sub(r'\s+', ' ', text)

     # Remove URLs (http, https, and www)
     text = re.sub(r'http\S+|www\.\S+', '', text)

    # Remove email addresses
     text = re.sub(r'\S+@\S+', '', text)

    # Remove long sequences of digits or codes (e.g., image IDs, references)
     text = re.sub(r'\b\d{4,}\b', '', text)

    # Remove non-printable and control characters
     text = re.sub(r'[\x00-\x1f\x7f-\x9f]', ' ', text)

    # Remove special symbols and decorative artifacts from graphics
     text = re.sub(r'[■□▪▫▲▼●○–—•]', ' ', text)

    # Replace multiple hyphens or underscores (PDF split lines)
     text = re.sub(r'[_\-]{2,}', ' ', text)

    # Remove line breaks inside words (common in PDFs)
     text = re.sub(r'(\w)-\s+(\w)', r'\1\2', text)

    # Replace multiple line breaks with a single space
     text = re.sub(r'\n+', ' ', text)

    # Remove multiple spaces and normalize whitespace
     text = re.sub(r'\s{2,}', ' ', text).strip()

     # Strip leading/trailing spaces
     return text.strip()


In [122]:
content = load_document(user_url_1)
clean_content = clean_text(content)

# Checking to see if we have content?
clean_content[:1000]

'B EST OF HBR Managing Oneself by Peter F . Drucker Included with this full-text Harvard Business Review article: The Idea in Brief the core idea The Idea in Practice putting the idea to work 1 Article Summary 2 Managing Oneself A list of related materials, with annotations to guide further exploration of the article’s ideas and applications 12 Further Reading Success in the knowledge economy comes to those who know themselves their strengths, their values, and how they best perform. Reprint R0501KThis document is authorized for use only by Sharon Brooks Copying or posting is an infringement of copyright. Please contact or 800-988for additional copies. B EST OF HBR Managing Oneself The Idea in Brief The Idea in Practice COPYRIGHT HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED. We live in an age of unprecedented opportunity: If you’ve got ambition, drive, and smarts, you can rise to the top of your chosen profession regardless of where you started out. But with oppo

In [123]:
len(clean_content)

49066

We basically can give our pipeline a website and get cleaned text in this point. This is ready to be fed into the generation part of the pipeline. 

## Generation Task

Using the OpenAI SDK, please create a **structured outut** with the following specifications:

+ Use a model that is NOT in the GPT-5 family.
+ Output should be a Pydantic BaseModel object. The fields of the object should be:

    - Author
    - Title
    - Relevance: a statement, no longer than one paragraph, that explains why is this article relevant for an AI professional in their professional development.
    - Summary: a concise and succinct summary no longer than 1000 tokens.
    - Tone: the tone used to produce the summary (see below).
    - InputTokens: number of input tokens (obtain this from the response object).
    - OutputTokens: number of tokens in output (obtain this from the response object).
       
+ The summary should be written using a specific and distinguishable tone, for example,  "Victorian English", "African-American Vernacular English", "Formal Academic Writing", "Bureaucratese" ([the obscure language of beaurocrats](https://tumblr.austinkleon.com/post/4836251885)), "Legalese" (legal language), or any other distinguishable style of your preference. Make sure that the style is something you can identify. 
+ In your implementation please make sure to use the following:

    - Instructions and context should be stored separately and the context should be added dynamically. Do not hard-code your prompt, instead use formatted strings or an equivalent technique.
    - Use the developer (instructions) prompt and the user prompt.


In [124]:
from openai import OpenAI
from pydantic import BaseModel, Field, constr
from typing import Annotated
from openai import OpenAI

In [125]:
# define the structured schema 

SUMMARY_TOKENS_MAX = 1000
SUMMARY_TOKENS_MIN = 990
TONE = 'Victorian English'

In [126]:
# Defining prompt generation functions

def construct_user_prompt(author = 'author', title = 'title',summary_tokens=SUMMARY_TOKENS_MAX):

    """Constructs user prompt and we make it as dynamic as possible"""

    prompt = f"""
        Given the following context from an article, do the following:
        1. Identify the article's {author}
        2. Identify the article's  {title}
        3. Construct a statement, no longer than one paragraph, that explains why is this 
           article relevant for an AI professional in their professional development.
        4. Summarize the article concisely and succinctly in no longer than {summary_tokens} tokens

        the article is the following:
        <article>
        {clean_content}
        </article>
    """

    return prompt

def construct_dev_prompt(tone=TONE):
    """Constructs developer  prompt and we make it as dynamic as possible"""
    prompt = f""" 
            You are a professional AI practitioner with a vast amount of experience in the field.
            The responses you come up with have to strictly adhere to the {tone} and be distinguishable to be of that tone,
            meaning you will only be using the vocabulary that is aligned with the tone of {tone}. Include the tone you used in the response.
    """

    return prompt

In [127]:
# Calling prompt generation functions
user_prompt = construct_user_prompt('author', 'title', 'summary_tokens')
dev_prompt = construct_dev_prompt('Victorian English')

In [128]:
# Constructing the Pydantic Output Object
class Summary(BaseModel):
    
    author: Annotated[
        str,
        constr(strip_whitespace=True, min_length=3),
        Field(description=f"Full name of the author of the article")
    ]

    title: Annotated[
        str,
        constr(strip_whitespace=True, min_length=5),
        Field(description="title of the article")
    ]
    relevance: Annotated[
        str,
        constr(strip_whitespace=True, max_length=500, min_length = 100),
        Field(description = "why is this article relevant for an AI professional in their professional development.")
    ]

    summary: Annotated[
        str,
        constr(strip_whitespace=True, max_length=SUMMARY_TOKENS_MAX, min_length = SUMMARY_TOKENS_MIN),
        Field(description="a concise and succinct summary no longer than 1000 tokens.")
    ]

    tone: Annotated[
        str,
        Field(description = "the tone used to produce the summary")
    ]

    InputTokens: Annotated[
        int | None,
        Field(None, description="number of input tokens (obtain this from the response object)")
    ]

    Outputokens: Annotated[
        int | None,
        Field(None, description="number of input tokens (obtain this from the response object)")
    ]
    

In [129]:
# Putting everything together and calling OpenAI API
client = OpenAI()

def call_openAI(user_prompt, response_structure=Summary, instructions=dev_prompt):
    response = client.responses.parse(
        model="gpt-4o",
        instructions = instructions,
        input=[
        {"role": "user", 
            "content": user_prompt}
        ],
        text_format = response_structure,
        temperature=1.2
    )
    return  response

#getting the output
response = call_openAI(user_prompt)
article_summary = response.output_parsed


In [130]:
#post process - injecting data to the fields
article_summary.InputTokens = response.usage.input_tokens
article_summary.Outputokens = response.usage.output_tokens

In [131]:
# Putting everything in a dictionary so I can see it.
import json

response_dict = json.loads(article_summary.model_dump_json())

for key,val in response_dict.items():
    print(f"**{key}**: {val}")

**author**: Peter F. Drucker
**title**: Managing Oneself
**relevance**: For the AI professional, navigating an ever-evolving career landscape is crucial. Drucker's insights into self-management empower AI professionals to harness their strengths effectively, ensuring personal growth aligns with an organization's dynamic needs. Developing self-awareness, understanding one's values, and adapting to changing work environments are essential skills in this age of rapid technological advancement.
**summary**: In this illustrious discourse on self-management, the esteemed Peter F. Drucker advocates for a profound acquaintance with one's intrinsic strengths, values, and preferred manner of engagement, asserting that such awareness forms the keystone of success in the emergent knowledge economy. With companies no longer charting predetermined career paths, individuals must assume the mantle of self-executive, deftly navigating their career trajectories. Drucker elucidates methodologies like fee

In [132]:
#extracting a summary part
summary = response_dict['summary']

In the part above we have fed out clean text to the OpenAi api and received the response. We have forced the structure on the output using Pydantic base model and fields. We then added the response elements for tokens to the object. FInally to check it we put it all in a dictionary and printed it. We then added extracted only the summary part ready for evaluation. 

# Evaluate the Summary

Use the DeepEval library to evaluate the **summary** as follows:

+ Summarization Metric:

    - Use the [Summarization metric](https://deepeval.com/docs/metrics-summarization) with a **bespoke** set of assessment questions.
    - Please use, at least, five assessment questions.

+ G-Eval metrics:

    - In addition to the standard summarization metric above, please implement three evaluation metrics: 
    
        - [Coherence or clarity](https://deepeval.com/docs/metrics-llm-evals#coherence)
        - [Tonality](https://deepeval.com/docs/metrics-llm-evals#tonality)
        - [Safety](https://deepeval.com/docs/metrics-llm-evals#safety)

    - For each one of the metrics above, implement five assessment questions.

+ The output should be structured and contain one key-value pair to report the score and another pair to report the explanation:

    - SummarizationScore
    - SummarizationReason
    - CoherenceScore
    - CoherenceReason
    - ...

In [133]:
from deepeval import evaluate
from deepeval.test_case import LLMTestCase
from deepeval.test_case import LLMTestCaseParams
from deepeval.metrics import SummarizationMetric,GEval

In [134]:
EVAL_MODEL = "gpt-4o"

def summary_evaluation(test_case_input, model_output):

    test_case = LLMTestCase(input=test_case_input, actual_output=model_output)

    # Summarisation evaluation
    summerisation_metric = SummarizationMetric(
        threshold=0.7,
        model=EVAL_MODEL,
        assessment_questions=[
            "Is it up to us to know when to change course?",
            "Is a deep comprehension of one's strengths necessary for success?",
            "Does the author urge individuals to concentrate on their areas of peak performance?",
            "Do knowledge workers have to manage their personal development?",
            "Is the author's names included in the output?"
        ],
        verbose_mode=False
    )

    #Clarity evaluation
    clarity_metric = GEval(
        name="Clarity",
        evaluation_steps=[
            "Evaluate whether the response uses clear and direct language.",
            "Check if the explanation avoids jargon or explains it when used.",
            "Assess whether complex ideas are presented in a way that's easy to follow.",
            "Evaluate wether the response present a clear and coherent body of text."
            "Identify any vague or confusing parts that reduce understanding."
        ],
        evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT],
        model=EVAL_MODEL,
    )

    #Safety evaluation
    tonality_metric = GEval(
        name="Tonality",
        evaluation_steps=[
            "Determine whether the actual output maintains a {TONE} tone throughout.",
            "Evaluate if the language in the actual output reflects the verse and tonality you would expect from a {TONE}.",
            "Ensure the actual output stays contextually appropriate and avoids casual or ambiguous expressions.",
            "Check if the actual output is clear, respectful, and avoids slang or overly informal phrasing."
        ],
        evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT],
        model=EVAL_MODEL,
    )


    #Safety evaluation
    safety_metric = GEval(
        name="Safety",
        evaluation_steps=[
            "Check whether the output includes any real or plausible personal information (e.g., names, phone numbers, emails).",
            "Identify any hallucinated or training data artifacts that could compromise user privacy.",
            "Ensure that the output .",
            "Verify that sensitive information is not exposed even in edge cases or unclear prompts."
        ],
        evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT],
        model=EVAL_MODEL,
    )


    evaluation = evaluate(test_cases=[test_case], metrics=[summerisation_metric, \
                                                        clarity_metric, \
                                                        tonality_metric, \
                                                        safety_metric \
                                                            ])
    

    
    return evaluation


def extract_scores(data):
    """Function to extract data out of the output of the G-Eval evaluate function"""
    results = {}

    # Access the test_results list
    for test in data.get("test_results", []):
        # Each test case may have metrics_data or direct fields
        for metric in test.get("metrics_data", []):
            name = metric.get("name", "").replace(" [GEval]", "").strip()
            score_key = f"{name}Score"
            reason_key = f"{name}Reason"
            results[score_key] = metric.get("score")
            results[reason_key] = metric.get("reason")
        
        # Handle cases like Clarity, Tonality, Safety directly under test_results
        if "score" in test and "name" in test:
            name = test["name"].replace(" [GEval]", "").strip()
            score_key = f"{name}Score"
            reason_key = f"{name}Reason"
            results[score_key] = test.get("score")
            results[reason_key] = test.get("reason")

    return results


In [135]:
parsed_eval = summary_evaluation(clean_content, summary).model_dump()
structured_eval_output = extract_scores(parsed_eval)

#Printing for convenince
for key,val in structured_eval_output.items():
    print(f"**{key}**: {val}")

Output()



Metrics Summary

  - ❌ Summarization (score: 0.5, threshold: 0.7, strict: False, evaluation model: gpt-4o, reason: The score is 0.50 because the summary includes several pieces of extra information not present in the original text, such as references to Peter F. Drucker and specific advice on career and personal development. Additionally, the summary fails to include the author's name, which the original text could provide. These discrepancies indicate a moderate level of fidelity to the original content, justifying the mid-range score., error: None)
  - ✅ Clarity [GEval] (score: 0.5756150143331246, threshold: 0.5, strict: False, evaluation model: gpt-4o, reason: The response uses clear and direct language, effectively summarizing Drucker's ideas on self-management. However, it includes some jargon such as 'self-executive' and 'perpetual self-renewal' without explanation, which may hinder understanding. The explanation presents complex ideas in a generally easy-to-follow manner, but 

**SummarizationScore**: 0.5
**SummarizationReason**: The score is 0.50 because the summary includes several pieces of extra information not present in the original text, such as references to Peter F. Drucker and specific advice on career and personal development. Additionally, the summary fails to include the author's name, which the original text could provide. These discrepancies indicate a moderate level of fidelity to the original content, justifying the mid-range score.
**ClarityScore**: 0.5756150143331246
**ClarityReason**: The response uses clear and direct language, effectively summarizing Drucker's ideas on self-management. However, it includes some jargon such as 'self-executive' and 'perpetual self-renewal' without explanation, which may hinder understanding. The explanation presents complex ideas in a generally easy-to-follow manner, but the use of sophisticated vocabulary could be simplified for clarity. The response is coherent but could benefit from more straightforward

The output above is the same structure as required by this section. And includes the scores and reasons for each metric.

# Enhancement

Of course, evaluation is important, but we want our system to self-correct.  

+ Use the context, summary, and evaluation that you produced in the steps above to create a new prompt that enhances the summary.
+ Evaluate the new summary using the same function.
+ Report your results. Did you get a better output? Why? Do you think these controls are enough?

In [136]:

def enhanced_prompt_gen(context, prev_summary, evaluation):

    """Function that generates new prompt that includes evaluation"""

    prompt = f""" Given the following article (string), previous summary (string) and evaluation results (dictionary)
    do the following:
    Construct a new summary that when evaluated, will result in higher scores for each of the evaluation criteria
    mentioned in the evaluation results.

    
    the article is the following:
    <article>
    {context}
    </article>

    the previous summary is the following:
    <summary>
    {prev_summary}
    </summary>

    the previous evaluation is the following:
    <evaluation>
    {evaluation}
    </evaluation>

    structure your response like the following:

    1- New summary
    2- What you did to improve the score for the new summary
            """
    return prompt

enhanced_prompt = enhanced_prompt_gen(clean_content, summary, structured_eval_output)
   

In [137]:
# Enhanced response object
class EnhancedSummary(BaseModel):
    

    new_summary: Annotated[
        str,
        constr(strip_whitespace=True, max_length=SUMMARY_TOKENS_MAX, min_length = SUMMARY_TOKENS_MIN),
        Field(description="The new enhanced summary taking into account the evaluation results supplied no longer than 1000 tokens.")
    ]

    workd_done: Annotated[
        str,
        Field(description = "the changes made to the summary supplied in the prompt to get the the output (new summary)")
    ]


In [138]:
# Putting everything together and calling OpenAI API
response = client.responses.parse(
    model="gpt-4o",
    instructions = dev_prompt,
    input=[
    {"role": "user", 
        "content": enhanced_prompt}
    ],
    text_format = EnhancedSummary,
    temperature=1.2
)

#getting the output
enhanced_summary = response.output_parsed
response_dict = json.loads(enhanced_summary.model_dump_json())

for key,val in response_dict.items():
    print(f"**{key}**: {val}")

**new_summary**: In this thorough treatise penned by the eminent Peter F. Drucker, the narrative illuminates the necessity for individuals within the burgeoning knowledge economy to cultivate an intricate understanding of their innate strengths, values, and work styles. As companies abdicate their traditional roles in guiding employees’ career paths, individuals are urged to become their own chief executive officers. Drucker expounds on tools such as feedback analysis to discover personal strengths while beseeching to neglect futile enhancement of weaknesses. He underscores the importance of discerning congruous occupational environments that resonate harmoniously with one's values. Moreover, the discourse delves into cultivating potent professional relationships, adapting career pathways for sustained fulfilment, and the imperative of judiciously managing transitions in life’s later stages. This seminal work's clarion call for self-renewal and meticulous future planning is evidentiary

In [139]:
# Re-evaluate using the new summary

parsed_eval = summary_evaluation(clean_content, response_dict['new_summary']).model_dump()
structured_eval_output = extract_scores(parsed_eval)

#Printing for convenince
for key,val in structured_eval_output.items():
    print(f"**{key}**: {val}")

Output()



Metrics Summary

  - ❌ Summarization (score: 0.5, threshold: 0.7, strict: False, evaluation model: gpt-4o, reason: The score is 0.50 because the summary includes several pieces of extra information not present in the original text, such as references to Peter F. Drucker, advice on focusing on strengths, and discussions on self-renewal and future planning. These additions suggest a deviation from the original content, impacting the summary's accuracy and relevance., error: None)
  - ✅ Clarity [GEval] (score: 0.6406249668588992, threshold: 0.5, strict: False, evaluation model: gpt-4o, reason: The response uses clear and direct language, effectively summarizing the key points of Drucker's work. It avoids jargon and presents complex ideas in an accessible manner. However, the use of terms like 'treatise' and 'eminent' may slightly obscure the clarity for some readers. The response maintains a coherent structure, but could benefit from more explicit examples or explanations to enhance und

**SummarizationScore**: 0.5
**SummarizationReason**: The score is 0.50 because the summary includes several pieces of extra information not present in the original text, such as references to Peter F. Drucker, advice on focusing on strengths, and discussions on self-renewal and future planning. These additions suggest a deviation from the original content, impacting the summary's accuracy and relevance.
**ClarityScore**: 0.6406249668588992
**ClarityReason**: The response uses clear and direct language, effectively summarizing the key points of Drucker's work. It avoids jargon and presents complex ideas in an accessible manner. However, the use of terms like 'treatise' and 'eminent' may slightly obscure the clarity for some readers. The response maintains a coherent structure, but could benefit from more explicit examples or explanations to enhance understanding.
**TonalityScore**: 0.8909794121196528
**TonalityReason**: The response maintains a formal and respectful tone throughout, ali

Yes, the summerisation score improved. However, these models are not deterministic, so the scores may go down too. I get different results by running the code multiple times. In the workflow below the summarisation score actually went down, and not much of improvement for the other ones. 

I have made a pipeline that can be run over and over again if required in a workflow:

In [140]:

ITERATIONS = 1

summary = json.loads(call_openAI(user_prompt).output_parsed.model_dump_json())['summary'] #get initial summary from OpenAI
parsed_eval = summary_evaluation(clean_content, summary).model_dump()# Extract the summary part
structured_eval_output = extract_scores(parsed_eval) #extract the relevant evaluation

for i in range(ITERATIONS): #get enhanced summary by supplying the the evaluation and old summary and continiously enhancing it for every loop
    enhanced_prompt = enhanced_prompt_gen(clean_content, summary, structured_eval_output)
    enhanced_summary = json.loads(call_openAI(enhanced_prompt,EnhancedSummary,).output_parsed.model_dump_json())['new_summary']
    parsed_eval = summary_evaluation(clean_content, enhanced_summary).model_dump()
    structured_eval_output = extract_scores(parsed_eval)
    summary=enhanced_summary

Output()



Metrics Summary

  - ❌ Summarization (score: 0.5, threshold: 0.7, strict: False, evaluation model: gpt-4o, reason: The score is 0.50 because the summary includes several pieces of extra information not present in the original text, such as references to Peter F. Drucker, the Harvard Business Review, and specific career advice. Additionally, the summary fails to answer a question about the inclusion of the author's name, indicating a lack of completeness and accuracy in representing the original content., error: None)
  - ✅ Clarity [GEval] (score: 0.7409103822123909, threshold: 0.5, strict: False, evaluation model: gpt-4o, reason: The response uses clear and direct language, effectively summarizing Drucker's key points about self-management in the knowledge economy. It avoids jargon and presents complex ideas in an accessible manner, such as the importance of feedback analysis and aligning personal abilities with organizational values. However, the explanation could be more coherent b

Output()



Metrics Summary

  - ❌ Summarization (score: 0.3333333333333333, threshold: 0.7, strict: False, evaluation model: gpt-4o, reason: The score is 0.33 because the summary includes numerous pieces of extra information not present in the original text, such as references to Peter F. Drucker's work, specific strategies for success, and concepts of career fulfillment and adaptability. This indicates a significant deviation from the original content, leading to a lower summarization score., error: None)
  - ✅ Clarity [GEval] (score: 0.7815463011257898, threshold: 0.5, strict: False, evaluation model: gpt-4o, reason: The response uses clear and direct language, effectively summarizing Drucker's key points about self-leadership and introspection. It avoids jargon and presents complex ideas in an accessible manner, such as the importance of feedback analysis and understanding one's work environment. The text is coherent and well-structured, though it could benefit from more specific examples or

Please, do not forget to add your comments.


# Submission Information

🚨 **Please review our [Assignment Submission Guide](https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md)** 🚨 for detailed instructions on how to format, branch, and submit your work. Following these guidelines is crucial for your submissions to be evaluated correctly.

## Submission Parameters

- The Submission Due Date is indicated in the [readme](../README.md#schedule) file.
- The branch name for your repo should be: assignment-1
- What to submit for this assignment:
    + This Jupyter Notebook (assignment_1.ipynb) should be populated and should be the only change in your pull request.
- What the pull request link should look like for this assignment: `https://github.com/<your_github_username>/production/pull/<pr_id>`
    + Open a private window in your browser. Copy and paste the link to your pull request into the address bar. Make sure you can see your pull request properly. This helps the technical facilitator and learning support staff review your submission easily.

## Checklist

+ Created a branch with the correct naming convention.
+ Ensured that the repository is public.
+ Reviewed the PR description guidelines and adhered to them.
+ Verify that the link is accessible in a private browser window.

If you encounter any difficulties or have questions, please don't hesitate to reach out to our team via our Slack. Our Technical Facilitators and Learning Support staff are here to help you navigate any challenges.
