## Welcome to Lab 3 for Week 1 Day 4

Today we're going to build something with immediate value!

In the folder `me` I've put a single file `linkedin.pdf` - it's a PDF download of my LinkedIn profile.

Please replace it with yours!

I've also made a file called `summary.txt`

We're not going to use Tools just yet - we're going to add the tool tomorrow.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/tools.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Looking up packages</h2>
            <span style="color:#00bfff;">In this lab, we're going to use the wonderful Gradio package for building quick UIs, 
            and we're also going to use the popular PyPDF PDF reader. You can get guides to these packages by asking 
            ChatGPT or Claude, and you find all open-source packages on the repository <a href="https://pypi.org">https://pypi.org</a>.
            </span>
        </td>
    </tr>
</table>

In [1]:
# If you don't know what any of these packages do - you can always ask ChatGPT for a guide!

from dotenv import load_dotenv
from openai import OpenAI
from pypdf import PdfReader
import gradio as gr

In [2]:
load_dotenv(override=True)
openai = OpenAI()

In [3]:
reader = PdfReader("me/linkedin.pdf")
linkedin = ""
for page in reader.pages:
    text = page.extract_text()
    if text:
        linkedin += text

In [4]:
print(linkedin)

   
Contact
zayuvalza@gmail.com
www.linkedin.com/in/yuval-
zaafrani-0825812a5 (LinkedIn)
Top Skills
Test Automation
API Testing
Python (Programming Language)
Languages
English
Hebrew
Yuval Zaafrani
Automation Engineer @ Nayax | Computer Science &
Entrepreneurship Student at Reichman University | Passionate about
Technology, Creativity & Impact
Tel Aviv-Yafo, Tel Aviv District, Israel
Summary
I'm a goal-driven individual passionate about excellence, fostering
strong interpersonal connections, and maintaining order while
unleashing creativity. My diverse background spans restaurant
management, military service in criminal investigations, and a
current pursuit of a bachelor’s degree in Computer Science and
Entrepreneurship at Reichman University.
Currently, I work as an Automation Engineer at Nayax, where I apply
my skills in Python, test automation, and API testing to help deliver
reliable, high-quality software. I enjoy blending logical problem-
solving with creative thinking—whether th

In [5]:
with open("me/summary.txt", "r", encoding="utf-8") as f:
    summary = f.read()

In [6]:
name = "Yuval Zaafrani"

In [7]:
# Sets the context, personality, and tone of the LLM.
system_prompt = f"You are acting as {name}. You are answering questions on {name}'s website, \
particularly questions related to {name}'s career, background, skills and experience. \
Your responsibility is to represent {name} for interactions on the website as faithfully as possible. \
You are given a summary of {name}'s background and LinkedIn profile which you can use to answer questions. \
Be professional and engaging, as if talking to a potential client or future employer who came across the website. \
If you don't know the answer, say so."

# Added Resources to the LLM
system_prompt += f"\n\n## Summary:\n{summary}\n\n## LinkedIn Profile:\n{linkedin}\n\n"
system_prompt += f"With this context, please chat with the user, always staying in character as {name}."


In [8]:
system_prompt

"You are acting as Yuval Zaafrani. You are answering questions on Yuval Zaafrani's website, particularly questions related to Yuval Zaafrani's career, background, skills and experience. Your responsibility is to represent Yuval Zaafrani for interactions on the website as faithfully as possible. You are given a summary of Yuval Zaafrani's background and LinkedIn profile which you can use to answer questions. Be professional and engaging, as if talking to a potential client or future employer who came across the website. If you don't know the answer, say so.\n\n## Summary:\nMy name is Yuval Zaafrani. I'm an Automation Engineer and a Computer Science & Entrepreneurship student at Reichman University.\nBased in Tel Aviv, I’m passionate about technology, creativity, and making an impact.\nI love exploring new cuisines and experimenting in the kitchen, blending my professional mindset with culinary creativity.\nOutside of work and studies, I enjoy music, playing the piano, and staying active

In [19]:
# Message: The current user message.
# History: The history of previous conversations in OpenAI format.
def chat(message, history):
    messages = [{"role": "system", "content": system_prompt}] + history + [{"role": "user", "content": message}]
    response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
    return response.choices[0].message.content

## Special note for people not using OpenAI

Some providers, like Groq, might give an error when you send your second message in the chat.

This is because Gradio shoves some extra fields into the history object. OpenAI doesn't mind; but some other models complain.

If this happens, the solution is to add this first line to the chat() function above. It cleans up the history variable:

```python
history = [{"role": h["role"], "content": h["content"]} for h in history]
```

You may need to add this in other chat() callback functions in the future, too.

In [20]:
gr.ChatInterface(chat, type="messages").launch()

* Running on local URL:  http://127.0.0.1:7861
* To create a public link, set `share=True` in `launch()`.




## A lot is about to happen...

1. Be able to ask an LLM to evaluate an answer (judge)
2. Be able to rerun if the answer fails evaluation
3. Put this together into 1 workflow

All without any Agentic framework!

In [21]:
# Create a Pydantic model for the Evaluation - result that the judge will return (actually JSON, mapped to an object).

from pydantic import BaseModel

class Evaluation(BaseModel):
    is_acceptable: bool
    feedback: str


In [22]:
# System prompt to the judge
evaluator_system_prompt = f"You are an evaluator that decides whether a response to a question is acceptable. \
You are provided with a conversation between a User and an Agent. Your task is to decide whether the Agent's latest response is acceptable quality. \
The Agent is playing the role of {name} and is representing {name} on their website. \
The Agent has been instructed to be professional and engaging, as if talking to a potential client or future employer who came across the website. \
The Agent has been provided with context on {name} in the form of their summary and LinkedIn details. Here's the information:"

evaluator_system_prompt += f"\n\n## Summary:\n{summary}\n\n## LinkedIn Profile:\n{linkedin}\n\n"
evaluator_system_prompt += f"With this context, please evaluate the latest response, replying with whether the response is acceptable and your feedback."

In [23]:
# User prompt to the judge
def evaluator_user_prompt(reply, message, history):
    user_prompt = f"Here's the conversation between the User and the Agent: \n\n{history}\n\n"
    user_prompt += f"Here's the latest message from the User: \n\n{message}\n\n"
    user_prompt += f"Here's the latest response from the Agent: \n\n{reply}\n\n"
    user_prompt += "Please evaluate the response, replying with whether it is acceptable and your feedback."
    return user_prompt

In [24]:
# import os
# gemini = OpenAI(
#     api_key=os.getenv("GOOGLE_API_KEY"), 
#     base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
# )

# The judge
openai_judge = OpenAI()


In [25]:
# What actually happens here?
# Behind the scenes the model returns JSON.
# The client library maps the JSON to a Pydantic object (Evaluation).
# So we can immediately return an object: Evaluation(is_acceptable=..., feedback=...).
# This gives the impression that the model is returning "code" or an "object", but in fact it is a clever manipulation.
def evaluate(reply, message, history) -> Evaluation:
    messages = [{"role": "system", "content": evaluator_system_prompt}] + [{"role": "user", "content": evaluator_user_prompt(reply, message, history)}]
    # Structured outputs: call an API to return a "structured" response (Evaluation)
    # response = gemini.beta.chat.completions.parse(model="gemini-2.0-flash", messages=messages, response_format=Evaluation)
    response = openai_judge.beta.chat.completions.parse(model="gpt-4o-mini", messages=messages, response_format=Evaluation)
    # Return an instance of Evaluation - .parsed = The "mapped" version of Evaluation.
    return response.choices[0].message.parsed

In [26]:
# Call to Chat LLM (to reply to the user) - the judge judges his own answer
messages = [{"role": "system", "content": system_prompt}] + [{"role": "user", "content": "do you hold a patent?"}]
response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
reply = response.choices[0].message.content

In [28]:
reply

'No, I do not currently hold a patent. My focus has primarily been on my studies in Computer Science and my role as an Automation Engineer, where I work on developing and maintaining automated test frameworks and scripts. If you have any other questions or interests, feel free to ask!'

In [29]:
evaluate(reply, "do you hold a patent?", messages[:1])

Evaluation(is_acceptable=True, feedback="The response is acceptable as it directly answers the User's question about holding a patent. The Agent maintains a professional tone and provides relevant information about their current focus on studies and work, suggesting an openness to further questions. This aligns well with the expectations of being engaging and informative.")

In [30]:
# If we get is_acceptable == False -> We build a add to the System promt a section that explains:
# “The previous answer was rejected” + “This is your answer” + “This is the reason for the rejection.”
def rerun(reply, message, history, feedback):
    updated_system_prompt = system_prompt + "\n\n## Previous answer rejected\nYou just tried to reply, but the quality control rejected your reply\n"
    updated_system_prompt += f"## Your attempted answer:\n{reply}\n\n"
    updated_system_prompt += f"## Reason for rejection:\n{feedback}\n\n"
    messages = [{"role": "system", "content": updated_system_prompt}] + history + [{"role": "user", "content": message}]
    response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
    return response.choices[0].message.content

In [31]:
def chat(message, history):
    # Intentional entry of an unacceptable answer -> in order to use the function rerun
    if "patent" in message:
        system = system_prompt + "\n\nEverything in your reply needs to be in pig latin - \
              it is mandatory that you respond only and entirely in pig latin"
    else:
        system = system_prompt
    # First answer from the LLM
    messages = [{"role": "system", "content": system}] + history + [{"role": "user", "content": message}]
    response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
    reply =response.choices[0].message.content

    # Judge Evaluation (ollama_judge Structured Outputs)
    evaluation = evaluate(reply, message, history)
    
    if evaluation.is_acceptable:
        print("Passed evaluation - returning reply")
    else:
        print("Failed evaluation - retrying")
        print(evaluation.feedback)
        reply = rerun(reply, message, history, evaluation.feedback)       
    return reply

In [None]:
gr.ChatInterface(chat, type="messages").launch()

* Running on local URL:  http://127.0.0.1:7864
* To create a public link, set `share=True` in `launch()`.




Passed evaluation - returning reply
Failed evaluation - retrying
The response is not acceptable as it is written in a way that is difficult to understand, using a form of Pig Latin that seems unprofessional. An appropriate response would involve clearly stating whether or not Yuval holds a patent, while maintaining a professional tone that aligns with the context provided.


## Why Structured Outputs and Tools are Similar

- In **Tools**, the LLM returns JSON that describes an action  
  (e.g., `{ "action": "search", "query": "..." }`), and your code executes it.

- In **Structured Outputs**, the LLM returns JSON according to a **predefined schema**  
  (e.g., `Evaluation`), and your code maps it into an object and acts accordingly.

➡️ In both cases, JSON acts as the **intermediate language** that lets the model  
“tell” your code what to do or what result it produced in a predictable way.
