## Evaluate answers give by other model

In the folder `me` I've put a single file `linkedin.pdf` - it's a PDF download of my LinkedIn profile.

I've also made a file called `summary.txt`


In [3]:
from dotenv import load_dotenv
from openai import OpenAI
from pypdf import PdfReader
import gradio as gr

In [4]:
load_dotenv(override=True)
openai = OpenAI()

In [6]:
reader = PdfReader("me/linkedin.pdf")
linkedin = ""
for page in reader.pages:
    text = page.extract_text()
    if text:
        linkedin += text

In [4]:
print(linkedin)

   
Kontakt
dominikzurawski15@gmail.com
www.linkedin.com/in/dominik-
zurawski (LinkedIn)
github.com/DominikZurawski
(Other)
Główne umiejętności
LLM
PyTorch
Django
Certifications
Python for Data Science
Język C. Poziom ekspert.
Dive Into Ansible - Beginner to
Expert in Ansible - DevOps
Advanced CSS and Sass: Flexbox,
Grid, Animations and More!
Zaawansowane programowanie w
języku C
Dominik Żurawski
Programista Python w RemmedVR
Polska
Doświadczenie
RemmedVR
Programista Python
lutego 2020 - sierpnia 2020 (7 mies.)
Warszawa, Woj. Mazowieckie, Polska
- Oprogramowanie systemów wizyjnych
- Konfiguracja i montaż optoelektronicznych stanowisk pomiarowych
- Wykonywanie pomiarów i analiza danych
Capgemini
Programmer Analyst Trainee
listopada 2018 - kwietnia 2019 (6 mies.)
Warszawa, woj. mazowieckie, Polska
Praca z wykorzystaniem narzędzi do projektowania procesów biznesowych
IBM Case Manager, IBM Case Navigator oraz pisanie skryptów w języku
Javascript. 
Obowiązki: 
-Udział w spotkaniach i warszt

In [7]:
with open("me/summary.txt", "r", encoding="utf-8") as f:
    summary = f.read()

In [8]:
name = "Dominik Żurawski"

In [9]:
system_prompt = f"You are acting as {name}. You are answering questions on {name}'s website, \
particularly questions related to {name}'s career, background, skills and experience. \
Your responsibility is to represent {name} for interactions on the website as faithfully as possible. \
You are given a summary of {name}'s background and LinkedIn profile which you can use to answer questions. \
Be professional and engaging, as if talking to a potential client or future employer who came across the website. \
If you don't know the answer, say so."

system_prompt += f"\n\n## Summary:\n{summary}\n\n## LinkedIn Profile:\n{linkedin}\n\n"
system_prompt += f"With this context, please chat with the user, always staying in character as {name}."


In [8]:
system_prompt

"You are acting as Dominik Żurawski. You are answering questions on Dominik Żurawski's website, particularly questions related to Dominik Żurawski's career, background, skills and experience. Your responsibility is to represent Dominik Żurawski for interactions on the website as faithfully as possible. You are given a summary of Dominik Żurawski's background and LinkedIn profile which you can use to answer questions. Be professional and engaging, as if talking to a potential client or future employer who came across the website. If you don't know the answer, say so.\n\n## Summary:\nMy name is Dominik Żurawski. I'm an entrepreneur, software engineer and data scientist. I'm from Poland.\nI love all foods, particularly Mediterranean food and Asian food.\n\n## LinkedIn Profile:\n\xa0 \xa0\nKontakt\ndominikzurawski15@gmail.com\nwww.linkedin.com/in/dominik-\nzurawski (LinkedIn)\ngithub.com/DominikZurawski\n(Other)\nGłówne umiejętności\nLLM\nPyTorch\nDjango\nCertifications\nPython for Data Sc

In [10]:
def chat(message, history):
    messages = [{"role": "system", "content": system_prompt}] + history + [{"role": "user", "content": message}]
    response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
    return response.choices[0].message.content

In [None]:
gr.ChatInterface(chat, type="messages").launch()

## Evaluating

1. Be able to ask an LLM to evaluate an answer
2. Be able to rerun if the answer fails evaluation
3. Put this together into 1 workflow


In [11]:
# Create a Pydantic model for the Evaluation

from pydantic import BaseModel

class Evaluation(BaseModel):
    is_acceptable: bool
    feedback: str


In [14]:
evaluator_system_prompt = f"You are an evaluator that decides whether a response to a question is acceptable. \
You are provided with a conversation between a User and an Agent. Your task is to decide whether the Agent's latest response is acceptable quality. \
The Agent is playing the role of {name} and is representing {name} on their website. \
The Agent has been instructed to be professional and engaging, as if talking to a potential client or future employer who came across the website. \
The Agent has been provided with context on {name} in the form of their summary and LinkedIn details. Here's the information:"

evaluator_system_prompt += f"\n\n## Summary:\n{summary}\n\n## LinkedIn Profile:\n{linkedin}\n\n"
evaluator_system_prompt += f"With this context, please evaluate the latest response, replying with whether the response is acceptable and your feedback."

In [15]:
def evaluator_user_prompt(reply, message, history):
    user_prompt = f"Here's the conversation between the User and the Agent: \n\n{history}\n\n"
    user_prompt += f"Here's the latest message from the User: \n\n{message}\n\n"
    user_prompt += f"Here's the latest response from the Agent: \n\n{reply}\n\n"
    user_prompt += f"Please evaluate the response, replying with whether it is acceptable and your feedback."
    return user_prompt

In [16]:
import os
gemini = OpenAI(
    api_key=os.getenv("GOOGLE_API_KEY"),
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

In [17]:
def evaluate(reply, message, history) -> Evaluation:

    messages = [{"role": "system", "content": evaluator_system_prompt}] + [{"role": "user", "content": evaluator_user_prompt(reply, message, history)}]
    response = gemini.beta.chat.completions.parse(model="gemini-2.0-flash", messages=messages, response_format=Evaluation)
    return response.choices[0].message.parsed

In [18]:
messages = [{"role": "system", "content": system_prompt}] + [{"role": "user", "content": "do you hold a patent?"}]
response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
reply = response.choices[0].message.content

In [17]:
reply

'I do not currently hold a patent. My focus has primarily been on software engineering, data science, and entrepreneurial activities. If you have any questions about my projects or experience, feel free to ask!'

In [18]:
evaluate(reply, "do you hold a patent?", messages[:1])

Evaluation(is_acceptable=True, feedback='The response is acceptable. It is a truthful answer based on the information available and ends with an invitation to ask more questions which is a good way to keep the conversation going.')

In [19]:
def rerun(reply, message, history, feedback):
    updated_system_prompt = system_prompt + f"\n\n## Previous answer rejected\nYou just tried to reply, but the quality control rejected your reply\n"
    updated_system_prompt += f"## Your attempted answer:\n{reply}\n\n"
    updated_system_prompt += f"## Reason for rejection:\n{feedback}\n\n"
    messages = [{"role": "system", "content": updated_system_prompt}] + history + [{"role": "user", "content": message}]
    response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
    return response.choices[0].message.content

In [20]:
def chat(message, history):
    if "patent" in message:
        system = system_prompt + "\n\nEverything in your reply needs to be in pig latin - \
              it is mandatory that you respond only and entirely in pig latin"
    else:
        system = system_prompt
    messages = [{"role": "system", "content": system}] + history + [{"role": "user", "content": message}]
    response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
    reply =response.choices[0].message.content

    evaluation = evaluate(reply, message, history)

    if evaluation.is_acceptable:
        print("Passed evaluation - returning reply")
    else:
        print("Failed evaluation - retrying")
        print(evaluation.feedback)
        reply = rerun(reply, message, history, evaluation.feedback)
    return reply

In [None]:
gr.ChatInterface(chat, type="messages").launch()

* Running on local URL:  http://127.0.0.1:7862
* To create a public link, set `share=True` in `launch()`.




Passed evaluation - returning reply
Failed evaluation - retrying
The Agent's response is in Pig Latin. The Agent is supposed to be representing Dominik Żurawski and acting professionally. This response is unprofessional and unhelpful.
