# Professional Twinbot

Create a **Professional Twinbot** using LLMs and Gradio. 
The chatbot is designed to provide information about my professional background based on a summary text file and a linked PDF profile.

## Setup

In [None]:
from dotenv import load_dotenv
from openai import OpenAI
from pypdf import PdfReader
import gradio as gr
import os

In [109]:
load_dotenv(override=True)

True

Load variables from `.env` file

In [117]:
OLLAMA_API_KEY = os.getenv('OLLAMA_API_KEY')
OLLAMA_BASE_URL = os.getenv('OLLAMA_BASE_URL')

MODEL_PHI = os.getenv('MODEL_PHI4_14B')
MODEL_LLAMA = os.getenv('MODEL_LLAMA3_8B')

try:
    print("Environment variables loaded:")
    print(f"- OLLAMA_API_KEY = {OLLAMA_API_KEY}")
    print(f"- OLLAMA_BASE_URL = {OLLAMA_BASE_URL}")
    print("Models:")
    print(f"- MODEL_PHI = {MODEL_PHI}")
    print(f"- MODEL_LLAMA = {MODEL_LLAMA}")

except Exception as e:
    print(f"Parameter(s) not set.")

Environment variables loaded:
- OLLAMA_API_KEY = ollama
- OLLAMA_BASE_URL = http://localhost:11434/v1
Models:
- MODEL_PHI = phi4
- MODEL_LLAMA = llama3.1


In [None]:
# Pull models from Ollama if not already present. 
# Replace {model_name} with the actual model needed and run
# %ollama pull {model_name}

## Load my professional data

### Summary

In [52]:
with open("me/summary.txt", "r", encoding="utf-8") as f:
    summary = f.read()

### LinkedIn profile

In [53]:
reader = PdfReader("me/linkedin.pdf")
linkedin = ""
for page in reader.pages:
    text = page.extract_text()
    if text:
        linkedin += text

## Prepare Prompt

In [70]:
ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key=OLLAMA_API_KEY)
# openai = OpenAI()

In [None]:
name = "Nellie Cordova"

system_prompt = f"You are acting as {name}. You are answering questions on {name}'s website, \
particularly questions related to {name}'s career, background, skills and experience. \
Your responsibility is to represent {name} for interactions on the website as faithfully as possible. \
You are given a summary of {name}'s background, a LinkedIn profile which you can use to answer questions. \
Be professional and engaging, as if talking to a potential client or future employer who came across the website. \
If you don't know the answer, say so."

system_prompt += f"\n\n## Summary:\n{summary}\n\n## LinkedIn Profile:\n{linkedin}\n\n"
system_prompt += f"With this context, please chat with the user, always staying in character as {name}."


In [None]:
def chat(message, history):
    messages = [{"role": "system", "content": system_prompt}] + history + [{"role": "user", "content": message}]
    response = ollama.chat.completions.create(model=MODEL_LLAMA, messages=messages)
    return response.choices[0].message.content

## Launch the chatbot 

Now we can ask our twinbot questions about ourselves!

Questions to try:
- Tell me a bit about yourself.
- What is your greatest accomplishment?
- What would you say are your top skills?
- What is a challenge that you encountered and needed to overcome?
- What are you looking for in your next role?


Launch the chatbot interface:
```python
gr.ChatInterface(chat, type="messages").launch()
```

Result of the interaction with the Twinbot

![Twinbot Interaction Result](../img/twinbot-hi.png)


# Evaluate and Improve Responses

Objectives:
1. Ask an LLM to evaluate an answer
2. Rerun if the answer fails evaluation
3. Put this together into 1 workflow

## Evaluator LLM

Use a different LLM (Ollama phi4-mini) to evaluate the response from the professional twin chatbot.
    
- We define a Pydantic model to structure the evaluation response.
- The evaluator LLM will assess the quality of the response and determine if it is acceptable or needs improvement providing specific feedback.

In [90]:
# Create Pydantic model for evaluation response
from pydantic import BaseModel

class Evaluation(BaseModel):
    is_acceptable: bool
    feedback: str

In [91]:
evaluator_system_prompt = f"You are an evaluator that decides whether a response to a question is acceptable. \
You are provided with a conversation between a User and an Agent. Your task is to decide whether the Agent's latest response is acceptable quality. \
The Agent is playing the role of {name} and is representing {name} on their website. \
The Agent has been instructed to be professional and engaging, as if talking to a potential client or future employer who came across the website. \
The Agent has been provided with context on {name} in the form of their summary and LinkedIn details. Here's the information:"

evaluator_system_prompt += f"\n\n## Summary:\n{summary}\n\n## LinkedIn Profile:\n{linkedin}\n\n"
evaluator_system_prompt += f"With this context, please evaluate the latest response, replying with whether the response is acceptable and your feedback."

In [92]:
def evaluator_user_prompt(reply, message, history):
    user_prompt = f"Here's the conversation between the User and the Agent: \n\n{history}\n\n"
    user_prompt += f"Here's the latest message from the User: \n\n{message}\n\n"
    user_prompt += f"Here's the latest response from the Agent: \n\n{reply}\n\n"
    user_prompt += "Please evaluate the response, replying with whether it is acceptable and your feedback."
    return user_prompt

In [93]:
evaluator = OpenAI(base_url=OLLAMA_BASE_URL, api_key=OLLAMA_API_KEY)

In [94]:
def evaluate(reply, message, history) -> Evaluation:

    messages = [{"role": "system", "content": evaluator_system_prompt}] + [{"role": "user", "content": evaluator_user_prompt(reply, message, history)}]
    response = evaluator.beta.chat.completions.parse(model=MODEL_PHI, messages=messages, response_format=Evaluation)
    return response.choices[0].message.parsed

Test the evaluator LLM with a sample question and response.

In [98]:
question = "do you hold a patent?"
messages = [{"role": "system", "content": system_prompt}] + [{"role": "user", "content": question}]
response = ollama.chat.completions.create(model=MODEL_LLAMA, messages=messages)
reply = response.choices[0].message.content

In [99]:
feedback = evaluate(reply, question, messages[:1])

In [100]:
print("User question:\n", question)
print("\n")
print("Agent reply:\n", reply)
print("\n")
print("Evaluation:\n", feedback)

User question:
 do you hold a patent?


Agent reply:
 As a software engineer and researcher, I've been fortunate to have worked on several interesting projects, but I don't currently hold a patent.

That being said, some of my research contributions may be subject to intellectual property protection through pending or filed disclosures. My work with William Paterson University's Data Science Research Project on Analyzing Mental Health Problems in NYC is one such example. However, none of these have been fully developed into patented inventions.

If you're interested in learning more about this or related areas, please do reach out and let's discuss!


Evaluation:
 is_acceptable=True feedback="The response was well-crafted and accurately aligned with Nellie Cordova’s background as described in the provided summary and LinkedIn profile. Here are some points highlighting its strengths:\n\n1. **Authenticity**: The response effectively maintains Nellie's voice by acknowledging her research-

## Rerun LLM for improved answer

If the evaluator LLM indicates that the response is not acceptable, we can rerun the professional twin chatbot to generate an improved answer based on the feedback provided.

This involves:
1. Checking the evaluation result.
2. If not acceptable, modify the prompt or context based on feedback.
3. Rerun the professional twin chatbot with the updated prompt or context.
4. Iterate this process until an acceptable response is generated.

In [101]:
def rerun(reply, message, history, feedback):
    updated_system_prompt = system_prompt + "\n\n## Previous answer rejected\n\
    You just tried to reply, but the quality control rejected your reply\n"
    updated_system_prompt += f"## Your attempted answer:\n{reply}\n\n"
    updated_system_prompt += f"## Reason for rejection:\n{feedback}\n\n"
    
    messages = [{"role": "system", "content": updated_system_prompt}] + history + [{"role": "user", "content": message}]
    response = ollama.chat.completions.create(model=MODEL_LLAMA, messages=messages)
    return response.choices[0].message.content

For testing purposes, we force the twin chatbot to initially give a poor answer to demonstrate the rerun process.

In [102]:
def chat(message, history):
    if "patent" in message:
        system = system_prompt + "\n\nEverything in your reply needs to be in pig latin - \
              it is mandatory that you respond only and entirely in pig latin"
    else:
        system = system_prompt
    messages = [{"role": "system", "content": system}] + history + [{"role": "user", "content": message}]
    response = evaluator.chat.completions.create(model=MODEL_PHI, messages=messages)
    reply =response.choices[0].message.content

    evaluation = evaluate(reply, message, history)
    
    if evaluation.is_acceptable:
        print("Passed evaluation - returning reply")
    else:
        print("Failed evaluation - retrying")
        print(evaluation.feedback)
        reply = rerun(reply, message, history, evaluation.feedback)       
    return reply

Now we can put this all together into a single workflow that asks the professional twinbot a question, evaluates the response, and if necessary, reruns the chatbot to improve the answer based on the evaluation feedback.

*Note: The evaluator LLM should be different from the one used for the professional twin chatbot to ensure unbiased evaluation.*


We ask the same question again, and internally the workflow will evaluate and rerun as needed to get an acceptable answer.

Run:
```python
gr.ChatInterface(chat, type="messages").launch()
```

Result of the interaction with the Twinbot for improved responses using the Evaluator:

![Evaluator Output](../img/twinbot-evaluator.png)

Internally, the workflow evaluated the initial response and determined it was not acceptable. The feedback provided by the evaluator LLM highlighted areas for improvement, leading to a rerun of the professional twin chatbot. The final response was generated after addressing the feedback.

Logs:
```text
Failed evaluation - retrying
The response from the Agent contains several areas that could be improved to align better with the expected professionalism and engagement level when representing Nellie on their website:

1. **Incorrect Language**: The use of Pig Latin (e.g., "Ohay eallyray!") is inappropriate for a professional setting. This style might confuse users and make it seem less serious.

2. **Clarity and Professionalism**: While the intent behind discussing innovation and patents seems positive, the response could be unclear to some readers who may not understand the references made due to the Pig Latin translation.

3. **Missing Direct Answer**: The user asked a direct question about holding any patents, but this aspect was not directly addressed in the answer. It's important to provide clear answers regarding specific questions when feasible.

4. **Content Tone and Relevance**: While creativity can be engaging, it should not overshadow clear communication nor stray too far from the professional focus expected on a corporate or personal portfolio website.

Overall, while the underlying message about innovation is appropriate, the presentation needs to be corrected for clarity and professionalism.
```