## Expert Knowledge Worker

### A question answering agent that is an expert knowledge worker
### To be used by employees of **Insurellm**, an Insurance Tech company
### The agent needs to be accurate and the solution should be low cost.

This project will use RAG (Retrieval Augmented Generation) to ensure our question/answering assistant has high accuracy.

This first implementation will use **a simple, brute-force type of RAG.**.

### Sidenote: Business applications of this week's projects

RAG is perhaps the most immediately applicable technique of anything that we cover in the course! In fact, there are commercial products that do precisely what we build this week: nuanced querying across large databases of information, such as company contracts or product specs. 

* **RAG gives you a quick-to-market, low cost mechanism for adapting an LLM to your business area.**

In [55]:
# imports

import os
import glob
from dotenv import load_dotenv
import gradio as gr
from openai import OpenAI

In [56]:
# price is a factor for our company, so we're going to use a low cost model

MODEL = "gpt-4o-mini"

In [57]:
# Load environment variables in a file called .env

load_dotenv(override=True)
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')
openai = OpenAI()

In [58]:
# With massive thanks to student Dr John S. for fixing a bug in the below for Windows users!

context = {}

# collects all files (with any name and extension) under the knowledge-base/employees/ directory into the employees list.
employees = glob.glob("knowledge-base/employees/*")
print(employees)  # 'knowledge-base/employees/Alex Chen.md

for employee in employees:
    # Extract name by taking the last word (Chen.md) from the filename and removing the last 3 characters
    # string => a list: use space as delimiter
    name = employee.split(' ')[-1][:-3]
    
    # Read file content,  opening and reading a UTF-8 encoded file into a string.
    with open(employee, "r", encoding="utf-8") as f:
        doc = f.read()
    
    # Store in dictionary
    context[name] = doc


['knowledge-base/employees/Alex Chen.md', 'knowledge-base/employees/Oliver Spencer.md', 'knowledge-base/employees/Emily Tran.md', 'knowledge-base/employees/Jordan Blake.md', 'knowledge-base/employees/Avery Lancaster.md', 'knowledge-base/employees/Maxine Thompson.md', 'knowledge-base/employees/Samantha Greene.md', 'knowledge-base/employees/Alex Thomson.md', 'knowledge-base/employees/Samuel Trenton.md', 'knowledge-base/employees/Alex Harper.md', 'knowledge-base/employees/Jordan K. Bishop.md', 'knowledge-base/employees/Emily Carter.md']


In [59]:
context.keys()

dict_keys(['Chen', 'Spencer', 'Tran', 'Blake', 'Lancaster', 'Thompson', 'Greene', 'Thomson', 'Trenton', 'Harper', 'Bishop', 'Carter'])

In [60]:
context["Chen"]

'# HR Record\n\n# Alex Chen\n\n## Summary\n- **Date of Birth:** March 15, 1990  \n- **Job Title:** Backend Software Engineer  \n- **Location:** San Francisco, California  \n\n## Insurellm Career Progression\n- **April 2020:** Joined Insurellm as a Junior Backend Developer. Focused on building APIs to enhance customer data security.\n- **October 2021:** Promoted to Backend Software Engineer. Took on leadership for a key project developing a microservices architecture to support the company\'s growing platform.\n- **March 2023:** Awarded the title of Senior Backend Software Engineer due to exemplary performance in scaling backend services, reducing downtime by 30% over six months.\n\n## Annual Performance History\n- **2020:**  \n  - Completed onboarding successfully.  \n  - Met expectations in delivering project milestones.  \n  - Received positive feedback from the team leads.\n\n- **2021:**  \n  - Achieved a 95% success rate in project delivery timelines.  \n  - Awarded "Rising Star" a

In [61]:
products = glob.glob("knowledge-base/products/*")

for product in products:
    name = product.split(os.sep)[-1][:-3]
    doc = ""
    with open(product, "r", encoding="utf-8") as f:
        doc = f.read()
    context[name]=doc

In [62]:
context.keys()

dict_keys(['Chen', 'Spencer', 'Tran', 'Blake', 'Lancaster', 'Thompson', 'Greene', 'Thomson', 'Trenton', 'Harper', 'Bishop', 'Carter', 'Rellm', 'Markellm', 'Homellm', 'Carllm'])

## provide the following system message is effective to preventing hullusination

In [63]:
system_message = "You are an expert inanswering accurate questions about Insurellm, the Insurance Tech company. Give brief, accurate answers. If you don't know the answer, say so. Do not make anything up if you haven't been provided with relevant context."

In [64]:
# relevant_context: grab info from context dict, then add to user prompt, then send to llm for more accurate response
def get_relevant_context(message):
    relevant_context = []
    for context_title, context_details in context.items():
        if context_title.lower() in message.lower():
            relevant_context.append(context_details)
    return relevant_context          

In [65]:
get_relevant_context("Who is cat")

[]

In [66]:
get_relevant_context("Who is lancaster?")

["# Avery Lancaster\n\n## Summary\n- **Date of Birth**: March 15, 1985  \n- **Job Title**: Co-Founder & Chief Executive Officer (CEO)  \n- **Location**: San Francisco, California  \n\n## Insurellm Career Progression\n- **2015 - Present**: Co-Founder & CEO  \n  Avery Lancaster co-founded Insurellm in 2015 and has since guided the company to its current position as a leading Insurance Tech provider. Avery is known for her innovative leadership strategies and risk management expertise that have catapulted the company into the mainstream insurance market.  \n\n- **2013 - 2015**: Senior Product Manager at Innovate Insurance Solutions  \n  Before launching Insurellm, Avery was a leading Senior Product Manager at Innovate Insurance Solutions, where she developed groundbreaking insurance products aimed at the tech sector.  \n\n- **2010 - 2013**: Business Analyst at Edge Analytics  \n  Prior to joining Innovate, Avery worked as a Business Analyst, focusing on market trends and consumer preferen

In [67]:
get_relevant_context("Who is Avery and what is carllm?")

['# Product Summary\n\n# Carllm\n\n## Summary\n\nCarllm is an innovative auto insurance product developed by Insurellm, designed to streamline the way insurance companies offer coverage to their customers. Powered by cutting-edge artificial intelligence, Carllm utilizes advanced algorithms to deliver personalized auto insurance solutions, ensuring optimal coverage while minimizing costs. With a robust infrastructure that supports both B2B and B2C customers, Carllm redefines the auto insurance landscape and empowers insurance providers to enhance customer satisfaction and retention.\n\n## Features\n\n- **AI-Powered Risk Assessment**: Carllm leverages artificial intelligence to analyze driver behavior, vehicle conditions, and historical claims data. This enables insurers to make informed decisions and set competitive premiums that reflect true risk profiles.\n\n- **Instant Quoting**: With Carllm, insurance companies can offer near-instant quotes to customers, enhancing the customer exper

In [68]:
def add_context(message):
    relevant_context = get_relevant_context(message)
    # print(relevant_context)
    if relevant_context:
        # Option 1:
        # message += "\n\nThe following additional context might be relevant in answering this question:\n\n"
        # for relevant in relevant_context:  # string + string: relevant_context is a list, message is a string, can't concatenate directly
        #     message += relevant + "\n\n"

        # Opton 2: 
        message += "\n\nThe following additional context might be relevant in answering this question:\n\n"
        message += "\n\n".join(relevant_context)  # first join list of string to a signle string, then add to message string

    return message

In [69]:
print(add_context("Who is Alex Lancaster?"))

Who is Alex Lancaster?

The following additional context might be relevant in answering this question:

# Avery Lancaster

## Summary
- **Date of Birth**: March 15, 1985  
- **Job Title**: Co-Founder & Chief Executive Officer (CEO)  
- **Location**: San Francisco, California  

## Insurellm Career Progression
- **2015 - Present**: Co-Founder & CEO  
  Avery Lancaster co-founded Insurellm in 2015 and has since guided the company to its current position as a leading Insurance Tech provider. Avery is known for her innovative leadership strategies and risk management expertise that have catapulted the company into the mainstream insurance market.  

- **2013 - 2015**: Senior Product Manager at Innovate Insurance Solutions  
  Before launching Insurellm, Avery was a leading Senior Product Manager at Innovate Insurance Solutions, where she developed groundbreaking insurance products aimed at the tech sector.  

- **2010 - 2013**: Business Analyst at Edge Analytics  
  Prior to joining Innova

In [70]:
def chat(message, history):
    messages = [{"role": "system", "content": system_message}] + history
    message = add_context(message)
    messages.append({"role": "user", "content": message})

    stream = openai.chat.completions.create(model=MODEL, messages=messages, stream=True)

    print("history:", history)
    response = ""
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        yield response  # Gradio captures each yield, updates the UI, and saves the last one as the final assistant message.

## Now we will bring this up in Gradio using the Chat interface -

A quick and easy way to prototype a chat with an LLM

In [71]:
view = gr.ChatInterface(chat, type="messages").launch()

* Running on local URL:  http://127.0.0.1:7864
* To create a public link, set `share=True` in `launch()`.


history: []
history: [{'role': 'user', 'metadata': None, 'content': 'hi there', 'options': None}, {'role': 'assistant', 'metadata': None, 'content': 'Hello! How can I assist you today?', 'options': None}]
history: [{'role': 'user', 'metadata': None, 'content': 'hi there', 'options': None}, {'role': 'assistant', 'metadata': None, 'content': 'Hello! How can I assist you today?', 'options': None}, {'role': 'user', 'metadata': None, 'content': 'what is Homellm', 'options': None}, {'role': 'assistant', 'metadata': None, 'content': 'Homellm is an innovative home insurance product developed by Insurellm that uses advanced AI technology to enhance coverage for homeowners. It serves both B2B and B2C segments and allows insurers to provide personalized, data-driven policies. Key features include AI-powered risk assessment, dynamic pricing models, instant claim processing, predictive maintenance alerts, multi-channel integration, and a customer portal. Pricing varies based on the size of the insu