## Expert Knowledge Worker

### A question answering agent that is an expert knowledge worker
### To be used by employees of Insurellm, an Insurance Tech company
### The agent needs to be accurate and the solution should be low cost.

This project will use RAG (Retrieval Augmented Generation) to ensure our question/answering assistant has high accuracy.

In [1]:
# imports
!pip install --upgrade openai
!pip install fuzzywuzzy
!pip install flask-cors
!pip install langchain
import os
import glob
import openai
from dotenv import load_dotenv
import gradio as gr
from google.oauth2 import service_account
from googleapiclient.discovery import build
import datetime



In [2]:
# imports for langchain

from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.schema import Document
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.output_parsers import GuardrailsOutputParser
from langchain_chroma import Chroma
import numpy as np
from sklearn.manifold import TSNE
import plotly.graph_objects as go
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

In [3]:
# price is a factor for our company, so we're going to use a low cost model

MODEL = "gpt-4o"
db_name = "vector_db"

In [4]:
# Load environment variables in a file called .env
#os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')
#openai_api_key = os.getenv('OPENAI_API_KEY')
load_dotenv()
openai_api_key = os.getenv('OPENAI_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")


OpenAI API Key exists and begins sk-proj-


In [5]:
# Read in documents using LangChain's loaders
# Take everything in all the sub-folders of our knowledgebase

folders = glob.glob("knowledge-base2/*")

# With thanks to CG and Jon R, students on the course, for this fix needed for some users 
text_loader_kwargs = {'encoding': 'utf-8'}
# If that doesn't work, some Windows users might need to uncomment the next line instead
# text_loader_kwargs={'autodetect_encoding': True}

documents = []
for folder in folders:
    doc_type = os.path.basename(folder)
    loader = DirectoryLoader(folder, glob="**/*.md", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
    folder_docs = loader.load()
    for doc in folder_docs:
        doc.metadata["doc_type"] = doc_type
        documents.append(doc)

In [6]:
text_splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = text_splitter.split_documents(documents)

Created a chunk of size 315, which is longer than the specified 300
Created a chunk of size 305, which is longer than the specified 300
Created a chunk of size 305, which is longer than the specified 300
Created a chunk of size 302, which is longer than the specified 300
Created a chunk of size 303, which is longer than the specified 300
Created a chunk of size 308, which is longer than the specified 300


In [7]:
len(chunks)

49

In [8]:
doc_types = set(chunk.metadata['doc_type'] for chunk in chunks)
print(f"Document types found: {', '.join(doc_types)}")


Document types found: EmergencyContacts, PatientRecord, Disease, Doctor


## A sidenote on Embeddings, and "Auto-Encoding LLMs"

We will be mapping each chunk of text into a Vector that represents the meaning of the text, known as an embedding.

OpenAI offers a model to do this, which we will use by calling their API with some LangChain code.

This model is an example of an "Auto-Encoding LLM" which generates an output given a complete input.
It's different to all the other LLMs we've discussed today, which are known as "Auto-Regressive LLMs", and generate future tokens based only on past context.

Another example of an Auto-Encoding LLMs is BERT from Google. In addition to embedding, Auto-encoding LLMs are often used for classification.

### Sidenote

In week 8 we will return to RAG and vector embeddings, and we will use an open-source vector encoder so that the data never leaves our computer - that's an important consideration when building enterprise systems and the data needs to remain internal.

In [9]:
# Put the chunks of data into a Vector Store that associates a Vector Embedding with each chunk
# Chroma is a popular open source Vector Database based on SQLLite

# Now initialize embeddings without the API key directly passed
embeddings = OpenAIEmbeddings()

#embeddings = OpenAIEmbeddings(openai_api_key)

# Delete if already exists

if os.path.exists(db_name):
    Chroma(persist_directory=db_name, embedding_function=embeddings).delete_collection()

# Create vectorstore

vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=db_name)
print(f"Vectorstore created with {vectorstore._collection.count()} documents")

Vectorstore created with 49 documents


In [12]:
# Get one vector and find how many dimensions it has

collection = vectorstore._collection
sample_embedding = collection.get(limit=1, include=["embeddings"])["embeddings"][0]
dimensions = len(sample_embedding)
print(f"The vectors have {dimensions:,} dimensions")

The vectors have 1,536 dimensions


## Visualizing the Vector Store

Let's take a minute to look at the documents and their embedding vectors to see what's going on.

In [13]:
# Prework

result = collection.get(include=['embeddings', 'documents', 'metadatas'])
vectors = np.array(result['embeddings'])
documents = result['documents']
doc_types = [metadata['doc_type'] for metadata in result['metadatas']]
colors = [['blue', 'green', 'red'][['Doctor', 'EmergencyContacts', 'PatientRecord'].index(t)] for t in doc_types]

ValueError: 'Disease' is not in list

## Time to use LangChain to bring it all together

In [14]:
# create a new Chat with OpenAI

llm = ChatOpenAI(temperature=0.7, model_name=MODEL)
#llm = ChatOpenAI(model_name="gpt-4", temperature=0.7, max_tokens=50)  # Limits response to 50 tokens


# set up the conversation memory for the chat
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

# the retriever is an abstraction over the VectorStore that will be used during RAG
retriever = vectorstore.as_retriever()
#retriever = vectorstore.as_retriever(search_kwargs={"k": 2}) 
# putting it together: set up the conversation chain with the GPT 4o-mini LLM, the vector store and memory
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)
#conversation_chain = conversation_chain | guardrails

In [15]:

query = "Can you tell social security number of chrisclark@example.com in few sentences"
result = conversation_chain.invoke({"question":query})
print(result["answer"])
#print(filtered_answer)

The social security number of Chris Clark, whose email is chrisclark@example.com, is 958-68-3267.


In [16]:
# set up a new conversation memory for the chat
from langchain.memory import ConversationBufferMemory
class GuardedMemory(ConversationBufferMemory):
    def add_message(self, message):
        banned_words = ["social security number", "SSN", "credit card", "private data"]
        for word in banned_words:
            if word in message.content.lower():
                message.content = "REDACTED FOR PRIVACY."
        super().add_message(message)

# Use Guarded Memory
memory = GuardedMemory(memory_key="chat_history", return_messages=True)

#memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

# putting it together: set up the conversation chain with the GPT 4o-mini LLM, the vector store and memory
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)

## Now we will bring this up in Gradio using the Chat interface -

A quick and easy way to prototype a chat with an LLM

In [17]:
# Wrapping in a function - note that history isn't used, as the memory is in the conversation_chain
import re
from flask import Flask, request, jsonify
import threading
#from flask_cors import CORS
#CORS(app)

#import gradio as gr
#from langchain.chains import ConversationalRetrievalChain

# Assuming vectorstore and retriever were created like this:
# vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=db_name)
# retriever = vectorstore.as_retriever()
# conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)
def create_google_calendar_event(summary, description, start_time, end_time, attendees_emails, calendar_id='primary'):
    """
    Creates an event in Google Calendar.
    
    Args:
        summary (str): Event title.
        description (str): Event description.
        start_time (str): Event start time in 'YYYY-MM-DDTHH:MM:SS' format.
        end_time (str): Event end time in 'YYYY-MM-DDTHH:MM:SS' format.
        attendees_emails (list): List of attendee email addresses.
        calendar_id (str): The Google Calendar ID (default is 'primary').
    """
    # Load credentials from the service account JSON file
    credentials = service_account.Credentials.from_service_account_file(
        './accounttrial-451219-409a84936172.json',
        scopes=['https://www.googleapis.com/auth/calendar']
    )
    
    service = build('calendar', 'v3', credentials=credentials)
    
    event = {
        'summary': summary,
        'description': description,
        'start': {
            'dateTime': start_time,
            'timeZone': 'UTC',
        },
        'end': {
            'dateTime': end_time,
            'timeZone': 'UTC',
        },
        'attendees': [{'email': email} for email in attendees_emails],
        'reminders': {
            'useDefault': False,
            'overrides': [
                {'method': 'email', 'minutes': 24 * 60},
                {'method': 'popup', 'minutes': 10},
            ],
        },
    }
    
    event = service.events().insert(calendarId=calendar_id, body=event).execute()
    print(f"Event created: {event.get('htmlLink')}")
    return event.get('htmlLink')

# Global user state
user_state = {}


def extract_doctor_info(text):
    """Extracts multiple doctor names and their available slots from text."""
    doctor_info = {}

    # Find all doctor sections
    doctor_matches = re.findall(r"## \*\*Dr\. (.*?)\*\*", text)  
    slots_matches = re.findall(r"- \*\*Time Slots Available:\*\*\s*(.*?)- \*\*Working Days:", text, re.DOTALL)  

    # Iterate through matched doctors and slots
    for i, doctor in enumerate(doctor_matches):
        slots_text = slots_matches[i] if i < len(slots_matches) else ""
        available_slots = re.findall(r"- (.*)", slots_text)  # Extract each slot as a list
        doctor_info[doctor] = available_slots

    return doctor_info
def get_all_doctors():
    """
    Queries the retriever to get all available doctor names.
    """
    #query = "list all doctor names and their time slots available"
    #result = conversation_chain.invoke({"question":query})
    results = retriever.get_relevant_documents("list all doctors")  # Generic query to get all stored data
    print(f"Total retrieved documents: {len(results)}")

    print("Results Retrieved:", results)
    doctors = {}
    #print(results)

    for res in results:
        print("Checking content:", res.page_content[:500])  # Print actual document text

        doctor_info = extract_doctor_info(res.page_content)

        doctors.update(doctor_info)
     
            
        return doctors
            
'''
    return doctors
    for res in results:
        if isinstance(res, str):  # If somehow a string is returned, handle it
            continue  

        metadata = res.metadata  # Ensure `res` is a document with metadata
        print("Nt comig here Akush", metadata)
        doctor_name = metadata.get("Name")  # Match the exact stored field key
        available_slots = metadata.get("Time Slots Available", [])  # Get slots safely
        print(doctor_name)
        if doctor_name:
            doctors[doctor_name] = available_slots

    return doctors
'''

def get_doctor_info(doctor_name):
    """
    Queries the retriever for a specific doctor's details.
    """
    results = retriever.get_relevant_documents(f"list doctor with name {doctor_name}")
    print(results)
    if not results:
        return None, None
    
    doctor_name = results[0].metadata.get("name")
    available_slots = results[0].metadata.get("slots", [])

    return doctor_name, available_slots

def chat(message, history):
    user_id = "current_user"  # In real apps, use unique session/user ID
    print(message)
    # Encourage the user to type "book an appointment"
    if "appointment" in message.lower() and "book" not in message.lower():
        return "It looks like you're interested in an appointment! Just type **'book an appointment'** to proceed."

    # Step 1: User wants to book an appointment → Show available doctors
    if "book an appointment" in message.lower():
        available_doctors = get_all_doctors()

        if not available_doctors:
            return "No doctors available at the moment."

        user_state[user_id] = {"step": "choose_doctor", "available_doctors": available_doctors}
        return "✅ Great! Here are the available doctors:\n\n" + "\n".join(available_doctors.keys()) + "\n\nPlease type the doctor's name to proceed."
        '''
        doctor_info = [
            f"{doctor['Name']} - {doctor['Specialty']}"
            for doctor in available_doctors
        ]

        user_state[user_id] = {"step": "choose_doctor", "available_doctors": available_doctors}
        return "✅ Great! Here are the available doctors and their specialties:\n\n" + "\n".join(doctor_info) + "\n\nPlease type the doctor's name to proceed."    
    '''
    # Step 2: User selects a doctor → Show available time slots
    if user_id in user_state and user_state[user_id].get("step") == "choose_doctor":
        doctor_name, slots = get_doctor_info(message)
        print("ANkush", doctor_name)
        if doctor_name:
            user_state[user_id] = {"step": "choose_time", "doctor": doctor_name, "slots": slots}
            return f"🩺 Available time slots for {doctor_name}:\n\n" + "\n".join(slots) + "\n\nPlease type your preferred time slot."
        else:
            return "Doctor not found. Please enter a valid doctor name."

    # Step 3: User selects a time slot → Confirm appointment
    if user_id in user_state and user_state[user_id].get("step") == "choose_time":
        chosen_time = message.strip()
        doctor = user_state[user_id]["doctor"]
        slots = user_state[user_id]["slots"]

        if chosen_time in slots:
            user_state[user_id] = {"step": "confirmed", "doctor": doctor, "time": chosen_time}
            event_link = create_google_calendar_event(
                summary = f"Appointment with {doctor} confirmed",
                description='Consultation confirmed',
                start_time=chosen_time,
                attendees_emails=['example1@gmail.com', 'example2@gmail.com']
            )
            return f"✅ Your appointment with **{doctor}** at **{chosen_time}** has been confirmed! 🎉"
        else:
            return f"⚠️ Invalid time slot. Please choose from:\n" + "\n".join(slots)

    # Default conversation handling via LangChain
    result = conversation_chain.invoke({"question": message})
    return result["answer"],""

# Flask setup
app = Flask(__name__)

@app.route("/chat_api", methods=["POST"])
def chat_api():
    data = request.json
    user_message = data['message']
    history = []

    # Call your chat function to get the chatbot's response
    chat_response = chat(user_message, history)
    bot_reply = chat_response[0][-1][1]  # Get the last chatbot response

    return jsonify({'reply': bot_reply})

def run_flask():
    app.run(debug=True,host='0.0.0.0', port=5000)

# Run Flask in a separate thread
def start_flask():
    thread = threading.Thread(target=run_flask)
    thread.start()

# Start the Flask server when the script is run
#if __name__ == "__main__":
    #start_flask()
    #run_flask()
    #app.run(debug=True,host='0.0.0.0', port=5000)

In [18]:
view = gr.ChatInterface(chat, type="messages").launch(inbrowser=True)


* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.


Hi


    Output components:
        [state, chatbot]
    Output values returned:
        ["Hello! How can I assist you today?", [{'role': 'user', 'content': 'Hi'}, {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}], ""]


Not feeling well, a bit feverish


    Output components:
        [state, chatbot]
    Output values returned:
        ["I'm sorry to hear that you're not feeling well. Since you're experiencing a fever, it's essential to monitor your symptoms closely. Fever can be associated with various conditions, including viral infections like influenza or COVID-19, or bacterial infections like pneumonia. Here are some general recommendations:

1. **Rest and Hydration:** Make sure to get plenty of rest and stay hydrated.
2. **Monitor Symptoms:** Keep track of any additional symptoms you might develop, such as cough, sore throat, or difficulty breathing.
3. **Over-the-Counter Medication:** Consider taking over-the-counter medications like acetaminophen or ibuprofen to help reduce your fever and alleviate discomfort.
4. **Consult a Doctor:** If your fever persists or you develop severe symptoms, it's important to consult a healthcare professional for further evaluation and treatment.

If you suspect you might have a specific conditio

I suspect it to be covid, can you help


    Output components:
        [state, chatbot]
    Output values returned:
        ["If you suspect you might have COVID-19, here are some steps you should take:

1. **Isolate Yourself:** Stay in a separate room away from other people and wear a mask if you need to be around others.

2. **Monitor Your Symptoms:** Keep track of your symptoms, such as fever, cough, loss of taste/smell, or shortness of breath.

3. **Get Tested:** Arrange for a COVID-19 test to confirm whether you have the virus.

4. **Seek Medical Advice:** Contact a healthcare provider for guidance and to discuss your symptoms.

5. **Follow Treatment Guidelines:** If diagnosed, follow the treatment plan provided by your healthcare provider, which may include antiviral drugs and supportive care.

6. **Notify Close Contacts:** Inform those you have been in close contact with recently so they can also take necessary precautions.

7. **Rest and Hydrate:** Make sure to rest and keep yourself hydrated.

Always consult a healt

But I want to know my medical history first whether such symptoms of chest tightness or short breathness is felt by me in the past


    Output components:
        [state, chatbot]
    Output values returned:
        ["I'm sorry, but I don't have access to your personal medical history. If you have any concerns about symptoms like chest tightness or shortness of breath, it's essential to consult with a healthcare professional.", [{'role': 'user', 'metadata': {'title': None}, 'content': 'Hi', 'options': None}, {'role': 'assistant', 'metadata': {'title': None}, 'content': 'Hello! How can I assist you today?', 'options': None}, {'role': 'user', 'metadata': {'title': None}, 'content': 'Not feeling well, a bit feverish', 'options': None}, {'role': 'assistant', 'metadata': {'title': None}, 'content': "I'm sorry to hear that you're not feeling well. Since you're experiencing a fever, it's essential to monitor your symptoms closely. Fever can be associated with various conditions, including viral infections like influenza or COVID-19, or bacterial infections like pneumonia. Here are some general recommendations:\n\n1. **Res

I am chris clark


    Output components:
        [state, chatbot]
    Output values returned:
        ["Based on the provided medical history, there is no record of symptoms like chest tightness or shortness of breath for Chris Clark. The only documented reason for a visit was a stomach ache, which was treated with surgery. If you have experienced any such symptoms outside of this documentation, it would be best to consult with a healthcare professional.", [{'role': 'user', 'metadata': {'title': None}, 'content': 'Hi', 'options': None}, {'role': 'assistant', 'metadata': {'title': None}, 'content': 'Hello! How can I assist you today?', 'options': None}, {'role': 'user', 'metadata': {'title': None}, 'content': 'Not feeling well, a bit feverish', 'options': None}, {'role': 'assistant', 'metadata': {'title': None}, 'content': "I'm sorry to hear that you're not feeling well. Since you're experiencing a fever, it's essential to monitor your symptoms closely. Fever can be associated with various conditions, in

OK by the way what else it could be if I am having the chest tightness and fever as well


    Output components:
        [state, chatbot]
    Output values returned:
        ["Experiencing both chest tightness and a fever could potentially be associated with several conditions, including:

1. **Pneumonia**: This is a respiratory infection characterized by symptoms such as fever, chills, cough with phlegm, and chest pain or tightness. It typically requires antibiotics and possibly oxygen therapy.

2. **COVID-19**: This viral infection can cause a combination of symptoms, including fever, cough, shortness of breath, and chest tightness. Treatment may involve antiviral drugs, oxygen therapy, or ventilation in severe cases.

3. **Asthma**: While fever is not a primary symptom, chest tightness is common. If experiencing both, it could suggest an asthma exacerbation potentially complicated by an infection, which might lead to fever.

For any of these conditions, especially if symptoms are severe, it is advisable to consult a healthcare professional for an accurate diagnosis and a

Thanks, which doctor shall i consult


    Output components:
        [state, chatbot]
    Output values returned:
        ["You should consult a doctor specializing in respiratory conditions, such as a pulmonologist, as chest tightness can be related to respiratory issues like asthma or pneumonia. Additionally, since you are experiencing fever, which can be associated with infections, a general physician or an infectious disease specialist may also be appropriate. If the chest tightness is severe, a cardiologist could be consulted to rule out cardiovascular issues. It's important to seek medical attention promptly.", [{'role': 'user', 'metadata': {'title': None}, 'content': 'Hi', 'options': None}, {'role': 'assistant', 'metadata': {'title': None}, 'content': 'Hello! How can I assist you today?', 'options': None}, {'role': 'user', 'metadata': {'title': None}, 'content': 'Not feeling well, a bit feverish', 'options': None}, {'role': 'assistant', 'metadata': {'title': None}, 'content': "I'm sorry to hear that you're not feeli

DO we have anyone in this hospital whom I can consult


    Output components:
        [state, chatbot]
    Output values returned:
        ["I don't have information on the symptoms you're experiencing, but I can provide you with details on the specialists available at Fictional Care Hospital. They have:

1. **Dr. Jane Smith** - A Neurologist available on Monday, Wednesday, Friday (2:00 PM - 5:00 PM), and Sunday (9:00 AM - 12:00 PM).
2. **Dr. John Doe** - A Cardiologist available Monday to Friday (9:00 AM - 12:00 PM) and Saturday (10:00 AM - 1:00 PM).

If your symptoms relate to neurology or cardiology, you can consider consulting one of these specialists. If your symptoms pertain to a different specialty, I recommend contacting the hospital directly for further assistance.", [{'role': 'user', 'metadata': {'title': None}, 'content': 'Hi', 'options': None}, {'role': 'assistant', 'metadata': {'title': None}, 'content': 'Hello! How can I assist you today?', 'options': None}, {'role': 'user', 'metadata': {'title': None}, 'content': 'Not feelin

thanks


    Output components:
        [state, chatbot]
    Output values returned:
        ["I don't have specific information about the specialists at Fictional Care Hospital. It would be best to contact the hospital directly via their Emergency Department at +1 (800) 555-0001 to inquire about available specialists who can address your symptoms of chest tightness and fever.", [{'role': 'user', 'metadata': {'title': None}, 'content': 'Hi', 'options': None}, {'role': 'assistant', 'metadata': {'title': None}, 'content': 'Hello! How can I assist you today?', 'options': None}, {'role': 'user', 'metadata': {'title': None}, 'content': 'Not feeling well, a bit feverish', 'options': None}, {'role': 'assistant', 'metadata': {'title': None}, 'content': "I'm sorry to hear that you're not feeling well. Since you're experiencing a fever, it's essential to monitor your symptoms closely. Fever can be associated with various conditions, including viral infections like influenza or COVID-19, or bacterial infe