## Expert Knowledge Worker

* **A question answering agent** that is an expert knowledge worker
* **To be used by employees of Insurellm**, an Insurance Tech company
* The **agent needs to be accurate** and the **solution should be low cost**.

This project will use **RAG (Retrieval Augmented Generation)** to ensure our question/answering assistant has high accuracy.

### Sidenote: Business applications of this week's projects

**RAG is perhaps the most immediately applicable technique of anything that we cover in the course!** In fact, there are commercial products that do precisely what we build this week: nuanced querying across large databases of information, such as company contracts or product specs. RAG gives you a quick-to-market, low cost mechanism for adapting an LLM to your business area.

### Imports for langchain and Chroma and plotly

In [1]:
import os
from glob import glob
from dotenv import load_dotenv
import gradio as gr

from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.schema import Document
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
from langchain.memory import ConversationBufferMemory, ChatMessageHistory
from langchain.chains import ConversationalRetrievalChain
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain.embeddings import HuggingFaceEmbeddings

import numpy as np
from sklearn.manifold import TSNE
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import plotly.io as pio
pio.renderers.default = "notebook"    # I'm using my machine, not colab

### Load environment variables

In [2]:
load_dotenv(override=True)
anthropic_api_key = os.environ['ANTHROPIC_API_KEY'] 
openai_api_key = os.environ['OPENAI_API_KEY'] 
huggingface_api_key = os.environ['HUGGINGFACE_API_KEY']

In [3]:
MODEL = "gpt-4o-mini"
db_name = "vector_db"

## Read in documents using LangChain's loaders and add metadata to the splits
Take everything in all the sub-folders of our knowledgebase

In [4]:
folders = glob("knowledge-base/*")

def add_metadata(doc, doc_type):
    doc.metadata["doc_type"] = doc_type
    return doc

text_loader_kwargs = {'encoding': 'utf-8'}
documents = []
for folder in folders:
    doc_type = os.path.basename(folder)
    loader = DirectoryLoader(folder, glob="**/*.md", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
    folder_docs = loader.load()
    documents.extend([add_metadata(doc, doc_type) for doc in folder_docs])

### Split the documents into chunks

In [5]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)

print(f"Total number of chunks: {len(chunks)}")
print(f"Document types found: {set(doc.metadata['doc_type'] for doc in documents)}")

Created a chunk of size 1088, which is longer than the specified 1000


Total number of chunks: 123
Document types found: {'products', 'employees', 'contracts', 'company'}


## A sidenote on Embeddings, and "Auto-Encoding LLMs"

* We will be mapping each chunk of text into a Vector that represents the **meaning of the text**, known as an **embeddings**.

* `OpenAI` offers a model to do this, which we will use by calling their API with some `LangChain` code.

* This model is an example of an **"Auto-Encoding LLM"** which generates an output given a complete input.
It's different to all the other LLMs we've discussed today, which are known as **"Auto-Regressive LLMs"**, and generate **future tokens based only on past context**.

Another example of an **Auto-Encoding LLMs** is `BERT` from Google. In addition to embedding, **Auto-encoding LLMs are often used for classification**.

### Sidenote
In week 8 we will return to **RAG and vector embeddings**, and we will use an **open-source vector encoder** so that the **data never leaves our computer** - that's an important consideration when building enterprise systems and the data needs to remain internal.

### Put the chunks of data into a Vector Store that associates a Vector Embedding with each chunk
If you would rather use the **free Vector Embeddings from HuggingFace sentence-transformers**, then replace `embeddings = OpenAIEmbeddings()` with:

```python
from langchain.embeddings import HuggingFaceEmbeddings`
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

In [6]:
embeddings = OpenAIEmbeddings()

### Create vectorstore
Check if a **Chroma Datastore** already exists - if so, delete the collection to start from scratch

In [7]:
if os.path.exists(db_name):
    Chroma(persist_directory=db_name, embedding_function=embeddings).delete_collection()

vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=db_name)
print(f"Vectorstore created with {vectorstore._collection.count()} documents")

Vectorstore created with 123 documents


### Let's investigate the vectors
Get one vector and find how many dimensions it has.

In [8]:
collection = vectorstore._collection
count = collection.count()

sample_embedding = collection.get(limit=1, include=["embeddings"])["embeddings"][0]
dimensions = len(sample_embedding)
print(f"There are {count:,} vectors with {dimensions:,} dimensions in the vector store")

There are 123 vectors with 1,536 dimensions in the vector store


## Time to use LangChain to bring it all together
### Create a new Chat with OpenAI
* set up the **conversation memory** for the chat
* create the **retriever**: this is an **abstraction over the VectorStore** that will be used during RAG
* putting it together: set up the **conversation chain with the `GPT 4o-mini LLM`, the `vector store` and `memory`**

### The new and recommended way by Langchain 
* Use `RunnableWithMessageHistory` for memory
* Directly pass the `retriever` and `LLM`; **memory is managed outside the chain**!

### Set up memory and Create the chain with memory

In [9]:
# Define LLM, retriever and conversation chain
llm = ChatOpenAI(temperature=0.7, model_name=MODEL)
retriever = vectorstore.as_retriever()
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever)

# Create Message History Factory
message_histories = {}
def get_message_history(session_id):
    if session_id not in message_histories:
        message_histories[session_id] = ChatMessageHistory()
    return message_histories[session_id]

# Wrap Chain with Memory
chain_with_history = RunnableWithMessageHistory(
    conversation_chain,
    get_message_history,
    input_messages_key="question",
    history_messages_key="chat_history",
)

### Use the chain in RAG
Query your documents and print the result

In [10]:
session_id = "user-session-id"
config = {"configurable": {"session_id": session_id}}

In [11]:
query = 'Can you describe Insurellm in a few sentences'
result = chain_with_history.invoke({"question": query}, config = config)
print(result['answer'])

Insurellm is an innovative insurance tech startup founded in 2015 by Avery Lancaster, focused on disrupting the insurance industry with its unique products. The company offers four main software products: Carllm for auto insurance, Homellm for home insurance, Rellm for the reinsurance sector, and Marketllm, a marketplace connecting consumers with insurance providers. With 200 employees and over 300 clients worldwide, Insurellm is committed to providing reliable and innovative solutions in the insurance tech space.


In [12]:
query = 'What products does Insurellm offer?'       
result = chain_with_history.invoke({"question": query}, config = config)
print(result['answer'])

Insurellm offers four insurance software products:

1. Carllm - a portal for auto insurance companies
2. Homellm - a portal for home insurance companies
3. Rellm - an enterprise platform for the reinsurance sector
4. Marketllm - a marketplace for connecting consumers with insurance providers


In [13]:
query = 'Who received the prestigious IIOTY award in 2023?'       
result = chain_with_history.invoke({"question": query}, config = config)
print(result['answer'])

I don't know.


The answer `"I don't know."` is not correct. We'll investigate later...
### Alternative
if you'd like to use `Ollama` locally, you may use:

`llm = ChatOpenAI(temperature=0.7, model_name='llama3.2', base_url='http://localhost:11434/v1', api_key='ollama')`

### User Interface: Using Gradio's `ChatInterface`
A quick and easy way to prototype a chat with an LLM

* set up a new conversation memory for the chat
* putting it together: set up the conversation chain with the GPT 4o-mini LLM, the vector store and memory

### A new conversation memory for the chat
Instantiate chat history memory and add it to the chain.

### Wrapping that in a chat function

In [14]:
def chat(question, history, session_id = "user-session-id"):
    config = {"configurable": {"session_id": session_id}}
    result = chain_with_history.invoke({"question": question}, config = config)
    return result["answer"]

### And in Gradio:

In [17]:
view = gr.ChatInterface(chat, type="messages").launch(inbrowser=True)

* Running on local URL:  http://127.0.0.1:7892
* To create a public link, set `share=True` in `launch()`.


### Create a new Chat with OpenAI
Let's set the number of chunks to be analyzed to `25`, with the hope that the model would be able to response correctly to the query `"Who received the prestigious IIOTY award in 2023?"`. 

In [22]:
# Define LLM, retriever and conversation chain
llm = ChatOpenAI(temperature=0.7, model_name=MODEL)
retriever = vectorstore.as_retriever(search_kwargs={"k": 25})
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever)

# Create Message History Factory
message_histories = {}
def get_message_history(session_id):
    if session_id not in message_histories:
        message_histories[session_id] = ChatMessageHistory()
    return message_histories[session_id]

# Wrap Chain with Memory
chain_with_history = RunnableWithMessageHistory(
    conversation_chain,
    get_message_history,
    input_messages_key="question",
    history_messages_key="chat_history",
)

In [23]:
def chat(question, history, session_id = "user-session-id"):
    config = {"configurable": {"session_id": session_id}}
    result = chain_with_history.invoke({"question": question}, config = config)
    return result["answer"]

In [24]:
view = gr.ChatInterface(chat, type="messages").launch(inbrowser=True)

* Running on local URL:  http://127.0.0.1:7894
* To create a public link, set `share=True` in `launch()`.
