## Project 2: Expert

### A question answering agent that is an expert in a new product being launched
### The agent needs to be accurate and the solution should be low cost.

This project will use RAG (Retrieval Augmented Generation) to ensure our question/answering assistant has high accuracy.

We will be using the LangChain framework which does most of the heavy lifting for us! We'll also be using Gradio's chat interface.

In [None]:
# imports

import os
from dotenv import load_dotenv
import gradio as gr

In [None]:
# price is a factor for our company, so we're going to use a low cost model

MODEL = "gpt-4o-mini"

In [None]:
# Load environment variables in a file called .env

load_dotenv()
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')

In [None]:
# Read in the document using LangChain's loaders

from langchain.document_loaders import TextLoader
text_loader = TextLoader('product.md')
loaded_data = text_loader.load()

# Split the document into chunks of 1000 characters, aiming to preserve paragraphs, and with some overlap between chunks

from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(loaded_data)

print(f"The document was divided into {len(chunks)} chunks")

## A sidenote on Embeddings, and "Auto-Encoding LLMs"

We will be mapping each chunk of text into a Vector that represents the meaning of the text, known as an embedding.

OpenAI offers a model to do this, which we will use by calling their API with some LangChain code.

This model is an example of an "Auto-Encoding LLM" which generates an output given a complete input.
It's different to all the other LLMs we've discussed today, which are known as "Auto-Regressive LLMs", and generate future tokens based only on past context.

Another example of an Auto-Encoding LLMs is BERT from Google. In addition to embedding, Auto-encoding LLMs are often used for classification.

More details in the resources.

In [None]:
# Put the chunks of data into a Vector Store that associates a Vector Embedding with each chunk
# FAISS ("Facebook AI Similarity Search") is a library from Meta for quickly finding similar documents using Vector Embeddings

from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS

# A sidenote
# The OpenAiEmbeddings class in LangChain uses the OpenAI API.
# OpenAI provides an Embeddings model to turn text into emdeddings
# This is an example of an "Auto-Encoding LLM' like Bert
# All other models in this class are "Auto-Regressive" and generate future tokens based on past context
# We'll ask LangChain to use FAISS to create our VectorStore, using the OpenAIEmbeddings to generate embeddings

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(chunks, embedding=embeddings)

In [None]:
# Let's look at the vectors themselves for our Chunks

vectors = vectorstore.index.reconstruct_n(0, vectorstore.index.ntotal)
dimensions = vectorstore.index.d

print(f"There are {len(vectors)} vectors with {dimensions:,} dimensions in the vector store")

## Visualizing the Vector Store

Let's take a minute to look at the documents and their embedding vectors to see what's going on.

In [None]:
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
import numpy as np

# Reduce the dimensionality of the vectors using t-SNE ("t-distributed Stochastic Neighbor Embedding")
perplexity_value = min(30, len(vectors) - 1)
tsne = TSNE(n_components=2, random_state=42, perplexity=perplexity_value)
reduced_vectors = tsne.fit_transform(vectors)

# Plot the reduced vectors using Matplotlib
plt.figure(figsize=(10, 10))
plt.scatter(reduced_vectors[:, 0], reduced_vectors[:, 1], s=10)

# Uncomment the next lines to see the text by each point
for i, txt in enumerate(chunks):
    plt.annotate(txt.page_content[:30], (reduced_vectors[i, 0], reduced_vectors[i, 1]))

plt.title("FAISS Vector Store Visualization with t-SNE")
plt.xlabel("Dimension 1")
plt.ylabel("Dimension 2")
plt.show()

## Time to use LangChain to bring it all together

In [None]:
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

# create a new Chat with OpenAI
llm = ChatOpenAI(temperature=0.7, model_name=MODEL)

# set up the conversation memory for the chat
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

# putting it together: set up the conversation chain with the GPT 3.5 LLM, the vector store and memory
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=vectorstore.as_retriever(), memory=memory)

In [None]:
# Let's try a simple question

query = "Please explain what WealthAI is in a couple of sentences"
result = conversation_chain.invoke({"question": query})
answer = result["answer"]
print(answer)

## Now we will bring this up in Gradio using the Chat interface -

A quick and easy way to prototype a chat with an LLM

In [None]:
# Wrapping that in a function

def expert(question, history):
    result = conversation_chain.invoke({"question": question})
    return result["answer"]

In [None]:
# And in Gradio:

view = gr.ChatInterface(expert).launch()

In [None]:
from langchain_core.callbacks import StdOutCallbackHandler

# Create the conversation
llm = ChatOpenAI(temperature=0.7, model_name=MODEL)
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
conversation_chain = ConversationalRetrievalChain.from_llm(
    llm=llm, 
    retriever=vectorstore.as_retriever(), 
    memory=memory,
    callbacks=[StdOutCallbackHandler()]
)

# Try it out
query = "Does WealthAI offer tax advice"
result = conversation_chain.invoke({"question": query})
answer = result["answer"]
print("\nAnswer:", answer)

# Exercises

Break this example by adding information at the bottom of the product documentation with Tax information so that the wrong chunk is provided to the model and it answers the question wrongly.

Then find a way to fix the break.