<div style="font-size: 13px; line-height: 1.4; margin: 0; padding: 0;">
<h5 style="margin-bottom: 0.2em;">
This notebook documents my theoretical study alongside the lab exercises conducted on <b>July 8, 2025</b>.
</h5>
</div>

### <u><b>LAB EXERCISES:</b></u> **WEEK 5**

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
<b style="font-size: 16px;">Overview:</b> This week focuses on building an <b>expert knowledge worker</b> using <b>Retrieval-Augmented Generation (RAG)</b> to accurately answer questions for an insurance tech company. The labs guide you through implementing a simple RAG pipeline that retrieves relevant information from a knowledge base to <b>ground LLM responses</b>, with a strong emphasis on <b>practical, low-cost deployment</b> for enterprise applications.
</div>


#### <code>**day1.ipynb**</code>

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
<b style="font-size: 16px;">Abstract:</b> Build a simple <b>RAG (Retrieval-Augmented Generation)</b> pipeline to create an expert knowledge worker for <b>Insurellm</b>. Load context from files and answer questions with high accuracy. Focus: Low-cost, brute-force retrieval and grounding LLM responses in company-specific data.
</div>


<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
<h4 style="margin-bottom: 0.4em;"><b>Expert Knowledge Worker</b></h4>

<b>Overview:</b><br>
A question-answering agent designed as an <b>expert knowledge worker</b> to assist employees at <b>Insurellm</b>, an Insurance Tech company. This system prioritizes <b>accuracy</b> while maintaining a <b>low-cost</b> deployment.<br>

<b>What is RAG?</b><br>
<b>Retrieval-Augmented Generation (RAG)</b> is an AI architecture that enhances LLM outputs by retrieving relevant information from external sources (e.g., documents or databases) and injecting it into the model's prompt. This improves factual accuracy and reduces hallucinations.<br>

<b>Approach:</b><br>
The solution leverages <b>RAG</b> to improve response accuracy by grounding answers in retrieved knowledge. The initial implementation adopts a <b>simple brute-force RAG</b> mechanism to demonstrate feasibility and performance.<br>

<b style="font-size: 13.5px;">Sidenote: Business Application Relevance</b><br>
RAG is arguably the most practical technique covered in this course. Many commercial applications already use similar pipelines to perform <b>context-aware retrieval over large document stores</b> — such as insurance contracts, financial policies, or product specs. It offers a <b>quick-to-market, cost-effective</b> strategy to enhance LLM utility in enterprise environments.
</div>


In [None]:
# Run in Anaconda Prompt (for conda users):
# conda install -c conda-forge python-dotenv gradio openai langchain langchain-community langchain-openai langchain-chroma scikit-learn plotly sentence-transformers langchain-huggingface faiss-cpu matplotlib

# pip users:
# pip install python-dotenv gradio openai langchain langchain-community langchain-openai langchain-chroma scikit-learn plotly sentence-transformers langchain-huggingface faiss-cpu matplotlib

In [None]:
import os
import glob
from dotenv import load_dotenv
import gradio as gr
from openai import OpenAI

In [None]:
MODEL = 'gpt-4o-mini'
# MODEL_LLAMA_HF = 'meta-llama/Meta-Llama-3.1-8B-Instruct'  
# MODEL_GEMINI = 'gemini-1.5-flash'  
# MODEL_LLAMA_LOCAL = 'llama3.2'  

# Example: 
# To use local Llama3.2 with OpenAI-compatible API (Ollama), use:
# openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

In [None]:
# Load environment variables in a file called .env

load_dotenv(override=True)
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')
openai = OpenAI()

In [None]:
# With massive thanks to student Dr John S. for fixing a bug in the below for Windows users!

context = {}

employees = glob.glob("knowledge-base/employees/*")

for employee in employees:
    name = employee.split(' ')[-1][:-3]
    doc = ""
    with open(employee, "r", encoding="utf-8") as f:
        doc = f.read()
    context[name]=doc

In [None]:
context["Lancaster"]

In [None]:
products = glob.glob("knowledge-base/products/*")

for product in products:
    name = product.split(os.sep)[-1][:-3]
    doc = ""
    with open(product, "r", encoding="utf-8") as f:
        doc = f.read()
    context[name]=doc

In [None]:
context.keys()

In [None]:
# system_message = "You are an expert in answering accurate questions about Insurellm, the Insurance Tech company. Give brief, accurate answers. If you don't know the answer, say so. Do not make anything up if you haven't been provided with relevant context."

In [None]:
def get_relevant_context(message):
    relevant_context = []
    for context_title, context_details in context.items():
        if context_title.lower() in message.lower():
            relevant_context.append(context_details)
    return relevant_context          

In [None]:
get_relevant_context("Who is lancaster?")

In [None]:
get_relevant_context("Who is Avery and what is carllm?")

In [None]:
def add_context(message):
    relevant_context = get_relevant_context(message)
    if relevant_context:
        message += "\n\nThe following additional context might be relevant in answering this question:\n\n"
        for relevant in relevant_context:
            message += relevant + "\n\n"
    return message

In [None]:
print(add_context("Who is Alex Lancaster?"))

In [None]:
def chat(message, history):
    messages = [{"role": "system", "content": system_message}] + history
    message = add_context(message)
    messages.append({"role": "user", "content": message})

    stream = openai.chat.completions.create(model=MODEL, messages=messages, stream=True)

    response = ""
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        yield response

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
<h4 style="margin-bottom: 0.4em;"><b>Now we will bring this up in Gradio using the Chat interface</b></h4>
A quick and easy way to prototype a chat with an LLM.
</div>


In [None]:
view = gr.ChatInterface(chat, type="messages").launch()

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
<h5 style="margin-bottom: 0.2em;"><b>Questions to Test</b></h5>
<ul style="margin: 0.4em 0; padding-left: 1.5em;">
  <li>Who is Alex Lancaster?</li>
  <li>What is the CarLLM product?</li>
  <li>Who are the employees in the Insurellm knowledge base?</li>
  <li>What does the Insurellm company do?</li>
  <li>Tell me about Avery's role at Insurellm.</li>
  <li>What insurance products does Insurellm offer?</li>
  <li>Who is responsible for product development?</li>
  <li>What is the main feature of the CarLLM product?</li>
  <li>Who can I contact for claims support?</li>
  <li>List all products mentioned in the knowledge base.</li>
</ul>
</div>


<br>

<br>

#### <code>**day2.ipynb**</code>

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
<b style="font-size: 16px;">Abstract:</b> Automate document loading and chunking from the knowledge base. Focus: Text preprocessing and chunking, preparing data for vector storage in the next step.
</div>


In [None]:
import os
import glob
from dotenv import load_dotenv
import gradio as gr

In [None]:
from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import CharacterTextSplitter

In [None]:
MODEL = "gpt-4o-mini"
db_name = "vector_db"

In [None]:
load_dotenv(override=True)
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')

In [None]:
folders = glob.glob("knowledge-base/*")

# With thanks to CG and Jon R, students on the course, for this fix needed for some users 
text_loader_kwargs = {'encoding': 'utf-8'}
# If that doesn't work, some Windows users might need to uncomment the next line instead
# text_loader_kwargs={'autodetect_encoding': True}

documents = []
for folder in folders:
    doc_type = os.path.basename(folder)
    loader = DirectoryLoader(folder, glob="**/*.md", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
    folder_docs = loader.load()
    for doc in folder_docs:
        doc.metadata["doc_type"] = doc_type
        documents.append(doc)

In [None]:
len(documents)

In [None]:
documents[24]

In [None]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)

In [None]:
len(chunks)

In [None]:
chunks[6]

In [None]:
doc_types = set(chunk.metadata['doc_type'] for chunk in chunks)
print(f"Document types found: {', '.join(doc_types)}")

In [None]:
for chunk in chunks:
    if 'CEO' in chunk.page_content:
        print(chunk)
        print("_________")

<br>

<br>

#### <code>**day3.ipynb**</code>

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
<b style="font-size: 16px;">Abstract:</b> Generate vector embeddings for each document chunk using either <b>OpenAI</b> or <b>HuggingFace</b> models. Store them in a <b>Chroma</b> vector database. Visualize the vector space in 2D and 3D to gain insight into how your knowledge is represented.
</div>


In [None]:
import os
import glob
from dotenv import load_dotenv
import gradio as gr

In [None]:
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.schema import Document
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
import numpy as np
from sklearn.manifold import TSNE
import plotly.graph_objects as go

In [None]:
MODEL = "gpt-4o-mini"
db_name = "vector_db"

In [None]:
load_dotenv(override=True)
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')

In [None]:
# Read in documents using LangChain's loaders
# Take everything in all the sub-folders of our knowledgebase

folders = glob.glob("knowledge-base/*")

# With thanks to CG and Jon R, students on the course, for this fix needed for some users 
text_loader_kwargs = {'encoding': 'utf-8'}
# If that doesn't work, some Windows users might need to uncomment the next line instead
# text_loader_kwargs={'autodetect_encoding': True}

documents = []
for folder in folders:
    doc_type = os.path.basename(folder)
    loader = DirectoryLoader(folder, glob="**/*.md", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
    folder_docs = loader.load()
    for doc in folder_docs:
        doc.metadata["doc_type"] = doc_type
        documents.append(doc)

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
<b style="font-size: 16px;">⚠️ Please Note:</b><br>
In the next cell, we split the text into chunks.<br>
Two students reported that the operation caused their computers to crash. They resolved the issue by adjusting the chunking parameters:<br>
<ul style="margin: 0.5em 0; padding-left: 1.5em;">
  <li><code>chunk_size</code>: from <code>1000</code> to <code>2000</code></li>
  <li><code>chunk_overlap</code>: from <code>200</code> to <code>400</code></li>
</ul>
This change should not be necessary in most cases, but if you encounter similar issues, feel free to apply it.<br>
<span style="color: gray;"><i>Note:</i> LangChain may issue a warning about chunk sizes exceeding 1000 — this can be safely ignored.</span><br>
<i>Special thanks to Steven W and Nir P for reporting and resolving this issue 🙏</i>
</div>


In [None]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)

In [None]:
len(chunks)

In [None]:
doc_types = set(chunk.metadata['doc_type'] for chunk in chunks)
print(f"Document types found: {', '.join(doc_types)}")

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
<h5 style="margin-bottom: 0.2em; font-size: 16px;"><b>Sidenote on Embeddings and Auto-Encoding LLMs</b></h5>

We will be mapping each chunk of text into a vector that represents its meaning — this is called an <b>embedding</b>.<br>

To do this, we will use OpenAI’s embedding model via API calls wrapped in LangChain code. This model is an example of an <b>Auto-Encoding LLM</b>, which processes an entire input to generate a fixed output. It differs from <b>Auto-Regressive LLMs</b> (like GPT), which generate outputs token by token based only on prior context.<br>

One well-known Auto-Encoding model is <b>BERT</b> from Google. In addition to generating embeddings, these models are also commonly used for classification tasks.<br>

<b>Sidenote:</b><br>
In <b>Week 8</b>, we’ll return to RAG and vector embeddings using an <b>open-source vector encoder</b> so that data processing remains completely local. This is a crucial consideration for enterprise applications where data privacy and internal compliance are essential.
</div>


<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
<b>Option 1:</b> Use <code>embeddings = OpenAIEmbeddings()</code><br>
This sets up the embedding model that will convert text chunks into vector representations using OpenAI’s API.
</div>

In [None]:
# Put the chunks of data into a Vector Store that associates a Vector Embedding with each chunk

embeddings = OpenAIEmbeddings()

# If you would rather use the free Vector Embeddings from HuggingFace sentence-transformers
# Then replace embeddings = OpenAIEmbeddings()
# with:
# from langchain.embeddings import HuggingFaceEmbeddings
# embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

In [None]:
# Check if a Chroma Datastore already exists - if so, delete the collection to start from scratch

if os.path.exists(db_name):
    Chroma(persist_directory=db_name, embedding_function=embeddings).delete_collection()

In [None]:
# Create our Chroma vectorstore!

vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=db_name)
print(f"Vectorstore created with {vectorstore._collection.count()} documents")

In [None]:
# Get one vector and find how many dimensions it has

collection = vectorstore._collection
sample_embedding = collection.get(limit=1, include=["embeddings"])["embeddings"][0]
dimensions = len(sample_embedding)
print(f"The vectors have {dimensions:,} dimensions")

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
<h5 style="margin-bottom: 0.2em; font-size: 16px;"><b>Visualizing the Vector Store</b></h5>
Let’s take a moment to examine the documents and their corresponding embedding vectors to better understand how the system is organizing and retrieving information.
</div>


In [None]:
# Prework

result = collection.get(include=['embeddings', 'documents', 'metadatas'])
vectors = np.array(result['embeddings'])
documents = result['documents']
doc_types = [metadata['doc_type'] for metadata in result['metadatas']]
colors = [['blue', 'green', 'red', 'orange'][['products', 'employees', 'contracts', 'company'].index(t)] for t in doc_types]

In [None]:
# We humans find it easier to visalize things in 2D!
# Reduce the dimensionality of the vectors to 2D using t-SNE
# (t-distributed stochastic neighbor embedding)

tsne = TSNE(n_components=2, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)

# Create the 2D scatter plot
fig = go.Figure(data=[go.Scatter(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='2D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x',yaxis_title='y'),
    width=800,
    height=600,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

In [None]:
# Let's try 3D!

tsne = TSNE(n_components=3, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)

# Create the 3D scatter plot
fig = go.Figure(data=[go.Scatter3d(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    z=reduced_vectors[:, 2],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='3D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x', yaxis_title='y', zaxis_title='z'),
    width=900,
    height=700,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
<b>Option 2:</b> Use <code>embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")</code><br>
This sets up the embedding model using Hugging Face's <code>sentence-transformers</code>, allowing you to generate vector representations of text locally without relying on external APIs.
</div>


In [None]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
import numpy as np
from sklearn.manifold import TSNE
import plotly.graph_objects as go
import os

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

if os.path.exists(db_name):
    Chroma(persist_directory=db_name, embedding_function=embeddings).delete_collection()

vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=db_name)
print(f"Vectorstore created with {vectorstore._collection.count()} documents")

collection = vectorstore._collection
embedding_result = collection.get(limit=1, include=["embeddings"])
embeddings_array = embedding_result.get("embeddings")
if embeddings_array is not None and len(embeddings_array) > 0 and embeddings_array[0] is not None:
    sample_embedding = embeddings_array[0]
    dimensions = len(sample_embedding)
    print(f"The vectors have {dimensions:,} dimensions")
else:
    print("No embeddings found in the collection.")

result = collection.get(include=['embeddings', 'documents', 'metadatas'])
vectors = np.array(result['embeddings'])
documents = result['documents']
doc_types = [metadata['doc_type'] for metadata in result['metadatas']]
colors = [['blue', 'green', 'red', 'orange'][['products', 'employees', 'contracts', 'company'].index(t)] for t in doc_types]

tsne = TSNE(n_components=2, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)

fig = go.Figure(data=[go.Scatter(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='2D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x', yaxis_title='y'),
    width=800,
    height=600,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

tsne = TSNE(n_components=3, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)

fig = go.Figure(data=[go.Scatter3d(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    z=reduced_vectors[:, 2],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='3D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x', yaxis_title='y', zaxis_title='z'),
    width=900,
    height=700,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

<br>

<br>

#### <code>**day4.ipynb**</code>

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
<b style="font-size: 16px;">Abstract:</b> Connect the vector store to a <b>Conversational Retrieval Chain</b>, allowing the LLM to answer questions using retrieved context. Demonstrate how to integrate memory and retrieval for accurate, context-aware responses.
</div>


In [None]:
import os
import glob
import gradio as gr
import numpy as np
import plotly.graph_objects as go

from dotenv import load_dotenv
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.schema import Document
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
from sklearn.manifold import TSNE
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

In [None]:
# price is a factor for our company, so we're going to use a low cost model
MODEL = "gpt-4o-mini"
db_name = "vector_db"

In [None]:
# Load environment variables in a file called .env
load_dotenv(override=True)
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')

In [None]:
# Read in documents using LangChain's loaders
# Take everything in all the sub-folders of our knowledgebase

folders = glob.glob("knowledge-base/*")

# With thanks to CG and Jon R, students on the course, for this fix needed for some users 
text_loader_kwargs = {'encoding': 'utf-8'}
# If that doesn't work, some Windows users might need to uncomment the next line instead
# text_loader_kwargs={'autodetect_encoding': True}

documents = []
for folder in folders:
    doc_type = os.path.basename(folder)
    loader = DirectoryLoader(folder, glob="**/*.md", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
    folder_docs = loader.load()
    for doc in folder_docs:
        doc.metadata["doc_type"] = doc_type
        documents.append(doc)

In [None]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)

In [None]:
len(chunks)

In [None]:
doc_types = set(chunk.metadata['doc_type'] for chunk in chunks)
print(f"Document types found: {', '.join(doc_types)}")

In [None]:
# Put the chunks of data into a Vector Store that associates a Vector Embedding with each chunk
# Chroma is a popular open source Vector Database based on SQLLite

embeddings = OpenAIEmbeddings()

# If you would rather use the free Vector Embeddings from HuggingFace sentence-transformers
# Then replace embeddings = OpenAIEmbeddings()
# with:
# from langchain.embeddings import HuggingFaceEmbeddings
# embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Delete if already exists
if os.path.exists(db_name):
    Chroma(persist_directory=db_name, embedding_function=embeddings).delete_collection()

# Create vectorstore
vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=db_name)
print(f"Vectorstore created with {vectorstore._collection.count()} documents")

In [None]:
# Get one vector and find how many dimensions it has
collection = vectorstore._collection
sample_embedding = collection.get(limit=1, include=["embeddings"])["embeddings"][0]
dimensions = len(sample_embedding)
print(f"The vectors have {dimensions:,} dimensions")

In [None]:
# Prework
result = collection.get(include=['embeddings', 'documents', 'metadatas'])
vectors = np.array(result['embeddings'])
documents = result['documents']
doc_types = [metadata['doc_type'] for metadata in result['metadatas']]
colors = [['blue', 'green', 'red', 'orange'][['products', 'employees', 'contracts', 'company'].index(t)] for t in doc_types]

In [None]:
# We humans find it easier to visalize things in 2D!
# Reduce the dimensionality of the vectors to 2D using t-SNE
# (t-distributed stochastic neighbor embedding)

tsne = TSNE(n_components=2, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)

# Create the 2D scatter plot
fig = go.Figure(data=[go.Scatter(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='2D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x',yaxis_title='y'),
    width=800,
    height=600,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

In [None]:
# Let's try 3D!

tsne = TSNE(n_components=3, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)

# Create the 3D scatter plot
fig = go.Figure(data=[go.Scatter3d(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    z=reduced_vectors[:, 2],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='3D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x', yaxis_title='y', zaxis_title='z'),
    width=900,
    height=700,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
<h5 style="margin-bottom: 0.3em; font-size: 14px;">Now it’s time to use <b>LangChain</b> to bring everything together</h5>
</div>


In [None]:
# create a new Chat with OpenAI
llm = ChatOpenAI(temperature=0.7, model_name=MODEL)

# set up the conversation memory for the chat
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

# the retriever is an abstraction over the VectorStore that will be used during RAG
retriever = vectorstore.as_retriever()

# putting it together: set up the conversation chain with the GPT 4o-mini LLM, the vector store and memory
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)

In [None]:
query = "Can you describe Insurellm in a few sentences"
result = conversation_chain.invoke({"question":query})
print(result["answer"])

In [None]:
# set up a new conversation memory for the chat
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

# putting it together: set up the conversation chain with the GPT 4o-mini LLM, the vector store and memory
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)

In [None]:
# Wrapping in a function - note that history isn't used, as the memory is in the conversation_chain

def chat(message, history):
    result = conversation_chain.invoke({"question": message})
    return result["answer"]

In [None]:
# And in Gradio:

view = gr.ChatInterface(chat, type="messages").launch(inbrowser=True)

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
  <h5 style="margin-bottom: 0.2em;"><b>Questions to Test</b></h5>
  <ul style="margin: 0.4em 0; padding-left: 1.5em;">
    <li>What awards has Insurellm or its employees received?</li>
    <li>Describe the responsibilities of the claims support team.</li>
    <li>Which employee is the CEO of Insurellm?</li>
    <li>How does Insurellm use AI in its products?</li>
    <li>What are the main differences between CarLLM and other insurance products?</li>
    <li>Who leads the product development team?</li>
    <li>Can you summarize the company’s mission or vision?</li>
    <li>What is the process for filing an insurance claim with Insurellm?</li>
    <li>Which products are designed for automotive insurance?</li>
    <li>Who should I contact for technical support?</li>
    <li>What recent innovations has Insurellm introduced?</li>
    <li>Are there any notable partnerships or collaborations mentioned in the knowledge base?</li>
    <li>What are the eligibility criteria for Insurellm’s insurance products?</li>
    <li>How does Insurellm ensure data privacy for its customers?</li>
    <li>What is the background of Alex Lancaster?</li>
  </ul>
</div>


<br>

<br>

#### <code>**day4.5.ipynb**</code>

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
<b style="font-size: 16px;">Abstract:</b> Experiment with swapping <b>Chroma</b> for <b>FAISS</b> as the vector database backend. Explore an open-source alternative with the same retrieval workflow.
</div>


<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
  This exercise demonstrates how to swap out <b>Chroma</b> for <b>FAISS</b> (Facebook AI Similarity Search) as the vector store backend.<br>
  FAISS is an open-source library developed by Facebook AI Research for efficient similarity search on dense vectors.<br>

</div>


In [None]:
import os
import glob

import numpy as np
import plotly.graph_objects as go
from sklearn.manifold import TSNE
from dotenv import load_dotenv
import gradio as gr

from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.schema import Document
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
from langchain.vectorstores import FAISS
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

In [None]:
# price is a factor for our company, so we're going to use a low cost model

MODEL = "gpt-4o-mini"
db_name = "vector_db"

In [None]:
# Load environment variables in a file called .env

load_dotenv(override=True)
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')

In [None]:
# Read in documents using LangChain's loaders
# Take everything in all the sub-folders of our knowledgebase

folders = glob.glob("knowledge-base/*")

# With thanks to CG and Jon R, students on the course, for this fix needed for some users 
text_loader_kwargs = {'encoding': 'utf-8'}
# If that doesn't work, some Windows users might need to uncomment the next line instead
# text_loader_kwargs={'autodetect_encoding': True}

documents = []
for folder in folders:
    doc_type = os.path.basename(folder)
    loader = DirectoryLoader(folder, glob="**/*.md", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
    folder_docs = loader.load()
    for doc in folder_docs:
        doc.metadata["doc_type"] = doc_type
        documents.append(doc)

In [None]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)

In [None]:
len(chunks)

In [None]:
doc_types = set(chunk.metadata['doc_type'] for chunk in chunks)
print(f"Document types found: {', '.join(doc_types)}")

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
  <h5 style="margin-bottom: 0.2em; font-size: 15px;"><b>A Sidenote on Embeddings and Auto-Encoding LLMs</b></h5>
  We will be mapping each chunk of text into a vector that captures its semantic meaning — this is called an <b>embedding</b>.<br>

  OpenAI provides an embedding model that we'll use through their API, integrated with LangChain code.<br>

  This model is an example of an <b>Auto-Encoding LLM</b>, which processes the entire input to produce a single embedding vector. This differs from <b>Auto-Regressive LLMs</b> (like GPT), which generate output token by token based on prior context.<br>

  A well-known auto-encoding model is <b>BERT</b> by Google. In addition to producing embeddings, such models are widely used for tasks like classification and entity recognition.<br>

  <!-- <b>Sidenote:</b> In <b>Week 8</b>, we’ll revisit RAG and embeddings, using an open-source vector encoder locally so that no data is sent to external APIs — a key requirement in many enterprise deployments. -->
</div>


In [None]:
# Put the chunks of data into a Vector Store that associates a Vector Embedding with each chunk
# Chroma is a popular open source Vector Database based on SQLLite

embeddings = OpenAIEmbeddings()

# Create vectorstore

# BEFORE
# vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=db_name)

# AFTER
vectorstore = FAISS.from_documents(chunks, embedding=embeddings)

total_vectors = vectorstore.index.ntotal
dimensions = vectorstore.index.d

print(f"There are {total_vectors} vectors with {dimensions:,} dimensions in the vector store")

In [None]:
# Prework
vectors = []
documents = []
doc_types = []
colors = []
color_map = {'products':'blue', 'employees':'green', 'contracts':'red', 'company':'orange'}

for i in range(total_vectors):
    vectors.append(vectorstore.index.reconstruct(i))
    doc_id = vectorstore.index_to_docstore_id[i]
    document = vectorstore.docstore.search(doc_id)
    documents.append(document.page_content)
    doc_type = document.metadata['doc_type']
    doc_types.append(doc_type)
    colors.append(color_map[doc_type])
    
vectors = np.array(vectors)

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
  <h5 style="margin-bottom: 0.2em;"><b>Visualizing the Vector Store</b></h5>
  Let's take a moment to examine the documents and their corresponding embedding vectors to better understand how the system organizes and retrieves information.<br>

  <i>Sidenote:</i> What we’re really visualizing here is the <b>distribution of vector embeddings</b> generated by <code>OpenAIEmbeddings</code> and retrieved from the <code>FAISS</code> index. Naturally, these visualizations will appear identical whether the vectors are stored in <b>FAISS</b> or <b>Chroma</b>—since the underlying embeddings remain the same.
</div>


In [None]:
# We humans find it easier to visalize things in 2D!
# Reduce the dimensionality of the vectors to 2D using t-SNE
# (t-distributed stochastic neighbor embedding)

tsne = TSNE(n_components=2, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)

# Create the 2D scatter plot
fig = go.Figure(data=[go.Scatter(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='2D FAISS Vector Store Visualization',
    scene=dict(xaxis_title='x',yaxis_title='y'),
    width=800,
    height=600,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

In [None]:
# Let's try 3D!

tsne = TSNE(n_components=3, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)

# Create the 3D scatter plot
fig = go.Figure(data=[go.Scatter3d(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    z=reduced_vectors[:, 2],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='3D FAISS Vector Store Visualization',
    scene=dict(xaxis_title='x', yaxis_title='y', zaxis_title='z'),
    width=900,
    height=700,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
<h5 style="margin-bottom: 0.3em; font-size: 14px;">Now it’s time to use <b>LangChain</b> to bring everything together</h5>
</div>


In [None]:
llm = ChatOpenAI(temperature=0.7, model_name=MODEL)
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

# the retriever is an abstraction over the VectorStore that will be used during RAG
retriever = vectorstore.as_retriever()

# putting it together: set up the conversation chain with the GPT 3.5 LLM, the vector store and memory
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)

In [None]:
query = "Can you describe Insurellm in a few sentences"
result = conversation_chain.invoke({"question":query})
print(result["answer"])

In [None]:
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
  <h5 style="margin-bottom: 0.2em;"><b>Bringing It to Life with Gradio</b></h5>
  Now we will bring this up in <b>Gradio</b> using the <code>ChatInterface</code> — a quick and easy way to prototype a conversational interface with an LLM.
</div>


In [None]:
def chat(message, history):
    result = conversation_chain.invoke({"question": message})
    return result["answer"]

In [None]:
view = gr.ChatInterface(chat, type="messages").launch(inbrowser=True)

<div style="font-size: 14px; line-height: 1.5;">
  <h4 style="margin-bottom: 0.5em;"><b>Comparison: Chroma vs FAISS (Facebook AI Similarity Search)</b></h4>
  <table style="border-collapse: collapse; width: 100%; font-size: 14px;">
    <thead>
      <tr>
        <th style="border: 1px solid #ccc; padding: 8px;">Criteria</th>
        <th style="border: 1px solid #ccc; padding: 8px;">Chroma</th>
        <th style="border: 1px solid #ccc; padding: 8px;">FAISS</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td style="border: 1px solid #ccc; padding: 8px;">Origin</td>
        <td style="border: 1px solid #ccc; padding: 8px;">Open-source, developed by the ChromaDB team</td>
        <td style="border: 1px solid #ccc; padding: 8px;">Open-source, developed by Facebook AI Research (FAIR)</td>
      </tr>
      <tr>
        <td style="border: 1px solid #ccc; padding: 8px;">Supported OS</td>
        <td style="border: 1px solid #ccc; padding: 8px;">Cross-platform (Windows, Linux, macOS)</td>
        <td style="border: 1px solid #ccc; padding: 8px;">Linux, macOS (GPU supported); Windows supports only CPU (GPU setup is difficult)</td>
      </tr>
      <tr>
        <td style="border: 1px solid #ccc; padding: 8px;">Installation</td>
        <td style="border: 1px solid #ccc; padding: 8px;"><code>pip install chromadb</code> or via LangChain</td>
        <td style="border: 1px solid #ccc; padding: 8px;"><code>pip install faiss-cpu</code> (for CPU) or <code>faiss-gpu</code> (GPU, Linux/macOS only)</td>
      </tr>
      <tr>
        <td style="border: 1px solid #ccc; padding: 8px;">LangChain Integration</td>
        <td style="border: 1px solid #ccc; padding: 8px;">Yes, via <code>langchain_chroma.Chroma</code></td>
        <td style="border: 1px solid #ccc; padding: 8px;">Yes, via <code>langchain.vectorstores.FAISS</code></td>
      </tr>
      <tr>
        <td style="border: 1px solid #ccc; padding: 8px;">Data Storage</td>
        <td style="border: 1px solid #ccc; padding: 8px;">Supports disk persistence, multiple collections</td>
        <td style="border: 1px solid #ccc; padding: 8px;">In-memory; can save/load FAISS binary files</td>
      </tr>
      <tr>
        <td style="border: 1px solid #ccc; padding: 8px;">Scalability</td>
        <td style="border: 1px solid #ccc; padding: 8px;">Good for small to medium apps, can be used in production</td>
        <td style="border: 1px solid #ccc; padding: 8px;">Very strong for large-scale data, optimized for high-performance search</td>
      </tr>
      <tr>
        <td style="border: 1px solid #ccc; padding: 8px;">GPU Support</td>
        <td style="border: 1px solid #ccc; padding: 8px;">❌ Not supported</td>
        <td style="border: 1px solid #ccc; padding: 8px;">✅ Supported (Linux/macOS with CUDA + faiss-gpu)</td>
      </tr>
      <tr>
        <td style="border: 1px solid #ccc; padding: 8px;">Search API</td>
        <td style="border: 1px solid #ccc; padding: 8px;">Easy to use, supports filters, metadata, various queries</td>
        <td style="border: 1px solid #ccc; padding: 8px;">Optimized for vector search (nearest neighbor), lacks advanced filter/metadata support</td>
      </tr>
      <tr>
        <td style="border: 1px solid #ccc; padding: 8px;">Key Features</td>
        <td style="border: 1px solid #ccc; padding: 8px;">- Simple to use<br>- Good metadata management<br>- Filter support</td>
        <td style="border: 1px solid #ccc; padding: 8px;">- Extremely fast vector search<br>- Supports many ANN algorithms<br>- GPU support available</td>
      </tr>
      <tr>
        <td style="border: 1px solid #ccc; padding: 8px;">Compatibility</td>
        <td style="border: 1px solid #ccc; padding: 8px;">Good with Windows, ideal for learning/lab</td>
        <td style="border: 1px solid #ccc; padding: 8px;">Not recommended for GPU use on Windows; best on Linux for large-scale production</td>
      </tr>
      <tr>
        <td style="border: 1px solid #ccc; padding: 8px;">Use Cases</td>
        <td style="border: 1px solid #ccc; padding: 8px;">Prototyping, demos, small to medium apps, filter-required cases</td>
        <td style="border: 1px solid #ccc; padding: 8px;">Large-scale production, millions of vectors, high-speed requirements</td>
      </tr>
      <tr>
        <td style="border: 1px solid #ccc; padding: 8px;">Drawbacks</td>
        <td style="border: 1px solid #ccc; padding: 8px;">Not optimized for huge datasets, lacks GPU support</td>
        <td style="border: 1px solid #ccc; padding: 8px;">Complex GPU setup on Windows, weak metadata/filter handling</td>
      </tr>
    </tbody>
  </table>
</div>


<br>

<br>

#### <code>**day5.ipynb**</code>

<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
<b style="font-size: 16px;">Abstract:</b> Build a complete <b>Gradio chat interface</b> to interact with the given knowledge base. This final integration delivers a low-cost, fully interactive Q&A assistant suitable for enterprise use.
</div>


<div style="font-size: 14px; line-height: 1.5; margin: 0; padding: 0;">
<b>Note:</b> The following model uses the <code>test-base</code> knowledge base.
</div>


In [None]:
import os
import glob

from dotenv import load_dotenv
import gradio as gr
import matplotlib.pyplot as plt
import numpy as np
from sklearn.manifold import TSNE
import plotly.graph_objects as go

from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.schema import Document
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from langchain.embeddings import HuggingFaceEmbeddings

In [None]:
MODEL = "gpt-4o-mini"
db_name = "vector_db"

In [None]:
load_dotenv(override=True)
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')

In [None]:
system_message = (
    "You are an expert in answering accurate questions about the knowledge base. "
    "Always answer in English. If you don't know the answer, say so. "
    "Do not make anything up if you haven't been provided with relevant context."
)

In [None]:
folders = glob.glob("test-base/*")

def add_metadata(doc, doc_type):
    doc.metadata["doc_type"] = doc_type
    return doc

text_loader_kwargs = {'encoding': 'utf-8'}

documents = []
for folder in folders:
    doc_type = os.path.basename(folder)
    loader = DirectoryLoader(folder, glob="**/*.md", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
    folder_docs = loader.load()
    documents.extend([add_metadata(doc, doc_type) for doc in folder_docs])

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)

print(f"Total number of chunks: {len(chunks)}")
print(f"Document types found: {set(doc.metadata['doc_type'] for doc in documents)}")

In [None]:
embeddings = OpenAIEmbeddings()

if os.path.exists(db_name):
    Chroma(persist_directory=db_name, embedding_function=embeddings).delete_collection()

vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=db_name)
print(f"Vectorstore created with {vectorstore._collection.count()} documents")

In [None]:
# Let's investigate the vectors

collection = vectorstore._collection
count = collection.count()

embedding_result = collection.get(limit=1, include=["embeddings"])
embeddings_array = embedding_result.get("embeddings")
if embeddings_array is not None and len(embeddings_array) > 0 and embeddings_array[0] is not None:
	sample_embedding = embeddings_array[0]
	dimensions = len(sample_embedding)
	print(f"There are {count:,} vectors with {dimensions:,} dimensions in the vector store")
else:
	print("No embeddings found in the collection.")

In [None]:
# Prework (with thanks to Jon R for identifying and fixing a bug in this!)

result = collection.get(include=['embeddings', 'documents', 'metadatas'])
vectors = np.array(result['embeddings'])
documents = result['documents']
metadatas = result['metadatas']
doc_types = [metadata['doc_type'] for metadata in metadatas if metadata is not None]
color_map = {
	'algorithms': 'blue',
	'applications': 'green',
	'datasets': 'red',
	'researchers': 'orange'
}
colors = [color_map.get(t, 'gray') for t in doc_types if t is not None]

In [None]:
tsne = TSNE(n_components=2, random_state=42, perplexity=2)
reduced_vectors = tsne.fit_transform(vectors)
fig = go.Figure(data=[go.Scatter(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])
fig.update_layout(
    title='2D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x', yaxis_title='y'),
    width=800,
    height=600,
    margin=dict(r=20, b=10, l=10, t=40)
)
fig.show()

In [None]:
tsne = TSNE(n_components=3, random_state=42, perplexity=2)
reduced_vectors = tsne.fit_transform(vectors)
fig = go.Figure(data=[go.Scatter3d(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    z=reduced_vectors[:, 2],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])
fig.update_layout(
    title='3D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x', yaxis_title='y', zaxis_title='z'),
    width=900,
    height=700,
    margin=dict(r=20, b=10, l=10, t=40)
)
fig.show()

In [None]:
llm = ChatOpenAI(temperature=0.7, model_name=MODEL)
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
retriever = vectorstore.as_retriever()
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)

In [None]:
query = "Who is Geoffrey Hinton?"
result = conversation_chain.invoke({"question": query})
print(result["answer"])

In [None]:
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)

In [None]:
def chat(question, history):
    result = conversation_chain.invoke({"question": question})
    return result["answer"]

In [None]:
# And in Gradio:

view = gr.ChatInterface(chat, type="messages").launch(inbrowser=True)

In [None]:
# Let's investigate what gets sent behind the scenes

from langchain_core.callbacks import StdOutCallbackHandler

llm = ChatOpenAI(temperature=0.7, model_name=MODEL)

memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

retriever = vectorstore.as_retriever()

conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory, callbacks=[StdOutCallbackHandler()])

query = "Who is Geoffrey Hinton?"
result = conversation_chain.invoke({"question": query})
answer = result["answer"]
print("\nAnswer:", answer)

In [None]:
# create a new Chat with OpenAI
llm = ChatOpenAI(temperature=0.7, model_name=MODEL)

# set up the conversation memory for the chat
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

# the retriever is an abstraction over the VectorStore that will be used during RAG; k is how many chunks to use
retriever = vectorstore.as_retriever(search_kwargs={"k": 25})

# putting it together: set up the conversation chain with the GPT 3.5 LLM, the vector store and memory
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)

In [None]:
def chat(question, history):
    result = conversation_chain.invoke({"question": question})
    return result["answer"]

In [None]:
view = gr.ChatInterface(chat, type="messages").launch(inbrowser=True)