# O que é RAG (Retrieval-Augmented Generation)?

RAG é uma técnica para melhorar as respostas de um LLM adicionando informações externas, confiáveis e atualizadas no momento da pergunta.
<br>
<br>A ideia central:

Em vez de depender apenas da memória do modelo, você o alimenta com o conhecimento certo, na hora certa.

## O que vimos até então, técnicas simples para melhorar prompts:

• Multi-shot prompting
<br>Você dá vários exemplos no prompt para ensinar o modelo como quer que ele responda.

• Uso de ferramentas
<br>O modelo chama APIs, bancos de dados, funções — ampliando suas capacidades.

• Contexto adicional
<br>Você fornece textos, regras ou dados relevantes diretamente dentro do prompt.

Essas técnicas funcionam, mas não escalam bem quando você precisa lidar com muito conteúdo ou com conhecimento que muda com frequência.

# Como chegamos ao RAG

Se você quiser que o LLM responda com conhecimento preciso, atualizado e especializado, o primeiro passo é:
Construir um repositório organizado de informação — uma knowledge base.

A knowledge base pode estar em:

<br>• base de documentos
<br>• banco vetorial
<br>• banco relacional
<br>• knowledge graph
<br>• combinação de tudo

Por que RAG é tão relevante hoje em dia?

<br>• Reduz alucinação
<br>• Garante que respostas venham dos dados reais da organização
<br>• Permite atualizações instantâneas no conhecimento
<br>• Escala bem com grandes volumes de documentos
<br>• Funciona em todo tipo de domínio (legal, financeiro, engenharia, suporte, etc.)

In [68]:
# imports

import os
import glob
from dotenv import load_dotenv
import gradio as gr
from openai import OpenAI

In [70]:
# price is a factor for our company, so we're going to use a low cost model

MODEL = "gpt-4o-mini"

In [71]:
# Load environment variables in a file called .env

load_dotenv(override=True)
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')
openai = OpenAI()

In [72]:
context = {}

employees = glob.glob("knowledge-base/employees/*")

for employee in employees:
    name = employee.split(' ')[-1][:-3]
    doc = ""
    with open(employee, "r", encoding="utf-8") as f:
        doc = f.read()
    context[name]=doc

In [73]:
context["Lancaster"]

"# Avery Lancaster\n\n## Summary\n- **Date of Birth**: March 15, 1985  \n- **Job Title**: Co-Founder & Chief Executive Officer (CEO)  \n- **Location**: San Francisco, California  \n\n## Insurellm Career Progression\n- **2015 - Present**: Co-Founder & CEO  \n  Avery Lancaster co-founded Insurellm in 2015 and has since guided the company to its current position as a leading Insurance Tech provider. Avery is known for her innovative leadership strategies and risk management expertise that have catapulted the company into the mainstream insurance market.  \n\n- **2013 - 2015**: Senior Product Manager at Innovate Insurance Solutions  \n  Before launching Insurellm, Avery was a leading Senior Product Manager at Innovate Insurance Solutions, where she developed groundbreaking insurance products aimed at the tech sector.  \n\n- **2010 - 2013**: Business Analyst at Edge Analytics  \n  Prior to joining Innovate, Avery worked as a Business Analyst, focusing on market trends and consumer preferenc

In [74]:
context

{'Carter': '# HR Record\n\n# Emily Carter\n\n## Summary\n- **Date of Birth:** August 12, 1990  \n- **Job Title:** Account Executive  \n- **Location:** Austin, Texas  \n\n## Insurellm Career Progression\n- **2021-Present:** Account Executive  \n  - Responsibilities include managing a portfolio of B2B clients, conducting sales presentations, and ensuring customer satisfaction.  \n  - Achievements: \n    - Exceeded annual sales target by 30% in 2022.\n    - Instrumental in acquiring 15 new corporate clients in half a year.\n  \n- **2019-2021:** Sales Coordinator  \n  - Supported the sales team with administrative tasks, lead generation, and customer follow-ups.\n  - Achievements:\n    - Implemented a new lead tracking system that improved workflow efficiency by 25%.  \n    - Received "Employee of the Month" award twice for outstanding contribution to team goals.  \n\n- **2017-2019:** Marketing Intern  \n  - Assisted with market research and campaign development for social media outreach. 

In [75]:
products = glob.glob("knowledge-base/products/*")

for product in products:
    name = product.split(os.sep)[-1][:-3]
    doc = ""
    with open(product, "r", encoding="utf-8") as f:
        doc = f.read()
    context[name]=doc

In [76]:
context.keys()

dict_keys(['Carter', 'Harper', 'Bishop', 'Trenton', 'Blake', 'Greene', 'Thomson', 'Tran', 'Chen', 'Lancaster', 'Spencer', 'Thompson', 'Markellm', 'Carllm', 'Homellm', 'Rellm'])

In [77]:
system_message = "You are an expert in answering accurate questions about Insurellm, the Insurance Tech company. Give brief, accurate answers. If you don't know the answer, say so. Do not make anything up if you haven't been provided with relevant context."

In [79]:
def get_relevant_context(message):
    relevant_context = []
    for context_title, context_details in context.items():
        if context_title.lower() in message.lower():
            relevant_context.append(context_details)
    return relevant_context          

In [78]:
for context_title, context_details in context.items():
    print(context_title)
    print(context_details)

Carter
# HR Record

# Emily Carter

## Summary
- **Date of Birth:** August 12, 1990  
- **Job Title:** Account Executive  
- **Location:** Austin, Texas  

## Insurellm Career Progression
- **2021-Present:** Account Executive  
  - Responsibilities include managing a portfolio of B2B clients, conducting sales presentations, and ensuring customer satisfaction.  
  - Achievements: 
    - Exceeded annual sales target by 30% in 2022.
    - Instrumental in acquiring 15 new corporate clients in half a year.
  
- **2019-2021:** Sales Coordinator  
  - Supported the sales team with administrative tasks, lead generation, and customer follow-ups.
  - Achievements:
    - Implemented a new lead tracking system that improved workflow efficiency by 25%.  
    - Received "Employee of the Month" award twice for outstanding contribution to team goals.  

- **2017-2019:** Marketing Intern  
  - Assisted with market research and campaign development for social media outreach.  
  - Achievements:  
    - 

In [80]:
get_relevant_context("Who is lancaster?")

["# Avery Lancaster\n\n## Summary\n- **Date of Birth**: March 15, 1985  \n- **Job Title**: Co-Founder & Chief Executive Officer (CEO)  \n- **Location**: San Francisco, California  \n\n## Insurellm Career Progression\n- **2015 - Present**: Co-Founder & CEO  \n  Avery Lancaster co-founded Insurellm in 2015 and has since guided the company to its current position as a leading Insurance Tech provider. Avery is known for her innovative leadership strategies and risk management expertise that have catapulted the company into the mainstream insurance market.  \n\n- **2013 - 2015**: Senior Product Manager at Innovate Insurance Solutions  \n  Before launching Insurellm, Avery was a leading Senior Product Manager at Innovate Insurance Solutions, where she developed groundbreaking insurance products aimed at the tech sector.  \n\n- **2010 - 2013**: Business Analyst at Edge Analytics  \n  Prior to joining Innovate, Avery worked as a Business Analyst, focusing on market trends and consumer preferen

In [81]:
get_relevant_context("Who is Avery and what is carllm?")

['# Product Summary\n\n# Carllm\n\n## Summary\n\nCarllm is an innovative auto insurance product developed by Insurellm, designed to streamline the way insurance companies offer coverage to their customers. Powered by cutting-edge artificial intelligence, Carllm utilizes advanced algorithms to deliver personalized auto insurance solutions, ensuring optimal coverage while minimizing costs. With a robust infrastructure that supports both B2B and B2C customers, Carllm redefines the auto insurance landscape and empowers insurance providers to enhance customer satisfaction and retention.\n\n## Features\n\n- **AI-Powered Risk Assessment**: Carllm leverages artificial intelligence to analyze driver behavior, vehicle conditions, and historical claims data. This enables insurers to make informed decisions and set competitive premiums that reflect true risk profiles.\n\n- **Instant Quoting**: With Carllm, insurance companies can offer near-instant quotes to customers, enhancing the customer exper

In [82]:
def add_context(message):
    relevant_context = get_relevant_context(message)
    if relevant_context:
        message += "\n\nThe following additional context might be relevant in answering this question:\n\n"
        for relevant in relevant_context:
            message += relevant + "\n\n"
    return message

In [83]:
print(add_context("Who is Alex Lancaster?"))

Who is Alex Lancaster?

The following additional context might be relevant in answering this question:

# Avery Lancaster

## Summary
- **Date of Birth**: March 15, 1985  
- **Job Title**: Co-Founder & Chief Executive Officer (CEO)  
- **Location**: San Francisco, California  

## Insurellm Career Progression
- **2015 - Present**: Co-Founder & CEO  
  Avery Lancaster co-founded Insurellm in 2015 and has since guided the company to its current position as a leading Insurance Tech provider. Avery is known for her innovative leadership strategies and risk management expertise that have catapulted the company into the mainstream insurance market.  

- **2013 - 2015**: Senior Product Manager at Innovate Insurance Solutions  
  Before launching Insurellm, Avery was a leading Senior Product Manager at Innovate Insurance Solutions, where she developed groundbreaking insurance products aimed at the tech sector.  

- **2010 - 2013**: Business Analyst at Edge Analytics  
  Prior to joining Innova

In [84]:
def chat(message, history):
    messages = [{"role": "system", "content": system_message}] + history
    message = add_context(message)
    messages.append({"role": "user", "content": message})

    stream = openai.chat.completions.create(model=MODEL, messages=messages, stream=True)

    response = ""
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        yield response

In [85]:
view = gr.ChatInterface(chat, type="messages").launch()

* Running on local URL:  http://127.0.0.1:7862
* To create a public link, set `share=True` in `launch()`.


In [86]:
from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain_text_splitters import CharacterTextSplitter

LangChain é um framework para construir aplicações que usam modelos de linguagem (LLMs) de forma estruturada e modular.
Ele facilita a criação de sistemas mais complexos do que apenas “prompt → resposta”, permitindo combinar LLMs com ferramentas, bancos de dados, cadeias de processamento, agentes, RAG e integrações externas.

LangChain é útil para:

<br>• Orquestrar prompts e pipelines complexos.
<br>• Conectar LLMs a bases de dados usando RAG (retrieval-augmented generation).
<br>• Usar ferramentas (como busca, APIs, código, modelos adicionais).
<br>• Criar agentes que tomam decisões sobre quais ações executar.
<br>• Gerenciar memória e contexto conversacional.
<br>• Padronizar integrações com diversos provedores de LLM (OpenAI, Anthropic, Google, HuggingFace, etc.).

Ele virou um padrão de mercado porque simplifica muito o desenvolvimento de aplicações avançadas com IA, especialmente quando você precisa combinar LLM + lógica + dados + ferramentas.

Curiosidades: 
- Framework open-source criado em outubro de 2022.
- LangChain v1 lançado em outubro de 2025. https://blog.langchain.com/langchain-langgraph-1dot0/

Prós

- Acelera o desenvolvimento (time-to-market).
- Forte adoção em empresas.
- Alto ecossistema (loaders, chains, templates).

Contras

- APIs de LLM estão cada vez mais simples e padronizadas, reduzindo a necessidade de uma camada grande.
- LangChain ainda é uma abstração pesada comparada a LiteLLM.
- Partes do código são consideradas “legacy” e pouco performáticas.

In [87]:
db_name = "vector_db"

In [88]:
# Read in documents using LangChain's loaders
# Take everything in all the sub-folders of our knowledgebase

folders = glob.glob("knowledge-base/*")

text_loader_kwargs = {'encoding': 'utf-8'}

documents = []
for folder in folders:
    doc_type = os.path.basename(folder)
    loader = DirectoryLoader(folder, glob="**/*.md", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
    folder_docs = loader.load()
    for doc in folder_docs:
        doc.metadata["doc_type"] = doc_type
        documents.append(doc)

In [89]:
len(documents)

31

In [90]:
documents[24]

Document(metadata={'source': 'knowledge-base/employees/Avery Lancaster.md', 'doc_type': 'employees'}, page_content="# Avery Lancaster\n\n## Summary\n- **Date of Birth**: March 15, 1985  \n- **Job Title**: Co-Founder & Chief Executive Officer (CEO)  \n- **Location**: San Francisco, California  \n\n## Insurellm Career Progression\n- **2015 - Present**: Co-Founder & CEO  \n  Avery Lancaster co-founded Insurellm in 2015 and has since guided the company to its current position as a leading Insurance Tech provider. Avery is known for her innovative leadership strategies and risk management expertise that have catapulted the company into the mainstream insurance market.  \n\n- **2013 - 2015**: Senior Product Manager at Innovate Insurance Solutions  \n  Before launching Insurellm, Avery was a leading Senior Product Manager at Innovate Insurance Solutions, where she developed groundbreaking insurance products aimed at the tech sector.  \n\n- **2010 - 2013**: Business Analyst at Edge Analytics  

In [91]:
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)

Created a chunk of size 1088, which is longer than the specified 1000


In [92]:
len(chunks)

123

In [93]:
chunks[6]

Document(metadata={'source': 'knowledge-base/contracts/Contract with TechDrive Insurance for Carllm.md', 'doc_type': 'contracts'}, page_content='## Renewal\n\n1. **Automatic Renewal**: This contract shall automatically renew for additional one-year terms unless either party provides written notice of non-renewal at least 30 days prior to the contract expiration.\n   \n2. **Pricing Review**: The pricing for any renewal period shall be discussed 60 days prior to the end of the term and agreed upon in writing.\n\n## Features\n\n1. **Included Features**: Under the Professional Tier, TechDrive Insurance will have access to the following features of Carllm:\n   - AI-Powered Risk Assessment\n   - Instant Quoting\n   - Customizable Coverage Plans\n   - Fraud Detection\n   - Customer Insights Dashboard\n   - Mobile Integration\n   - Automated Customer Support\n\n2. **System Requirements**: TechDrive Insurance must ensure that their existing systems meet the technical requirements to integrate w

In [94]:
doc_types = set(chunk.metadata['doc_type'] for chunk in chunks)
print(f"Document types found: {', '.join(doc_types)}")

Document types found: products, company, employees, contracts


Chunking decides what your model “remembers” and how relevant your retrievals are.
Get it wrong, and even the best LLM will hallucinate.
Get it right, and your RAG becomes consistent, precise, and reliable.

Here are 10 practical chunking techniques 👇

1️⃣ Fixed Chunking
<br>→ Split text into equal-sized segments
<br>→ Simple and fast, best for structured or uniform content

2️⃣ Overlapping Chunking
<br>→ Include a few sentences from the previous chunk
<br>→ Preserves context and improves continuity across sections

3️⃣ Semantic Chunking
<br>→ Use embeddings to split text by meaning instead of length
<br>→ Ideal for unstructured data like blogs, articles, or transcripts

4️⃣ Advanced Semantic Chunking
<br>→ Add clustering or transformer-based segmentation for smarter splits
<br>→ Great for long, domain-heavy or multi-topic documents

5️⃣ Recursive Character Chunking
<br>→ Break text recursively until each chunk fits the model’s token limit
<br>→ Adapts dynamically to different model sizes and limits

6️⃣ Sentence-Based Chunking
<br>→ Split at sentence boundaries to preserve linguistic integrity
<br>→ Best for summarization and factual QA systems

7️⃣ Context-Enriched Chunking
<br>→ Add neighboring text (before and after) to enrich each chunk
<br>→ Improves recall and multi-step reasoning accuracy

8️⃣ Paragraph Chunking
<br>→ Use natural paragraph structure to form coherent retrieval units
<br>→ Works best for PDFs, reports, and research papers

9️⃣ Agentic Chunking
<br>→ Let an LLM agent decide chunk boundaries dynamically
<br>→ Perfect for topic-aware or intent-driven retrieval systems

🔟 Token-Based Chunking
<br>→ Split text precisely by token count for model alignment
<br>→ Maximizes context usage without exceeding token limits

https://www.linkedin.com/posts/naresh-edagotti-6a71a1233_chunking-techniques-ugcPost-7393839920022261760-W9NJ/?utm_source=share&utm_medium=member_desktop&rcm=ACoAACFA4lABpBtnDJID8j6ghljN0WD9SblPAks

In [95]:
for chunk in chunks:
    if 'CEO' in chunk.page_content:
        print(chunk)
        print("_________")

page_content='## Support

1. **Customer Support**: Velocity Auto Solutions will have access to Insurellm’s customer support team via email or chatbot, available 24/7.  
2. **Technical Maintenance**: Regular maintenance and updates to the Carllm platform will be conducted by Insurellm, with any downtime communicated in advance.  
3. **Training & Resources**: Initial training sessions will be provided for Velocity Auto Solutions’ staff to ensure effective use of the Carllm suite. Regular resources and documentation will be made available online.

---

**Accepted and Agreed:**  
**For Velocity Auto Solutions**  
Signature: _____________________  
Name: John Doe  
Title: CEO  
Date: _____________________  

**For Insurellm**  
Signature: _____________________  
Name: Jane Smith  
Title: VP of Sales  
Date: _____________________' metadata={'source': 'knowledge-base/contracts/Contract with Velocity Auto Solutions for Carllm.md', 'doc_type': 'contracts'}
_________
page_content='3. **Regular U

# Auto-Encoding vs Auto-Regressive LLMs

<br>Auto-regressive LLMs

- Preveem o próximo token com base apenas nos tokens anteriores.
- Funcionam “da esquerda para a direita”.
- Base dos modelos generativos (ex.: GPT, Llama).

<br>Auto-encoding LLMs
- Processam todo o input de uma vez.
- Focam em entender, não em gerar texto.
- Excelentes para classificação, extração, similaridade e criação de embeddings.
- Exemplos: BERT, OpenAIEmbeddings.

# Auto-Encoding LLMs e Embeddings
Aplicações principais

- Análise de sentimento
- Classificação de textos
- Similaridade semântica- 
- Conversão de texto → vetores (embeddings)

Propriedades dos embeddings

- Representam o “significado” matemático do input.
- Podem representar desde tokens, palavras, frases até documentos inteiros.
- Geralmente possuem centenas ou milhares de dimensões.
- Vetores semelhantes → significados semelhantes.
- Permitem operações semânticas como: King – Man + Woman ≈ Queen.

## Encoder / Embedding Models
### Função

- Converter texto em vetores.
- Esses vetores são armazenados em um vector database para busca semântica.

Modelos populares

- word2vec (2013)
- BERT (2018)
- OpenAI text-embedding-3-small / large
- Google gemini-embedding-001
- all-MiniLM-L6-v2 (HuggingFace/Sentence Transformers)

### Vector stores

#### Open-source

- Chroma
- Qdrant
- FAISS (in-memory, muito rápido)

#### Pagos e escaláveis

- Pinecone
- Weaviate

#### Mainstream databases com vetores

- Postgres
- MongoDB
- Elastic

Importante:

A escolha do vector database é independente da escolha do modelo de embedding.
Fatores decisivos: custo, performance, latência, escala.

Sendo assim, um fluxo típico de RAG seria:

- Carregar todos os documentos
- Dividir em chunks
- Gerar embeddings (encoder)
- Armazenar em um vector store

In [96]:
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_chroma import Chroma
import numpy as np
from sklearn.manifold import TSNE
import plotly.graph_objects as go

from langchain_huggingface import HuggingFaceEmbeddings


In [105]:
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
## embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

In [106]:
# Check if a Chroma Datastore already exists - if so, delete the collection to start from scratch

if os.path.exists(db_name):
    Chroma(persist_directory=db_name, embedding_function=embeddings).delete_collection()

In [107]:
# Chroma is a popular open source Vector Database based on SQLLite

vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=db_name)
print(f"Vectorstore created with {vectorstore._collection.count()} documents")

Vectorstore created with 123 documents


In [108]:
collection = vectorstore._collection
sample_embedding = collection.get(limit=1, include=["embeddings"])["embeddings"][0]
dimensions = len(sample_embedding)
print(f"The vectors have {dimensions:,} dimensions")

The vectors have 3,072 dimensions


## Visualizing the Vector Store

Let's take a minute to look at the documents and their embedding vectors to see what's going on.

In [109]:
# Prework

result = collection.get(include=['embeddings', 'documents', 'metadatas'])
vectors = np.array(result['embeddings'])
documents = result['documents']
doc_types = [metadata['doc_type'] for metadata in result['metadatas']]
colors = [['blue', 'green', 'red', 'orange'][['products', 'employees', 'contracts', 'company'].index(t)] for t in doc_types]

In [110]:
# We humans find it easier to visalize things in 2D!
# Reduce the dimensionality of the vectors to 2D using t-SNE
# (t-distributed stochastic neighbor embedding)

tsne = TSNE(n_components=2, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)

# Create the 2D scatter plot
fig = go.Figure(data=[go.Scatter(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='2D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x',yaxis_title='y'),
    width=800,
    height=600,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

In [111]:
# Let's try 3D!

tsne = TSNE(n_components=3, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)

# Create the 3D scatter plot
fig = go.Figure(data=[go.Scatter3d(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    z=reduced_vectors[:, 2],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='3D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x', yaxis_title='y', zaxis_title='z'),
    width=900,
    height=700,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

In [112]:
retriever = vectorstore.as_retriever()
llm = ChatOpenAI(temperature=0, model_name=MODEL)

## These LangChain objects implement the method invoke()

In [113]:
retriever.invoke("Who is Avery?")

[Document(id='4eff401b-24cf-436b-adc2-fddd7e83f971', metadata={'source': 'knowledge-base/employees/Avery Lancaster.md', 'doc_type': 'employees'}, page_content='# Avery Lancaster\n\n## Summary\n- **Date of Birth**: March 15, 1985  \n- **Job Title**: Co-Founder & Chief Executive Officer (CEO)  \n- **Location**: San Francisco, California  \n\n## Insurellm Career Progression\n- **2015 - Present**: Co-Founder & CEO  \n  Avery Lancaster co-founded Insurellm in 2015 and has since guided the company to its current position as a leading Insurance Tech provider. Avery is known for her innovative leadership strategies and risk management expertise that have catapulted the company into the mainstream insurance market.  \n\n- **2013 - 2015**: Senior Product Manager at Innovate Insurance Solutions  \n  Before launching Insurellm, Avery was a leading Senior Product Manager at Innovate Insurance Solutions, where she developed groundbreaking insurance products aimed at the tech sector.'),
 Document(id=

In [114]:
llm.invoke("Who is Avery?")

AIMessage(content='The name "Avery" can refer to various individuals, characters, or concepts depending on the context. It could be a first name or surname, and it is used by many people across different fields, including entertainment, sports, literature, and more. If you have a specific Avery in mind—such as a celebrity, fictional character, or historical figure—please provide more details, and I can give you more information!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 85, 'prompt_tokens': 11, 'total_tokens': 96, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_11f3029f6b', 'id': 'chatcmpl-Ckw1QAKdT027QFqW7D2tc7JOp30RY', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': No

## Time to put this together!

In [115]:
SYSTEM_PROMPT_TEMPLATE = """
You are a knowledgeable, friendly assistant representing the company Insurellm.
You are chatting with a user about Insurellm.
If relevant, use the given context to answer any question.
If you don't know the answer, say so.
Context:
{context}
"""

In [116]:
from langchain_core.messages import SystemMessage, HumanMessage

In [117]:
def answer_question(question: str, history):
    docs = retriever.invoke(question)
    context = "\n\n".join(doc.page_content for doc in docs)
    system_prompt = SYSTEM_PROMPT_TEMPLATE.format(context=context)
    response = llm.invoke([SystemMessage(content=system_prompt), HumanMessage(content=question)])
    return response.content

In [118]:
answer_question("Who is Averi Lancaster?", [])

'Avery Lancaster is the Co-Founder and Chief Executive Officer (CEO) of Insurellm, an insurance tech company she co-founded in 2015. Under her leadership, Insurellm has become a leading provider in the insurance technology sector, known for its innovative products and strategies. Avery has a background in product management and business analysis, having previously worked at Innovate Insurance Solutions and Edge Analytics. She is recognized for her commitment to diversity and inclusion, community engagement, and fostering a positive work environment.'

In [119]:
gr.ChatInterface(answer_question).launch()


The 'tuples' format for chatbot messages is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style 'role' and 'content' keys.



* Running on local URL:  http://127.0.0.1:7863
* To create a public link, set `share=True` in `launch()`.


