## Setup

Dependencies
We'll use OpenAI embeddings and a Chroma vector store in this walkthrough, but everything shown here works with any Embeddings, and VectorStore or Retriever.

We'll use the following packages:

In [None]:
%%capture --no-stderr
%pip install --upgrade --quiet  langchain langchain-community langchainhub langchain-chroma beautifulsoup4

In [None]:
!pip install -qU langchain-groq InstructorEmbedding sentence-transformers==2.2.2 pinecone-client langchain-pinecone langchain-cohere

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/86.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.8/244.8 kB[0m [31m14.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.0/44.0 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m39.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m222.4/222.4 kB[0m [31m11.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m106.5/106.5 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m206.9/206.9 kB[0m [31m14.3 MB/s[0m eta [36m0

In [1]:
#from google.colab import userdata
import getpass
import os
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
import bs4
from langchain import hub
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.document_loaders import DataFrameLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings, HuggingFaceBgeEmbeddings, HuggingFaceInstructEmbeddings
from langchain_pinecone import PineconeVectorStore

USER_AGENT environment variable not set, consider setting it to identify your requests.

  from tqdm.autonotebook import tqdm


In [3]:
import torch
model_name = "BAAI/bge-m3"
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_kwargs = {'device': device}
encode_kwargs = {'normalize_embeddings': True}
hf_embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

In [4]:
index_name = "tce-pe-idx"
vectorstore = PineconeVectorStore(index_name=index_name, embedding=hf_embeddings )
retriever = vectorstore.as_retriever(search_kwargs={"k": 25})

In [5]:
from langchain_groq import ChatGroq

llm = ChatGroq(model="llama-3.1-70b-versatile")

## How to add message history

Passing conversation state into and out a chain is vital when building a chatbot. The RunnableWithMessageHistory class lets us add message history to certain types of chains. It wraps another Runnable and manages the chat message history for it. Specifically, it loads previous messages in the conversation BEFORE passing it to the Runnable, and it saves the generated response as a message AFTER calling the runnable. This class also enables multiple conversations by saving each conversation with a session_id - it then expects a session_id to be passed in the config when calling the runnable, and uses that to look up the relevant conversation history.

In order to properly set this up there are two main things to consider:

1. How to store and load messages? (this is get_session_history in the example above)

2. What is the underlying Runnable you are wrapping and what are its inputs/outputs? (this is runnable in the example above, as well any additional parameters you pass to RunnableWithMessageHistory to align the inputs/outputs)
Let's walk through these pieces (and more) below.

## How to store and load messages

A key part of this is storing and loading messages. When constructing RunnableWithMessageHistory you need to pass in a get_session_history function. This function should take in a session_id and return a BaseChatMessageHistory object.

**What is session_id?**

session_id is an identifier for the session (conversation) thread that these input messages correspond to. This allows you to maintain several conversations/threads with the same chain at the same time.

**What is BaseChatMessageHistory?**

BaseChatMessageHistory is a class that can load and save message objects. It will be called by RunnableWithMessageHistory to do exactly that. These classes are usually initialized with a session id.

Let's create a get_session_history object to use for this example. To keep things simple, we will use a simple SQLiteMessage

In [6]:
from langchain_community.chat_message_histories import SQLChatMessageHistory

def get_session_history(session_id):
    return SQLChatMessageHistory(session_id, "sqlite:///memory.db")

## What is the runnable you are trying to wrap?

RunnableWithMessageHistory can only wrap certain types of Runnables. Specifically, it can be used for any Runnable that takes as input one of:

a sequence of BaseMessages
a dict with a key that takes a sequence of BaseMessages
a dict with a key that takes the latest message(s) as a string or sequence of BaseMessages, and a separate key that takes historical messages
And returns as output one of

a string that can be treated as the contents of an AIMessage
a sequence of BaseMessage
a dict with a key that contains a sequence of BaseMessage
Let's take a look at some examples to see how it works.

# Dictionary input, message(s) output


Besides just wrapping a raw model, the next step up is wrapping a prompt + LLM. This now changes the input to be a dictionary (because the input to a prompt is a dictionary). This adds two bits of complication.

First: a dictionary can have multiple keys, but we only want to save ONE as input. In order to do this, we now now need to specify a key to save as the input.

Second: once we load the messages, we need to know how to save them to the dictionary. That equates to know which key in the dictionary to save them in. Therefore, we need to specify a key to save the loaded messages in.

Putting it all together, that ends up looking something like:

In [18]:
from langchain_community.retrievers import TavilySearchAPIRetriever

tavily_retriever = TavilySearchAPIRetriever(k=3)

In [24]:


qa_system_prompt  = (
    "Você é um assistente virtual para perguntas e resposta que ajuda os usuários a encontrar informações"
    "relacionadas ao julgamento de contas do Tribunal de Contas."
    "O tribunal de contas é responsável por julgar as contas dos prefeitos"
    "e do governador do estado. Você é um assistente responsável por ajudar"
    "a retornar as respostas relacionadas os resultados dos jugalmento das"
    "contas públicas. Gere o texto em português brasileiro a partir do contexto"
    "\n\n"
    "{context}"
)


qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)


In [17]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank
from langchain_community.llms import Cohere


### Contextualize question ###
contextualize_q_system_prompt = """Dado um histórico de conversações e a última pergunta do usuário \
que pode fazer referência ao contexto no histórico de conversações, formule uma pergunta autónoma \
que pode ser entendida sem o histórico de conversação. NÃO responda à pergunta, \
apenas reformula-a se necessário e, caso contrário, devolve-a como está."""
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

compressor = CohereRerank(model="rerank-multilingual-v3.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

history_aware_retriever = create_history_aware_retriever(
    llm, ensemble_retriever, contextualize_q_prompt
)

In [19]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

qa_system_prompt  = (
    "Você é um assistente virtual para perguntas e resposta que ajuda os usuários a encontrar informações"
    "relacionadas ao julgamento de contas do Tribunal de Contas."
    "O tribunal de contas é responsável por julgar as contas dos prefeitos"
    "e do governador do estado. Você é um assistente responsável por ajudar"
    "a retornar as respostas relacionadas os resultados dos jugalmento das"
    "contas públicas. Gere o texto em português brasileiro a partir do contexto"
    "\n\n"
    "{context}"
)


qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

# def format_docs(docs):
#     return "\n\n".join(doc.page_content for doc in docs)

# rag_chain = (
#     {"context": history_aware_retriever | format_docs,
#      "input": RunnablePassthrough(),
#      "chat_history": get_session_history}
#     | qa_prompt
#     | llm
#     | StrOutputParser()
# )


In [20]:
from langchain_core.runnables.history import RunnableWithMessageHistory

conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

In [23]:
SESSION_ID = "1"

while True:
  print("-" * 100)
  userinput = str(input("Digite aqui: "))

  if userinput == "q": break

  response = rag_chain.invoke(
      {"input": userinput},
      config={"configurable": {"session_id": SESSION_ID}},
  )

  print("Resposta: ", response["answer"])

----------------------------------------------------------------------------------------------------


HTTPError: 400 Client Error: Bad Request for url: https://api.tavily.com/search

In [None]:
async def chat_tce(session_id = "1"):

  chunks = []

  while True:
    print("-" * 100)
    userinput = str(input("Digite aqui: "))

    if userinput == "q": break

    async for chunk in conversational_rag_chain.astream("hello. tell me something about yourself"):
          chunks.append(chunk)
          print(chunk.content, end="|", flush=True)

    print("Resposta: ", response["answer"])

In [None]:
import asyncio

asyncio.run(chat_tce())


In [None]:
loop = asyncio.get_event_loop()
loop.run_until_complete(chat_tce())