# Context-Aware Chatbot Using LangChain or RAG

I am using a wikipedia corpus to create a context-aware chatbot. To make it fully aware of the user context and reliably response to queries we need a good llm, tokenizer and vectordb, but due to resources limitations like GPUs, training time quota, the context aware chatbot won't work as expected.

Loading the dataset a stream, and only getting 200 of it due to limitations on tuning time.

In [1]:
from datasets import load_dataset

wikipedia_corpus = load_dataset(
    "wikimedia/wikipedia",
    "20231101.en",
    streaming=True
)

dataset = wikipedia_corpus["train"].take(200)

README.md: 0.00B [00:00, ?B/s]

Resolving data files:   0%|          | 0/41 [00:00<?, ?it/s]

printing 2 items from the corpus

In [2]:
for sample in dataset.take(2):
    print(sample["title"])
    print(sample["text"][:100])

Anarchism
Anarchism is a political philosophy and movement that is skeptical of all justifications for authori
Albedo
Albedo (; ) is the fraction of sunlight that is diffusely reflected by a body. It is measured on a s


list if langchain libraries are installed

In [3]:
!pip list | grep "langchain"

langchain                                1.2.0
langchain-core                           1.2.1


installing text splitter, chromadb and langgraph

In [4]:
!pip install langchain-text-splitters langchain-community chromadb langgraph --quiet

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m51.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.7/21.7 MB[0m [31m84.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m278.2/278.2 kB[0m [31m20.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m81.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m51.2 MB/s[0m eta [36m0:00:00[0

convert the corpus to langchain document to further process

In [5]:
from langchain_core.documents import Document

documents = [
    Document(
        page_content=row["text"],
        metadata={"title": row["title"]}
    )
    for row in dataset
]

splitting text with recursive splitter with chunks of 500, the chunk size is 500 characters, it is not decided based on the data size and type. The semantic splitter will be very good but it is time and resource consuming.

In [6]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100
)

chunks = text_splitter.split_documents(documents)

In [10]:
len(chunks)

19005

downloading the embeddings model from huggingface

In [7]:
from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

  embeddings = HuggingFaceEmbeddings(


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

using chromadb to store vectors, as FAISS is not supported on TPUs on google colab, so I choose chromadb for storing vectors.

In [8]:
from langchain_community.vectorstores import Chroma

# Build the vector store using Chroma
vectorstore = Chroma.from_documents(
    documents=chunks,   # your pre-chunked documents
    embedding=embeddings
)

# Create retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

downloading llm

In [9]:
from langchain_community.llms import HuggingFacePipeline
from transformers import pipeline

hf_pipeline = pipeline("text2text-generation", model="google/flan-t5-base", max_length=512)
llm = HuggingFacePipeline(pipeline=hf_pipeline)

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cuda:0
  llm = HuggingFacePipeline(pipeline=hf_pipeline)


In [12]:
import langchain
print(langchain.__version__)

1.2.0


In [13]:
from langchain_core.runnables import RunnableWithMessageHistory
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

designing a very simple template, this template should be written based on the chatbot requirements or its applications

In [42]:
from langchain_core.prompts import PromptTemplate

prompt = PromptTemplate.from_template(
    """You are a helpful assistant.

Conversation so far:
{chat_history}

Context:
{context}

Question:
{question}

Answer:"""
)


below functions formats the retriever for the rag chain

In [43]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [48]:
def format_history(messages):
    return "\n".join(
        f"{m.type}: {m.content}" for m in messages
    )

In [49]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.runnables import (
    RunnablePassthrough,
    RunnableWithMessageHistory,
    RunnableLambda,
)
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

rag_chain = (
    {
        "context": itemgetter("question") | retriever | RunnableLambda(format_docs),
        "question": itemgetter("question"),
        "chat_history": itemgetter("chat_history") | RunnableLambda(format_history),
    }
    | prompt
    | llm
)

In [50]:
store = {}

def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

In [51]:
rag_with_memory = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="chat_history",
)

## testing

below is the same user asking two follow up questions, it might not be very context awared because of resources limitations.

In [54]:
rag_with_memory.invoke(
    {"question": "What is Anarchism words"},
    config={"configurable": {"session_id": "user1"}}
)

'Anarchism is a political philosophy and movement that is skeptical of all justifications for authority and seeks to abolish the institutions it claims maintain unnecessary coercion and hierarchy, typically including nation-states, and capitalism.'

In [55]:
rag_with_memory.invoke(
    {"question": "Summarize it"},
    config={"configurable": {"session_id": "user1"}}
)

'ai: This article is a plot summary.'