## **Behind the curtain of LangChain and What if RAG fails ??**

In [1]:
import os
import glob
from dotenv import load_dotenv
import gradio as gr

In [2]:
from langchain.document_loaders import TextLoader, DirectoryLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.schema import Document
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_chroma import Chroma
import numpy as np
from sklearn.manifold import TSNE
import plotly.graph_objects as go
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain


In [3]:
load_dotenv(override=True)

MODEL = 'gpt-4o-mini'
db_name = "vector_db"

In [4]:
# Step 1. Loading the documents using LangChain loaders

text_loader_kwargs = {'encoding': 'utf-8'}
documents = []

folders = glob.glob("knowledge-base/*")
for folder in folders:
    doc_type = os.path.basename(folder)
    loader = DirectoryLoader(folder, glob="**/*.md", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
    folder_docs = loader.load()
    for doc in folder_docs:
        doc.metadata['doc_type'] = doc_type
        documents.append(doc)

In [5]:
# Step 2. Chunking

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)

Created a chunk of size 1088, which is longer than the specified 1000


In [6]:
# Step 3. Embedding and putting the chunks in Vector Database

embeddings = OpenAIEmbeddings()

# is db already exists, delete it 
if os.path.exists(db_name):
    Chroma(persist_directory=db_name, embedding_function=embeddings).delete_collection()

# Create Vector Store
vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=db_name)
print(f"Vector DB Created with {vectorstore._collection.count()} documents")



Vector DB Created with 123 documents


In [7]:
# collecting one documents and finding how many dimensions it has !!

collection = vectorstore._collection
sample_embedding = collection.get(limit=1, include=["embeddings"])["embeddings"][0]
dimensions = len(sample_embedding)
print(f"The vectors have {dimensions} dimensions.")

The vectors have 1536 dimensions.


In [8]:
# The final step 

# Create a new chat model with OpenAI
llm = ChatOpenAI(temperature=0.7, model=MODEL)

# set up the conversation memory for the chat
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

# The retriever is an abstraction over the vector store that will be used during RAG
retriever = vectorstore.as_retriever()

# putting it all together: set up the conversation chain with the gpt-4.1-nano llm, vector store and memory
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)

  memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)


In [10]:
def chat(query, history):
    result = conversation_chain.invoke({'question': query})
    return result['answer']

gr.ChatInterface(fn=chat, type="messages").launch()

* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.




## **RAG FAILED !!!**

> When I asked the llm, "Who received the prestigious IIOTY award in 2023 ?", it said "I don't know". 
> But, it is clearly mentioned in the employees folder, inside the HR records of Maxine Thompson that she received the IIOTY 2023 award.

> So, we are gonna see what chunks were provided to the llm for context... using `Callbacks` in LangChain

In [12]:
from langchain_core.callbacks import StdOutCallbackHandler

llm = ChatOpenAI(temperature=0.7, model = MODEL)

memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

retriever = vectorstore.as_retriever()

conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory, callbacks=[StdOutCallbackHandler()])


query = "Who received the prestigious IIOTY 2023 award in 2023?"
result = conversation_chain.invoke({'question': query})
answer = result['answer']
print("ANSWER: ", answer)




[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: Use the following pieces of context to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
## Annual Performance History
- **2020:**  
  - Completed onboarding successfully.  
  - Met expectations in delivering project milestones.  
  - Received positive feedback from the team leads.

- **2021:**  
  - Achieved a 95% success rate in project delivery timelines.  
  - Awarded "Rising Star" at the annual company gala for outstanding contributions.  

- **2022:**  
  - Exceeded goals by optimizing existing backend code, improving system performance by 25%.  
  - Conducted training sessions for junior developers, fostering knowledge sharing.  

- **2023:**  
  - Led a major overhaul of the API internal ar

### Mitigation Strats !!

#### 1. How many chunks to use ?

In [15]:
# The final step 

# Create a new chat model with OpenAI
llm = ChatOpenAI(temperature=0.7, model=MODEL)

# set up the conversation memory for the chat
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

# The retriever is an abstraction over the vector store that will be used during RAG; k-relevant chunks will be provided in context
retriever = vectorstore.as_retriever(search_kwargs={'k': 25})

# putting it all together: set up the conversation chain with the gpt-4.1-nano llm, vector store and memory
conversation_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever, memory=memory)

In [14]:
def chat(query, history):
    result = conversation_chain.invoke({'question': query})
    return result['answer']

gr.ChatInterface(fn=chat, type="messages").launch()

* Running on local URL:  http://127.0.0.1:7861
* To create a public link, set `share=True` in `launch()`.




#### 2. Look using callbacks that what chunks are being provided to the model

#### 3. Size of chunks and overlap size

#### 4. Passing the documents as whole instead of chunks (never think of it)