Project Name: Buffett-Wisdom RAG (Retrieval-Augmented Generation)

Student Name: Abdullah Rayyan

System ID: 2023001064

Course: B.Tech CSE (3rd Year), Sharda University

Subject: AI / Machine Learning Project


In [None]:
# Install LangChain, OpenAI (for LLM), and ChromaDB (Vector Store)
!pip install -q langchain langchain-community langchain-openai chromadb pypdf tiktoken


In [None]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# 1. Load the PDF (Make sure the filename matches what you uploaded)
loader = PyPDFLoader("2023ltr.pdf")
data = loader.load()

# 2. Split the text into chunks
# Assignment Requirement: "Reason for chosen strategy"
# We use RecursiveCharacterTextSplitter because it tries to keep paragraphs
# and sentences together, preserving the context better than a simple split.
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100
)
chunks = text_splitter.split_documents(data)

print(f"Total chunks created: {len(chunks)}")

In [None]:
!pip install -q langchain-text-splitters

Objective: To develop a robust RAG system capable of extracting and synthesizing financial insights from dense corporate documents. Specifically, this application parses Warren Buffettâ€™s annual shareholder letters to provide accurate, context-aware answers to complex investment queries.

The Challenge: Traditional LLMs often suffer from "hallucinations" or lack up-to-date specific data. By implementing RAG, we ground the model's responses in a specific knowledge base (PDF), ensuring factual accuracy and reducing computational costs.

In [None]:
from langchain_community.document_loaders import PyPDFLoader
# This is the updated import path
from langchain_text_splitters import RecursiveCharacterTextSplitter

# 1. Load the PDF
# IMPORTANT: Ensure '2023ltr.pdf' is actually uploaded in the folder icon on the left!
loader = PyPDFLoader("2023ltr.pdf")
data = loader.load()

# 2. Split the text into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=100
)
chunks = text_splitter.split_documents(data)

print(f"Total chunks created: {len(chunks)}")

In [None]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

# 1. Initialize the Embedding Model
# Requirement: "Embedding model used" -> OpenAI text-embedding-3-small
# Reason: It is highly cost-effective and creates dense, accurate vectors.
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# 2. Create the Vector Database
# This takes your chunks, turns them into numbers, and saves them.
vector_db = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

print("Vector Database 'ChromaDB' is ready and persisted!")

In [None]:
!pip install -q sentence-transformers

The application follows a standard modular RAG pipeline:

Data Ingestion: Loading PDF documents using PyPDFLoader.

Preprocessing: Fragmenting text using RecursiveCharacterTextSplitter to maintain semantic flow (Chunk size: 1000, Overlap: 100).

Vectorization: Generating high-dimensional embeddings using the HuggingFace all-MiniLM-L6-v2 model.

Storage: Utilizing ChromaDB as the vector store for efficient similarity searching.

Retrieval: Fetching relevant context via the LangChain VectorStoreRetriever.

Generation: Using Mistral-7B-Instruct-v0.2 (local pipeline) to generate the final response based on the retrieved context.

In [None]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma

# Use a free, high-quality open-source model
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Create the Vector Database exactly as before
vector_db = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

print("Vector Database created for FREE using HuggingFace!")

In [None]:
!pip install -q langchain-huggingface huggingface_hub

In [None]:
import os
from langchain_huggingface import HuggingFaceEndpoint
from langchain.chains import RetrievalQA

# 1. Set your Hugging Face Token
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "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"

# 2. Initialize the Free LLM (Mistral-7B is excellent for RAG)
repo_id = "mistralai/Mistral-7B-Instruct-v0.2"

llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    max_new_tokens=512,
    temperature=0.1, # Low temperature for factual answers
    huggingfacehub_api_token=os.environ["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"]
)

# 3. Create the RAG Chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vector_db.as_retriever(search_kwargs={"k": 3})
)

# 4. Assignment Requirement: 3 Test Queries
queries = [
    "What is Warren Buffett's advice on investing for the long term?",
    "What does the document say about Berkshire's insurance business?",
    "How does Buffett define a 'wonderful' company?"
]

for i, q in enumerate(queries, 1):
    print(f"\n--- Query {i} ---")
    print(f"Question: {q}")
    response = qa_chain.invoke(q)
    print(f"Answer: {response['result']}")

In [None]:
!pip install -q langchain-classic


In [None]:
import os
from langchain_huggingface import HuggingFaceEndpoint
# Change from langchain.chains to langchain_classic.chains
from langchain_classic.chains import RetrievalQA

# 1. Set your Hugging Face Token
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "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"

# 2. Initialize the Free LLM (Mistral-7B is excellent for RAG)
repo_id = "mistralai/Mistral-7B-Instruct-v0.2"

llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    max_new_tokens=512,
    temperature=0.1, # Low temperature for factual answers
    huggingfacehub_api_token=os.environ["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"]
)

# 3. Create the RAG Chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vector_db.as_retriever(search_kwargs={"k": 3})
)

# 4. Assignment Requirement: 3 Test Queries
queries = [
    "What is Warren Buffett's advice on investing for the long term?",
    "What does the document say about Berkshire's insurance business?",
    "How does Buffett define a 'wonderful' company?"
]

for i, q in enumerate(queries, 1):
    print(f"\n--- Query {i} ---")
    print(f"Question: {q}")
    response = qa_chain.invoke(q)
    print(f"Answer: {response['result']}")

In [None]:
import os
from langchain_huggingface import HuggingFaceEndpoint
from langchain.chains import RetrievalQA

# 1. Set your token correctly
# Replace 'your_token_here' with: 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
my_hf_token = "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"

# 2. Initialize the Free LLM
repo_id = "mistralai/Mistral-7B-Instruct-v0.2"

llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    max_new_tokens=512,
    temperature=0.1,
    huggingfacehub_api_token=my_hf_token  # Pass the variable directly here
)

# 3. Create the RAG Chain (Using the updated create_retrieval_chain method)
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

# Define a simple prompt for the AI
prompt = ChatPromptTemplate.from_template("""
Answer the following question based only on the provided context:
<context>
{context}
</context>
Question: {input}""")

# Combine everything
document_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(vector_db.as_retriever(), document_chain)

# 4. Run your test query
response = rag_chain.invoke({"input": "What is Warren Buffett's advice on investing for the long term?"})
print(response["answer"])

In [None]:
!pip install -q langchain-classic

In [None]:
# 1. FIXED IMPORTS FOR 2026
from langchain_classic.chains import create_retrieval_chain
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

# 2. DEFINE THE SYSTEM PROMPT
system_prompt = (
    "You are a professional assistant. "
    "Use the following pieces of retrieved context to answer the question. "
    "If you don't know the answer, just say that you don't know. "
    "\n\n"
    "Context: {context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

# 3. BUILD THE RAG CHAIN
# This uses the 'llm' and 'vector_db' you created in previous steps
combine_docs_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(vector_db.as_retriever(), combine_docs_chain)

# 4. RUN YOUR TEST QUERIES
test_queries = [
    "What is Warren Buffett's advice on investing for the long term?",
    "What does the document say about Berkshire's insurance business?",
    "How does Buffett define a 'wonderful' company?"
]

print("--- GENERATING ANSWERS ---")
for q in test_queries:
    response = rag_chain.invoke({"input": q})
    print(f"\nQuestion: {q}")
    print(f"Answer: {response['answer']}")

In [None]:
# 1. SETUP LLM & EMBEDDINGS (This fixes the NameError)
import os
from langchain_huggingface import HuggingFaceEndpoint, HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_classic.chains import create_retrieval_chain
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

# AUTHENTICATION
my_hf_token = "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"

# 1. UPDATED LLM DEFINITION (Fixes the task mismatch error)
llm = HuggingFaceEndpoint(
    repo_id="mistralai/Mistral-7B-Instruct-v0.2",
    task="text-generation",  # This is the key fix
    max_new_tokens=512,
    temperature=0.1,
    huggingfacehub_api_token=my_hf_token
)

# 2. RUN YOUR TEST AGAIN
# (Ensure your vector_db and prompt are already defined)
combine_docs_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(vector_db.as_retriever(), combine_docs_chain)

response = rag_chain.invoke({"input": "What is the main advice in this document?"})
print(f"Answer: {response['answer']}")

# DEFINE EMBEDDINGS (Matches your previous step)
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# RE-LOAD VECTOR DB (Points to the directory you created earlier)
vector_db = Chroma(persist_directory="./chroma_db", embedding_function=embeddings)

# 2. BUILD THE RAG CHAIN
system_prompt = (
    "You are a professional assistant. Use the following context to answer. "
    "If you don't know, say you don't know.\n\nContext: {context}"
)

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}"),
])

combine_docs_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(vector_db.as_retriever(), combine_docs_chain)

# 3. RUN TEST
response = rag_chain.invoke({"input": "What is the main advice in this document?"})
print(f"Answer: {response['answer']}")

In [None]:
# 1. INSTALL UPDATED INTEGRATIONS
!pip install -q -U langchain-huggingface langchain-chroma langchain-classic

import os
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace, HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_classic.chains import create_retrieval_chain
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

# 2. SETUP (Replace with your token)
my_hf_token = "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"

# Use the ChatHuggingFace wrapper to fix the 'conversational' task error
base_llm = HuggingFaceEndpoint(
    repo_id="mistralai/Mistral-7B-Instruct-v0.2",
    max_new_tokens=512,
    temperature=0.1,
    huggingfacehub_api_token=my_hf_token
)
llm = ChatHuggingFace(llm=base_llm)

# Load Embeddings & Vector DB
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vector_db = Chroma(persist_directory="./chroma_db", embedding_function=embeddings)

# 3. DEFINE RAG LOGIC
system_prompt = (
    "You are a helpful assistant. Use the following context to answer the question. "
    "Context: {context}"
)

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}"),
])

# Create the Modern Chain
combine_docs_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(vector_db.as_retriever(), combine_docs_chain)

# 4. EXECUTE & TEST
print("--- RUNNING ASSIGNMENT QUERIES ---")
query = "What is the main investing advice in this document?"
response = rag_chain.invoke({"input": query})
print(f"Answer: {response['answer']}")

In [None]:
# ==========================================
# FINAL WORKING RAG CHAIN (The Fix)
# ==========================================
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
from langchain_classic.chains import create_retrieval_chain
from langchain_classic.chains.combine_documents import create_stuff_documents_chain

# Fix for the 'stream' argument error
base_llm = HuggingFaceEndpoint(
    repo_id="mistralai/Mistral-7B-Instruct-v0.2",
    max_new_tokens=512,
    temperature=0.1,
    huggingfacehub_api_token=my_hf_token, # Uses the token from your previous cell
)

# Explicitly disabling streaming solves the Post() keyword error
llm = ChatHuggingFace(llm=base_llm, disable_streaming=True)

# Re-link the components
combine_docs_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(vector_db.as_retriever(), combine_docs_chain)

# Run the final assignment tests
print("--- RUNNING ASSIGNMENT QUERIES ---")
query = "What is the main investing advice in this document?"
response = rag_chain.invoke({"input": query})
print(f"Answer: {response['answer']}")

In [None]:
# ==========================================
# FINAL WORKING RAG CHAIN (Fix for 'stream' error)
# ==========================================
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
from langchain_classic.chains import create_retrieval_chain
from langchain_classic.chains.combine_documents import create_stuff_documents_chain

# 1. Base LLM Setup
base_llm = HuggingFaceEndpoint(
    repo_id="mistralai/Mistral-7B-Instruct-v0.2",
    max_new_tokens=512,
    temperature=0.1,
    huggingfacehub_api_token=my_hf_token, # Uses the token you defined earlier
)

# 2. Chat Wrapper with STREAMING DISABLED
# This 'disable_streaming=True' is the key fix for the TypeError you received.
llm = ChatHuggingFace(llm=base_llm, disable_streaming=True)

# 3. RE-BUILD THE CHAIN
combine_docs_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(vector_db.as_retriever(), combine_docs_chain)

# 4. RUN FINAL TEST
print("--- RUNNING ASSIGNMENT QUERIES ---")
query = "What is the main investing advice in this document?"
response = rag_chain.invoke({"input": query})
print(f"Answer: {response['answer']}")

In [None]:
!pip install -q transformers accelerate

In [None]:
import torch
from transformers import pipeline
from langchain_huggingface import HuggingFacePipeline
from langchain_classic.chains import create_retrieval_chain
from langchain_classic.chains.combine_documents import create_stuff_documents_chain

# 1. SETUP LOCAL LLM PIPELINE
# This avoids the 'Client.post' API error entirely
model_id = "mistralai/Mistral-7B-Instruct-v0.2"

# Initialize the transformers pipeline
pipe = pipeline(
    "text-generation",
    model=model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    max_new_tokens=512,
    repetition_penalty=1.1
)

# Wrap it for LangChain
llm = HuggingFacePipeline(pipeline=pipe)

# 2. RE-BUILD THE RAG CHAIN
# (Using your existing 'prompt' and 'vector_db' from previous cells)
combine_docs_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(vector_db.as_retriever(), combine_docs_chain)

# 3. RUN THE TEST
print("--- RUNNING ASSIGNMENT QUERIES ---")
query = "What is the main investing advice in this document?"
response = rag_chain.invoke({"input": query})
print(f"Answer: {response['answer']}")

Multi-Query Retrieval: Implementing query expansion to improve the diversity of retrieved documents.

Reranking: Adding a Cross-Encoder reranker to ensure the most relevant chunks are prioritized for the LLM.

Integration: Deploying the model via a Streamlit or Gradio web interface for user interaction.