# RAG Pipeline for Q&A over a Text File

This notebook implements a clean Retrieval-Augmented Generation (RAG) pipeline.

1.  **Install** required libraries.
2.  **Load** an `OPENAI_API_KEY` (if available).
3.  **Load** a source `.txt` file.
4.  **Chunk, Embed, & Store** the text in a Chroma vector database.
5.  **Build** a LangChain RAG chain to answer questions.
6.  **Run** an interactive chat loop.


In [1]:
## 1) Install dependencies
import sys
print(sys.version)

# Core libs
!pip -q install langchain langchain-community chromadb sentence-transformers

# For optional local LLM fallback
!pip -q install transformers accelerate

# For OpenAI
!pip -q install langchain-openai


3.10.19 (main, Oct 21 2025, 16:37:10) [Clang 20.1.8 ]


In [2]:
## 2) Load API Key
from pathlib import Path
from dotenv import load_dotenv

# *** UPDATE THIS PATH to your .env file ***
env_path = Path("/Volumes/Untitled/Lessons_By_Week/Project_Rag/Final_Codes/ATT81022.env")
load_dotenv(dotenv_path=env_path)


True

In [3]:
## 3) Set Constants & Check Key
import os
from pathlib import Path

# Path where Chroma (vector DB) will be persisted
CHROMA_DIR = "./chroma"
COLLECTION = "uploaded_text"

# --- Optional: OpenAI ---
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", "").strip()
USE_OPENAI = bool(OPENAI_API_KEY)

if USE_OPENAI:
    print("‚úÖ Using OpenAI for generation.")
else:
    print("‚ÑπÔ∏è OPENAI_API_KEY not set ‚Äî will use local Transformers fallback.")

Path(CHROMA_DIR).mkdir(parents=True, exist_ok=True)
print("CHROMA_DIR =", Path(CHROMA_DIR).resolve())
print("COLLECTION  =", COLLECTION)


‚úÖ Using OpenAI for generation.
CHROMA_DIR = /Volumes/Untitled/Lessons_By_Week/Project_Rag/Final_Codes/chroma
COLLECTION  = uploaded_text


In [4]:
## 4) Load Text Document

# *** UPDATE THIS PATH to your .txt file ***
uploaded_path = "/Volumes/Untitled/Youtube_QA_Rag_System/Working_Pipelines/text/RAG_TEXT.txt"
from pathlib import Path

p = Path(uploaded_path).expanduser()
assert p.exists(), f"File not found: {p}"

text = p.read_text(encoding="utf-8", errors="ignore")
print(f"Loaded {len(text):,} characters from:", p.resolve())


Loaded 10,135 characters from: /Volumes/Untitled/Youtube_QA_Rag_System/Working_Pipelines/text/RAG_TEXT.txt


In [5]:
## 5) Define LLM (Generator)

generator = None

if USE_OPENAI:
    from langchain_openai import ChatOpenAI
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
    generator = llm
    print("Using ChatOpenAI: gpt-4o-mini")
else:
    # Local Transformers text2text generation via HF pipeline
    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
    print("Loading local model: google/flan-t5-base...")
    model_id = "google/flan-t5-base"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
    hf_pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer)

    class HFText2TextLLM:
        def __call__(self, prompt_text: str) -> str:
            out = hf_pipe(prompt_text, max_new_tokens=256, truncation=True)
            return out[0]["generated_text"]
    
    generator = HFText2TextLLM()
    print("Using local Transformers: flan-t5-base")


  from .autonotebook import tqdm as notebook_tqdm


Using ChatOpenAI: gpt-4o-mini


In [None]:
COLLECTION = "uploaded_text"

In [6]:
## 6) Chunk, Embed, and Store in Vector DB

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document
from langchain_community.embeddings import HuggingFaceEmbeddings

# 1) Chunk the text
splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=50,
    separators=["\n\n", "\n", ". ", " ", ""],
)
docs = [Document(page_content=c, metadata={"source": str(p.name)}) 
        for c in splitter.split_text(text)]
print(f"Chunks created: {len(docs)}")

# 2) Embedding function
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    encode_kwargs={"normalize_embeddings": True},
)

# 3) Create (or re-open) the Chroma collection
vs = Chroma(
    collection_name=COLLECTION,
    persist_directory=CHROMA_DIR,
    embedding_function=embeddings,
)

# 4) Add docs
vs.add_documents(docs)
print("‚úÖ Stored in Chroma at:", Path(CHROMA_DIR).resolve())

# 5) Create the retriever
retriever = vs.as_retriever(search_kwargs={"k": 5})
print("\n‚úÖ Created 'retriever' variable.")


Chunks created: 55


  embeddings = HuggingFaceEmbeddings(
  vs = Chroma(


‚úÖ Stored in Chroma at: /Volumes/Untitled/Lessons_By_Week/Project_Rag/Final_Codes/chroma

‚úÖ Created 'retriever' variable.


In [7]:
## 6.5) Upgrade to Multi-Query Retriever
from langchain.retrievers.multi_query import MultiQueryRetriever
import logging

# Optional: Turn on logging so you can see the different questions the AI generates
logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

if USE_OPENAI:
    # This uses the LLM (gpt-4o-mini) to generate variations of the question
    # and retrieve documents for all variations.
    retriever = MultiQueryRetriever.from_llm(
        retriever=vs.as_retriever(search_kwargs={"k": 5}),
        llm=llm
    )
    print("‚úÖ Multi-Query Retriever (OpenAI) is active.")
    print("   (The system will now generate variations of your question for better search results.)")

else:
    # Fallback: Multi-query requires a strong instruction-following LLM.
    # Smaller local models (like flan-t5) often fail the strict formatting requirements.
    retriever = vs.as_retriever(search_kwargs={"k": 5})
    print("‚ÑπÔ∏è Using standard retriever (Local Model).")

‚úÖ Multi-Query Retriever (OpenAI) is active.
   (The system will now generate variations of your question for better search results.)


In [13]:
## 7) Build Conversational RAG Chain (With Memory)

from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

# --- 1. Contextualize Question ---
# This prompt helps the LLM understand follow-up questions (e.g., "What about him?")
contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages([
    ("system", contextualize_q_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

# --- 2. Answer Question ---
# This is the prompt that actually answers the user
qa_system_prompt = (
    "You are a helpful assistant. Answer the question only from the provided context. "
    "If the answer isn't present, say: 'I don't see that in the file nut try not to make it too obvious that you are ai you can play with differenct option ' "
    "You may engage in friendly conversation, but never fabricate facts outside the context. "
    "\n\n"
    "{context}"
)

qa_prompt = ChatPromptTemplate.from_messages([
    ("system", qa_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

# --- 3. Build the Chain ---
if USE_OPENAI:
    # Create a retriever that can handle history
    history_aware_retriever = create_history_aware_retriever(
        llm, retriever, contextualize_q_prompt
    )
    
    # Create the document combining chain
    question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)
    
    # Combine them
    rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

    # --- 4. Memory Management ---
    store = {}

    def get_session_history(session_id: str) -> BaseChatMessageHistory:
        if session_id not in store:
            store[session_id] = ChatMessageHistory()
        return store[session_id]

    conversational_chain = RunnableWithMessageHistory(
        rag_chain,
        get_session_history,
        input_messages_key="input",
        history_messages_key="chat_history",
        output_messages_key="answer",
    )
    print("‚úÖ Conversational RAG (OpenAI + Memory) is ready.")

else:
    # Local Fallback (Simplified memory for local testing)
    # Note: Local models often struggle with the complex history-rewriting step
    print("‚ÑπÔ∏è Memory is disabled for local fallback to ensure stability.")
    conversational_chain = None 
    # We will handle the fallback logic in the ask function

‚úÖ Conversational RAG (OpenAI + Memory) is ready.


## 8) Ask Questions

Run the cells below to interact with your RAG pipeline.


In [14]:
## 8 & 9) Chat with Memory

# Create a random session ID for this run
session_id = "user_session_1"

def ask_with_memory(question: str):
    if not question.strip():
        return "Please enter a non-empty question."
    
    if USE_OPENAI:
        # Invoke with session_id so it remembers previous turns
        response = conversational_chain.invoke(
            {"input": question},
            config={"configurable": {"session_id": session_id}}
        )
        return response["answer"]
    else:
        # Fallback for local model (Stateless)
        # We use the previous simple logic for local to avoid errors
        if callable(chain) and not hasattr(chain, "invoke"):
            return chain(question)
        return chain.invoke(question)

print(f"üß† Chatbot with Memory is ready (Session ID: {session_id}).")
print("Try asking a question, then a follow-up like 'Tell me more about that'.")

try:
    while True:
        q = input("\nAsk a question (or press Enter to exit): ").strip()
        if not q:
            break
        print("\n--- Answer ---")
        print(ask_with_memory(q))
except KeyboardInterrupt:
    print("\nChat session ended.")

üß† Chatbot with Memory is ready (Session ID: user_session_1).
Try asking a question, then a follow-up like 'Tell me more about that'.

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ['Is the Netherlands a good place to live or visit?  ', 'What are the pros and cons of living in the Netherlands?  ', 'How does the quality of life in the Netherlands compare to other countries?']


I don't see that in the file, but it does mention that the Netherlands delights with its canals, tulip fields, and cycling culture, which suggests it has many appealing features. Would you like to know more about a specific aspect of the Netherlands?

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ['What is the meaning or significance of the name Kayode?  ', 'Can you explain the cultural or historical background of the name Kayode?  ', 'What are the origins and interpretations associated with the name Kayode?']


Hi Kayode! It's nice to meet you. How can I assist you today?

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ['1. Nj·∫π o le ·π£e alaye ni ede Yoruba?', '2. ·π¢e o le k·ªç ni Yoruba?', '3. Nj·∫π o le ba mi s·ªçr·ªç ni Yoruba?']


I don't see that in the file, but I can help with other questions or topics if you'd like!

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ['What is the geographical position of the Netherlands?  ', 'Can you tell me the location of the Netherlands on a map?  ', 'In which part of Europe can I find the Netherlands?']


I don't see that in the file, but I can tell you that NL typically refers to the Netherlands. If you have more questions about it, feel free to ask!

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ['What was the question you asked before this one?  ', 'Can you remind me of the question you posed earlier?  ', 'What did you inquire about in your last message?']


Your previous question was about whether the Netherlands is good.

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ["Can you recall previous conversations or interactions we've had?  ", 'Are you capable of retaining information from our past discussions?  ', 'Do you have the functionality to remember our earlier exchanges?']


I don't have memory in the way humans do, but I can keep track of our conversation while we're chatting. Once the conversation ends, I won't remember anything from it. How can I assist you further?
