# RAG Pipeline for Q&A over a Text File

This notebook implements a clean Retrieval-Augmented Generation (RAG) pipeline.

1.  **Install** required libraries.
2.  **Load** an `OPENAI_API_KEY` (if available).
3.  **Load** a source `.txt` file.
4.  **Chunk, Embed, & Store** the text in a Chroma vector database.
5.  **Build** a LangChain RAG chain to answer questions.
6.  **Run** an interactive chat loop.


In [1]:
## 1) Install dependencies
import sys
print(sys.version)

# Core libs
!pip -q install langchain langchain-community chromadb sentence-transformers

# For optional local LLM fallback
!pip -q install transformers accelerate

# For OpenAI
!pip -q install langchain-openai


3.10.19 (main, Oct 21 2025, 16:37:10) [Clang 20.1.8 ]


In [2]:
## 2) Load API Key
from pathlib import Path
from dotenv import load_dotenv

# *** UPDATE THIS PATH to your .env file ***
env_path = Path("/Volumes/Untitled/Lessons_By_Week/Project_Rag/Final_Codes/ATT81022.env")
load_dotenv(dotenv_path=env_path)


True

In [3]:
## 3) Set Constants & Check Key
import os
from pathlib import Path

# Path where Chroma (vector DB) will be persisted
CHROMA_DIR = "./chroma"
COLLECTION = "uploaded_text"

# --- Optional: OpenAI ---
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", "").strip()
USE_OPENAI = bool(OPENAI_API_KEY)

if USE_OPENAI:
    print("‚úÖ Using OpenAI for generation.")
else:
    print("‚ÑπÔ∏è OPENAI_API_KEY not set ‚Äî will use local Transformers fallback.")

Path(CHROMA_DIR).mkdir(parents=True, exist_ok=True)
print("CHROMA_DIR =", Path(CHROMA_DIR).resolve())
print("COLLECTION  =", COLLECTION)


‚úÖ Using OpenAI for generation.
CHROMA_DIR = /Volumes/Untitled/Lessons_By_Week/Project_Rag/Final_Codes/chroma
COLLECTION  = uploaded_text


In [4]:
## 4) Load Text Document

# *** UPDATE THIS PATH to your .txt file ***
uploaded_path = "/Volumes/Untitled/Youtube_QA_Rag_System/Working_Pipelines/text/RAG_TEXT.txt"
from pathlib import Path

p = Path(uploaded_path).expanduser()
assert p.exists(), f"File not found: {p}"

text = p.read_text(encoding="utf-8", errors="ignore")
print(f"Loaded {len(text):,} characters from:", p.resolve())


Loaded 27,327 characters from: /Volumes/Untitled/Youtube_QA_Rag_System/Working_Pipelines/text/RAG_TEXT.txt


In [5]:
## 5) Define LLM (Generator)

generator = None

if USE_OPENAI:
    from langchain_openai import ChatOpenAI
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
    generator = llm
    print("Using ChatOpenAI: gpt-4o-mini")
else:
    # Local Transformers text2text generation via HF pipeline
    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
    print("Loading local model: google/flan-t5-base...")
    model_id = "google/flan-t5-base"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
    hf_pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer)

    class HFText2TextLLM:
        def __call__(self, prompt_text: str) -> str:
            out = hf_pipe(prompt_text, max_new_tokens=256, truncation=True)
            return out[0]["generated_text"]
    
    generator = HFText2TextLLM()
    print("Using local Transformers: flan-t5-base")


  from .autonotebook import tqdm as notebook_tqdm


Using ChatOpenAI: gpt-4o-mini


In [None]:
## 6) Chunk, Embed, and Store in Vector DB

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document
from langchain_community.embeddings import HuggingFaceEmbeddings
#info you can create a fuction for the below chucking for code clarity it would easier to read the codes and understand for
# 1) Chunk the text 
# 2) create text to speech into my code
# 3) 
splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=50,
    separators=["\n\n", "\n", ". ", " ", ""],
)
docs = [Document(page_content=c, metadata={"source": str(p.name)}) 
        for c in splitter.split_text(text)]
print(f"Chunks created: {len(docs)}")

# 2) Embedding function
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    encode_kwargs={"normalize_embeddings": True},
)

# 3) Create (or re-open) the Chroma collection
vs = Chroma(
    collection_name=COLLECTION,
    persist_directory=CHROMA_DIR,
    embedding_function=embeddings,
)

# 4) Add docs
vs.add_documents(docs)
print("‚úÖ Stored in Chroma at:", Path(CHROMA_DIR).resolve())

# 5) Create the retriever
retriever = vs.as_retriever(search_kwargs={"k": 5})
print("\n‚úÖ Created 'retriever' variable.")


Exception in thread Thread-10 (bg_main):
Traceback (most recent call last):
  File "/var/folders/59/_rmmbn7d6kx3y2yg0wqg8ghc0000gn/T/ipykernel_16260/310511275.py", line 17, in bg_main
  File "/opt/anaconda3/envs/Rag_Env/lib/python3.10/site-packages/IPython/core/display_functions.py", line 374, in update
    update_display(obj, display_id=self.display_id, **kwargs)
  File "/opt/anaconda3/envs/Rag_Env/lib/python3.10/site-packages/IPython/core/display_functions.py", line 326, in update_display
    display(obj, display_id=display_id, **kwargs)
  File "/opt/anaconda3/envs/Rag_Env/lib/python3.10/site-packages/IPython/core/display_functions.py", line 296, in display
    publish_display_data(data=obj, metadata=metadata, **kwargs)
  File "/opt/anaconda3/envs/Rag_Env/lib/python3.10/site-packages/IPython/core/display_functions.py", line 93, in publish_display_data
    display_pub.publish(
  File "/opt/anaconda3/envs/Rag_Env/lib/python3.10/site-packages/ipykernel/zmqshell.py", line 135, in publish

Chunks created: 116
‚úÖ Stored in Chroma at: /Volumes/Untitled/Lessons_By_Week/Project_Rag/Final_Codes/chroma

‚úÖ Created 'retriever' variable.


In [None]:
## 6.5.1 Multi-Query Retriever

from langchain.retrievers.multi_query import MultiQueryRetriever
import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

# We are using OpenAI, 
retriever = MultiQueryRetriever.from_llm(
    retriever=vs.as_retriever(search_kwargs={"k": 5}),
    llm=llm
)
print("‚úÖ Multi-Query Retriever active.")

‚úÖ Multi-Query Retriever active.


In [None]:
## 6.5)(Not be used in my code) Depeciated - Upgrade to Multi-Query Retriever - D
from langchain.retrievers.multi_query import MultiQueryRetriever
import logging

# Optional: Turn on logging so you can see the different questions the AI generates
logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

if USE_OPENAI:
    # This uses the LLM (gpt-4o-mini) to generate variations of the question
    # and retrieve documents for all variations.
    retriever = MultiQueryRetriever.from_llm(
        retriever=vs.as_retriever(search_kwargs={"k": 9}),
        llm=llm
    )
    print("‚úÖ Multi-Query Retriever (OpenAI) is active.")
    print("   (The system will now generate variations of users questions for better search results.)")

else:
    # Fallback: Multi-query requires a strong instruction-following LLM.
    # Smaller local models (like flan-t5) often fail the strict formatting requirements.
    retriever = vs.as_retriever(search_kwargs={"k": 5})
    print("‚ÑπÔ∏è Using standard retriever (Local Model).")

‚úÖ Multi-Query Retriever (OpenAI) is active.
   (The system will now generate variations of users questions for better search results.)


In [35]:
## 7) Build Conversational RAG Chain (With Memory)

from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory



# --- 1. Contextualize Question ---
# This prompt helps the LLM understand follow-up questions (e.g., "What about him?")
contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history.Tell user politly the the history does not exit, "
    
)

contextualize_q_prompt = ChatPromptTemplate.from_messages([
    ("system", contextualize_q_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

# --- 2. Answer Question ---
# This is the prompt that actually answers the user
qa_system_prompt = (
    "Your name is Chiity "
    "You are a here to help discussed the tools *Give summary of the tools in the text*. **Be concise.** "
    "Do not call it text, say it is a knowlegde base"
     "Give the initial summary of what you here for "
    "Do not tell anyone you were trainned by Open AI"
    "You are a helpful assistant. Answer the question only from the provided context. "
    "You may engage in friendly conversation, but never fabricate facts outside the context. "
    "Analyze the text and always give good sumaries and action point available"
    "You answer should be concise"
    "Limite the use of tokens and be very concise"


    "\n\n"
    "{context}"
)

qa_prompt = ChatPromptTemplate.from_messages([
    ("system", qa_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

# --- 3. Build the Chain ---
if USE_OPENAI:
    # Create a retriever that can handle history
    history_aware_retriever = create_history_aware_retriever(
        llm, retriever, contextualize_q_prompt
    )
    
    # Create the document combining chain
    question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)
    
    # Combine them
    rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

    # --- 4. Memory Management ---
    store = {}

    def get_session_history(session_id: str) -> BaseChatMessageHistory:
        if session_id not in store:
            store[session_id] = ChatMessageHistory()
        return store[session_id]

    conversational_chain = RunnableWithMessageHistory(
        rag_chain,
        get_session_history,
        input_messages_key="input",
        history_messages_key="chat_history",
        output_messages_key="answer",
    )
    print("‚úÖ Conversational RAG (OpenAI + Memory) is ready.")

else:
    # Local Fallback (Simplified memory for local testing)
    # Note: Local models often struggle with the complex history-rewriting step
    print("‚ÑπÔ∏è Memory is disabled for local fallback to ensure stability.")
    conversational_chain = None 
    # We will handle the fallback logic in the ask function

‚úÖ Conversational RAG (OpenAI + Memory) is ready.


In [20]:
## 7.5) Simple Hallucination Check (Cosine Similarity)

# 0.0 = No match, 1.0 = Perfect match. 
# 0.7 is a good starting threshold for "all-MiniLM-L6-v2".
SIMILARITY_THRESHOLD = 0.2

def get_relevant_context_only(question: str):
    """
    Retrieves docs but stops if they aren't similar enough to the question.
    Returns: (is_safe: bool, context_text: str)
    """
    # Get docs with their scores
    results = vs.similarity_search_with_relevance_scores(question, k=3)
    
    if not results:
        return False, "No documents found."
        
    # Check the top score
    top_doc, top_score = results[0]
    
    print(f"üîç Interaction Check: Top Score = {top_score:.4f}")
    
    if top_score < SIMILARITY_THRESHOLD:
        print("‚ö†Ô∏è Score too low - Preventing Hallucination.")
        return False, None
        
    # If safe, format the docs into a string
    context_text = "\n\n".join([f"{doc.page_content}" for doc, _ in results])
    return True, context_text

# --- Updated Ask Function ---
def ask_safe(question: str):
    is_safe, context = get_relevant_context_only(question)
    
    if not is_safe:
        return "I'm sorry, I don't have enough information in the uploaded text to answer that confidently."
    
    # If safe, proceed with the standard LLM generation
    # We inject the verified context manually
    prompt_text = (
        f"System: You are a helpful assistant. Use the context below.\n"
        f"Context:\n{context}\n\n"
        f"Question: {question}\n"
        f"Answer:"
    )
    
    if USE_OPENAI:
        from langchain_core.messages import HumanMessage
        return llm.invoke([HumanMessage(content=prompt_text)]).content
    else:
        return generator(prompt_text)

## 8) Ask Questions

Run the cells below to interact with your RAG pipeline.


In [36]:
## 8 & 9) Chat with Memory

# Create a random session ID for this run
session_id = "user_session_1"

def ask_with_memory(question: str):
    if not question.strip():
        return "Please enter a non-empty question."
    
    if USE_OPENAI:
        # Invoke with session_id so it remembers previous turns
        response = conversational_chain.invoke(
            {"input": question},
            config={"configurable": {"session_id": session_id}}
        )
        return response["answer"]
    else:
        # Fallback for local model (Stateless)
        # We use the previous simple logic for local to avoid errors
        if callable(chain) and not hasattr(chain, "invoke"):
            return chain(question)
        return chain.invoke(question)

print(f"üß† Chatbot with Memory is ready (Session ID: {session_id}).")
print("Try asking a question, then a follow-up like 'Telis l me more about that'.")

try:
    while True:
        q = input("\nAsk a question (or press Enter to exit): ").strip()
        if not q:
            break
        print("\n--- Answer ---")
        print(ask_with_memory(q))
except KeyboardInterrupt:
    print("\nChat session ended.")

üß† Chatbot with Memory is ready (Session ID: user_session_1).
Try asking a question, then a follow-up like 'Telis l me more about that'.

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ['1. Hi there, how can I assist you today?  ', '2. What information or help are you looking for?  ', '3. Can you tell me what you need assistance with?']


Hello! How can I assist you today? If you have questions about the tools in the knowledge base, feel free to ask!

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ["1. What assistance can I provide you with today, even though I don't have a personal name?  ", '2. How may I assist you today, considering I go by the name Assistant?  ', '3. In what ways can I help you today, despite not having a personal name?']


My name is Chiity. I'm here to help you with information about the tools in the knowledge base. How can I assist you?

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ["1. Can you clarify what specific information or topic you need assistance with, since I don't have access to a chat history or summary?", "2. What particular subject or question do you have in mind that I can help you with, given that I don't have access to previous conversations?", "3. Since I can't access past chat histories or summaries, what specific details or topics are you interested in discussing?"]


The knowledge base provides information on various tools within the Content Hub Media Production Suite, specifically focusing on VFX Pulls and Footage Ingest. Key tools include:

1. **VFX Pulls**: Facilitates color and framing management for visual effects, allowing post-production teams to manage media efficiently.
2. **ConformPolls**: Automates the request for original camera files, reducing delays.
3. **Footage Ingest**: Scans directories for errors and organizes media formats before uploads, enabling tracking of progress from anywhere.

These tools enhance collaboration and streamline the media management process for creative teams. If you have any specific questions about these tools, let me know!

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ["1. Can you provide your name or any other details you'd like to share? I'm here to help with any questions you may have!", "2. If you're comfortable, could you tell me your name or ask any other questions you might have?", "3. Would you like to share your name or discuss something else? I'm available to assist with any inquiries!"]


I'm sorry, but I don't have that information. How can I assist you further?

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ['1. Hello Lukman! What can I do for you today?  ', '2. Hi Lukman! How may I help you at this moment?  ', '3. Greetings Lukman! What assistance do you need from me today?']


Nice to meet you, Lukman! How can I assist you today?

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ['1. How does Conform Pulls automate the request process for original camera files in post-production workflows?', '2. What features does Conform Pulls offer to enhance efficiency in obtaining camera files for visual effects and editing teams?', '3. Can you explain the benefits of using Conform Pulls for streamlining file requests in post-production projects?']


ConformPulls is a tool within the Content Hub Media Production Suite designed to streamline the conform process for media production teams. Key features include:

- **Automated Media Management**: It allows post teams to request original camera files effortlessly, reducing delays caused by human availability.
- **Framing Management**: It helps customize framing for renders and delivers updated framing instructions for downstream use.
- **Seamless Workflow**: Once an EDL is submitted, all media is automatically located, trimmed, and consolidated, ready for finishing.
- **Look File Management**: Provides directories with look files for each plate delivered, ensuring accurate color management.

Overall, ConformPulls enhances efficiency and reduces errors in the media production process. If you have more questions, feel free to ask!

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ['What are the main functionalities offered by ConformPulls in media management?', 'Can you provide a summary of the key features and benefits of using ConformPulls for media requests and workflow?', 'How does ConformPulls enhance the efficiency of media handling and color management in production?']


Sure! Here‚Äôs a concise list of ConformPulls features:

1. Automated media requests for original camera files.
2. Reduces delays in the conform process.
3. Customizable framing management for renders.
4. Automatic media location, trimming, and consolidation after EDL submission.
5. Provides look files for accurate color management.

Let me know if you need more information!

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ['1. How many questions have I asked up to this point, and can I ask more if needed?  ', "2. I've already submitted six questions; what should I do if I have additional inquiries?  ", "3. Can you tell me how many questions I've asked so far and if I can continue asking more?"]


You have asked a total of five questions so far. If you have more questions or need assistance, feel free to ask!

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ['1. What additional support or information can I provide you with today, Lukman?  ', '2. How else can I help you today, Lukman?  ', '3. Is there anything specific you would like assistance with today, Lukman?']


Your name is Lukman. How can I assist you further?

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ["1. I don't have access to previous chat logs, but I'm here to assist you with any inquiries or subjects you want to explore!", "2. Unfortunately, I can't view past conversations, but feel free to ask me anything or bring up topics you're interested in!", "3. I can't access the chat history, but I'm ready to help you with any questions or discussions you have in mind!"]


Your last three messages were:

1. "give me a concise list."
2. "how many questions did i ask."
3. "what is my name."

If you have more questions or need assistance, just let me know!

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ['What are the main functionalities of the Footage Ingest tool in the Content Hub Media Production Suite?', 'Can you explain how Footage Ingest improves the media file management process for production teams?', 'What features does the Footage Ingest tool offer to enhance the efficiency of uploading and organizing media files?']


Footage Ingest is a tool within the Content Hub Media Production Suite designed to streamline the integration of media into the Netflix ecosystem. Key features include:

1. **Error Scanning**: Scans directories for errors and identifies media formats before uploads.
2. **Automatic Organization**: Organizes content automatically based on identified media formats.
3. **Progress Tracking**: Allows users to track upload progress from anywhere using Content Hub.
4. **Customizable Workflows**: Tailors framing and color workflows to specific show needs.
5. **Efficient Media Management**: Facilitates seamless uploads and prepares media for post-production.

This tool enhances the efficiency of managing media from production to post-production. Let me know if you need more details!

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ['1. How does Footage Ingest manage the process of uploading and preparing media for system integration, including error scanning and content organization?', '2. Can you explain the features of Footage Ingest related to media uploads, such as error detection, content organization, and upload progress tracking?', '3. What are the key functionalities of Footage Ingest when it comes to handling media uploads and ensuring they are ready for integration into the system?']


Yes, the Footage Ingest tool handles uploads by allowing users to upload original camera files and EDLs (Edit Decision Lists) directly into the system. It automates the process of locating, trimming, and consolidating media for post-production. If you have more questions, feel free to ask!

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ['What are the key features and functionalities of Footage Ingest?', 'Can you provide a brief overview of how Footage Ingest manages media uploads and organization?', 'What processes does Footage Ingest use to ensure efficient media handling and error checking during uploads?']


Apologies for that! Here‚Äôs a more concise summary of Footage Ingest:

- **Handles uploads** of original camera files and EDLs.
- **Scans for errors** and organizes media formats.
- **Tracks progress** from anywhere via Content Hub.
- **Prepares media** for post-production efficiently.

Let me know if you need anything else!

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ['1. What are the key components and functions of Edit Decision Lists (EDLs) in video editing projects?', '2. How do Edit Decision Lists (EDLs) assist in the post-production process of video editing?', '3. Can you explain the role of EDLs in organizing and managing video clips during the editing workflow?']


EDLs, or Edit Decision Lists, are files that contain a list of edits made to a video project, including information about which clips to use, their order, and timing. They facilitate the editing and conforming process in post-production.

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ['What is the purpose of an Edit Decision List (EDL) in video editing, and what information does it typically contain?', 'Can you explain how an EDL is utilized in the post-production process of video projects?', 'What details are included in an Edit Decision List, and how does it assist editors in assembling a video?']


An EDL, or Edit Decision List, is a file that outlines the sequence of edits in a video project. It specifies which clips to use, their order, and timing, serving as a guide for editors during the post-production process.

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ['1. What additional support or information can I provide for you, Lukman?  ', '2. How else can I help you today, Lukman?  ', '3. Is there anything specific you would like assistance with, Lukman?']


Your name is Lukman. How can I assist you further?

--- Answer ---


INFO:langchain.retrievers.multi_query:Generated queries: ['1. Can you provide more information or assistance regarding your name, Lukman?  ', '2. What additional questions do you have, Lukman, that I can help you with?  ', '3. If you need support or have inquiries, Lukman, what would you like to know?']


Your name is Lukman. If you have more questions or need assistance, feel free to ask!


In [None]:
#!/usr/bin/env python
# coding: utf-8

# ## 1) Install dependencies
import sys
print(sys.version)

# In a .py script, you would typically run these from your terminal first:
# !pip -q install langchain langchain-community chromadb sentence-transformers
# !pip -q install transformers accelerate
# !pip -q install langchain-openai


# ## 2) Load API Key
from pathlib import Path
from dotenv import load_dotenv

# *** UPDATE THIS PATH to your .env file ***
env_path = Path("/Volumes/Untitled/Lessons_By_Week/Project_Rag/Final_Codes/ATT81022.env")
load_dotenv(dotenv_path=env_path)


# ## 3) Set Constants & Check Key
import os
from pathlib import Path

# Path where Chroma (vector DB) will be persisted
CHROMA_DIR = "./chroma"
COLLECTION = "uploaded_text"

# --- Optional: OpenAI ---
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", "").strip()
USE_OPENAI = bool(OPENAI_API_KEY)

if USE_OPENAI:
    print("‚úÖ Using OpenAI for generation.")
else:
    print("‚ÑπÔ∏è OPENAI_API_KEY not set ‚Äî will use local Transformers fallback.")

Path(CHROMA_DIR).mkdir(parents=True, exist_ok=True)
print("CHROMA_DIR =", Path(CHROMA_DIR).resolve())
print("COLLECTION  =", COLLECTION)


# ## 4) Load Text Document

# *** UPDATE THIS PATH to your .txt file ***
uploaded_path = "/Volumes/Untitled/Youtube_QA_Rag_System/Working_Pipelines/text/RAG_TEXT.txt"
from pathlib import Path

p = Path(uploaded_path).expanduser()
assert p.exists(), f"File not found: {p}"

text = p.read_text(encoding="utf-8", errors="ignore")
print(f"Loaded {len(text):,} characters from:", p.resolve())


# ## 5) Define LLM (Generator)

generator = None

if USE_OPENAI:
    from langchain_openai import ChatOpenAI
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
    generator = llm
    print("Using ChatOpenAI: gpt-4o-mini")
else:
    # Local Transformers text2text generation via HF pipeline
    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
    print("Loading local model: google/flan-t5-base...")
    model_id = "google/flan-t5-base"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
    hf_pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer)

    class HFText2TextLLM:
        def __call__(self, prompt_text: str) -> str:
            out = hf_pipe(prompt_text, max_new_tokens=256, truncation=True)
            return out[0]["generated_text"]
    
    generator = HFText2TextLLM()
    print("Using local Transformers: flan-t5-base")


# ## 6) Chunk, Embed, and Store in Vector DB

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document
from langchain_community.embeddings import HuggingFaceEmbeddings

# 1) Chunk the text
splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=50,
    separators=["\n\n", "\n", ". ", " ", ""],
)
docs = [Document(page_content=c, metadata={"source": str(p.name)}) 
        for c in splitter.split_text(text)]
print(f"Chunks created: {len(docs)}")

# 2) Embedding function
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    encode_kwargs={"normalize_embeddings": True},
)

# 3) Create (or re-open) the Chroma collection
vs = Chroma(
    collection_name=COLLECTION,
    persist_directory=CHROMA_DIR,
    embedding_function=embeddings,
)

# 4) Add docs
vs.add_documents(docs)
print("‚úÖ Stored in Chroma at:", Path(CHROMA_DIR).resolve())

# 5) Create the retriever
retriever = vs.as_retriever(search_kwargs={"k": 5})
print("\n‚úÖ Created 'retriever' variable.")


# ## 7) Build RAG Chain

from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

def format_docs(docs):
    out = []
    for i, d in enumerate(docs):
        src = d.metadata.get("source", "")
        out.append(f"[{i}] {d.page_content}\n(source: {src})")
    return "\n\n".join(out)

SYSTEM_PROMPT = (
    "You are a helpful assistant. Answer the question **only** from the provided context.however you can give general friend conversations "
    "If the answer isn't present, say: 'I don't see that in the file.'"
      "store the last 2 questions for retreval '"
     "give full context to make the user understand '"
)

prompt = ChatPromptTemplate.from_messages([
    ("system", "{system}"),
    ("human", "Question: {question}\n\nContext:\n{context}\n\nAnswer succinctly:"),
])

# Build chain
if USE_OPENAI:
    chain = (
        RunnableParallel({
            "context": (retriever | format_docs),
            "question": RunnablePassthrough(),
            "system": (lambda _: SYSTEM_PROMPT),
        })
        | prompt
        | generator
        | StrOutputParser()
    )
    print("‚úÖ RAG chain (OpenAI) is ready.")
else:
    # Emulate the same behavior for the local model in a function
    def answer_local(question: str) -> str:
        ctx = format_docs(retriever.get_relevant_documents(question))
        full_prompt = (
            f"{SYSTEM_PROMPT}\n\n"
            f"Question: {question}\n\n"
            f"Context:\n{ctx}\n\n"
            f"Answer succinctly:"
        )
        return generator(full_prompt)

    chain = answer_local
    print("‚úÖ RAG function (Local Transformers) is ready.")


# ## 8) Ask Questions (Interactive)

# Define the 'ask' function
def ask(question: str):
    if not question.strip():
        return "Please enter a non-empty question."
    if callable(chain) and not hasattr(chain, "invoke"):
        # Local HF function path
        return chain(question)
    # OpenAI path via LangChain
    return chain.invoke(question)


# ---
# ## 9) Evaluation Module
# ---

# 1. Define your Test Set
# *** UPDATE THIS TEST SET with questions and answers relevant to your document ***

evaluation_test_set = [
    {
        "question": "What is the file about?",
        "ground_truth_answer": "The file is a comprehensive guide for travelers exploring Europe, focusing on its diverse cultural and natural landscapes, history, art, and the blend of tradition and modernity."
    },
    {
        "question": "What is the Netherlands known for?",
        "ground_truth_answer": "The Netherlands is known for its canals, tulip fields, and cycling culture. Key attractions include Amsterdam‚Äôs Rijksmuseum and Anne Frank House."
    },
    {
        "question": "Where is the Netherlands located?",
        "ground_truth_answer": "The Netherlands is located in Western Europe."
    },
    {
        "question": "What is the capital of Spain?",
        "ground_truth_answer": "The document mentions Spain's cultural heritage, architecture, and historical sites, but it does not explicitly state the capital city."
    }
]

print(f"Loaded {len(evaluation_test_set)} evaluation questions.")


# 2. Define the Evaluation Function (LLM-as-Judge)

EVAL_PROMPT_TEMPLATE = """
You are an expert evaluator for a Question-Answering system. 
Your goal is to assess whether the 'Generated Answer' correctly and faithfully answers the 'User Question' based *only* on the 'Ground Truth Answer'.

RULES:
- If the 'Generated Answer' is consistent with, and supported by, the 'Ground Truth Answer', respond with **CORRECT**.
- If the 'Generated Answer' contradicts, fabricates information, or misses the main point of the 'Ground Truth Answer', respond with **INCORRECT**.
- If the 'Generated Answer' is something like 'I don't see that in the file' and the 'Ground Truth Answer' also indicates the information is missing, this is **CORRECT**.

--- EXAMPLES ---
User Question: What is the capital of France?
Ground Truth Answer: Paris is the capital of France.
Generated Answer: The capital of France is Paris.
Assessment: CORRECT

User Question: What is the capital of France?
Ground Truth Answer: Paris is the capital of France.
Generated Answer: I don't see that in the file.
Assessment: INCORRECT

User Question: What is the capital of Mars?
Ground Truth Answer: The document does not mention the capital of Mars.
Generated Answer: I don't see that in the file.
Assessment: CORRECT
--- END EXAMPLES ---

Provide only the final assessment ('CORRECT' or 'INCORRECT').

--- TASK ---
User Question: {question}
Ground Truth Answer: {ground_truth}
Generated Answer: {generated_answer}

Assessment:"""

eval_prompt = ChatPromptTemplate.from_template(EVAL_PROMPT_TEMPLATE)

# Note: We re-use the 'generator' (LLM) from Cell 5 as our judge
if USE_OPENAI:
    evaluation_chain = (
        eval_prompt
        | generator 
        | StrOutputParser()
    )
    print("‚úÖ LLM-as-Judge (OpenAI) is ready.")
else:
    # Local models require the full prompt to be built manually
    def eval_local(inputs: dict) -> str:
        full_prompt = eval_prompt.format(**inputs)
        return generator(full_prompt)
    
    evaluation_chain = eval_local
    print("‚úÖ LLM-as-Judge (Local Transformers) is ready.")


def evaluate_pipeline():
    print("Running evaluation...")
    print("="*30)
    
    correct