In [None]:
# Here is the **deep explanation**   first read this theory for code explaination then go to code present in second cell

# ---

# # **Deep Explanation of the Code Flow**  

# This code implements a **Retrieval-Augmented Generation (RAG) pipeline** using **LangChain, ChromaDB, a local LLaMA model, and Tavily web search API**. The goal is to **retrieve, validate, and generate accurate answers** based on stored documents and external search results, while ensuring **factual correctness** and **problem resolution**.  

# ---

# ## **1. Setting Up the Environment**  
# - The code starts by **installing necessary dependencies** for LangChain, vector databases (ChromaDB), web search (Tavily), and the local LLaMA model.  
# - **API keys and environment variables** are set to enable LangChain tracing and authentication with Tavily.  

# ---

# ## **2. Document Ingestion & Processing**  
# This step ensures **high-quality knowledge retrieval** by:  
# 1. **Scraping documents** from predefined URLs using `FireCrawlLoader`.  
# 2. **Splitting documents into chunks** using `RecursiveCharacterTextSplitter` to fit within model context limits.  
# 3. **Filtering metadata** to remove unnecessary complexity.  
# 4. **Storing processed documents** in **ChromaDB** as embedding vectors, enabling fast and efficient retrieval.  

# ---

# ## **3. Retrieving Documents for a User Query**  
# When a user submits a question, the system follows these steps:  
# 1. **ChromaDB is queried** for relevant documents.  
# 2. The retrieved document is sent to a **retrieval grader** (`Chatollama`) that decides:  
#    - If **relevant**, the document is used for answering.  
#    - If **not relevant**, a **web search using Tavily** is triggered.  

# This ensures that **only high-quality context** is used for generating answers.  

# ---

# ## **4. Generating an Answer Using the Retrieved Context**  
# Once a relevant document is found:  
# 1. The **retrieved document and question** are passed into a **RAG prompt**.  
# 2. The local **LLaMA model (`Chatollama`) generates an answer**.  
# 3. The generated response is then **checked for hallucination**.  

# ---

# ## **5. Hallucination Detection**  
# Since **LLMs can hallucinate (generate incorrect answers)**, the pipeline includes:  
# - A **hallucination checker** that verifies whether the generated answer **stays true to the retrieved context**.  
# - If hallucination is detected:  
#   1. The system **performs another web search**.  
#   2. The model **regenerates the answer** using **newly retrieved information**.  

# This ensures the final answer is **factually accurate** and prevents misinformation.  

# ---

# ## **6. Final Answer Verification (New Step Added)**  
# Even after passing the **hallucination check**, the generated answer is further evaluated using a **final answer grader**.  
# - This step ensures that the generated response actually **resolves the user’s query**.  
# - The answer is graded as:  
#   - `"yes"` → If the answer sufficiently resolves the query.  
#   - `"no"` → If the answer does not fully address the query.  

# If the answer is **not useful**, the system **performs another web search and regenerates the response**.  

# This final step guarantees that the model provides a **clear, direct, and actionable response** rather than vague or unhelpful information.  

# ---

# ## **7. Final Answer Delivery**  
# - After multiple **verification loops** (retrieval → hallucination check → final grading), the **best possible answer** is presented to the user.  
# - This process ensures that the response is **accurate, contextually relevant, and directly addresses the question**.  

# ---

# ## **Core Strengths of This Approach**  
# 1. **Combining Static & Dynamic Knowledge:**  
#    - Uses **stored documents (ChromaDB)** for efficiency.  
#    - Uses **real-time web search (Tavily)** for fresh, up-to-date information.  

# 2. **Multi-Level Filtering & Validation:**  
#    - **Retrieval Grader:** Ensures **only relevant** documents are used.  
#    - **Hallucination Checker:** Prevents **factually incorrect** answers.  
#    - **Final Answer Grader:** Ensures that the answer is **actually useful**.  

# 3. **Minimizing LLM Hallucination:**  
#    - If **no valid answer is found**, the system explicitly states **“I don’t know”** instead of generating misleading information.  

# ---

# ### **Final Takeaway**  
# This pipeline ensures **high-quality, verifiable, and context-aware responses** by combining **retrieval-based and generative AI techniques**. With **multi-step validation**, the system prevents incorrect answers, irrelevant responses, and hallucination—creating a **robust and reliable AI assistant**. 🚀

In [None]:
# Install dependencies
!pip install -U langchain-nomic langchain_community tiktoken langchainhub chromadb langchain langgraph tavily-python pt4all firecrawl-py

import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import GPT4AllEmbeddings
from langchain_community.document_loaders import FireCrawlLoader
from langchain_community.vectorstores.utils import filter_complex_metadata
from langchain.docstore.document import Document
from langchain.prompts import PromptTemplate
from langchain_community.chat_models import Chatollama
from langchain_core.output_parsers import JsonOutputParser
from tavily import TavilyClient

# Set environment variables for LangChain tracing and API keys
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = "xxxxxxx"  # Replace with actual API key
os.environ['TAVILY_API_KEY'] = "xxxxx"  # Replace with actual API key

# Define local LLaMA model
local_llm = "llama3"  # Ensure this is properly loaded in your environment

# URLs for scraping to extract information for document storage
urls = [
    "https://www.al-jason.com/learning-ai/how-to-reduce-ulm-cost",
    "https://www.ai-jason.com/learning-ai/gpt5-lim",
    "https://www.ai-jason.com/learning-ai/how-to-build-ai-agent-tutorial-3",
]

# Load documents using FireCrawlLoader from the provided URLs
docs = [FireCrawlLoader(api_key="xxxxx", url=url, mode="scrape").load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

# Split documents into smaller chunks for efficient retrieval
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=250, chunk_overlap=0)
doc_splits = text_splitter.split_documents(docs_list)

# Filter out complex metadata to keep only relevant data for storage
filtered_docs = []
for doc in doc_splits:
    if isinstance(doc, Document) and hasattr(doc, 'metadata'):
        clean_metadata = {k: v for k, v in doc.metadata.items() if isinstance(v, (str, int, float, bool))}
        filtered_docs.append(Document(page_content=doc.page_content, metadata=clean_metadata))

# Store the processed documents into ChromaDB for retrieval
vectorstore = Chroma.from_documents(
    documents=filtered_docs,
    collection_name="rag-chroma",
    embedding=GPT4AllEmbeddings()
)
retriever = vectorstore.as_retriever()

# Define a grading prompt to assess document relevance to user queries
llm = Chatollama(model=local_llm, format="json", temperature=0)
prompt = PromptTemplate(
    template="""
    You are a grader assessing the relevance of a retrieved document to a user question. 
    Give a binary score 'yes' or 'no' to indicate whether the document is relevant.
    Provide the binary score as JSON with a single key 'score'.
    Document: {document}
    Question: {question}
    """,
    input_variables=["question", "document"]
)
retrieval_grader = prompt | llm | JsonOutputParser()


# Define the RAG (Retrieval-Augmented Generation) prompt to answer user queries
rag_prompt = PromptTemplate(
    template="""
    You are an assistant for question-answering tasks.
    Use the retrieved context to answer the question. If unsure, say "I don't know".
    Answer concisely in three sentences maximum.
    Question: {question}
    Context: {context}
    """,
    input_variables=["question", "context"]
)
rag_chain = rag_prompt | llm | JsonOutputParser()


# Define hallucination detection to verify if model-generated answers align with context
hallucination_prompt = PromptTemplate(
    template="""
    You are checking if the model-generated answer is factually correct based on the retrieved context.
    If the answer contains information not found in the context, classify it as hallucinated.
    Provide a binary score 'yes' or 'no' as JSON with a single key 'hallucination'.
    Context: {context}
    Answer: {answer}
    """,
    input_variables=["context", "answer"]
)
hallucination_checker = hallucination_prompt | llm | JsonOutputParser()



# Define a final answer grader to check if the answer resolves the question
answer_grader_prompt = PromptTemplate(
    template="""
    You are a grader assessing whether an answer is useful to resolve a question. 
    Give a binary score 'yes' or 'no' to indicate whether the answer is useful.
    Provide the binary score as JSON with a single key 'score'.
    Here is the answer: 
    
    {generation}
    
    Here is the question: {question}
    """,
    input_variables=["generation", "question"]
)
answer_grader = answer_grader_prompt | llm | JsonOutputParser()


# Web search function using Tavily API in case no relevant documents are found
def web_search(query):
    client = TavilyClient()
    results = client.search(query=query, num_results=3)
    return "\n".join([r['content'] for r in results['results']]) if 'results' in results else "No additional information found."
    

# Function to process user questions and return an accurate response
def answer_question(question):
    docs = retriever.invoke(question)
    doc_txt = docs[0].page_content if docs else ""
    
    if not docs or retrieval_grader.invoke({"question": question, "document": doc_txt})['score'] == "no":
        print("No relevant document found in ChromaDB. Performing web search...")
        doc_txt = web_search(question)
    
    answer = rag_chain.invoke({"context": doc_txt, "question": question})
    
    if hallucination_checker.invoke({"context": doc_txt, "answer": answer})['hallucination'] == "yes":
        print("Detected hallucination. Fetching more reliable sources...")
        doc_txt = web_search(question)
        answer = rag_chain.invoke({"context": doc_txt, "question": question})
    
    if answer_grader.invoke({"question": question, "generation": answer})['score'] == "no":
        print("Answer did not resolve the question. Performing another web search...")
        doc_txt = web_search(question)
        answer = rag_chain.invoke({"context": doc_txt, "question": question})
    
    return answer

# Example usage
question = "Where to buy iPhone 5?"
response = answer_question(question)
print(response)
