In [None]:
### **Deep Explanation of the Code Flow**   first read this theory for code explaination then go to code present in second cell

This code implements a **Retrieval-Augmented Generation (RAG) pipeline** using **LangChain, ChromaDB, a local LLaMA model, and Tavily web search API**. The purpose of this pipeline is to **retrieve, validate, and generate answers** based on existing documents and external search results while ensuring factual correctness.

---

## **1. Setting Up the Environment**
- The code starts by installing necessary libraries, which include **LangChain components, ChromaDB (vector database), FireCrawl (web crawling), Tavily (web search API), and a local LLaMA model for inference**.
- Environment variables are set for LangChain tracing and API keys for authentication with LangChain and Tavily.

---

## **2. Document Ingestion & Processing**
- The system is designed to **scrape data from predefined URLs** using `FireCrawlLoader`, which fetches web content.
- The retrieved data is then **split into smaller text chunks** using `RecursiveCharacterTextSplitter`. This step is critical because long documents may not fit into the model’s context window.
- The code then **filters complex metadata**, keeping only relevant and simple metadata fields (e.g., text, numbers, and boolean values).
- These processed document chunks are **stored in ChromaDB**, where they are converted into embedding vectors using `GPT4AllEmbeddings` for fast retrieval.

---

## **3. Retrieving Documents for a User Query**
- When a user asks a question, the system first attempts to **retrieve relevant documents from ChromaDB**.
- The retrieved document is then **graded using a retrieval grader model** (`Chatollama` LLM), which determines whether the document is relevant.
  - If **no relevant document is found**, the system **performs a web search** using Tavily API to fetch external information.
  - If a document is found but **marked as irrelevant**, it is discarded, and web search is used as a fallback.

---

## **4. Generating an Answer Using the Retrieved Context**
- The system uses **RAG (Retrieval-Augmented Generation)** by feeding the retrieved document and user query into a **prompt template**.
- The local LLaMA model (`Chatollama`) generates a response using the retrieved document’s content.

---

## **5. Verifying for Hallucination**
- Since LLMs can **hallucinate (generate incorrect or non-factual information)**, the generated answer is checked against the retrieved document.
- If **the generated answer contains information that is not found in the retrieved document**, it is marked as **hallucinated**.
  - If hallucination is detected, the system **performs another web search** and regenerates a new answer using the newly fetched context.
  - If the answer is **not hallucinated**, it is considered **factually valid** and returned to the user.

---

## **6. Final Answer Delivery**
- After multiple **validation loops** (retrieval → validation → web search → hallucination check), the final response is returned.
- This ensures that the **answer is accurate, factually grounded, and based on a reliable source**.

---

## **Core Strengths of This Approach**
1. **Combining Static & Dynamic Knowledge:**  
   - Uses **pre-indexed documents (ChromaDB)** for efficiency.  
   - Uses **real-time web search (Tavily)** to fetch fresh data when needed.

2. **Multi-Level Filtering & Validation:**  
   - **Retrieval Grader** ensures that irrelevant documents are discarded.  
   - **Hallucination Checker** ensures that incorrect answers are corrected.  
   - **Web Search Fallback** ensures information is always available.

3. **Minimizing LLM Hallucination:**  
   - The system does **not allow the model to generate answers freely** without checking if the context supports it.
   - If **no valid answer is found, it explicitly states “I don’t know” instead of guessing.**

---

### **Final Takeaway**
This pipeline creates a **reliable, context-aware, and verifiable question-answering system**. It ensures that the **answers are backed by valid sources**, preventing misinformation and hallucination. 🚀

In [None]:
# Install dependencies
!pip install -U langchain-nomic langchain_community tiktoken langchainhub chromadb langchain langgraph tavily-python pt4all firecrawl-py

import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import GPT4AllEmbeddings
from langchain_community.document_loaders import FireCrawlLoader
from langchain_community.vectorstores.utils import filter_complex_metadata
from langchain.docstore.document import Document
from langchain.prompts import PromptTemplate
from langchain_community.chat_models import Chatollama
from langchain_core.output_parsers import JsonOutputParser
from tavily import TavilyClient

# Set environment variables for LangChain tracing and API keys
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = "xxxxxxx"  # Replace with actual API key
os.environ['TAVILY_API_KEY'] = "xxxxx"  # Replace with actual API key

# Define local LLaMA model
local_llm = "llama3"  # Ensure this is properly loaded in your environment

# URLs for scraping to extract information for document storage
urls = [
    "https://www.al-jason.com/learning-ai/how-to-reduce-ulm-cost",
    "https://www.ai-jason.com/learning-ai/gpt5-lim",
    "https://www.ai-jason.com/learning-ai/how-to-build-ai-agent-tutorial-3",
]

# Load documents using FireCrawlLoader from the provided URLs
docs = [FireCrawlLoader(api_key="xxxxx", url=url, mode="scrape").load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

# Split documents into smaller chunks for efficient retrieval
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=250, chunk_overlap=0)
doc_splits = text_splitter.split_documents(docs_list)

# Filter out complex metadata to keep only relevant data for storage
filtered_docs = []
for doc in doc_splits:
    if isinstance(doc, Document) and hasattr(doc, 'metadata'):
        clean_metadata = {k: v for k, v in doc.metadata.items() if isinstance(v, (str, int, float, bool))}
        filtered_docs.append(Document(page_content=doc.page_content, metadata=clean_metadata))

# Store the processed documents into ChromaDB for retrieval
vectorstore = Chroma.from_documents(
    documents=filtered_docs,
    collection_name="rag-chroma",
    embedding=GPT4AllEmbeddings()
)
retriever = vectorstore.as_retriever()

# Define a grading prompt to assess document relevance to user queries
llm = Chatollama(model=local_llm, format="json", temperature=0)
prompt = PromptTemplate(
    template="""
    You are a grader assessing the relevance of a retrieved document to a user question. 
    Give a binary score 'yes' or 'no' to indicate whether the document is relevant.
    Provide the binary score as JSON with a single key 'score'.
    Document: {document}
    Question: {question}
    """,
    input_variables=["question", "document"]
)
retrieval_grader = prompt | llm | JsonOutputParser()

# Define hallucination detection to verify if model-generated answers align with context
hallucination_prompt = PromptTemplate(
    template="""
    You are checking if the model-generated answer is factually correct based on the retrieved context.
    If the answer contains information not found in the context, classify it as hallucinated.
    Provide a binary score 'yes' or 'no' as JSON with a single key 'hallucination'.
    Context: {context}
    Answer: {answer}
    """,
    input_variables=["context", "answer"]
)
hallucination_checker = hallucination_prompt | llm | JsonOutputParser()

# Web search function using Tavily API in case no relevant documents are found
def web_search(query):
    client = TavilyClient()
    results = client.search(query=query, num_results=3)
    return "\n".join([r['content'] for r in results['results']]) if 'results' in results else "No additional information found."

# Define the RAG (Retrieval-Augmented Generation) prompt to answer user queries
rag_prompt = PromptTemplate(
    template="""
    You are an assistant for question-answering tasks.
    Use the retrieved context to answer the question. If unsure, say "I don't know".
    Answer concisely in three sentences maximum.
    Question: {question}
    Context: {context}
    """,
    input_variables=["question", "context"]
)
rag_chain = rag_prompt | llm | JsonOutputParser()

# Function to process user questions and return an accurate response
def answer_question(question):
    # Retrieve relevant documents from ChromaDB
    docs = retriever.invoke(question)
    doc_txt = docs[0].page_content if docs else ""
    
    # Check if the retrieved document is relevant; if not, perform a web search
    if not docs or retrieval_grader.invoke({"question": question, "document": doc_txt})['score'] == "no":
        print("No relevant document found in ChromaDB. Performing web search...")
        doc_txt = web_search(question)
    
    # Generate an answer using the retrieved or searched context
    answer = rag_chain.invoke({"context": doc_txt, "question": question})
    
    # Verify if the generated answer is hallucinated
    if hallucination_checker.invoke({"context": doc_txt, "answer": answer})['hallucination'] == "yes":
        print("Detected hallucination. Fetching more reliable sources...")
        doc_txt = web_search(question)
        answer = rag_chain.invoke({"context": doc_txt, "question": question})
    
    return answer

# Example usage
question = "Where to buy iPhone 5?"
response = answer_question(question)
print(response)
