#RAG pipeline that mitigates hallucinations using Gemini + LangChain

#🧠 Objective:
Mitigate hallucinations in RAG by:

Using Hybrid Retrieval (dense + sparse)

Filtering/re-ranking retrieved content

Applying a fact-checking model

Using Gemini for reasoning with trusted content only

#🔧 Step-by-Step Implementation
#✅ Step 1: Install Required Libraries

In [3]:
!pip install langchain langchain-google-genai chromadb tiktoken rank_bm25 langchain-community

Collecting langchain-community
  Downloading langchain_community-0.3.25-py3-none-any.whl.metadata (2.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading mypy_extensions-1.1.0-py3-no

#✅ ✅ OPTION 1: Programmatically Create hallucination_rag.txt
Add this code before the loading step in your script:

In [6]:
# Create the sample input file
with open("hallucination_rag.txt", "w") as f:
    f.write("""
    LangChain is a framework to build LLM-powered apps.
    It supports modular chains and agents.

    Retrieval-Augmented Generation (RAG) improves LLM accuracy by retrieving documents as context.

    Hallucinations occur when LLMs invent facts. LangChain supports hybrid retrievers and fact-checking to reduce this.

    Google's Gemini is a powerful model for generation and reasoning tasks.
    """)


loader = TextLoader("hallucination_rag.txt")
docs = loader.load()


#✅ Step 2: Implementing a Hybrid Retriever
We'll combine Dense Retriever (Gemini Embeddings) with Sparse Retriever (BM25).

In [7]:
# Imports
import os
from langchain.vectorstores import Chroma
from langchain_google_genai.embeddings import GoogleGenerativeAIEmbeddings # Corrected import
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from rank_bm25 import BM25Okapi

# API key setup
os.environ["GOOGLE_API_KEY"] = "AIzaSyDR7ItGwxOcbodnqRZXJQzFN_MVrRWxGaw"

# Load and split documents
loader = TextLoader("hallucination_rag.txt")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = splitter.split_documents(docs)

# Dense embedding retriever
dense_embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vectorstore = Chroma.from_documents(chunks, embedding=dense_embeddings)
dense_retriever = vectorstore.as_retriever()

# Sparse (BM25) retriever setup
tokenized_corpus = [doc.page_content.split() for doc in chunks]
bm25 = BM25Okapi(tokenized_corpus)

#✅ Step 3: Implementing Context Filtering & Ranking
Retrieve from both methods and merge them, re-ranking based on token overlap.

In [8]:
def hybrid_retrieve(query, top_k=4):
    # Dense retrieval
    dense_results = dense_retriever.get_relevant_documents(query)

    # Sparse retrieval
    query_tokens = query.split()
    bm25_scores = bm25.get_scores(query_tokens)
    top_sparse_idxs = sorted(range(len(bm25_scores)), key=lambda i: bm25_scores[i], reverse=True)[:top_k]
    sparse_results = [chunks[i] for i in top_sparse_idxs]

    # Combine and filter duplicates
    combined = list({doc.page_content: doc for doc in dense_results + sparse_results}.values())
    return combined[:top_k]


#✅ Step 4: Implementing a Fact-Checking Model (using Gemini)

We'll use Gemini to verify if the final answer is supported by retrieved context.



In [10]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash-latest", temperature=0.3)

def generate_answer(context_docs, question):
    context = "\n".join([doc.page_content for doc in context_docs])
    prompt = f"""
    Given the following context:\n{context}\n\n
    Answer the question: "{question}"
    Only use information from the context. If unsure, say "Insufficient information."
    """
    return llm.invoke(prompt)


#✅ Step 5: Testing the Pipeline

In [11]:
query = "What is LangChain and how does it help mitigate hallucinations?"

# Step 1: Retrieve hybrid context
retrieved_docs = hybrid_retrieve(query)

# Step 2: Generate a response
answer = generate_answer(retrieved_docs, query)

# Output
print("Final Answer:\n", answer)


  dense_results = dense_retriever.get_relevant_documents(query)


Final Answer:
 content='LangChain is a framework to build LLM-powered apps.  It supports hybrid retrievers and fact-checking to reduce hallucinations (which occur when LLMs invent facts).' additional_kwargs={} response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-1.5-flash-latest', 'safety_ratings': []} id='run--c1833fac-e534-4ce8-8c9d-fe15a678fc21-0' usage_metadata={'input_tokens': 129, 'output_tokens': 36, 'total_tokens': 165, 'input_token_details': {'cache_read': 0}}
