#Project Title: Retrieval-Augmented QA with LangChain, Chroma & Wikipedia

Problem Statement:   

Traditional search systems often fail to deliver context-rich answers from unstructured documents. There's a growing need for modular, explainable RAG pipelines that integrate local knowledge with external sources like Wikipedia for more accurate and relevant responses.

Project Overview:

This notebook demonstrates a lightweight RAG-based QA system built using LangChain, Chroma, and Hugging Face models. It enables semantic search over uploaded documents and Wikipedia, optimizing retrieval latency and answer relevance through efficient chunking, embedding, and vector search.


Key Features:  

Semantic search using all-MiniLM-L6-v2 embeddings

Wikipedia integration via Python API

Vector store with Chroma and persistent storage

Modular LangChain pipeline for easy extension

Deployed in Colab with GPU support

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
from google.colab import files
uploaded = files.upload()

Saving extracted_text.txt to extracted_text.txt


In [None]:
!pip install wikipedia

Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11678 sha256=be218b48f9efe7106cb40d492d3e697cdac7f8f1f500948b32c3563f4af4b4be
  Stored in directory: /root/.cache/pip/wheels/8f/ab/cb/45ccc40522d3a1c41e1d2ad53b8f33a62f394011ec38cd71c6
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


In [None]:
!pip install langchain langchain-community langchain_openai langchain_chroma

Collecting langchain-community
  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Collecting langchain_openai
  Downloading langchain_openai-0.3.29-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain_chroma
  Downloading langchain_chroma-0.2.5-py3-none-any.whl.metadata (1.1 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.10.1-py3-none-any.whl.metadata (3.4 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting langchain-core<1.0.0,>=0.3.72 (from langchain)
  Downloading langchain_core-0.3.74-py3-none-any.whl.metadata (5.8 kB)
Collecting chromadb>=1.0.9 (from langchain_chroma)
  Downloading chromadb-1.0.16-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.4 kB

#Base Model

In [None]:
# 📦 Imports
import os
import gradio as gr
from transformers import pipeline
from langchain.llms import HuggingFacePipeline
from langchain.chains import RetrievalQA
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders import TextLoader, WikipediaLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# 📄 Load documents: local + Wikipedia
local_loader = TextLoader("extracted_text.txt")  # 🔁 Replace with your actual file
wiki_loader = WikipediaLoader(query="New Delhi is the Capital of India", load_max_docs=2)

docs = local_loader.load() + wiki_loader.load()

# ✂️ Split documents into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
split_docs = splitter.split_documents(docs)

# 🔍 Embed and store in Chroma
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = Chroma.from_documents(split_docs, embedding_model)

# 🤖 Load local model with transformers
local_pipeline = pipeline("text2text-generation", model="google/flan-t5-base", max_length=256, truncation=True)
llm = HuggingFacePipeline(pipeline=local_pipeline)

# 🔗 Create RAG chain
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
rag_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever, return_source_documents=True)

# 🧠 Define response function
def generate_response(question):
    try:
        response = rag_chain.invoke({"query": question})
        answer = response["result"]
        sources = response.get("source_documents", [])

        if not sources:
            return "**Answer (fallback):**\nThis question is outside the scope of the provided documents."

        source_text = "\n\n".join([doc.page_content for doc in sources])
        return f"**Answer:**\n{answer}\n\n**Sources:**\n{source_text}"
    except Exception as e:
        return f"❌ Error: {str(e)}"


  embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cuda:0
  llm = HuggingFacePipeline(pipeline=local_pipeline)


Gradio Interface based on local document/file

In [None]:
# 🎛️ Launch Gradio interface
iface = gr.Interface(
    fn=generate_response,
    inputs=gr.Textbox(lines=2, placeholder="Enter your question here..."),
    outputs="markdown",
    title="🌐 RAG QA with Local + Wikipedia Knowledge",
    description="Ask questions based on local documents and Wikipedia. Powered by LangChain, Chroma, and Hugging Face's flan-t5-base running locally."
)

iface.launch(share=True, debug=True)

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://f0ceab5b9450506adc.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://f0ceab5b9450506adc.gradio.live


