## This project implements a Retrieval-Augmented Generation (RAG) chatbot that dynamically fetches content from Wikipedia, processes it, and enables interactive question-answering using a custom prompt pipeline.

Key Features:

Dynamic Knowledge Base: The chatbot loads a Wikipedia article based on the user’s first query and builds a context-aware retrieval system around it.

Text Chunking & Embedding: The article is split into manageable chunks using LangChain’s RecursiveCharacterTextSplitter and embedded using HuggingFaceEmbeddings (all-MiniLM-L6-v2).

Vector Database with Pinecone: Embeddings are stored in a Pinecone serverless index, enabling fast semantic search and retrieval.

Groq LLM Integration: Uses llama3-70b-8192 via Groq for high-speed, cost-efficient inference with retrieved context.

Custom Prompt-based RAG: Combines retrieved text and conversation history using a handcrafted prompt before passing it to the LLM.

Chat Memory Support: Maintains conversational memory across turns using LangChain’s ConversationBufferMemory.

To use this application, you must provide your own API keys for:

🧠 Groq (for LLaMA3-based language generation)
📦 Pinecone (for vector similarity search and retrieval)

In [65]:
!pip install python-dotenv



In [37]:
!pip install langchain langchain-community langchain-pinecone pinecone-client sentence-transformers

Collecting sentence-transformers
  Using cached sentence_transformers-5.0.0-py3-none-any.whl.metadata (16 kB)
Collecting transformers<5.0.0,>=4.41.0 (from sentence-transformers)
  Using cached transformers-4.53.1-py3-none-any.whl.metadata (40 kB)
Collecting torch>=1.11.0 (from sentence-transformers)
  Using cached torch-2.7.1-cp312-cp312-win_amd64.whl.metadata (28 kB)
Collecting scikit-learn (from sentence-transformers)
  Using cached scikit_learn-1.7.0-cp312-cp312-win_amd64.whl.metadata (14 kB)
Collecting huggingface-hub>=0.20.0 (from sentence-transformers)
  Using cached huggingface_hub-0.33.2-py3-none-any.whl.metadata (14 kB)
Collecting filelock (from transformers<5.0.0,>=4.41.0->sentence-transformers)
  Using cached filelock-3.18.0-py3-none-any.whl.metadata (2.9 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers<5.0.0,>=4.41.0->sentence-transformers)
  Using cached tokenizers-0.21.2-cp39-abi3-win_amd64.whl.metadata (6.9 kB)
Collecting safetensors>=0.4.3 (from transformers<5.0

In [5]:
!pip install langchain wikipedia langchain-pinecone tiktoken streamlit -U langchain_ollama


Collecting langchain_ollama
  Using cached langchain_ollama-0.3.3-py3-none-any.whl.metadata (1.5 kB)
Collecting ollama<1.0.0,>=0.4.8 (from langchain_ollama)
  Using cached ollama-0.5.1-py3-none-any.whl.metadata (4.3 kB)
Using cached langchain_ollama-0.3.3-py3-none-any.whl (21 kB)
Using cached ollama-0.5.1-py3-none-any.whl (13 kB)
Installing collected packages: ollama, langchain_ollama

   -------------------- ------------------- 1/2 [langchain_ollama]
   ---------------------------------------- 2/2 [langchain_ollama]

Successfully installed langchain_ollama-0.3.3 ollama-0.5.1


In [61]:
!pip install -U langchain-huggingface

Collecting langchain-huggingface
  Using cached langchain_huggingface-0.3.0-py3-none-any.whl.metadata (996 bytes)
Using cached langchain_huggingface-0.3.0-py3-none-any.whl (27 kB)
Installing collected packages: langchain-huggingface
Successfully installed langchain-huggingface-0.3.0


## Imports

In [71]:
import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from langchain_huggingface import HuggingFaceEmbeddings
from dotenv import load_dotenv

In [72]:
load_dotenv()

True

In [73]:
groq_api_key = os.getenv("GROQ_API_KEY")
pinecone_api_key = os.getenv("PINECONE_API_KEY")

## Extracting Articles from Wikipedia

In [74]:
import wikipedia

def fetch_article(title):
    try:
        summary = wikipedia.page(title).content
        return summary
    except:
        return None


In [13]:
transcript = fetch_article("mahatma gandhi")

## Text Chunking & Embedding

In [14]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
doc = Document(page_content=transcript)
chunks = splitter.split_documents([doc])

In [15]:
chunks[100]

Document(metadata={}, page_content='Friends and comrades, the light has gone out of our lives, and there is darkness everywhere, and I do not quite know what to tell you or how to say it. Our beloved leader, Bapu as we called him, the father of the nation, is no more. Perhaps I am wrong to say that; nevertheless, we will not see him again, as we have seen him for these many years, we will not run to him for advice or seek solace from him, and that is a terrible blow, not only for me, but for millions and millions in this country.')

In [63]:
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

In [41]:
vectors = embeddings.embed_query("who is gandhi?")
len(vectors)

384

## Creating a Pinecone vector database

In [75]:
from pinecone import Pinecone

pc = Pinecone(api_key=pinecone_api_key)

index_name = "db"

if not pc.has_index(index_name):
    pc.create_index_for_model(
        name=index_name,
        cloud="aws",
        region="us-east-1",
        embed={
            "model":"llama-text-embed-v2",
            "field_map":{"text": "chunk_text"}
        }
    )

In [76]:
index = pc.Index(index_name)

In [77]:
from langchain_pinecone import PineconeVectorStore
vector_store = PineconeVectorStore(index=index, embedding=embeddings)

In [78]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 4})

In [79]:
from langchain_groq import ChatGroq

llm = ChatGroq(
    api_key=groq_api_key,
    model_name="llama3-70b-8192"
)

In [64]:
# from langchain.prompts import PromptTemplate
#
# prompt = PromptTemplate(
#     template="""
# You are a helpful assistant.
# Answer ONLY from the provided transcript context.
# If the context is insufficient, just say you don't know.
#
# Context:
# {context}
#
# Question:
# {question}
# """,
#     input_variables=["context", "question"]
# )
#
# question = input("ask questions")
# retrieved_docs = retriever.invoke(question)
# context_text = "\n\n".join(doc.page_content for doc in retrieved_docs)
# final_prompt = prompt.format(context=context_text, question=question)
# response = llm.invoke(final_prompt)
#
# print("Bot:", response.content)


## Creating a memory based chatbot

In [80]:
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

prompt = PromptTemplate(
    input_variables=["chat_history", "context", "question"],
    template="""
You are a helpful assistant having a conversation with a user.

Chat history:
{chat_history}

Use the following context from a transcript to answer:
{context}

Question:
{question}

If the context is not relevant, just say "I don't know."
""")

def rag_with_memory(question, retriever, llm, prompt_template, memory):
    retrieved_docs = retriever.invoke(question)
    context = "\n\n".join(doc.page_content for doc in retrieved_docs)

    chat_history = ""
    for msg in memory.chat_memory.messages:
        if msg.type == "human":
            chat_history += f"Human: {msg.content}\n"
        elif msg.type == "ai":
            chat_history += f"AI: {msg.content}\n"

    final_prompt = prompt_template.format(
        chat_history=chat_history.strip(),
        context=context,
        question=question
    )

    response = llm.invoke(final_prompt)

    memory.chat_memory.add_user_message(question)
    memory.chat_memory.add_ai_message(response.content)

    return response.content


In [81]:
print("🤖 Ask me anything! Type 'exit' to quit.\n")
while True:
    user_input = input("You: ")
    if user_input.lower() in ["exit", "quit"]:
        break
    answer = rag_with_memory(user_input, retriever, llm, prompt, memory)
    print("Bot:", answer)


🤖 Ask me anything! Type 'exit' to quit.

