# ü§ñ Q&A Chatbot with LangChain

Build an intelligent Q&A chatbot that answers questions based on your documents using LangChain and LLMs.

**Features:**
- üìÑ Process PDF, TXT, DOCX files
- üîç Vector-based semantic search
- üí¨ Context-aware answers
- üéØ Support for OpenAI and Google Gemini models
- üìä Source citations

## 1. Installation

Install all required libraries:

In [1]:
%pip install langchain langchain-community langchain-openai langchain-google-genai -q
%pip install chromadb sentence-transformers -q
%pip install pypdf python-docx tiktoken -q

print("‚úÖ All packages installed successfully!")

Note: you may need to restart the kernel to use updated packages.


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-generativeai 0.8.5 requires google-ai-generativelanguage==0.6.15, but you have google-ai-generativelanguage 0.9.0 which is incompatible.

[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.
‚úÖ All packages installed successfully!



[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


## 2. Import Libraries

In [1]:
import os
from getpass import getpass
import warnings
warnings.filterwarnings('ignore')

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader, PyPDFLoader, Docx2txtLoader
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain_google_genai import ChatGoogleGenerativeAI

print("‚úÖ Libraries imported successfully!")

‚úÖ Libraries imported successfully!


## 3. API Key Configuration

Choose your preferred LLM provider and enter your API key:

**Get API Keys:**
- OpenAI: https://platform.openai.com/api-keys
- Google Gemini: https://aistudio.google.com/app/apikey (Free tier available!)

In [4]:
PROVIDER = 'gemini'

if PROVIDER == 'openai':
    api_key = getpass("Enter your OpenAI API key: ")
    os.environ['OPENAI_API_KEY'] = api_key
    print("‚úÖ OpenAI API key configured")
elif PROVIDER == 'gemini':
    api_key = getpass("Enter your Google Gemini API key: ")
    os.environ['GOOGLE_API_KEY'] = api_key
    print("‚úÖ Gemini API key configured")

‚úÖ Gemini API key configured


## 4. Document Loader Class

Create a flexible document loader that supports multiple file formats:

In [5]:
class DocumentProcessor:
    def __init__(self):
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200,
            length_function=len
        )
    
    def load_document(self, file_path):
        file_extension = file_path.lower().split('.')[-1]
        print(f"üìÑ Loading {file_extension.upper()} file...")
        
        if file_extension == 'pdf':
            loader = PyPDFLoader(file_path)
        elif file_extension == 'txt':
            loader = TextLoader(file_path)
        elif file_extension in ['docx', 'doc']:
            loader = Docx2txtLoader(file_path)
        else:
            raise ValueError(f"Unsupported file format: {file_extension}")
        
        documents = loader.load()
        print(f"‚úÖ Loaded {len(documents)} document(s)")
        return documents
    
    def load_from_text(self, text):
        from langchain.docstore.document import Document
        return [Document(page_content=text, metadata={"source": "text_input"})]
    
    def split_documents(self, documents):
        chunks = self.text_splitter.split_documents(documents)
        print(f"‚úÖ Split into {len(chunks)} chunks")
        return chunks

doc_processor = DocumentProcessor()
print("‚úÖ Document processor initialized")

‚úÖ Document processor initialized


## 5. Create Sample Document

In [6]:
sample_text = """Python Programming Language

Python is a high-level, interpreted programming language known for its simplicity and readability. 
Created by Guido van Rossum and first released in 1991, Python has become one of the most popular 
programming languages in the world.

Key Features:
1. Easy to Learn: Python's syntax is clear and intuitive, making it an excellent choice for beginners.
2. Versatile: Python can be used for web development, data science, machine learning, automation, and more.
3. Large Community: Python has a vast ecosystem of libraries and frameworks.
4. Object-Oriented: Python supports multiple programming paradigms.

Popular Use Cases:
- Data Science and Machine Learning
- Web Development with Django and Flask
- Automation and Scripting
- Scientific Computing

Python 3 is the current and recommended version."""

with open('sample_python_doc.txt', 'w') as f:
    f.write(sample_text)

print("‚úÖ Sample document created: sample_python_doc.txt")

‚úÖ Sample document created: sample_python_doc.txt


## 6. Load and Process Document

In [7]:
document_path = 'sample_python_doc.txt'
documents = doc_processor.load_document(document_path)
chunks = doc_processor.split_documents(documents)

print("\nüìù First chunk preview:")
print("=" * 70)
print(chunks[0].page_content[:300] + "...")

üìÑ Loading TXT file...
‚úÖ Loaded 1 document(s)
‚úÖ Split into 1 chunks

üìù First chunk preview:
Python Programming Language

Python is a high-level, interpreted programming language known for its simplicity and readability. 
Created by Guido van Rossum and first released in 1991, Python has become one of the most popular 
programming languages in the world.

Key Features:
1. Easy to Learn: Pyt...


## 7. Create Vector Store

In [8]:
print("üîÑ Creating embeddings...")

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={'device': 'cpu'}
)

vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

print(f"‚úÖ Vector store created with {len(chunks)} chunks")

üîÑ Creating embeddings...


  embeddings = HuggingFaceEmbeddings(



‚úÖ Vector store created with 1 chunks


## 8. Initialize LLM

In [12]:
if PROVIDER == 'openai':
    llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
    print("‚úÖ OpenAI LLM initialized")
elif PROVIDER == 'gemini':
    llm = ChatGoogleGenerativeAI(model="gemini-flash-latest", temperature=0)
    print("‚úÖ Gemini LLM initialized")

‚úÖ Gemini LLM initialized


## 9. Create Q&A Chain

In [13]:
prompt_template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know.

Context: {context}

Question: {question}

Answer:"""

PROMPT = PromptTemplate(template=prompt_template, input_variables=["context", "question"])

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)

print("‚úÖ Q&A chain created successfully!")

‚úÖ Q&A chain created successfully!


## 10. Ask Questions

In [14]:
def ask_question(question):
    print(f"\n‚ùì Question: {question}")
    print("=" * 70)
    result = qa_chain.invoke({"query": question})
    print(f"\nüí° Answer:")
    print(result['result'])
    return result

ask_question("What is Python?")


‚ùì Question: What is Python?

üí° Answer:
Python is a high-level, interpreted programming language known for its simplicity and readability. It was created by Guido van Rossum and first released in 1991.


{'query': 'What is Python?',
 'result': 'Python is a high-level, interpreted programming language known for its simplicity and readability. It was created by Guido van Rossum and first released in 1991.',
 'source_documents': [Document(metadata={'source': 'sample_python_doc.txt'}, page_content="Python Programming Language\n\nPython is a high-level, interpreted programming language known for its simplicity and readability. \nCreated by Guido van Rossum and first released in 1991, Python has become one of the most popular \nprogramming languages in the world.\n\nKey Features:\n1. Easy to Learn: Python's syntax is clear and intuitive, making it an excellent choice for beginners.\n2. Versatile: Python can be used for web development, data science, machine learning, automation, and more.\n3. Large Community: Python has a vast ecosystem of libraries and frameworks.\n4. Object-Oriented: Python supports multiple programming paradigms.\n\nPopular Use Cases:\n- Data Science and Machine Learning\

In [15]:
ask_question("What are the key features of Python?")


‚ùì Question: What are the key features of Python?

üí° Answer:
The key features of Python are:

1.  **Easy to Learn:** Python's syntax is clear and intuitive, making it an excellent choice for beginners.
2.  **Versatile:** Python can be used for web development, data science, machine learning, automation, and more.
3.  **Large Community:** Python has a vast ecosystem of libraries and frameworks.
4.  **Object-Oriented:** Python supports multiple programming paradigms.


{'query': 'What are the key features of Python?',
 'result': "The key features of Python are:\n\n1.  **Easy to Learn:** Python's syntax is clear and intuitive, making it an excellent choice for beginners.\n2.  **Versatile:** Python can be used for web development, data science, machine learning, automation, and more.\n3.  **Large Community:** Python has a vast ecosystem of libraries and frameworks.\n4.  **Object-Oriented:** Python supports multiple programming paradigms.",
 'source_documents': [Document(metadata={'source': 'sample_python_doc.txt'}, page_content="Python Programming Language\n\nPython is a high-level, interpreted programming language known for its simplicity and readability. \nCreated by Guido van Rossum and first released in 1991, Python has become one of the most popular \nprogramming languages in the world.\n\nKey Features:\n1. Easy to Learn: Python's syntax is clear and intuitive, making it an excellent choice for beginners.\n2. Versatile: Python can be used for web 

In [16]:
ask_question("What is Python used for?")


‚ùì Question: What is Python used for?

üí° Answer:
Python is used for:
*   Web development (with Django and Flask)
*   Data science and machine learning
*   Automation and scripting
*   Scientific computing


{'query': 'What is Python used for?',
 'result': 'Python is used for:\n*   Web development (with Django and Flask)\n*   Data science and machine learning\n*   Automation and scripting\n*   Scientific computing',
 'source_documents': [Document(metadata={'source': 'sample_python_doc.txt'}, page_content="Python Programming Language\n\nPython is a high-level, interpreted programming language known for its simplicity and readability. \nCreated by Guido van Rossum and first released in 1991, Python has become one of the most popular \nprogramming languages in the world.\n\nKey Features:\n1. Easy to Learn: Python's syntax is clear and intuitive, making it an excellent choice for beginners.\n2. Versatile: Python can be used for web development, data science, machine learning, automation, and more.\n3. Large Community: Python has a vast ecosystem of libraries and frameworks.\n4. Object-Oriented: Python supports multiple programming paradigms.\n\nPopular Use Cases:\n- Data Science and Machine Le

## 11. Complete Chatbot Class

In [17]:
class DocumentQAChatbot:
    def __init__(self, provider='gemini'):
        self.provider = provider
        self.doc_processor = DocumentProcessor()
        self.vectorstore = None
        self.qa_chain = None
        self.llm = None
    
    def load_document(self, file_path):
        documents = self.doc_processor.load_document(file_path)
        chunks = self.doc_processor.split_documents(documents)
        embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
        self.vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings)
        print(f"‚úÖ Document loaded and indexed!")
    
    def initialize_llm(self):
        if self.provider == 'openai':
            self.llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
        elif self.provider == 'gemini':
            self.llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0)
        
        prompt = PromptTemplate(
            template="Context: {context}\nQuestion: {question}\nAnswer:",
            input_variables=["context", "question"]
        )
        
        self.qa_chain = RetrievalQA.from_chain_type(
            llm=self.llm,
            retriever=self.vectorstore.as_retriever(),
            return_source_documents=True,
            chain_type_kwargs={"prompt": prompt}
        )
        print(f"‚úÖ Chatbot ready!")
    
    def ask(self, question):
        result = self.qa_chain.invoke({"query": question})
        return result['result']

print("‚úÖ Chatbot class defined")

‚úÖ Chatbot class defined


## Summary

You have successfully built a Q&A chatbot with:
- Document processing (PDF, TXT, DOCX)
- Vector-based search
- Google Gemini or OpenAI integration
- Ready to use with your own documents!