A production-grade Retrieval-Augmented Generation (RAG) chatbot built with Python, FastAPI, ChromaDB, and OpenAI. This chatbot allows you to ask questions about your documents and get accurate answers based on the content.
- 🚀 Fast & Modern API: Built with FastAPI for high performance
- 📚 Multi-format Support: Handles PDF, TXT, and Markdown files
- 🧠 Smart Chunking: Intelligent text chunking with overlap for better context
- 🔍 Vector Search: Uses ChromaDB for efficient similarity search
- 💬 GPT-4o-mini: Powered by OpenAI's latest model for accurate answers
- 🎨 Beautiful UI: Includes a modern chat interface
- 📝 Production Ready: Comprehensive logging, error handling, and validation
- Python 3.8+
- FastAPI - Web framework
- ChromaDB - Vector database for embeddings
- OpenAI API - Embeddings (text-embedding-3-small) and Chat (GPT-4o-mini)
- PyPDF - PDF processing
- Uvicorn - ASGI server
RAG Chatbot/
├── app.py # FastAPI application with endpoints
├── ingest.py # Document ingestion pipeline
├── rag.py # RAG retrieval and answer generation
├── test_rag.py # Test script
├── requirements.txt # Python dependencies
├── README.md # This file
├── .env # Environment variables (create this)
├── docs/ # Place your documents here
└── chroma_db/ # ChromaDB storage (auto-created)
cd "RAG Chatbot"python -m venv venv
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activatepip install -r requirements.txtCreate a .env file in the project root:
touch .envAdd your OpenAI API key to the .env file:
OPENAI_API_KEY=sk-your-api-key-here
Get your API key: Visit OpenAI Platform
Place your documents in the docs/ folder:
# Create docs folder if it doesn't exist
mkdir -p docs
# Add your documents (PDF, TXT, or MD files)
cp /path/to/your/document.pdf docs/Run the ingestion pipeline to process documents and store them in ChromaDB:
python ingest.pyThis will:
- Read all documents from the
docs/folder - Split them into chunks (500-1000 characters)
- Create embeddings using OpenAI
- Store everything in ChromaDB
Expected output:
2024-01-01 12:00:00 - __main__ - INFO - Starting document ingestion pipeline...
2024-01-01 12:00:00 - __main__ - INFO - Processing file: document.pdf
2024-01-01 12:00:00 - __main__ - INFO - Created 15 chunks from document.pdf
2024-01-01 12:00:00 - __main__ - INFO - Total chunks created: 15
2024-01-01 12:00:00 - __main__ - INFO - Creating embeddings...
2024-01-01 12:00:01 - __main__ - INFO - Successfully ingested 15 chunks into ChromaDB
2024-01-01 12:00:01 - __main__ - INFO - Ingestion pipeline completed successfully!
Start the FastAPI server:
uvicorn app:app --reloadThe server will start at http://localhost:8000
Open your browser and go to:
http://localhost:8000
You'll see a beautiful chat interface where you can ask questions!
Send POST requests to the /ask endpoint:
curl -X POST "http://localhost:8000/ask" \
-H "Content-Type: application/json" \
-d '{"question": "What is this document about?"}'Response format:
{
"answer": "Based on the provided documents...",
"context": [
{
"text": "Relevant chunk of text...",
"source": "document.pdf",
"chunk_index": "0"
}
]
}Run the test script to see the RAG system in action:
python test_rag.pyOnce the server is running, visit:
- Interactive API docs: http://localhost:8000/docs
- Alternative docs: http://localhost:8000/redoc
Returns the chat UI (HTML page)
Health check endpoint
Response:
{
"status": "healthy",
"service": "RAG Chatbot API",
"version": "1.0.0"
}Ask a question to the chatbot
Request body:
{
"question": "Your question here"
}Response:
{
"answer": "The answer based on your documents",
"context": [
{
"text": "Relevant text chunk",
"source": "filename.pdf",
"chunk_index": "0"
}
]
}You can modify these settings in the respective files:
DOCS_FOLDER = "./docs" # Where to read documents from
CHROMA_DB_PATH = "./chroma_db" # Where to store the database
CHUNK_SIZE = 750 # Characters per chunk (500-1000)
CHUNK_OVERLAP = 100 # Overlap between chunks
EMBEDDING_MODEL = "text-embedding-3-small" # OpenAI embedding modelTOP_K = 5 # Number of chunks to retrieve
CHAT_MODEL = "gpt-4o-mini" # OpenAI chat modelDocuments → Read & Parse → Chunk Text → Create Embeddings → Store in ChromaDB
- Reads PDF, TXT, and MD files
- Splits documents into overlapping chunks for better context
- Uses OpenAI's
text-embedding-3-smallto create vector embeddings - Stores chunks with metadata in ChromaDB
Question → Create Embedding → Search ChromaDB → Retrieve Top-K → Generate Answer
- Converts question to embedding
- Finds most similar chunks using vector search
- Passes relevant chunks to GPT-4o-mini
- Generates answer using only the provided context
The chatbot is instructed to:
- Use ONLY the provided context to answer questions
- Say "I don't know" if information isn't in the documents
- Be concise and accurate
- Not make up information
- Make sure you created a
.envfile in the project root - Add your API key:
OPENAI_API_KEY=sk-... - Restart the server after adding the key
- Run the ingestion pipeline first:
python ingest.py - Make sure documents are in the
docs/folder
- Check that your documents are in the
docs/folder - Supported formats:
.pdf,.txt,.md - Check file permissions
# Use a different port
uvicorn app:app --reload --port 8001To update the knowledge base with new documents:
- Add new files to the
docs/folder - Run the ingestion pipeline again:
python ingest.py
This will delete the old ChromaDB collection and create a new one with all documents.
python test_rag.pyThe application uses Python logging. Check the console output for detailed logs.
To support additional file types, modify the read_document() function in ingest.py:
def read_document(file_path: str) -> str:
extension = Path(file_path).suffix.lower()
if extension == '.pdf':
return read_pdf(file_path)
elif extension in ['.txt', '.md']:
return read_text_file(file_path)
elif extension == '.docx': # Add your custom handler
return read_docx(file_path)
# ...-
Chunk Size: Adjust
CHUNK_SIZEbased on your documents- Smaller chunks (500): More precise, but may miss context
- Larger chunks (1000): More context, but less precise
-
Top-K: Adjust
TOP_Kfor retrieval- More chunks: Better context, but more tokens/cost
- Fewer chunks: Faster, cheaper, but may miss information
-
Embeddings: The
text-embedding-3-smallmodel is cost-effective- For better quality: Use
text-embedding-3-large
- For better quality: Use
-
Chat Model:
gpt-4o-miniis fast and affordable- For better reasoning: Use
gpt-4o
- For better reasoning: Use
Approximate costs (as of November 2024):
-
Ingestion (one-time per document):
- Embeddings: $0.00002/1K tokens
- Example: 100 pages ≈ $0.10
-
Each Question:
- Query embedding: $0.00002/1K tokens ≈ $0.00001
- GPT-4o-mini: $0.150/1M input tokens, $0.600/1M output tokens
- Example: ≈ $0.001 per question
This project is open source and available for personal and commercial use.
For issues or questions:
- Check the troubleshooting section
- Review the logs for error messages
- Ensure all dependencies are installed
- Verify your OpenAI API key is valid
- Add authentication for production use
- Implement caching for faster responses
- Add support for more document types
- Deploy to cloud (AWS, GCP, Azure)
- Add conversation history
- Implement streaming responses
Built with ❤️ using Python, FastAPI, ChromaDB, and OpenAI