A production-ready Retrieval-Augmented Generation (RAG) system that allows users to upload PDFs and chat with them using AI. Built with 100% free and open-source technologies.
- 📄 PDF Upload: Upload and process multiple PDF documents
- 💬 AI Chat Interface: Ask questions and get intelligent answers
- 🔍 Vector Search: Semantic search using ChromaDB
- 📚 Source Citations: See which documents answers come from
- ⚡ Streaming Responses: Real-time AI responses with typewriter effect
- 🎨 Modern UI: Beautiful glassmorphism design with dark theme
- 🆓 100% Free: Uses free tiers and open-source models
- Frontend: Streamlit
- LLM: Google Gemini 1.5 Flash (Free Tier)
- Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
- Vector DB: ChromaDB (Local)
- Framework: LangChain
- PDF Processing: PyPDF2
- Python 3.9 or higher
- Google API Key (free from Google AI Studio)
-
Clone or download the project:
cd rag-knowledge-base -
Create a virtual environment:
python -m venv venv # On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
- Copy
.env.exampleto.env - Add your Google API key:
GOOGLE_API_KEY=your_api_key_here
- Copy
-
Run the application:
streamlit run app.py
-
Open your browser to
http://localhost:8501
- Upload PDFs: Click the upload button and select your PDF files
- Process Documents: Click "Process Documents" to analyze your PDFs
- Start Chatting: Ask questions about your documents in the chat interface
- View Sources: Expand source citations to see where answers came from
- Manage Documents: Use the sidebar to view and delete documents
rag-knowledge-base/
├── app.py # Main Streamlit application
├── requirements.txt # Python dependencies
├── .env.example # Environment variables template
├── .streamlit/
│ └── config.toml # Streamlit configuration
├── src/
│ ├── components/
│ │ ├── chat_interface.py # Chat UI component
│ │ ├── pdf_uploader.py # PDF upload component
│ │ └── sidebar.py # Sidebar component
│ ├── core/
│ │ ├── embeddings.py # Embedding generation
│ │ ├── pdf_processor.py # PDF processing
│ │ ├── rag_pipeline.py # RAG logic
│ │ └── vector_store.py # ChromaDB integration
│ └── utils/
│ ├── config.py # Configuration
│ └── session_state.py # State management
└── assets/
└── styles.py # Custom CSS styles
- Glassmorphism Effects: Modern semi-transparent UI elements
- Smooth Animations: Fade-ins, hover effects, and transitions
- Gradient Accents: Beautiful color gradients throughout
- Dark Theme: Professional dark mode with proper contrast
- Responsive Design: Works on desktop, tablet, and mobile
-
Push to GitHub:
git init git add . git commit -m "Initial commit" git remote add origin your-repo-url git push -u origin main
-
Deploy on Streamlit Cloud:
- Go to share.streamlit.io
- Connect your GitHub repository
- Select
app.pyas the main file - Add
GOOGLE_API_KEYin Secrets Management - Click Deploy!
-
Share your URL with recruiters!
- Visit Google AI Studio
- Click "Create API Key"
- Copy and paste into your
.envfile
Free Tier Limits:
- 15 requests per minute
- 1,500 requests per day
- Perfect for demos and personal projects!
This project demonstrates:
- ✅ Modern AI Engineering: RAG, vector databases, LLM integration
- ✅ Production-Ready Code: Clean architecture, error handling, caching
- ✅ UX Excellence: Streaming responses, source citations, beautiful UI
- ✅ Full-Stack Skills: Frontend (Streamlit), backend (Python), AI/ML
- ✅ Deployment: Cloud-ready with free hosting
Feel free to fork, improve, and submit pull requests!
MIT License - feel free to use this project for your portfolio!
Add screenshots or a demo video here after deployment!
Built with ❤️ for YC Recruiters
Powered by Google Gemini, ChromaDB, and LangChain