A full-stack AI application that allows users to upload documents (PDF or text) and ask questions about them using a Retrieval-Augmented Generation (RAG) pipeline.
This project implements a production-style RAG system that combines:
- Document ingestion (PDF and text)
- Text chunking and embedding generation
- Vector similarity search
- LLM-based question answering
Users can interact through a simple web interface to query their own documents, similar to a private ChatGPT.
- Python
- FastAPI
- FAISS (vector database)
- OpenAI API (embeddings + LLM)
- Docker
- Streamlit
- Requests
- Upload PDF or text documents
- Intelligent text chunking for improved retrieval
- Semantic search using vector embeddings (FAISS)
- LLM-powered question answering (RAG pipeline)
- Interactive UI with Streamlit
- Persistent vector storage (FAISS index + documents)
- Rate limiting for API protection
- Basic metrics endpoint
- Logging and error handling
- Fully dockerized (API + UI)
Create a .env file:
OPENAI_API_KEY=your_api_key_heredocker-compose up --build- API docs: http://localhost:8000/docs
- UI: http://localhost:8501
- Health Check
GET /v1/healthResponse:
{
"status": "healthy"
}- Upload Document
POST /v1/uploadUpload a .txt or .pdf file. The document is processed, chunked, embedded, and stored in the vector database.
- Ask Question
POST /v1/askRequest:
{
"question": "What is this document about?"
}Response:
{
"answer": "..."
}- Metrics
GET /v1/metricsReturns basic usage statistics.
- Documents are uploaded and converted into text
- Text is split into overlapping chunks
- Each chunk is transformed into an embedding
- Embeddings are stored in a FAISS vector index
- User queries are embedded and matched against stored chunks
- Relevant context is retrieved and passed to an LLM
- The LLM generates a grounded answer based on the context
- API versioning (/v1/...)
- Rate limiting to prevent abuse
- Logging of requests and errors
- Persistent storage of vector index
- Separation of services (API + UI)
- Environment-based configuration
- Authentication (API keys / JWT)
- Streaming responses (real-time answers)
- Advanced UI (chat interface with history)
- Hybrid search (keyword + semantic)
- Model switching (local vs API-based)
- CI/CD pipeline
- Monitoring (Prometheus + Grafana)
- Frontend: https://llm-rag-api-frontend.onrender.com
- Backend: https://llm-rag-api.onrender.com/docs
This project demonstrates how to build real-world AI applications such as:
- Document Q&A systems
- Internal knowledge assistants
- Customer support copilots
- AI-powered search engines
Built as part of a Machine Learning / AI Engineering portfolio project.
This project showcases the ability to:
- Design and implement RAG systems
- Work with LLMs in production-like environments
- Build full-stack AI applications
- Deploy scalable, modular systems using modern tools