Skip to content

RamonRL/llm-rag-api

Repository files navigation

RAG AI Assistant — Chat with Your Documents

A full-stack AI application that allows users to upload documents (PDF or text) and ask questions about them using a Retrieval-Augmented Generation (RAG) pipeline.


Overview

This project implements a production-style RAG system that combines:

  • Document ingestion (PDF and text)
  • Text chunking and embedding generation
  • Vector similarity search
  • LLM-based question answering

Users can interact through a simple web interface to query their own documents, similar to a private ChatGPT.


Tech Stack

Backend

  • Python
  • FastAPI
  • FAISS (vector database)
  • OpenAI API (embeddings + LLM)
  • Docker

Frontend

  • Streamlit
  • Requests

Features

  • Upload PDF or text documents
  • Intelligent text chunking for improved retrieval
  • Semantic search using vector embeddings (FAISS)
  • LLM-powered question answering (RAG pipeline)
  • Interactive UI with Streamlit
  • Persistent vector storage (FAISS index + documents)
  • Rate limiting for API protection
  • Basic metrics endpoint
  • Logging and error handling
  • Fully dockerized (API + UI)

Running Locally

1. Set environment variables

Create a .env file:

OPENAI_API_KEY=your_api_key_here

2. Run with Docker

docker-compose up --build

3. Access the app

API Endpoints

  • Health Check
GET /v1/health

Response:

{
  "status": "healthy"
}
  • Upload Document
POST /v1/upload

Upload a .txt or .pdf file. The document is processed, chunked, embedded, and stored in the vector database.

  • Ask Question
POST /v1/ask

Request:

{
  "question": "What is this document about?"
}

Response:

{
  "answer": "..."
}
  • Metrics
GET /v1/metrics

Returns basic usage statistics.

How It Works

  1. Documents are uploaded and converted into text
  2. Text is split into overlapping chunks
  3. Each chunk is transformed into an embedding
  4. Embeddings are stored in a FAISS vector index
  5. User queries are embedded and matched against stored chunks
  6. Relevant context is retrieved and passed to an LLM
  7. The LLM generates a grounded answer based on the context

Production-Oriented Features

  • API versioning (/v1/...)
  • Rate limiting to prevent abuse
  • Logging of requests and errors
  • Persistent storage of vector index
  • Separation of services (API + UI)
  • Environment-based configuration

Future Improvements

  • Authentication (API keys / JWT)
  • Streaming responses (real-time answers)
  • Advanced UI (chat interface with history)
  • Hybrid search (keyword + semantic)
  • Model switching (local vs API-based)
  • CI/CD pipeline
  • Monitoring (Prometheus + Grafana)

Live Demo

Use Case

This project demonstrates how to build real-world AI applications such as:

  • Document Q&A systems
  • Internal knowledge assistants
  • Customer support copilots
  • AI-powered search engines

Author

Built as part of a Machine Learning / AI Engineering portfolio project.

Motivation

This project showcases the ability to:

  • Design and implement RAG systems
  • Work with LLMs in production-like environments
  • Build full-stack AI applications
  • Deploy scalable, modular systems using modern tools

About

A full-stack AI application that allows users to upload documents (PDF or text) and ask questions about them using a Retrieval-Augmented Generation (RAG) pipeline.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors