An asynchronous Retrieval-Augmented Generation (RAG) API built with FastAPI, Redis Queue, and LangChain. This system processes PDF documents and answers questions based on their content using OpenAI's GPT models and vector similarity search.
- Asynchronous Processing: Jobs are queued and processed in the background, keeping the API responsive
- PDF Document Processing: Automatically chunks and indexes PDF documents
- Vector Search: Uses Qdrant for efficient similarity search
- Job Tracking: Monitor job status and retrieve results when ready
- Scalable Architecture: Separate API and worker processes for better resource management
- FastAPI: Web framework for the API
- Redis Queue (RQ): Background job processing (using Valkey)
- LangChain: Document processing and RAG orchestration
- Qdrant: Vector database for similarity search
- OpenAI: Embeddings and chat completions
- PyPDF: PDF document loading
- Python 3.8+
- Redis/Valkey server
- Docker (for Qdrant)
- OpenAI API key
- Clone the repository:
git clone https://github.com/10Vaibhav/Async-RAG-API.git
cd async-rag-api- Create a virtual environment:
python -m venv .venv
.venv\Scripts\activate # Windows- Install dependencies:
pip install -r requirements.txt- Create a
.envfile:
OPENAI_API_KEY=your_openai_api_key_here- Start Qdrant and Valkey using Docker:
docker-compose up -d- Verify Valkey is running:
python -c "from redis import Redis; r = Redis(host='localhost', port=6379); print(r.ping())"Should return True if Valkey is running correctly.
Check that Valkey is accessible:
python -c "from redis import Redis; r = Redis(host='localhost', port=6379); print(r.ping())"First, place your PDF file in the project directory and update the path in index.py:
python index.pyThis will:
- Load the PDF
- Split it into chunks
- Create embeddings
- Store vectors in Qdrant
The worker processes background jobs:
python run_worker.pyIn a separate terminal:
python main.pyThe API will be available at http://127.0.0.1:8000
Submit a query:
POST http://127.0.0.1:8000/chat?query=What is Node.js?Response:
{
"status": "queued",
"job_id": "e3bc9337-5026-4e07-9ddf-cdf3ffa55d09"
}Check job status:
GET http://127.0.0.1:8000/job-status?job_id=e3bc9337-5026-4e07-9ddf-cdf3ffa55d09Response:
{
"result": "Node.js is a JavaScript runtime built on Chrome's V8 engine..."
}Health check endpoint.
Response:
{
"status": "Server is up and running!"
}Submit a query for processing.
Parameters:
query(string): The question to ask about the PDF content
Response:
{
"status": "queued",
"job_id": "string"
}Retrieve the result of a processed job.
Parameters:
job_id(string): The job ID returned from/chat
Response:
{
"result": "string or null"
}- Document Indexing: PDFs are loaded, split into chunks, converted to embeddings, and stored in Qdrant
- Query Submission: User submits a question via
/chatendpoint - Job Queuing: The query is added to Redis Queue and a job ID is returned immediately
- Background Processing: Worker picks up the job and:
- Searches for relevant chunks in Qdrant
- Builds context from retrieved chunks
- Sends context + query to OpenAI
- Stores the result
- Result Retrieval: User polls
/job-statuswith the job ID to get the answer
- URL:
http://localhost:6333 - Collection:
pdf_rag
- Host:
localhost - Port:
6379
- Embedding Model:
text-embedding-3-large - Chat Model:
gpt-4
This project uses SimpleWorker from RQ, which is compatible with Windows (standard RQ workers use os.fork() which doesn't exist on Windows).
Worker fails with "module 'os' has no attribute 'fork'":
- Make sure you're using
run_worker.pywhich usesSimpleWorker
Qdrant connection errors:
- Ensure Docker is running and Qdrant container is up:
docker-compose up -d
OpenAI API errors:
- Verify your API key in
.env - Check your OpenAI account has credits
Valkey/Redis connection errors:
- Ensure Valkey server is running on
localhost:6379
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
.png)