Google Drive Connected Retrieval-Augmented Generation (RAG) System.
Drive-RAG is a modular RAG application that connects directly to a shared Google Drive folder or file, processes documents, stores embeddings in a vector database, and enables conversational question answering over your own documents.
The goal of this project is to provide a simple yet extensible RAG framework that allows users to:
- Ingest documents directly from Google Drive
- Automatically chunk and embed documents
- Store embeddings in a vector database
- Perform retrieval-augmented question answering with streaming responses
- Inspect retrieved context alongside generated answers
The system is designed to be modular and extensible, making it easy to experiment with different databases, embedding models, and LLM providers.
This project uses uv for dependency and environment management.
curl -Ls https://astral.sh/uv/install.sh | shuv syncThis will:
- Create a virtual environment
- Install all project dependencies
Start the application with:
python src/main.pyThe Gradio UI will open in your browser.
-
Provide a shared Google Drive link.
-
The system:
- Downloads documents
- Loads and parses them
- Splits them into chunks
- Generates embeddings
- Stores them in a vector database (ChromaDB by default)
-
Ask questions in the chat interface.
-
The system retrieves relevant chunks and streams LLM-generated answers.
-
Retrieved context is displayed alongside the response.
- Downloader → Google Drive integration
- Loader → Document parsing
- Chunker → Text segmentation
- Embedder → Embedding model abstraction
- Vector DB → ChromaDB
- LLM Interface → OpenAI SDK
- UI → Gradio
The architecture is modular and designed for easy extension.
- Add support for additional vector databases (e.g., Neo4j)
- Add support for multiple LLM providers/models
- Add a test UI that:
- Accepts a test.json file
- Runs RAG queries
- Calculates retrieval and generation metrics