Skip to content

FurkanAtass/Drive-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Drive-RAG

Google Drive Connected Retrieval-Augmented Generation (RAG) System.

Drive-RAG is a modular RAG application that connects directly to a shared Google Drive folder or file, processes documents, stores embeddings in a vector database, and enables conversational question answering over your own documents.

The goal of this project is to provide a simple yet extensible RAG framework that allows users to:

  • Ingest documents directly from Google Drive
  • Automatically chunk and embed documents
  • Store embeddings in a vector database
  • Perform retrieval-augmented question answering with streaming responses
  • Inspect retrieved context alongside generated answers

The system is designed to be modular and extensible, making it easy to experiment with different databases, embedding models, and LLM providers.


🚀 Installation

This project uses uv for dependency and environment management.

1️⃣ Install uv (if not installed)

curl -Ls https://astral.sh/uv/install.sh | sh

2️⃣ Create the virtual environment and install dependencies

uv sync

This will:

  • Create a virtual environment
  • Install all project dependencies

▶️ Running the Project

Start the application with:

python src/main.py

The Gradio UI will open in your browser.


🧠 How It Works

  1. Provide a shared Google Drive link.

  2. The system:

    • Downloads documents
    • Loads and parses them
    • Splits them into chunks
    • Generates embeddings
    • Stores them in a vector database (ChromaDB by default)
  3. Ask questions in the chat interface.

  4. The system retrieves relevant chunks and streams LLM-generated answers.

  5. Retrieved context is displayed alongside the response.


🏗 Architecture Overview

  • Downloader → Google Drive integration
  • Loader → Document parsing
  • Chunker → Text segmentation
  • Embedder → Embedding model abstraction
  • Vector DB → ChromaDB
  • LLM Interface → OpenAI SDK
  • UI → Gradio

The architecture is modular and designed for easy extension.


📌 TODO

  • Add support for additional vector databases (e.g., Neo4j)
  • Add support for multiple LLM providers/models
  • Add a test UI that:
    • Accepts a test.json file
    • Runs RAG queries
    • Calculates retrieval and generation metrics

About

Google Drive connected RAG System

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages