Pathway Dataquest is a full-stack web application designed for intelligent document management and analysis. It allows users to connect various data sources (local files, Google Drive, S3), interact with documents through a chat interface, and automatically track and analyze changes in document content. The backend is powered by a Retrieval-Augmented Generation (RAG) pipeline using the Pathway framework, enabling natural language queries on your document library.
- Conversational Q&A: Chat with your documents to find information, get summaries, and ask complex questions.
- Multi-Source Connectivity: Connect and sync documents from your local filesystem, Google Drive, and AWS S3 buckets.
- Automated Change Detection: Automatically detects, tracks, and displays changes between document versions, highlighting additions and modifications.
- Risk & Policy Analysis: A specialized AI agent identifies high-impact clauses in legal and policy documents related to legal obligations, safety protocols, and financial penalties.
- Interactive Dashboard: A modern, responsive UI to view recent files, analyze document activity, and explore changes.
-
Frontend:
- Framework: Next.js
- Language: TypeScript
- UI Components: shadcn/ui
- Styling: Tailwind CSS
- API Communication: Axios
-
Backend:
- Framework: FastAPI
- Language: Python
- Core Engine: Pathway (for RAG and data processing pipelines)
- Database: MongoDB (for storing conversations and source configurations)
- Server: Uvicorn
-
DevOps & Tooling:
- Package Management: npm (frontend), uv (backend)
- Containerization: Docker (
docker-compose.yaml)
- Node.js and npm
- Python 3.12+ and uv
- Docker and Docker Compose
- A running MongoDB instance
-
Clone the repository:
git clone https://github.com/AMythicDev/pathway-dataquest/ cd pathway-dataquest -
Set Up Environment Variables: Create a
.envfile in the project root. The backend requires credentials for MongoDB and a secret key for encrypting data source credentials.You can generate a suitable encryption key with this Python command:
from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())
Your
.envfile should look like this:# .env ENCRYPTION_KEY=your_generated_secret_key GEMINI_API_KEY=<your_gemini_api_key> MONGODB_URI=<mongodb_connection_uri> -
Install Frontend Dependencies:
npm install
-
Install Backend Dependencies:
uv sync
This project can be run using Docker Compose or by starting the frontend and backend services manually.
-
Create required folders:
mkdir model/state model/out
-
Start the Backend Server: The FastAPI server is defined in
model/main.py.uvicorn model.main:app --host 0.0.0.0 --port 8000 --reload
-
Start the Frontend Development Server: In a separate terminal, run the Next.js app.
npm run dev
The application will be accessible at http://localhost:3000.
/
├── app/ # Next.js frontend pages and routing
├── components/ # React components (UI elements, charts, etc.)
├── hooks/ # Custom React hooks (e.g., use-google-drive)
├── lib/ # Frontend utilities and context providers
├── model/ # Python backend (FastAPI, Pathway RAG pipeline)
│ ├── main.py # FastAPI application entrypoint
│ ├── rag.py # Core RAG logic
│ └── connectors.py # Logic for connecting to data sources
├── public/ # Static assets (images, icons)
├── pyproject.toml # Backend Python dependencies (for uv)
├── package.json # Frontend Node.js dependencies (for npm)
└── docker-compose.yaml # Docker configuration for services