An open-source Retrieval-Augmented Generation (RAG) platform to explore and analyze the unsealed Jeffrey Epstein court documents. Built with LangChain, ChromaDB, and Streamlit.
- Open Stack: Fully open-source tools and models.
- Local & Fast: Support for local execution via Ollama or high-speed cloud inference via Groq/OpenRouter.
- Automated Ingestion: Easily download and index curated parquet data from Hugging Face.
- Strict Guardrails: Designed to stay strictly within the context of the investigative documents.
- Python 3.10+ (Recommend using a virtual environment).
- Ollama (Optional): If you want to run LLMs completely locally. Download at ollama.com.
- Windows Users: If you encounter DLL initialization errors with TensorFlow/Transformers, ensure you follow the installation steps below precisely, as the
requirements.txtincludes critical fixes fortorchandprotobuf.
Clone the repository and install dependencies:
git clone https://github.com/AbhisumatK/Epstein_Files_RAG
cd Epstein_Files_RAG
# Optional create a virtual environment
python -m venv venv
.\venv\Scripts\activate # On Windows
# install dependencies
pip install -r requirements.txtCopy the .env.example to .env and configure your providers:
cp .env.example .envFill in your API keys in .env:
- Groq API: Get yours at console.groq.com.
- OpenRouter API: Get yours at openrouter.ai.
- Ollama: No key needed, just ensure it's running.
The Epstein dataset is massive (>200GB). By default, the ingestion script downloads only the first 0.5 GB chunk for testing.
python ingest.py- Estimated Time: ~3-5 minutes for the first chunk (depending on your bandwidth).
- How to Tweaks: Open
ingest.pyand changenum_files=1to a higher number (e.g.,num_files=10for ~5GB) to index more data.
Start the Streamlit dashboard:
streamlit run app.py- Source: Nikity/Epstein-Files on Hugging Face.
- Format: Apache Parquet files containing extracted text from investigative files.
- Note: The 0.5 GB limit (one parquet file) is used to ensure quick setup and low memory usage. The full dataset contains hundreds of thousands of documents.
This application includes specialized system prompts to ensure the assistant stays strictly within the investigative context. It will refuse out-of-scope requests (like general knowledge or unrelated tasks) to maintain the integrity of the analysis.
This project is licensed under the MIT License - see the LICENSE file for details.

