Curated dataset from official House Oversight Committee release
This repository catalogs the main EPSTEIN_FILES_20K dataset and derived datasets from the Epstein estate documents released by the U.S. House Committee on Oversight and Accountability on November 12, 2025.
Original Source: House Oversight Committee Release
The processed dataset is a single CSV file with two columns:
| Column | Description |
|---|---|
text |
Full text content extracted from the document |
filename |
Modified file path maintaining directory structure |
Access: Hugging Face
All JPG files converted to text using convert_jpg_to_txt.py while maintaining original directory structure.
- Loaded all text files (converted + existing) into pandas DataFrame
- Created two-column structure:
textandfilename - Exported as single CSV file
import pandas as pd
# Load the dataset
df = pd.read_csv('path_to_dataset.csv')
# View basic information
print(f"Total documents: {len(df)}")
print(df.head())Epstein Estate Documents Dataset
Source: U.S. House Committee on Oversight and Accountability
Available at: https://huggingface.co/datasets/tensonaut/EPSTEIN_FILES_20K
New derived datasets are organized in separate folders with individual README files.
Structure:
Dataset/
βββ README.md (this file)
βββ derived-dataset-1/
β βββ README.md
βββ derived-dataset-2/
βββ README.md
To add a derived dataset, create a new folder with documentation following the main dataset format.
Contributions are welcome for:
- Derivative datasets
- Processing pipeline improvements
- Documentation enhancements
How to contribute:
- Fork this repository
- Create your dataset/improvement
- Document thoroughly
- Submit a pull request
- Projects Repository - Tools using this dataset
- Safety Repository - Report data quality issues
- Original Release - Source documents
Please refer to the original source for licensing and usage terms of the Epstein estate documents.
For questions or concerns:
- Open an issue
- Email: flashcrimson22@gmail.com
Community-maintained β’ Not affiliated with any official investigation