A powerful Python utility that recursively collects files from nested directories, making it easier to prepare data for AI projects, especially for tools like Claude that require manual file uploads.
When working with AI tools like Claude, you often need to upload multiple files for analysis. However, these files might be scattered across different folders and subfolders in your project structure. Manually navigating through directories and uploading files one by one is time-consuming and prone to errors.
This tool automates the process by:
- Recursively finding all relevant files across nested directories
- Copying them to a single location (either preserving structure or flattened)
- Making it easy to upload multiple files at once to Claude or similar platforms
- Recursive File Collection: Automatically traverses through all subdirectories
- Selective Ignoring: Skip specific folders and file patterns
- Two Organization Modes:
- Hierarchical: Preserves original folder structure
- Flattened: Places all files in a single directory with path-encoded names
- Metadata Preservation: Maintains file timestamps and permissions
- Error Handling: Detailed reporting of successful and failed operations
- Customizable Separators: Choose how path components are joined in flattened mode
# Clone the repository
git clone https://github.com/yourusername/file-collector.git
# Navigate to the directory
cd file-collector
# No additional dependencies required - uses Python standard library!
python file_collector.py source_directory
This will copy all files to a result
directory while preserving the folder structure.
python file_collector.py source_directory --flatten
This will copy all files to a single directory, encoding the path information in the filenames.
python file_collector.py source_directory --output-dir my_output
python file_collector.py source_directory --flatten --separator=-
You can also use it as a Python module:
from file_collector import copy_files
# Preserve directory structure
copied, failed = copy_files("source_directory")
# Flatten directory structure
copied, failed = copy_files("source_directory", flatten=True)
# Custom configuration
copied, failed = copy_files(
root_dir="source_directory",
output_dir="custom_output",
ignore_folders={'.git', 'node_modules'},
ignore_patterns={'*.pyc', '*.log'},
flatten=True,
separator="-"
)
-
AI Project Data Preparation
- Quickly collect all relevant files for uploading to Claude
- Easily gather training data from multiple directories
- Prepare datasets for analysis
-
Project Organization
- Consolidate files from complex directory structures
- Create flat archives of nested projects
- Prepare files for bulk processing
-
Backup and Migration
- Collect specific file types across directories
- Create organized backups
- Prepare files for transfer to different systems
.git
node_modules
__pycache__
*.pyc
*.log
.DS_Store
Let's say you have this project structure:
my_project/
├── data/
│ ├── raw_data.csv
│ └── processed/
│ ├── cleaned_data.csv
│ └── feature_data.csv
├── notebooks/
│ ├── analysis.ipynb
│ └── visualization.ipynb
├── src/
│ ├── __pycache__/
│ │ └── utils.cpython-39.pyc
│ ├── utils.py
│ └── main.py
└── README.md
Running:
python file_collector.py my_project
Creates:
result/
├── data/
│ ├── raw_data.csv
│ └── processed/
│ ├── cleaned_data.csv
│ └── feature_data.csv
├── notebooks/
│ ├── analysis.ipynb
│ └── visualization.ipynb
├── src/
│ ├── utils.py
│ └── main.py
└── README.md
Console output:
Successfully copied files:
✓ data/raw_data.csv
✓ data/processed/cleaned_data.csv
✓ data/processed/feature_data.csv
✓ notebooks/analysis.ipynb
✓ notebooks/visualization.ipynb
✓ src/utils.py
✓ src/main.py
✓ README.md
Total files copied: 8
Total files failed: 0
Files have been copied to: result/
Running:
python file_collector.py my_project --flatten
Creates:
result/
├── data_raw_data.csv
├── data_processed_cleaned_data.csv
├── data_processed_feature_data.csv
├── notebooks_analysis.ipynb
├── notebooks_visualization.ipynb
├── src_utils.py
├── src_main.py
└── README.md
Console output:
Successfully copied files:
✓ data_raw_data.csv
✓ data_processed_cleaned_data.csv
✓ data_processed_feature_data.csv
✓ notebooks_analysis.ipynb
✓ notebooks_visualization.ipynb
✓ src_utils.py
✓ src_main.py
✓ README.md
Total files copied: 8
Total files failed: 0
Files have been copied to: result/
Directory structure was flattened
Note:
__pycache__
directory was automatically ignored.pyc
files were skipped based on ignore patterns- All file metadata (timestamps, permissions) was preserved
- In flattened mode, path separators were converted to underscores
Contributions are welcome! Here are some ways you can contribute:
- Add new features
- Improve documentation
- Report bugs
- Suggest improvements
This project is licensed under the MIT License - see the LICENSE file for details.
This tool was inspired by the need to streamline file preparation for AI tools like Claude and make data scientists' lives easier.
Made with ❤️ to save time and reduce tedious file management tasks.