A professional-grade Python utility for watching directories and syncing changes to remote locations with delta updates, conflict resolution, and .gitignore-style filtering.
- Real-time Directory Watching: Monitors source directory for file changes
- Delta Updates: Only syncs files that have changed (using SHA-256 hashing)
- Conflict Resolution: Multiple strategies for handling file conflicts
- Smart Filtering:
.gitignore-style pattern matching for excluding files - Deletion Syncing: Optional synchronization of file deletions
- Metadata Tracking: Persistent storage of file states for efficient syncing
- Hash-based Change Detection: Uses SHA-256 for reliable delta detection
- Event-driven Architecture: Watchdog-based file system monitoring
- Efficient Large File Handling: Chunked reading for memory efficiency
- Configurable Debouncing: Prevents redundant syncs from rapid changes
- Comprehensive Logging: File and console logging for audit trails
# Install dependencies
pip install -r requirements.txt
# Or install watchdog directly
pip install watchdogpython file_sync_utility.py /path/to/source /path/to/destinationpython file_sync_utility.py /path/to/source /path/to/destination --watchpython file_sync_utility.py /path/to/source /path/to/destination --watch --sync-deletions1. Simple one-time sync:
python file_sync_utility.py ./my_project ./backup2. Watch mode with conflict resolution:
python file_sync_utility.py ./source ./dest --watch --conflict newest3. Sync with custom ignore file:
python file_sync_utility.py ./source ./dest --ignore-file .mysyncignore4. Non-recursive sync (top-level only):
python file_sync_utility.py ./source ./dest --no-recursive5. Full production setup:
python file_sync_utility.py \
/home/user/projects \
/mnt/backup/projects \
--watch \
--sync-deletions \
--conflict newest \
--check-interval 26. Interactive conflict resolution:
python file_sync_utility.py ./source ./dest --watch --conflict promptpositional arguments:
source Source directory to watch
destination Destination directory for syncing
optional arguments:
-h, --help show this help message and exit
--watch, -w Watch for changes (default: one-time sync)
--ignore-file, -i Path to ignore patterns file (default: .syncignore)
--conflict, -c Conflict resolution strategy
Choices: source, destination, newest, prompt
Default: newest
--sync-deletions, -d Delete files from destination that don't exist in source
--no-recursive, -nr Don't sync subdirectories
--check-interval, -t Check interval in seconds for watch mode (default: 1)
Always use the source file, overwriting destination.
python file_sync_utility.py ./source ./dest --conflict sourceKeep destination file, don't overwrite.
python file_sync_utility.py ./source ./dest --conflict destinationCompare modification times and keep the newest file.
python file_sync_utility.py ./source ./dest --conflict newest --watchAsk user for each conflict (interactive mode).
python file_sync_utility.py ./source ./dest --conflict promptPrompt options:
sorsource: Use source filedordestination: Keep destination filekorkeep: Keep both (destination renamed with timestamp)iorskip: Skip this file
The utility supports .gitignore-style pattern matching for excluding files and directories.
Create a .syncignore file in your source directory:
# Comments start with #
# Ignore all Python cache
__pycache__/
*.pyc
# Ignore specific directories
node_modules/
.git/
dist/
# Ignore file types
*.log
*.tmp
# Negation (include despite other rules)
!important.log
# Match anywhere in path
*.txt
# Match from root only
/root_only.txt
# Directory-only patterns
temp/
# Python development
__pycache__/
*.py[cod]
.Python
venv/
*.egg-info/
# Node.js
node_modules/
npm-debug.log
# IDE
.vscode/
.idea/
*.swp
# OS files
.DS_Store
Thumbs.db
# Build artifacts
dist/
build/
*.o
*.exe
- Initial Scan: On startup, scans source directory and calculates SHA-256 hashes
- Metadata Comparison: Compares current hashes with stored metadata
- Change Detection: Only syncs files where hashes differ
- Conflict Resolution: Applies configured strategy when both files changed
- Metadata Update: Stores new hashes for future comparisons
Source Directory Changes
↓
Watchdog Event Detected
↓
Debounce Check (1 second default)
↓
Hash Calculation
↓
Metadata Comparison
↓
Conflict Resolution (if needed)
↓
File Copy with Metadata Preservation
↓
Metadata Update
A conflict occurs when:
- File exists in both source and destination
- Both files have changed since last sync
- File hashes don't match
The utility detects this by comparing:
- Current source hash vs. stored metadata
- Current destination hash vs. stored metadata
- Source hash vs. destination hash
- Algorithm: SHA-256
- Chunk Size: 4096 bytes (efficient for large files)
- Memory Usage: O(1) regardless of file size
- Format: JSON
- Location:
.file_sync_metadata.jsonin source directory - Contents: File paths, hashes, sizes, modification times
- Chunked file reading for memory efficiency
- Event debouncing (1 second default) prevents redundant syncs
- Hash-based change detection avoids unnecessary file copies
- Efficient directory scanning with early termination
- File Log:
file_sync.log(persistent) - Console Log: Real-time status updates
- Log Levels: INFO, WARNING, ERROR
- Rotation: Manual (can be extended with
logging.handlers)
Automatically backup your working directory to a remote location:
python file_sync_utility.py ~/projects /mnt/backup/projects --watch --sync-deletionsPrepare files for manual cloud upload (excluding large/temp files):
python file_sync_utility.py ~/documents ~/cloud_staging --ignore-file .cloudignoreCopy build outputs to distribution directory:
python file_sync_utility.py ./dist ./releases --conflict source --sync-deletionsSync local changes to remote development server:
python file_sync_utility.py ./local_project /mnt/remote_server/project --watch --conflict newestKeep multiple working directories synchronized:
# Terminal 1
python file_sync_utility.py ~/workspace/project /mnt/location1/project --watch
# Terminal 2
python file_sync_utility.py ~/workspace/project /mnt/location2/project --watchFileSyncUtility (Main Controller)
├── FileSyncEngine (Core Sync Logic)
│ ├── FileHasher (Hash Calculation)
│ ├── IgnorePatternMatcher (Filter Logic)
│ ├── ConflictResolver (Conflict Handling)
│ └── MetadataStore (State Persistence)
└── Observer (Watchdog)
└── FileSyncEventHandler (Event Processing)
File Change → Event → Debounce → Hash → Metadata Check →
Conflict Resolution → Copy → Metadata Update → Log
The utility handles various error scenarios:
- Missing source directory: Exits with error message
- Permission errors: Logs error, continues with other files
- Hash calculation failures: Logs error, skips file
- I/O errors during copy: Logs error, continues syncing
- Metadata corruption: Starts fresh with empty metadata
- One-way Sync: Only syncs from source to destination
- Local Filesystem: Designed for local/mounted filesystems (not native cloud APIs)
- No Encryption: Files copied without encryption (add separately if needed)
- No Compression: Files copied as-is (add archive step if needed)
- Platform-specific Paths: Uses OS-native path separators
Add new resolution strategy:
@staticmethod
def _resolve_custom(source_path: str, dest_path: str) -> str:
# Your custom logic here
return 'source' # or 'destination' or 'skip'Change hash algorithm:
# In FileHasher class
source_hash = self.hasher.hash_file(source_path, algorithm='md5')Wrap the copy operation:
# In FileSyncEngine._sync_file
shutil.copy2(source_path, dest_path)
encrypt_file(dest_path) # Your encryption function# Create test directories
mkdir -p test_source test_dest
# Create test files
echo "Test content" > test_source/file1.txt
echo "Another file" > test_source/file2.txt
mkdir test_source/subdir
echo "Nested file" > test_source/subdir/file3.txt
# Create ignore file
cat > test_source/.syncignore << EOF
*.log
temp/
EOF
# Run sync
python file_sync_utility.py test_source test_dest --watch- Basic sync: Create/modify files, verify they sync
- Ignore patterns: Create ignored files, verify they don't sync
- Conflict resolution: Modify same file in both locations
- Deletion sync: Delete source file, verify destination deletion
- Large files: Test with files > 1GB
- Rapid changes: Edit files quickly, verify debouncing
-
Adjust check interval: Increase for less CPU usage
--check-interval 5 # Check every 5 seconds -
Use ignore patterns: Exclude unnecessary files
*.log __pycache__/ node_modules/ -
Disable deletion sync: If not needed
# Don't use --sync-deletions flag -
One-time sync: Use for large initial syncs
# Do initial sync without watch python file_sync_utility.py source dest # Then start watching python file_sync_utility.py source dest --watch
Check:
- File not in ignore patterns
- Source directory path correct
- Destination directory writable
- Check
file_sync.logfor errors
Solutions:
- Increase check interval:
--check-interval 5 - Add more ignore patterns
- Check for infinite loop (syncing to subdirectory of source)
Check:
- Conflict strategy setting:
--conflict <strategy> - File permissions
file_sync.logfor conflict messages
Possible causes:
- Very large files (hashing loads chunks, not entire file)
- Too many files being watched
- Metadata file corruption
Solutions:
- Split large directories
- Use ignore patterns to exclude large files
- Delete
.file_sync_metadata.jsonand restart
To extend this utility:
- Add new conflict resolution strategies in
ConflictResolver - Implement additional hash algorithms in
FileHasher - Add pattern matching features in
IgnorePatternMatcher - Extend metadata storage in
MetadataStore - Add new event handlers in
FileSyncEventHandler
This is a portfolio project demonstrating Python development skills including:
- File I/O and system operations
- Event-driven programming with watchdog
- Hash algorithms for change detection
- Configuration management
- Command-line interfaces with argparse
- Professional logging
- Object-oriented design patterns
Created as a portfolio project demonstrating proficiency in:
- Python 3 development
- File system operations
- Hash algorithms (SHA-256)
- Event-driven architecture
- Configuration management
- Error handling and logging
- CLI tool development
- Object-oriented programming
- ✅
watchdoglibrary for file system monitoring - ✅ File I/O with chunked reading for efficiency
- ✅ Hashing algorithms (SHA-256) for change detection
- ✅ Configuration files (JSON for metadata)
- ✅ Pattern matching (.gitignore-style)
- ✅ Conflict resolution strategies
- ✅ Delta updates (only sync changes)
- ✅ Professional logging
- ✅ Command-line interfaces
- ✅ Error handling and resilience
- ✅ Object-oriented design
- ✅ Type hints and documentation