# Module 03: File System Operations for Data Science

**Difficulty**: ⭐⭐ (Intermediate)

**Estimated Time**: 75 minutes

**Prerequisites**: 
- Completed Modules 00-02
- Understanding of pathlib basics
- Familiarity with file I/O operations

## Learning Objectives

By the end of this notebook, you will be able to:

1. **Perform** batch file operations on hundreds of files efficiently
2. **Search** for files using patterns and filters
3. **Monitor** file system changes for data pipelines
4. **Organize** datasets with automated file management
5. **Work with** archives and compressed files
6. **Handle** file permissions and attributes on Windows

## Introduction: Why File Operations Matter for Data Scientists

Data scientists work with files constantly:
- **Datasets**: CSV, JSON, Parquet, HDF5 files
- **Models**: Pickle files, SavedModel directories, checkpoint files
- **Results**: Plots, reports, logs, predictions
- **Code**: Notebooks, scripts, configuration files

### Real-World Scenarios

**Scenario 1: Daily Data Ingestion**
- 50 CSV files arrive daily in a folder
- Need to process new files only (not already processed)
- Move processed files to archive
- Generate summary report

**Scenario 2: Model Checkpoint Management**
- Deep learning training saves checkpoints every epoch
- 100+ checkpoint files accumulate
- Need to keep only best 5 and latest 3
- Delete others to save disk space

**Scenario 3: Dataset Organization**
- Downloaded dataset has messy structure
- Images scattered across folders
- Need to reorganize by category
- Create train/validation/test splits

This module teaches you to automate all these tasks!

In [None]:
# Setup: Import required libraries
import os
import sys
from pathlib import Path
import shutil
import glob
import time
from datetime import datetime, timedelta
import json

print("Setup complete!")

## 1. Advanced pathlib Techniques

While we covered pathlib basics in Module 00, let's explore advanced features essential for data science workflows.

### Why pathlib Over os.path?

**Old way (os.path)**:
```python
import os
data_dir = os.path.join(os.getcwd(), 'data', 'raw')
files = [os.path.join(data_dir, f) for f in os.listdir(data_dir) 
         if os.path.isfile(os.path.join(data_dir, f))]
```

**New way (pathlib)**:
```python
from pathlib import Path
data_dir = Path.cwd() / 'data' / 'raw'
files = [f for f in data_dir.iterdir() if f.is_file()]
```

**Benefits**:
- More readable and intuitive
- Object-oriented (paths are objects with methods)
- Cross-platform (handles Windows/Linux path differences)
- Chainable operations
- Better error messages

### 1.1 Finding Files with Patterns

In [None]:
# Find all files matching a pattern
# Essential for working with datasets split across multiple files

def find_files_by_pattern(directory, pattern='*'):
    """
    Find all files matching a pattern in directory (recursive).
    
    Args:
        directory: Path to search in
        pattern: Glob pattern (e.g., '*.csv', 'data_*.json')
    
    Returns:
        list: List of Path objects matching pattern
    """
    directory = Path(directory)
    
    # ** means "any subdirectory"
    return sorted(directory.rglob(pattern))

# Example: Find all .ipynb files in project
project_root = Path.cwd().parent
notebooks = find_files_by_pattern(project_root, '*.ipynb')

print(f"Found {len(notebooks)} notebook(s):")
for nb in notebooks[:5]:  # Show first 5
    # Get path relative to project root for clean display
    rel_path = nb.relative_to(project_root)
    print(f"  {rel_path}")

if len(notebooks) > 5:
    print(f"  ... and {len(notebooks) - 5} more")

### 1.2 Filtering Files by Properties

In [None]:
# Filter files by size, date, extension, etc.
# Useful for finding large datasets, recent files, etc.

def filter_files(directory, 
                 min_size=None,  # bytes
                 max_size=None,  # bytes
                 extensions=None,  # list of extensions
                 modified_after=None,  # datetime object
                 modified_before=None):  # datetime object
    """
    Filter files by multiple criteria.
    
    Args:
        directory: Directory to search
        min_size: Minimum file size in bytes
        max_size: Maximum file size in bytes
        extensions: List of extensions (e.g., ['.csv', '.json'])
        modified_after: Files modified after this datetime
        modified_before: Files modified before this datetime
    
    Returns:
        list: Filtered list of Path objects
    """
    directory = Path(directory)
    results = []
    
    for file_path in directory.rglob('*'):
        if not file_path.is_file():
            continue
        
        # Check size
        file_size = file_path.stat().st_size
        if min_size and file_size < min_size:
            continue
        if max_size and file_size > max_size:
            continue
        
        # Check extension
        if extensions and file_path.suffix.lower() not in extensions:
            continue
        
        # Check modification time
        mod_time = datetime.fromtimestamp(file_path.stat().st_mtime)
        if modified_after and mod_time < modified_after:
            continue
        if modified_before and mod_time > modified_before:
            continue
        
        results.append(file_path)
    
    return sorted(results)

# Example: Find large files (>100KB) modified in last 7 days
seven_days_ago = datetime.now() - timedelta(days=7)
large_recent_files = filter_files(
    project_root,
    min_size=100 * 1024,  # 100KB
    modified_after=seven_days_ago
)

print(f"\nLarge files (>100KB) modified in last 7 days: {len(large_recent_files)}")
for file_path in large_recent_files[:3]:
    size_kb = file_path.stat().st_size / 1024
    mod_time = datetime.fromtimestamp(file_path.stat().st_mtime)
    print(f"  {file_path.name}: {size_kb:.1f}KB, modified {mod_time.strftime('%Y-%m-%d')}")

## 2. Batch File Operations

When working with datasets, you often need to process hundreds or thousands of files. Doing this manually is error-prone and time-consuming.

### Common Batch Operations in Data Science

1. **Renaming**: Standardize filenames (e.g., `IMG001.jpg` → `train_001.jpg`)
2. **Moving**: Organize files into folders (e.g., by date, category)
3. **Copying**: Create backups, duplicate for different experiments
4. **Deleting**: Remove old checkpoints, temporary files

### Safety Best Practices

Before batch operations:
1. ✅ **Test on small subset first** (5-10 files)
2. ✅ **Create backups** of important data
3. ✅ **Log all operations** for audit trail
4. ✅ **Use dry-run mode** to preview changes
5. ✅ **Validate** after operation completes

### 2.1 Batch Renaming with Patterns

In [None]:
# Batch rename files with pattern-based rules
# Common for preparing datasets for ML training

def batch_rename(directory, pattern, replacement, dry_run=True):
    """
    Rename multiple files using pattern matching.
    
    Args:
        directory: Directory containing files
        pattern: String pattern to find in filenames
        replacement: String to replace pattern with
        dry_run: If True, only show what would be renamed (default: True)
    
    Returns:
        int: Number of files renamed (or would be renamed if dry_run)
    """
    directory = Path(directory)
    renamed_count = 0
    
    print(f"{'DRY RUN: ' if dry_run else ''}Renaming files in {directory}")
    print(f"Pattern: '{pattern}' → '{replacement}'")
    print()
    
    for file_path in directory.iterdir():
        if not file_path.is_file():
            continue
        
        # Check if pattern exists in filename
        if pattern in file_path.name:
            new_name = file_path.name.replace(pattern, replacement)
            new_path = file_path.parent / new_name
            
            print(f"  {file_path.name} → {new_name}")
            
            if not dry_run:
                # Check if target already exists
                if new_path.exists():
                    print(f"    ⚠ Skipped: {new_name} already exists")
                    continue
                
                file_path.rename(new_path)
            
            renamed_count += 1
    
    print(f"\n{'Would rename' if dry_run else 'Renamed'} {renamed_count} file(s)")
    if dry_run:
        print("Set dry_run=False to actually rename files")
    
    return renamed_count

# Example: Preview renaming (dry run)
# This won't actually rename anything
notebooks_dir = Path.cwd()
batch_rename(notebooks_dir, '_', '-', dry_run=True)

### 2.2 Organizing Files by Date

In [None]:
# Organize files into date-based folders
# Useful for organizing daily data dumps or logs

def organize_by_date(source_dir, dest_dir, dry_run=True):
    """
    Organize files into YYYY/MM/DD folder structure based on modification date.
    
    Args:
        source_dir: Directory containing files to organize
        dest_dir: Destination directory for organized structure
        dry_run: If True, only show what would be done
    
    Returns:
        int: Number of files organized
    """
    source_dir = Path(source_dir)
    dest_dir = Path(dest_dir)
    organized_count = 0
    
    print(f"{'DRY RUN: ' if dry_run else ''}Organizing files by date")
    print(f"Source: {source_dir}")
    print(f"Destination: {dest_dir}")
    print()
    
    for file_path in source_dir.iterdir():
        if not file_path.is_file():
            continue
        
        # Get modification date
        mod_time = datetime.fromtimestamp(file_path.stat().st_mtime)
        
        # Create date-based path: YYYY/MM/DD/filename
        date_folder = dest_dir / str(mod_time.year) / f"{mod_time.month:02d}" / f"{mod_time.day:02d}"
        dest_file = date_folder / file_path.name
        
        print(f"  {file_path.name} → {date_folder.relative_to(dest_dir)}/")
        
        if not dry_run:
            # Create directory structure
            date_folder.mkdir(parents=True, exist_ok=True)
            
            # Move file
            shutil.move(str(file_path), str(dest_file))
        
        organized_count += 1
    
    print(f"\n{'Would organize' if dry_run else 'Organized'} {organized_count} file(s)")
    return organized_count

# Example: Preview organization (dry run)
# organize_by_date(source_dir, archive_dir, dry_run=True)
print("File organization function ready!")
print("Example usage: organize_by_date('data/raw', 'data/archive', dry_run=True)")

### 2.3 Batch Copying with Progress

In [None]:
# Copy multiple files with progress tracking
# Essential for backing up datasets or creating train/test splits

def batch_copy(source_files, dest_dir, show_progress=True):
    """
    Copy multiple files to a destination directory.
    
    Args:
        source_files: List of Path objects or strings
        dest_dir: Destination directory
        show_progress: Print progress updates
    
    Returns:
        dict: Statistics (copied, failed, skipped)
    """
    dest_dir = Path(dest_dir)
    dest_dir.mkdir(parents=True, exist_ok=True)
    
    stats = {'copied': 0, 'failed': 0, 'skipped': 0}
    total = len(source_files)
    
    for i, source_file in enumerate(source_files, 1):
        source_file = Path(source_file)
        dest_file = dest_dir / source_file.name
        
        try:
            # Skip if destination exists
            if dest_file.exists():
                stats['skipped'] += 1
                if show_progress:
                    print(f"[{i}/{total}] Skipped: {source_file.name} (already exists)")
                continue
            
            # Copy file
            shutil.copy2(source_file, dest_file)  # copy2 preserves metadata
            stats['copied'] += 1
            
            if show_progress:
                size_mb = source_file.stat().st_size / (1024 * 1024)
                print(f"[{i}/{total}] Copied: {source_file.name} ({size_mb:.2f} MB)")
        
        except Exception as e:
            stats['failed'] += 1
            print(f"[{i}/{total}] Failed: {source_file.name} - {e}")
    
    # Print summary
    print(f"\nSummary:")
    print(f"  Copied: {stats['copied']}")
    print(f"  Skipped: {stats['skipped']}")
    print(f"  Failed: {stats['failed']}")
    
    return stats

# Example: Copy all notebooks to backup folder
# backup_dir = project_root / 'backup'
# batch_copy(notebooks, backup_dir)
print("Batch copy function ready!")

## 3. Working with Archives

Data scientists frequently work with compressed files:
- Downloaded datasets (often .zip or .tar.gz)
- Model backups (compress checkpoints)
- Sharing results (compress output folder)

Python's `shutil` module provides easy archive handling.

### 3.1 Creating Archives

In [None]:
# Create compressed archives of directories
# Useful for backing up models or sharing datasets

def create_archive(source_dir, archive_name, format='zip'):
    """
    Create a compressed archive of a directory.
    
    Args:
        source_dir: Directory to archive
        archive_name: Name for archive (without extension)
        format: Archive format ('zip', 'tar', 'gztar', 'bztar', 'xztar')
    
    Returns:
        Path: Path to created archive
    """
    source_dir = Path(source_dir)
    
    if not source_dir.exists():
        raise FileNotFoundError(f"Source directory not found: {source_dir}")
    
    print(f"Creating {format} archive: {archive_name}")
    print(f"Source: {source_dir}")
    
    # shutil.make_archive returns path without extension
    archive_path = shutil.make_archive(
        base_name=archive_name,
        format=format,
        root_dir=source_dir.parent,
        base_dir=source_dir.name
    )
    
    archive_path = Path(archive_path)
    size_mb = archive_path.stat().st_size / (1024 * 1024)
    
    print(f"✓ Created: {archive_path.name} ({size_mb:.2f} MB)")
    
    return archive_path

# Example: Create zip archive of notebooks directory
# archive = create_archive(notebooks_dir, 'notebooks_backup', format='zip')
print("Archive creation function ready!")

### 3.2 Extracting Archives

In [None]:
# Extract compressed archives
# Useful for working with downloaded datasets

def extract_archive(archive_path, extract_to=None):
    """
    Extract a compressed archive.
    
    Args:
        archive_path: Path to archive file
        extract_to: Directory to extract to (default: same as archive)
    
    Returns:
        Path: Path to extraction directory
    """
    archive_path = Path(archive_path)
    
    if not archive_path.exists():
        raise FileNotFoundError(f"Archive not found: {archive_path}")
    
    if extract_to is None:
        extract_to = archive_path.parent
    else:
        extract_to = Path(extract_to)
    
    extract_to.mkdir(parents=True, exist_ok=True)
    
    print(f"Extracting: {archive_path.name}")
    print(f"To: {extract_to}")
    
    # shutil.unpack_archive detects format automatically
    shutil.unpack_archive(archive_path, extract_to)
    
    print(f"✓ Extracted successfully")
    
    return extract_to

# Example: Extract dataset
# extract_archive('data/raw/dataset.zip', 'data/raw/')
print("Archive extraction function ready!")

## 4. File Monitoring and Watching

Sometimes you need to detect when new files appear (e.g., for automated data pipelines).

### Use Cases
- **Data ingestion**: Process new CSV files as they arrive
- **Model monitoring**: Detect when new checkpoints are saved
- **Log monitoring**: Parse new log entries in real-time

### Simple Polling Approach

We'll implement a simple file watcher using polling. For production, consider using `watchdog` library for more efficient file watching.

In [None]:
# Simple file watcher using polling
# Detects new files added to a directory

class SimpleFileWatcher:
    """
    Watch a directory for new files.
    
    Example:
        watcher = SimpleFileWatcher('data/incoming')
        new_files = watcher.check_for_new_files()
        for file in new_files:
            process_file(file)
    """
    
    def __init__(self, directory):
        """Initialize watcher for a directory."""
        self.directory = Path(directory)
        self.known_files = set()
        
        # Initialize with existing files
        if self.directory.exists():
            self.known_files = {f for f in self.directory.iterdir() if f.is_file()}
    
    def check_for_new_files(self):
        """
        Check for files added since last check.
        
        Returns:
            set: Set of new file Path objects
        """
        if not self.directory.exists():
            return set()
        
        current_files = {f for f in self.directory.iterdir() if f.is_file()}
        new_files = current_files - self.known_files
        
        # Update known files
        self.known_files = current_files
        
        return new_files
    
    def watch(self, callback, interval=1, duration=10):
        """
        Watch directory and call callback for new files.
        
        Args:
            callback: Function to call with new file path
            interval: Seconds between checks
            duration: Total seconds to watch (None for infinite)
        """
        print(f"Watching {self.directory} for new files...")
        print(f"Interval: {interval}s, Duration: {duration}s")
        print("Press Ctrl+C to stop\n")
        
        start_time = time.time()
        
        try:
            while True:
                new_files = self.check_for_new_files()
                
                for file_path in new_files:
                    print(f"New file detected: {file_path.name}")
                    callback(file_path)
                
                # Check if duration exceeded
                if duration and (time.time() - start_time) >= duration:
                    print(f"\nWatch duration completed ({duration}s)")
                    break
                
                time.sleep(interval)
        
        except KeyboardInterrupt:
            print("\nWatch stopped by user")

# Example callback function
def process_new_file(file_path):
    """Example: Process a new file."""
    print(f"  Processing: {file_path.name}")
    # Add your processing logic here

# Example usage (commented out):
# watcher = SimpleFileWatcher('data/incoming')
# watcher.watch(process_new_file, interval=2, duration=10)

print("File watcher ready!")
print("Example: watcher = SimpleFileWatcher('data/incoming')")
print("         watcher.watch(process_new_file, interval=2, duration=10)")

## 5. Practical Automation Example: Dataset Cleanup

Let's combine everything into a real-world automation script: cleaning up old model checkpoints while keeping the best ones.

In [None]:
# Automated checkpoint cleanup
# Keeps only best N and latest M checkpoints, deletes rest

def cleanup_checkpoints(checkpoint_dir, 
                       keep_best=5, 
                       keep_latest=3,
                       dry_run=True):
    """
    Clean up old model checkpoints intelligently.
    
    Assumes checkpoint filenames contain metric value:
    e.g., 'model_epoch10_val_acc_0.95.h5'
    
    Args:
        checkpoint_dir: Directory containing checkpoints
        keep_best: Number of best checkpoints to keep (by metric)
        keep_latest: Number of most recent checkpoints to keep
        dry_run: If True, only show what would be deleted
    
    Returns:
        dict: Statistics about cleanup
    """
    checkpoint_dir = Path(checkpoint_dir)
    
    if not checkpoint_dir.exists():
        print(f"Checkpoint directory not found: {checkpoint_dir}")
        return {'deleted': 0, 'kept': 0}
    
    # Get all checkpoint files
    checkpoints = sorted(
        [f for f in checkpoint_dir.glob('*.h5')],
        key=lambda x: x.stat().st_mtime
    )
    
    if not checkpoints:
        print("No checkpoint files found")
        return {'deleted': 0, 'kept': 0}
    
    print(f"Found {len(checkpoints)} checkpoint(s)")
    print(f"Policy: Keep best {keep_best} + latest {keep_latest}")
    print()
    
    # Keep latest N
    latest_checkpoints = set(checkpoints[-keep_latest:])
    
    # Keep best N by modification time (proxy for training progress)
    # In real scenario, parse metric from filename
    best_checkpoints = set(checkpoints[-keep_best:])
    
    # Files to keep
    keep_files = latest_checkpoints | best_checkpoints
    
    # Files to delete
    delete_files = set(checkpoints) - keep_files
    
    # Show what will be kept
    print(f"Keeping {len(keep_files)} checkpoint(s):")
    for f in sorted(keep_files, key=lambda x: x.stat().st_mtime):
        size_mb = f.stat().st_size / (1024 * 1024)
        tags = []
        if f in latest_checkpoints:
            tags.append('latest')
        if f in best_checkpoints:
            tags.append('best')
        print(f"  ✓ {f.name} ({size_mb:.2f} MB) [{', '.join(tags)}]")
    
    print()
    
    # Show/perform deletions
    if delete_files:
        total_size_mb = sum(f.stat().st_size for f in delete_files) / (1024 * 1024)
        
        print(f"{'Would delete' if dry_run else 'Deleting'} {len(delete_files)} checkpoint(s) ({total_size_mb:.2f} MB):")
        for f in sorted(delete_files, key=lambda x: x.stat().st_mtime):
            size_mb = f.stat().st_size / (1024 * 1024)
            print(f"  ✗ {f.name} ({size_mb:.2f} MB)")
            
            if not dry_run:
                f.unlink()
        
        if dry_run:
            print(f"\nDRY RUN: No files actually deleted")
            print(f"Set dry_run=False to perform cleanup")
    else:
        print("No checkpoints need deletion")
    
    return {'deleted': len(delete_files), 'kept': len(keep_files)}

# Example usage (dry run)
# cleanup_checkpoints('models/checkpoints', keep_best=5, keep_latest=3, dry_run=True)
print("Checkpoint cleanup function ready!")

## 6. Practice Exercises

### Exercise 1: Dataset Organization Tool

Create a function that organizes image files into train/val/test folders:
1. Find all .jpg and .png files in a directory
2. Split them: 70% train, 20% validation, 10% test
3. Copy (don't move) files to respective folders
4. Print statistics (count and total size per split)

**Hint**: Use `random.sample()` for random splitting, `batch_copy()` from above

In [None]:
# Exercise 1: Your solution here
import random

def create_train_val_test_split(source_dir, output_dir, train_ratio=0.7, val_ratio=0.2, test_ratio=0.1):
    """
    Split images into train/val/test sets.
    
    Args:
        source_dir: Directory containing images
        output_dir: Output directory for splits
        train_ratio: Fraction for training set
        val_ratio: Fraction for validation set
        test_ratio: Fraction for test set
    """
    # TODO: Implement this function
    pass

# Test your function
# create_train_val_test_split('data/images', 'data/split')


### Exercise 2: Duplicate File Finder

Create a function to find duplicate files based on content:
1. Calculate hash (MD5 or SHA256) for each file
2. Group files with same hash
3. Report duplicates with sizes
4. Calculate total wasted space

**Hint**: Use `hashlib` module for hashing

In [None]:
# Exercise 2: Your solution here
import hashlib

def find_duplicates(directory):
    """
    Find duplicate files in directory tree.
    
    Args:
        directory: Directory to search
    
    Returns:
        dict: Hash -> list of duplicate file paths
    """
    # TODO: Implement this function
    pass

# Test your function
# duplicates = find_duplicates('data/')


### Exercise 3: Automated Backup System

Create an automated backup system:
1. Monitor a source directory for changes (use SimpleFileWatcher)
2. When new files appear, copy them to backup directory
3. Organize backups by date in YYYY-MM-DD folders
4. Keep only last 7 days of backups (delete older)
5. Log all operations to a file

**Hint**: Combine file watching, organization, and cleanup techniques from above

In [None]:
# Exercise 3: Your solution here

class AutomatedBackupSystem:
    """
    Automated backup system with retention policy.
    """
    
    def __init__(self, source_dir, backup_dir, retention_days=7):
        # TODO: Initialize the backup system
        pass
    
    def backup_file(self, file_path):
        # TODO: Backup a file with date organization
        pass
    
    def cleanup_old_backups(self):
        # TODO: Remove backups older than retention_days
        pass
    
    def start_watching(self, interval=5):
        # TODO: Start watching for new files
        pass

# Test your system
# backup_system = AutomatedBackupSystem('data/source', 'data/backup', retention_days=7)
# backup_system.start_watching(interval=5)


### Exercise 4: Smart Archiver

Create an intelligent archiving tool:
1. Find all folders in a directory
2. For folders not modified in last 30 days:
   - Create compressed archive
   - Verify archive integrity
   - Delete original folder
3. Report space saved
4. Create an index file listing archived folders

**Hint**: Check modification time, use `create_archive()`, verify by extracting to temp location

In [None]:
# Exercise 4: Your solution here

def smart_archiver(base_dir, archive_after_days=30, dry_run=True):
    """
    Archive old folders automatically.
    
    Args:
        base_dir: Directory to search
        archive_after_days: Archive folders older than this
        dry_run: Preview only if True
    
    Returns:
        dict: Statistics about archiving
    """
    # TODO: Implement smart archiving
    pass

# Test your archiver
# smart_archiver('data/experiments', archive_after_days=30, dry_run=True)


## 7. Summary

### Key Concepts

1. **Advanced pathlib**
   - `rglob()` for recursive pattern matching
   - Filtering by size, date, extension
   - Cross-platform path handling

2. **Batch Operations**
   - Always use dry-run mode first
   - Show progress for user feedback
   - Handle errors gracefully
   - Log operations for audit trail

3. **Archive Management**
   - `shutil.make_archive()` for compression
   - `shutil.unpack_archive()` for extraction
   - Automatic format detection

4. **File Monitoring**
   - Simple polling with set operations
   - Callback-based processing
   - Production: use `watchdog` library

5. **Practical Patterns**
   - Checkpoint cleanup (keep best + latest)
   - Date-based organization
   - Backup automation
   - Duplicate detection

### Real-World Applications

- **Data pipelines**: Auto-process incoming files
- **Model management**: Clean up old checkpoints
- **Dataset preparation**: Organize and split data
- **Backup automation**: Scheduled backups with retention
- **Space management**: Archive old experiments

### Safety Checklist

Before running file operations:
- [ ] Test on small subset first
- [ ] Use dry-run mode
- [ ] Have backups of important data
- [ ] Validate results after operation
- [ ] Log all changes for audit trail

### What's Next?

In **Module 04: Process & Service Management**, you'll learn:
- Monitor running processes
- Manage Windows services
- Kill hung processes
- Restart failed services automatically

### Self-Assessment

Before moving on, make sure you can:
- [ ] Find files using patterns and filters
- [ ] Perform batch operations safely (with dry-run)
- [ ] Create and extract archives
- [ ] Watch directories for new files
- [ ] Organize files programmatically

---

**Continue to Module 04** when ready!