# SyftBox File Watcher Tutorial

This tutorial demonstrates how to use the SyftBox file watcher to monitor file changes in your SyftBox directories.

## Overview

The SyftBox file watcher provides two main components:

1. **syftbox_watcher.py** - A standalone REST API service for monitoring multiple SyftBox directories
2. **syftbox_monitor.py** - An integrated monitor that works directly with syft_client

Both use `syft_serve` to create FastAPI servers that expose file watching functionality.

## Prerequisites

First, let's ensure all dependencies are installed:

In [None]:
# Install dependencies if needed
import subprocess
import sys

def install_if_missing(package):
    try:
        __import__(package.replace('-', '_'))
    except ImportError:
        print(f"Installing {package}...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])

# Check for required packages
for package in ['watchdog', 'syft-serve', 'requests']:
    install_if_missing(package)

print("✅ All dependencies are installed!")

In [None]:
# Import required libraries
import syft_client as sc
import requests
import json
import time
import subprocess
from pathlib import Path
import os

## Part 1: Setting Up a SyftBox Directory

First, let's create a test SyftBox directory to monitor:

In [None]:
# Create a test SyftBox directory
test_email = "test@example.com"
syftbox_dir = Path.home() / f"SyftBox_{test_email}"

# Create directory structure
syftbox_dir.mkdir(exist_ok=True)
(syftbox_dir / "datasites").mkdir(exist_ok=True)
(syftbox_dir / "apps").mkdir(exist_ok=True)

print(f"📁 Created SyftBox directory: {syftbox_dir}")
print("\nDirectory structure:")
for item in syftbox_dir.iterdir():
    print(f"  - {item.name}/")

## Part 2: Using the Standalone File Watcher

The standalone watcher provides a REST API for monitoring SyftBox directories. Let's start it in a subprocess:

In [None]:
# Start the file watcher service in the background
import threading

watcher_process = None

def start_watcher():
    global watcher_process
    if watcher_process is None:
        print("🚀 Starting SyftBox File Watcher...")
        watcher_process = subprocess.Popen(
            [sys.executable, "syftbox_watcher.py"],
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE
        )
        # Give it time to start
        time.sleep(3)
        print("✅ Watcher service started!")
    else:
        print("⚠️  Watcher already running")

start_watcher()

### Listing Available SyftBox Directories

In [None]:
# List available SyftBox directories
response = requests.get("http://localhost:8000/list")
data = response.json()

print("📁 Available SyftBox directories:")
for box in data.get("syftboxes", []):
    status = "👀 watching" if box["is_watching"] else "⏸️  not watching"
    print(f"\n  Email: {box['email']}")
    print(f"  Path: {box['path']}")
    print(f"  Status: {status}")

### Starting File Monitoring

In [None]:
# Start monitoring the test SyftBox
response = requests.post(f"http://localhost:8000/start/{test_email}")
result = response.json()

if result["success"]:
    print(f"✅ {result['message']}")
else:
    print(f"❌ {result['error']}")

### Checking Watcher Status

In [None]:
# Check current status
response = requests.get("http://localhost:8000/status")
status = response.json()

print("📊 Watcher Status:")
print(f"  Watching: {status['watching']}")
print(f"  Event count: {status['event_count']}")
print(f"  Queue size: {status['queue_size']}")

### Creating File Changes and Monitoring Events

Let's create some file changes and see how the watcher captures them:

In [None]:
# Create some test files
print("📝 Creating test files...\n")

# Create a data file
data_file = syftbox_dir / "datasites" / "test_data.csv"
data_file.write_text("id,name,value\n1,Alice,100\n2,Bob,200")
print(f"Created: {data_file.name}")

# Create an app file
app_file = syftbox_dir / "apps" / "hello_app.py"
app_file.write_text("print('Hello from SyftBox!')")
print(f"Created: {app_file.name}")

# Modify a file
time.sleep(0.5)
data_file.write_text("id,name,value\n1,Alice,150\n2,Bob,250\n3,Charlie,300")
print(f"Modified: {data_file.name}")

# Delete a file
time.sleep(0.5)
temp_file = syftbox_dir / "temp.txt"
temp_file.write_text("temporary file")
print(f"Created: {temp_file.name}")
time.sleep(0.5)
temp_file.unlink()
print(f"Deleted: {temp_file.name}")

In [None]:
# Get recent events
time.sleep(1)  # Give watcher time to process events

response = requests.get("http://localhost:8000/events?limit=10")
events = response.json()

print(f"\n📋 Captured {len(events)} events:\n")

for event in events:
    event_type = event['event_type']
    file_name = Path(event['src_path']).name
    timestamp = event['timestamp'].split('T')[1].split('.')[0]  # Just time
    
    icon = {
        'created': '➕',
        'modified': '✏️',
        'deleted': '🗑️',
        'moved': '➡️'
    }.get(event_type, '❓')
    
    print(f"{icon} {event_type.upper()}: {file_name} at {timestamp}")

### Filtering Events by Time

In [None]:
# Get events from the last 5 seconds
from datetime import datetime, timedelta

since = (datetime.now() - timedelta(seconds=5)).isoformat()
response = requests.get(f"http://localhost:8000/events?since={since}")
recent_events = response.json()

print(f"📋 Events from last 5 seconds: {len(recent_events)}")
for event in recent_events:
    print(f"  - {event['event_type']}: {Path(event['src_path']).name}")

### Stopping the Watcher

In [None]:
# Stop watching
response = requests.post(f"http://localhost:8000/stop/{test_email}")
result = response.json()

if result["success"]:
    print(f"✅ {result['message']}")
else:
    print(f"❌ {result['error']}")

## Part 3: Using the Integrated Monitor

The integrated monitor works directly with syft_client and supports custom callbacks. Let's explore how to use it:

In [None]:
# Import the monitor components
from syftbox_monitor import SyftBoxMonitor
from pathlib import Path
from watchdog.events import FileSystemEvent

### Creating Custom Callbacks

Let's create some custom callbacks to handle file events:

In [None]:
# Define custom callbacks
event_log = []

def log_event(client: sc.GDriveUnifiedClient, event: FileSystemEvent):
    """Log all events to a list"""
    event_data = {
        'type': event.event_type,
        'path': Path(event.src_path).name,
        'time': time.strftime("%H:%M:%S")
    }
    event_log.append(event_data)
    print(f"📝 Logged: {event_data['type']} - {event_data['path']}")

def notify_csv_changes(client: sc.GDriveUnifiedClient, event: FileSystemEvent):
    """Special handler for CSV files"""
    file_path = Path(event.src_path)
    if file_path.suffix == '.csv':
        print(f"📊 CSV Alert: {file_path.name} was {event.event_type}!")

def auto_backup(client: sc.GDriveUnifiedClient, event: FileSystemEvent):
    """Auto-backup important files"""
    file_path = Path(event.src_path)
    
    # Only backup specific file types
    important_extensions = ['.csv', '.json', '.yaml', '.txt']
    if file_path.suffix in important_extensions and file_path.exists():
        backup_dir = file_path.parent / "backups"
        backup_dir.mkdir(exist_ok=True)
        
        timestamp = time.strftime("%Y%m%d_%H%M%S")
        backup_path = backup_dir / f"{file_path.stem}_{timestamp}{file_path.suffix}"
        
        import shutil
        shutil.copy2(file_path, backup_path)
        print(f"💾 Backed up: {file_path.name} → {backup_path.name}")

### Setting Up the Monitor

In [None]:
# Create a mock client for demonstration
# In real usage, you'd use: client = sc.create_gdrive_client("your@email.com")
client = sc.GDriveUnifiedClient(email=test_email)
client.my_email = test_email
client.authenticated = True
client.local_syftbox_dir = syftbox_dir

# Create monitor
monitor = SyftBoxMonitor(client)

# Register callbacks
monitor.register_callback('created', log_event)
monitor.register_callback('modified', log_event)
monitor.register_callback('deleted', log_event)
monitor.register_callback('modified', notify_csv_changes)
monitor.register_callback('created', auto_backup)
monitor.register_callback('modified', auto_backup)

print("✅ Monitor configured with callbacks")

In [None]:
# Start monitoring
monitor.start()
print("\n👀 Monitor is running! Let's test it...")

### Testing the Monitor with File Operations

In [None]:
# Create and modify files to trigger callbacks
print("\n🧪 Testing file operations...\n")

# Test 1: Create a CSV file
test_csv = syftbox_dir / "datasites" / "metrics.csv"
test_csv.write_text("metric,value\naccuracy,0.95\nprecision,0.92")
time.sleep(0.5)

# Test 2: Modify the CSV
test_csv.write_text("metric,value\naccuracy,0.96\nprecision,0.93\nrecall,0.91")
time.sleep(0.5)

# Test 3: Create a JSON file
test_json = syftbox_dir / "apps" / "config.json"
test_json.write_text(json.dumps({"version": "1.0", "debug": True}, indent=2))
time.sleep(0.5)

# Test 4: Create and delete a temp file
temp_file = syftbox_dir / "temp_file.tmp"
temp_file.write_text("temporary")
time.sleep(0.5)
temp_file.unlink()
time.sleep(0.5)

In [None]:
# Check the event log
print("\n📋 Event Log Summary:")
print(f"Total events captured: {len(event_log)}\n")

for event in event_log[-10:]:  # Show last 10 events
    print(f"  [{event['time']}] {event['type']:10} - {event['path']}")

In [None]:
# Check backup directory
backup_dir = syftbox_dir / "datasites" / "backups"
if backup_dir.exists():
    print("\n💾 Backup Directory Contents:")
    for backup in sorted(backup_dir.iterdir()):
        print(f"  - {backup.name} ({backup.stat().st_size} bytes)")
else:
    print("\n💾 No backups created yet")

In [None]:
# Stop the monitor
monitor.stop()

## Part 4: Practical Use Cases

Here are some practical examples of how to use the file watcher:

### Use Case 1: Auto-sync to Google Drive

This example shows how to automatically sync files to Google Drive when they're added or modified:

In [None]:
def create_gdrive_sync_callback(sync_folder_id: str = None):
    """Create a callback that syncs files to Google Drive"""
    
    def sync_to_gdrive(client: sc.GDriveUnifiedClient, event: FileSystemEvent):
        file_path = Path(event.src_path)
        
        # Skip hidden files, temp files, and backups
        if (file_path.name.startswith('.') or 
            file_path.suffix == '.tmp' or
            'backups' in file_path.parts):
            return
        
        # Get relative path from SyftBox root
        syftbox_dir = client.get_syftbox_directory()
        rel_path = file_path.relative_to(syftbox_dir)
        
        print(f"\n📤 Syncing to GDrive: {rel_path}")
        
        # In a real implementation, you would:
        # 1. Create folder structure in GDrive if needed
        # 2. Upload the file using client._upload_file()
        # 3. Handle conflicts and versioning
        
        # For demo, just show what would be done
        print(f"   Would upload: {file_path.name}")
        print(f"   To folder: SyftBox/{rel_path.parent}")
        print(f"   Size: {file_path.stat().st_size} bytes")
    
    return sync_to_gdrive

# Example usage:
# monitor.register_callback('created', create_gdrive_sync_callback())
# monitor.register_callback('modified', create_gdrive_sync_callback())

### Use Case 2: Data Validation Pipeline

Automatically validate data files when they're added or modified:

In [None]:
def create_data_validator(rules: dict):
    """Create a callback that validates data files based on rules"""
    
    def validate_data(client: sc.GDriveUnifiedClient, event: FileSystemEvent):
        file_path = Path(event.src_path)
        
        # Only validate CSV files in datasites
        if file_path.suffix != '.csv' or 'datasites' not in file_path.parts:
            return
        
        print(f"\n🔍 Validating: {file_path.name}")
        
        try:
            import pandas as pd
            df = pd.read_csv(file_path)
            
            # Check rules
            issues = []
            
            # Example rules
            if 'required_columns' in rules:
                missing = set(rules['required_columns']) - set(df.columns)
                if missing:
                    issues.append(f"Missing columns: {missing}")
            
            if 'max_rows' in rules and len(df) > rules['max_rows']:
                issues.append(f"Too many rows: {len(df)} > {rules['max_rows']}")
            
            if issues:
                print("   ❌ Validation failed:")
                for issue in issues:
                    print(f"      - {issue}")
            else:
                print("   ✅ Validation passed!")
                
        except Exception as e:
            print(f"   ❌ Error reading file: {e}")
    
    return validate_data

# Example usage:
# validation_rules = {
#     'required_columns': ['id', 'name', 'value'],
#     'max_rows': 10000
# }
# monitor.register_callback('created', create_data_validator(validation_rules))
# monitor.register_callback('modified', create_data_validator(validation_rules))

### Use Case 3: Activity Dashboard

Create a simple activity dashboard that tracks file operations:

In [None]:
from collections import defaultdict
from datetime import datetime

class ActivityDashboard:
    def __init__(self):
        self.stats = defaultdict(int)
        self.recent_events = []
        self.start_time = datetime.now()
    
    def track_event(self, client: sc.GDriveUnifiedClient, event: FileSystemEvent):
        """Track file system events for dashboard"""
        file_path = Path(event.src_path)
        
        # Update stats
        self.stats[event.event_type] += 1
        self.stats[f"ext_{file_path.suffix}"] += 1
        
        # Track recent events
        self.recent_events.append({
            'time': datetime.now(),
            'type': event.event_type,
            'file': file_path.name,
            'size': file_path.stat().st_size if file_path.exists() else 0
        })
        
        # Keep only last 100 events
        self.recent_events = self.recent_events[-100:]
    
    def show_dashboard(self):
        """Display activity dashboard"""
        runtime = (datetime.now() - self.start_time).total_seconds()
        
        print("\n📊 SyftBox Activity Dashboard")
        print("=" * 40)
        print(f"Runtime: {runtime:.0f} seconds\n")
        
        print("Event Types:")
        for event_type in ['created', 'modified', 'deleted', 'moved']:
            count = self.stats.get(event_type, 0)
            print(f"  {event_type.capitalize():10} : {count}")
        
        print("\nFile Types:")
        for key, count in sorted(self.stats.items()):
            if key.startswith('ext_') and count > 0:
                ext = key.replace('ext_', '') or 'no extension'
                print(f"  {ext:10} : {count}")
        
        print("\nRecent Activity (last 5):")
        for event in self.recent_events[-5:]:
            time_str = event['time'].strftime("%H:%M:%S")
            print(f"  [{time_str}] {event['type']:8} - {event['file']}")

# Example usage:
dashboard = ActivityDashboard()

# Would register like this:
# for event_type in ['created', 'modified', 'deleted', 'moved']:
#     monitor.register_callback(event_type, dashboard.track_event)

# Then periodically: dashboard.show_dashboard()

## Part 5: Cleanup

Let's clean up the test environment:

In [None]:
# Stop the watcher service
if watcher_process:
    print("🛑 Stopping file watcher service...")
    watcher_process.terminate()
    watcher_process.wait()
    watcher_process = None
    print("✅ Service stopped")

# Clean up test directory
import shutil
if syftbox_dir.exists():
    print(f"\n🧹 Cleaning up {syftbox_dir}...")
    shutil.rmtree(syftbox_dir)
    print("✅ Cleanup complete")

## Summary

In this tutorial, we've learned how to:

1. **Use the Standalone File Watcher**
   - Start/stop monitoring SyftBox directories
   - Query file events via REST API
   - Filter events by time

2. **Use the Integrated Monitor**
   - Create custom callbacks for different event types
   - Implement auto-backup functionality
   - Handle specific file types differently

3. **Implement Practical Use Cases**
   - Auto-sync to Google Drive
   - Data validation pipelines
   - Activity tracking dashboards

The file watcher provides a flexible foundation for building automated workflows around your SyftBox files. You can extend it with custom callbacks to implement any file-based automation you need.

### Next Steps

- Integrate the file watcher with your syft_client workflows
- Create custom callbacks for your specific use cases
- Build automated data pipelines using file events
- Implement two-way sync with cloud storage providers