In [None]:
# 🏛️ Data Owner Tutorial - Syft Code Queue

This notebook shows **data owners** how to review, approve, and manage code execution requests on their datasites using syft-code-queue.

## What You'll Learn

- 📋 Review pending job requests
- 🔍 Inspect submitted code for safety and privacy
- ✅ Approve or reject jobs
- 🖥️ Run queue server to process approved jobs
- 📊 Monitor job execution and results

## Workflow Overview

```
Receive Request → Review Code → Approve/Reject → Execute → Monitor Results
```

As a data owner, you have full control over what code runs on your datasite.


In [None]:
# Import required libraries
import syft_code_queue as scq
from pathlib import Path
import time
import json

print(f"📚 Syft Code Queue version: {scq.__version__}")
print("🏛️ Data Owner Tutorial - Ready to manage your queue!")


In [None]:
## 📋 Step 1: Review Pending Job Requests

As a data owner, you'll receive job submission requests from data scientists. Your job is to review these requests and decide whether to approve them based on:

- **Privacy compliance**: Does the code respect individual privacy?
- **Data safety**: Does the code only compute safe aggregate statistics?
- **Resource usage**: Is the computational requirement reasonable?
- **Research value**: Does the analysis provide valuable insights?


In [None]:
# Create a queue server to manage jobs
try:
    server = scq.create_server(
        queue_name="my-datasite-queue",
        max_concurrent_jobs=2,
        job_timeout=600  # 10 minutes
    )
    
    print(f"🖥️ Created queue server for: {server.email}")
    print("📋 Server is ready to receive and process job requests")
    
except Exception as e:
    print(f"❌ Error creating server: {e}")
    print("💡 This might happen in demo mode without SyftBox")


In [None]:
# Check for pending job requests
try:
    pending_jobs = server.list_pending_jobs()
    
    print(f"📋 Found {len(pending_jobs)} pending job(s) for review")
    print("=" * 60)
    
    if pending_jobs:
        for i, job in enumerate(pending_jobs, 1):
            print(f"\n{i}. 📄 Job Request:")
            print(f"   📧 From: {job.requester_email}")
            print(f"   📋 Name: {job.name}")
            print(f"   📅 Submitted: {job.created_at}")
            print(f"   🏷️  Tags: {', '.join(job.tags)}")
            print(f"   📝 Description:")
            if job.description:
                # Show first 200 chars of description
                desc = job.description[:200] + "..." if len(job.description) > 200 else job.description
                print(f"      {desc}")
            else:
                print("      No description provided")
    else:
        print("📭 No pending jobs found.")
        print("💡 Jobs will appear here when data scientists submit requests.")
        
except Exception as e:
    print(f"❌ Error checking pending jobs: {e}")
    print("💡 In demo mode, job listing is simulated")


In [None]:
## 🔍 Step 2: Inspect Submitted Code

Before approving any job, you should carefully review the submitted code to ensure it's safe and appropriate for your datasite.


In [None]:
def inspect_job_code(job):
    """Inspect the code submitted with a job."""
    print(f"🔍 Inspecting code for job: {job.name}")
    print(f"📧 Submitted by: {job.requester_email}")
    print("=" * 60)
    
    try:
        code_folder = Path(job.code_folder)
        if not code_folder.exists():
            print("❌ Code folder not found")
            return
        
        print(f"📁 Code folder: {code_folder}")
        print(f"📄 Files in submission: {[f.name for f in code_folder.iterdir()]}")
        
        # Show run.sh script
        run_script = code_folder / "run.sh"
        if run_script.exists():
            print("\n📋 Execution Script (run.sh):")
            print("-" * 40)
            print(run_script.read_text())
        
        # Show requirements.txt if it exists
        requirements = code_folder / "requirements.txt"
        if requirements.exists():
            print("\n📦 Dependencies (requirements.txt):")
            print("-" * 40)
            print(requirements.read_text())
        
        # Show Python files
        python_files = list(code_folder.glob("*.py"))
        for py_file in python_files[:3]:  # Show first 3 Python files
            print(f"\n🐍 Python Code ({py_file.name}):")
            print("-" * 40)
            content = py_file.read_text()
            # Show first 1000 characters
            if len(content) > 1000:
                print(content[:1000] + "\n... (truncated)")
            else:
                print(content)
        
        if len(python_files) > 3:
            print(f"\n📄 ... and {len(python_files) - 3} more Python files")
            
    except Exception as e:
        print(f"❌ Error inspecting code: {e}")

# If we have pending jobs, inspect the first one
if 'pending_jobs' in globals() and pending_jobs:
    print("🔍 Let's inspect the first pending job...")
    inspect_job_code(pending_jobs[0])
else:
    print("💡 No pending jobs to inspect. Code inspection will happen when jobs are submitted.")


In [None]:
## ✅ Step 3: Approve or Reject Jobs

After reviewing the code, you can manually approve or reject jobs based on your assessment.


In [None]:
# Manual job approval/rejection functions
def approve_job_with_reason(job_uid, reason="Approved by data owner"):
    """Approve a job with a reason."""
    try:
        success = server.approve_job(job_uid)
        if success:
            print(f"✅ Job {job_uid} approved!")
            print(f"📝 Reason: {reason}")
            return True
        else:
            print(f"❌ Failed to approve job {job_uid}")
            return False
    except Exception as e:
        print(f"❌ Error approving job: {e}")
        return False

def reject_job_with_reason(job_uid, reason="Does not meet privacy requirements"):
    """Reject a job with a reason."""
    try:
        success = server.reject_job(job_uid, reason)
        if success:
            print(f"🚫 Job {job_uid} rejected!")
            print(f"📝 Reason: {reason}")
            return True
        else:
            print(f"❌ Failed to reject job {job_uid}")
            return False
    except Exception as e:
        print(f"❌ Error rejecting job: {e}")
        return False

# Example approval criteria checklist
def review_job_safety(job):
    """Review a job against safety criteria."""
    print(f"🔒 Safety Review for: {job.name}")
    print("=" * 50)
    
    # Check tags for privacy-safe indicators
    safe_tags = {"privacy-safe", "aggregate-analysis", "statistics", "research"}
    risky_tags = {"raw-data", "individual-records", "export"}
    
    job_tags = set(job.tags)
    has_safe_tags = bool(job_tags.intersection(safe_tags))
    has_risky_tags = bool(job_tags.intersection(risky_tags))
    
    print(f"✅ Privacy-safe tags present: {has_safe_tags}")
    print(f"⚠️  Risky tags present: {has_risky_tags}")
    
    # Basic code safety checks (simplified)
    code_folder = Path(job.code_folder)
    safety_score = 0
    
    if code_folder.exists():
        # Check for dangerous operations in run.sh
        run_script = code_folder / "run.sh"
        if run_script.exists():
            run_content = run_script.read_text().lower()
            dangerous_commands = ["rm -rf", "sudo", "chmod 777", "curl", "wget"]
            found_dangerous = [cmd for cmd in dangerous_commands if cmd in run_content]
            
            if found_dangerous:
                print(f"⚠️  Potentially dangerous commands found: {found_dangerous}")
                safety_score -= len(found_dangerous)
            else:
                print("✅ No dangerous commands detected in run.sh")
                safety_score += 1
        
        # Check Python files for privacy concerns
        python_files = list(code_folder.glob("*.py"))
        if python_files:
            for py_file in python_files:
                content = py_file.read_text().lower()
                privacy_indicators = ["aggregate", "mean", "sum", "count", "std"]
                risky_indicators = ["individual", "personal", "export", "save individual"]
                
                found_privacy = [ind for ind in privacy_indicators if ind in content]
                found_risky = [ind for ind in risky_indicators if ind in content]
                
                if found_privacy:
                    safety_score += 1
                if found_risky:
                    safety_score -= 1
    
    print(f"📊 Safety Score: {safety_score}")
    
    if safety_score >= 1:
        print("✅ Recommendation: APPROVE - Code appears privacy-safe")
        return "approve"
    elif safety_score >= 0:
        print("⚠️  Recommendation: REVIEW - Manual review recommended")
        return "review"
    else:
        print("🚫 Recommendation: REJECT - Code has safety concerns")
        return "reject"

# Example: Review first pending job if available
if 'pending_jobs' in globals() and pending_jobs:
    first_job = pending_jobs[0]
    print("🔒 Performing safety review of the first pending job...")
    recommendation = review_job_safety(first_job)
    
    print(f"\n📋 Job UID for approval/rejection: {first_job.uid}")
    print("💡 Use the functions below to approve or reject this job")
    
    # Example usage (commented out to prevent accidental execution)
    print("\n💭 To approve this job, run:")
    print(f"   approve_job_with_reason('{first_job.uid}', 'Code reviewed and approved for aggregate analysis')")
    print("\n💭 To reject this job, run:")
    print(f"   reject_job_with_reason('{first_job.uid}', 'Code accesses individual records - privacy concern')")
    
else:
    print("💡 No pending jobs to review. Safety review will happen when jobs are submitted.")


In [None]:
## 🖥️ Step 4: Run Queue Server for Job Execution

Start the queue server to automatically execute approved jobs.


In [None]:
# Start the queue server to process approved jobs
print("🖥️ Starting queue server...")
try:
    server.start()
    print("✅ Queue server started successfully!")
    print("🔄 The server will now automatically execute approved jobs")
    print("⏰ Let it run for a few seconds to process any approved jobs...")
    
    # Let it run briefly
    time.sleep(10)
    
    print("\n📊 Checking server status...")
    # Note: In a real scenario, you'd let this run continuously
    # For the tutorial, we'll stop it after a brief period
    
except Exception as e:
    print(f"❌ Error starting server: {e}")
    print("💡 In demo mode, server execution is simulated")

print("\n🛑 Stopping server for tutorial purposes...")
try:
    server.stop()
    print("✅ Server stopped")
except:
    print("💡 Server stop simulated in demo mode")


In [None]:
## 🛡️ Best Practices for Data Owners

### 🔍 Code Review Guidelines

**Always Check:**
- ✅ Does the code only compute aggregate statistics?
- ✅ Are individual records protected from exposure?
- ✅ Is the computational load reasonable?
- ✅ Are the dependencies safe and trusted?
- ✅ Does the requester have legitimate research needs?

**Red Flags:**
- ❌ Code that exports raw data
- ❌ Individual record access or printing
- ❌ Network requests to external servers
- ❌ File system operations outside designated areas
- ❌ Unlimited loops or resource consumption

### 🔒 Security Considerations

1. **Sandboxing**: The SafeCodeRunner provides basic security but review is still essential
2. **Resource Limits**: Set appropriate timeouts and concurrent job limits
3. **Audit Trail**: All approvals/rejections are logged for accountability
4. **Communication**: Maintain clear communication with researchers about your policies

### 📋 Approval Workflow

1. **Receive Request** → Job appears in pending queue
2. **Review Code** → Inspect all submitted files thoroughly  
3. **Assess Risk** → Check against your privacy and security criteria
4. **Make Decision** → Approve with reason or reject with feedback
5. **Monitor Execution** → Watch job progress and results
6. **Provide Results** → Completed job outputs are available to requester

### 🤝 Communication with Researchers

- Be transparent about your approval criteria
- Provide clear feedback when rejecting requests
- Suggest improvements for resubmission
- Maintain professional research collaboration standards

### ⚙️ Server Management

- **Production Use**: Run the server continuously as a daemon
- **Monitoring**: Check logs regularly for issues
- **Maintenance**: Clean up old completed jobs periodically
- **Updates**: Keep the software updated for security patches

---

## 🎉 You're Ready!

You now know how to:
- ✅ Review and inspect job submissions
- ✅ Make informed approval/rejection decisions  
- ✅ Run a queue server to execute approved jobs
- ✅ Follow best practices for data owner responsibilities

**Next Steps:**
- Set up your production queue server
- Define your organization's approval policies
- Train your team on the review process
- Establish communication channels with researchers

**Happy data stewarding! 🏛️📊**
