# The Data Guardian's Journey
## A Data Owner's Guide to Syft-Client

```
╔══════════════════════════════════════════════════════════════════╗
║                    THE SYFT DANCE                                ║
║              Data Owner (DO) Notebook                            ║
║                                                                  ║
║  This notebook is Part 1 of a 2-part collaboration demo.        ║
║  Run this alongside: DS_Journey.ipynb                           ║
╚══════════════════════════════════════════════════════════════════╝
```

### What You'll Do:
1. **Setup** - Install, configure credentials, and authenticate
2. **Monitor** - Start notification daemon for real-time alerts
3. **Create Dataset** - Prepare data with mock + private versions
4. **Accept Peer** - Welcome the Data Scientist
5. **Review & Execute Jobs** - Process analysis requests

### Prerequisites:
- Google account with Google Drive
- OAuth credentials (`credentials.json`) from Google Cloud Console
- A partner running the DS notebook!

---
# ACT 1: Setup
## Scene 1.1: Install Dependencies

In [None]:
#@title Install syft-client { display-mode: "form" }
!pip install -q git+https://github.com/OpenMined/syft-client.git@beach-hands-on-demo

In [None]:
#@title Import syft-client { display-mode: "form" }
# Suppress noisy Google httplib2 warnings
import logging
logging.getLogger('googleapiclient.discovery_cache').setLevel(logging.ERROR)
logging.getLogger('google_auth_httplib2').setLevel(logging.ERROR)

import syft_client as sc
print(f"syft-client version: {sc.__version__}")

## Scene 1.2: Mount Google Drive

In [None]:
#@title Mount Google Drive { display-mode: "form" }
from google.colab import drive
drive.mount('/content/drive')

## Scene 1.3: Setup Credentials Directory

```
╔══════════════════════════════════════════════════════════════════╗
║  CREDENTIALS SETUP                                              ║
╠══════════════════════════════════════════════════════════════════╣
║                                                                  ║
║  You need a `credentials.json` file from Google Cloud Console:  ║
║                                                                  ║
║  1. Go to https://console.cloud.google.com/                     ║
║  2. Create a project (or use existing)                          ║
║  3. Enable APIs: Google Drive API, Gmail API                    ║
║  4. Go to APIs & Services -> Credentials                        ║
║  5. Create OAuth 2.0 Client ID (Desktop app type)               ║
║  6. Download as `credentials.json`                              ║
║                                                                  ║
╚══════════════════════════════════════════════════════════════════╝
```

In [None]:
#@title Setup credentials directory { display-mode: "form" }
from pathlib import Path

# Credentials directory in Google Drive
CREDS_DIR = Path("/content/drive/MyDrive/syft-creds")
CREDS_DIR.mkdir(parents=True, exist_ok=True)

print(f"Credentials directory: {CREDS_DIR}")
print(f"  credentials.json: {'✅ exists' if (CREDS_DIR / 'credentials.json').exists() else '❌ MISSING - Please upload!'}")

In [None]:
#@title Upload credentials.json if missing { display-mode: "form" }
if not (CREDS_DIR / 'credentials.json').exists():
    print("Please upload your credentials.json file:")
    from google.colab import files
    uploaded = files.upload()
    
    if 'credentials.json' in uploaded:
        with open(CREDS_DIR / 'credentials.json', 'wb') as f:
            f.write(uploaded['credentials.json'])
        print(f"\n✅ credentials.json saved to {CREDS_DIR}")
    else:
        print("❌ ERROR: Please upload a file named 'credentials.json'")
else:
    print("✅ credentials.json already exists")

## Scene 1.4: Enter Your Email

In [None]:
# Your email address (Data Owner)
DO_EMAIL = input("Enter your email address (Data Owner): ").strip()
print(f"\nYou are: {DO_EMAIL}")

## Scene 1.5: Configure Token Path

In [None]:
#@title Configure token path { display-mode: "form" }
TOKEN_PATH = CREDS_DIR / "token_do.json"

print(f"Token path: {TOKEN_PATH}")
print(f"  Token exists: {'✅' if TOKEN_PATH.exists() else '⏳ (Will be created during login)'}")

## Scene 1.6: Login as Data Owner

This will prompt you to authenticate with Google if no token exists.

In [None]:
# Login as Data Owner
do_client = sc.login_do(
    email=DO_EMAIL,
    token_path=TOKEN_PATH,
)

print(f"\nLogged in as Data Owner: {do_client.email}")
print(f"   SyftBox folder: {do_client.syftbox_folder}")

## Scene 1.7: Create Drive Token for Notifications (Colab Only)

In Colab, `sc.login_do()` uses Colab's built-in auth which doesn't create a token file.
For peer notifications to work, we need to create a Drive token separately.

In [None]:
#@title Create Drive token for notifications { display-mode: "form" }
# Skip if token already exists

if not TOKEN_PATH.exists():
    print("Creating Drive token for peer notifications...")
    print("Follow the same OAuth flow as before.\n")
    
    from google_auth_oauthlib.flow import InstalledAppFlow
    import re
    from urllib.parse import unquote
    
    DRIVE_SCOPES = ['https://www.googleapis.com/auth/drive']
    flow = InstalledAppFlow.from_client_secrets_file(
        str(CREDS_DIR / 'credentials.json'), 
        DRIVE_SCOPES
    )
    flow.redirect_uri = "http://localhost:1/"
    auth_url, _ = flow.authorization_url(prompt="consent", access_type="offline")
    
    print("=" * 60)
    print("Drive OAuth Authorization Required")
    print("=" * 60)
    print("\n1. Click this link:\n")
    print(f"   {auth_url}\n")
    print("2. Sign in and grant Drive access")
    print("3. Copy the 'code' from the URL (page won't load - that's OK)")
    print("   Copy the part after 'code=' and before '&scope'\n")
    
    code = input("Enter authorization code: ").strip()
    
    # Extract code if full URL pasted
    if "code=" in code:
        match = re.search(r'code=([^&]+)', code)
        if match:
            code = match.group(1)
    code = unquote(code)
    
    flow.fetch_token(code=code)
    TOKEN_PATH.write_text(flow.credentials.to_json())
    print(f"\n✅ Drive token saved to {TOKEN_PATH}")
else:
    print(f"✅ Drive token already exists: {TOKEN_PATH}")

---
# ACT 2: Start Notification Monitor

The notification monitor watches for:
- New peer requests
- New job submissions
- Job status changes

And sends email notifications automatically!

```
╔══════════════════════════════════════════════════════════════════╗
║  TWO OPTIONS FOR NOTIFICATIONS                                  ║
╠══════════════════════════════════════════════════════════════════╣
║                                                                  ║
║  Option A: In-Notebook Monitor (below)                          ║
║     - Runs in background while notebook is active               ║
║     - Uses NotificationMonitor.from_client()                    ║
║                                                                  ║
║  Option B: Daemon Mode (in Colab Terminal)                      ║
║     - Runs independently in terminal                            ║
║     - Create config file and run: syft-notify                   ║
║     - See Appendix for daemon setup instructions                ║
║                                                                  ║
╚══════════════════════════════════════════════════════════════════╝
```

## Scene 2.1: Setup Gmail for Notifications (One-time)

```
╔══════════════════════════════════════════════════════════════════╗
║  GMAIL OAUTH FLOW (Manual for Colab)                            ║
╠══════════════════════════════════════════════════════════════════╣
║                                                                  ║
║  When you run NotificationMonitor.setup(), you will:            ║
║                                                                  ║
║  1. See a URL printed - CLICK IT                                ║
║  2. Sign in with your Google account                            ║
║  3. Grant Gmail send permission                                 ║
║  4. You'll be redirected to a page that WON'T LOAD (normal!)    ║
║  5. Copy the 'code' from the URL bar:                           ║
║       http://localhost:1/?code=4/0XXXXX...&scope=...            ║
║     Copy the part after 'code=' and before '&scope'             ║
║  6. Paste it back in the input box                              ║
║                                                                  ║
╚══════════════════════════════════════════════════════════════════╝
```

In [None]:
from syft_client.notifications import NotificationMonitor

# Check credentials status
creds_dir = NotificationMonitor.get_creds_dir()
print(f"Credentials directory: {creds_dir}")
print(f"  credentials.json: {'exists' if (creds_dir / 'credentials.json').exists() else 'MISSING'}")
print(f"  gmail_token.json: {'exists' if (creds_dir / 'gmail_token.json').exists() else 'MISSING (will be created)'}")

In [None]:
# Run Gmail setup - Follow the instructions printed!
# You'll need to click a link, authorize, and paste back the code
NotificationMonitor.setup()
print("\nGmail notification setup complete!")

## Scene 2.2: Start the Monitor

In [None]:
# Start notification monitor (runs in background)
monitor = NotificationMonitor.from_client(do_client)
monitor.start()  # Start all (jobs + peers)

print("Notification monitor started!")
print("  - Watching for new peer requests")
print("  - Watching for new job submissions")
print("  - Will send email notifications automatically")

---
# ACT 3: Create Your Dataset
## Scene 3.1: Download Sample Data

We'll use a sample sales dataset. In real scenarios, you'd use your own data.

In [None]:
import requests
from pathlib import Path

# Sample dataset URLs
MOCK_URL = "https://raw.githubusercontent.com/OpenMined/datasets/refs/heads/main/beach/sales-dataset/mock/sales.csv"
PRIVATE_URL = "https://raw.githubusercontent.com/OpenMined/datasets/refs/heads/main/beach/sales-dataset/private/sales.csv"

# Download to temporary locations
DATA_DIR = Path("/tmp/dataset")
DATA_DIR.mkdir(parents=True, exist_ok=True)

mock_path = DATA_DIR / "sales_mock.csv"
private_path = DATA_DIR / "sales_private.csv"
readme_path = DATA_DIR / "readme.md"

# Download mock data
r = requests.get(MOCK_URL)
mock_path.write_bytes(r.content)

# Download private data  
r = requests.get(PRIVATE_URL)
private_path.write_bytes(r.content)

# Create readme
readme_path.write_text("""# Sales Dataset

This dataset contains sales transaction records.

## Columns
- `product_id`: Unique product identifier
- `quantity`: Number of units sold
- `price_per_unit`: Price per unit in USD
- `date`: Transaction date

## Mock vs Private
- **Mock data**: Synthetic data with similar structure (safe to share)
- **Private data**: Real transaction data (protected)
""")

print("Sample data downloaded:")
print(f"   Mock: {mock_path}")
print(f"   Private: {private_path}")
print(f"   Readme: {readme_path}")

In [None]:
# Preview the data
import pandas as pd

print("Mock Data Preview (this is what DS will see):")
print(pd.read_csv(mock_path).head())

print("\nPrivate Data Preview (this stays with you):")
print(pd.read_csv(private_path).head())

## Scene 3.2: Create the Dataset

This registers your dataset with Syft, making the mock data discoverable by peers.

In [None]:
# Create the dataset
import shutil

DATASET_NAME = "sales-data"

# Clean up existing dataset paths first (important for re-runs)
public_dataset_path = do_client.syftbox_folder / do_client.email / "public" / "syft_datasets" / DATASET_NAME
private_dataset_path = do_client.syftbox_folder / "private" / "syft_datasets" / DATASET_NAME

for path in [public_dataset_path, private_dataset_path]:
    if path.exists():
        print(f"Cleaning up existing dataset: {path}")
        shutil.rmtree(path)

do_client.create_dataset(
    name=DATASET_NAME,
    mock_path=str(mock_path),
    private_path=str(private_path),
    summary="Sales transaction records for analysis",
    readme_path=str(readme_path),
    tags=["sales", "transactions", "demo"],
)

print(f"Dataset '{DATASET_NAME}' created!")

# Verify the dataset exists locally
dataset_path = do_client.syftbox_folder / do_client.email / "public" / "syft_datasets" / DATASET_NAME
print(f"Dataset path: {dataset_path}")
print(f"Exists: {dataset_path.exists()}")
if dataset_path.exists():
    print(f"Contents: {list(dataset_path.iterdir())}")

# Sync to Drive
do_client.sync()
print("Synced to Drive")

In [None]:
# View your datasets
do_client.datasets.get_all()

---
```
╔══════════════════════════════════════════════════════════════════╗
║  INTERMISSION - WAITING FOR DATA SCIENTIST                      ║
╠══════════════════════════════════════════════════════════════════╣
║                                                                  ║
║  Your stage is set! Now tell your DS partner:                   ║
║                                                                  ║
║  "I'm ready! My email is: {DO_EMAIL}"                           ║
║                                                                  ║
║  The DS should now:                                              ║
║  1. Run their notebook up to 'Add Peer'                         ║
║  2. Add you as a peer                                           ║
║                                                                  ║
║  You'll receive an EMAIL notification when DS adds you!         ║
║                                                                  ║
║  Wait for DS to add you, then continue to ACT 4...              ║
╚══════════════════════════════════════════════════════════════════╝
```

In [None]:
print(f"\nTell your DS partner: 'I'm ready! Add me as peer: {DO_EMAIL}'")
print("\nWaiting for DS to add you as peer...")
print("   You'll receive an EMAIL when they do!")
print("   Then continue to ACT 4.")

---
# ACT 4: Accept the Peer Request
## Scene 4.1: Check for New Peers

In [None]:
# Sync and check peers
do_client.sync()
do_client.peers

## Scene 4.2: Accept the Peer

Enter the DS's email to accept their peer request.

In [None]:
# Get DS email
DS_EMAIL = input("Enter the Data Scientist's email to accept: ").strip()
print(f"\nAccepting peer request from: {DS_EMAIL}")

In [None]:
# Add DS as peer (accept the request)
do_client.add_peer(DS_EMAIL)

print(f"\nPeer request from {DS_EMAIL} accepted!")

# Notify DS that their request was accepted
monitor.notify_peer_granted(DS_EMAIL)
print(f"Notification sent to {DS_EMAIL}!")

In [None]:
# Verify peers
do_client.peers

---
```
╔══════════════════════════════════════════════════════════════════╗
║  INTERMISSION - WAITING FOR JOB SUBMISSION                      ║
╠══════════════════════════════════════════════════════════════════╣
║                                                                  ║
║  The DS can now:                                                 ║
║  1. Explore your datasets                                        ║
║  2. Write analysis code                                          ║
║  3. Submit a job                                                 ║
║                                                                  ║
║  Tell DS: "Peer accepted! You can explore my data now."         ║
║                                                                  ║
║  You'll receive an EMAIL notification when DS submits a job!    ║
║                                                                  ║
║  Wait for DS to submit a job, then continue to ACT 5...         ║
╚══════════════════════════════════════════════════════════════════╝
```

In [None]:
print("Tell DS: 'Peer accepted! You can explore my data and submit jobs now.'")
print("\nWaiting for DS to submit a job...")
print("   You'll receive an EMAIL when they do!")
print("   Then continue to ACT 5.")

---
# ACT 5: Review and Execute Jobs
## Scene 5.1: View Incoming Jobs

In [None]:
# Sync to get latest jobs
do_client.sync()

# View all jobs
do_client.jobs

## Scene 5.2: Review a Job

Before approving, review what the code does.

In [None]:
# Get the first pending job
pending_jobs = [j for j in do_client.jobs if j.status == "inbox"]

if not pending_jobs:
    print("No pending jobs. Wait for DS to submit one.")
else:
    job = pending_jobs[0]
    print(f"Job: {job.name}")
    print(f"   From: {job.submitted_by}")
    print(f"   Status: {job.status}")
    print(f"   Location: {job.location}")
    
    # Show the submitted code
    print("\n" + "="*60)
    print("SUBMITTED CODE:")
    print("="*60)
    
    # Read the Python file (the actual code DS wrote)
    for f in job.location.iterdir():
        if f.suffix == ".py":
            print(f"\n--- {f.name} ---")
            print(f.read_text())
    
    # Also show run.sh (the wrapper script)
    run_script = job.location / "run.sh"
    if run_script.exists():
        print(f"\n--- run.sh ---")
        print(run_script.read_text())

## Scene 5.3: Approve the Job

If the code looks safe, approve it for execution.

In [None]:
# Approve the job
if pending_jobs:
    job = pending_jobs[0]
    job.approve()
    print(f"\nJob '{job.name}' approved!")
    print("DS will receive an email notification.")

## Scene 5.4: Execute the Job

Run the approved job on your private data.

In [None]:
# Execute all approved jobs
do_client.process_approved_jobs()

print("\nJob execution complete!")
print("   Results have been synced to Google Drive.")
print("   DS will receive an email notification.")

In [None]:
# View updated job status
do_client.jobs

In [None]:
# View the executed job's output
done_jobs = [j for j in do_client.jobs if j.status == "done"]
if done_jobs:
    job = done_jobs[-1]
    print(f"Job '{job.name}' output:")
    print(job.stdout)

---
# ACT 6: Finale

In [None]:
# Stop the notification monitor
monitor.stop()
print("Notification monitor stopped.")

In [None]:
print("""
╔══════════════════════════════════════════════════════════════════╗
║  CONGRATULATIONS! THE DANCE IS COMPLETE!                        ║
╠══════════════════════════════════════════════════════════════════╣
║                                                                  ║
║  As a Data Owner, you successfully:                              ║
║                                                                  ║
║  - Set up credentials and notification monitoring               ║
║  - Created a dataset with mock + private data                   ║
║  - Received email notification of peer request                  ║
║  - Accepted a peer request from a Data Scientist                ║
║  - Received email notification of job submission                ║
║  - Reviewed and approved a job submission                       ║
║  - Executed the job on your private data                        ║
║  - Shared results back to the Data Scientist                    ║
║                                                                  ║
║  Your private data NEVER left your control!                     ║
║  Only the computed results were shared.                         ║
║                                                                  ║
╚══════════════════════════════════════════════════════════════════╝
""")

---
# Appendix A: View Notification State

In [None]:
# View notification state (what was sent)
import json
state_path = NotificationMonitor.get_creds_dir() / "notification_state.json"

if state_path.exists():
    with open(state_path) as f:
        state_data = json.load(f)
    print("Notifications sent:")
    print(json.dumps(state_data, indent=2))
else:
    print("No notification state file found")

---
# Appendix B: Daemon Mode (Alternative to In-Notebook Monitor)

Instead of running the monitor in the notebook, you can run it as a daemon in Colab's terminal.

```
╔══════════════════════════════════════════════════════════════════╗
║  DAEMON MODE SETUP                                              ║
╠══════════════════════════════════════════════════════════════════╣
║                                                                  ║
║  1. Open Colab Terminal (click terminal icon)                   ║
║                                                                  ║
║  2. Create daemon config:                                        ║
║                                                                  ║
║     cat > /content/drive/MyDrive/syft-creds/daemon.yaml << EOF  ║
║     do_email: "your-email@example.com"                          ║
║     syftbox_root: "/content/SyftBox_your-email@example.com"     ║
║     drive_token_path: "/content/drive/MyDrive/syft-creds/token_do.json"  ║
║     gmail_token_path: "/content/drive/MyDrive/syft-creds/gmail_token.json" ║
║     EOF                                                          ║
║                                                                  ║
║  3. Run the daemon:                                              ║
║     syft-notify --config /content/drive/MyDrive/syft-creds/daemon.yaml  ║
║                                                                  ║
║  4. The daemon will run in the terminal, monitoring for events  ║
║     and sending email notifications automatically.              ║
║                                                                  ║
╚══════════════════════════════════════════════════════════════════╝
```

In [None]:
# Helper: Generate daemon config for your email
daemon_config = f"""
# Daemon config for {DO_EMAIL}
# Save this to: /content/drive/MyDrive/syft-creds/daemon.yaml

do_email: "{DO_EMAIL}"
syftbox_root: "/content/SyftBox_{DO_EMAIL}"
drive_token_path: "/content/drive/MyDrive/syft-creds/token_do.json"
gmail_token_path: "/content/drive/MyDrive/syft-creds/gmail_token.json"
"""
print(daemon_config)

---
# Appendix C: Additional Operations

In [None]:
# Approve all pending jobs
# for job in do_client.jobs:
#     if job.status == "inbox":
#         job.approve()
#         print(f"Approved: {job.name}")

# Execute all approved jobs
# do_client.process_approved_jobs()

In [None]:
# Delete a dataset
# do_client.delete_dataset(name="sales-data")