# SyftBox Data Scientist (DS) - Alice's Notebook

This notebook demonstrates the **Beaver + SyftBox integration** from the data scientist's perspective.

Key features:
- **Session-based** communication with data owners
- Data exchanges are **encrypted end-to-end** using SyftBox crypto
- Files sync automatically via SyftBox server
- Can only decrypt data encrypted for your identity

## Setup

In [1]:
import sys
from pathlib import Path

# Add beaver to path (for development)
sys.path.insert(0, "../python/src")

import beaver
from beaver import Twin
import pandas as pd
import numpy as np

In [2]:
# Check SyftBox SDK availability
try:
    import syftbox_sdk
    print(f"‚úì SyftBox SDK version: {syftbox_sdk.__version__}")
except ImportError:
    print("‚úó SyftBox SDK not installed. Run: pip install syftbox-sdk")

‚úì SyftBox SDK version: 0.1.5


## Connect with SyftBox Backend

Connect as a data scientist. SyftBox mode is now the default.

In [3]:
# Current directory is the client's data directory
DATA_DIR = Path.cwd()
USER_EMAIL = "client2@sandbox.local"
PEER_EMAIL = "client1@sandbox.local"

print(f"Data directory: {DATA_DIR}")
print(f"User: {USER_EMAIL}")
print(f"Peer: {PEER_EMAIL}")

Data directory: /Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local
User: client2@sandbox.local
Peer: client1@sandbox.local


In [4]:
# Connect with SyftBox backend (encrypted mode is now default)
bv = beaver.connect(
    user=USER_EMAIL,
    data_dir=DATA_DIR,
)

print(f"‚úì Connected as {bv.user}")
print(f"  SyftBox enabled: {bv.syftbox_enabled}")
print(f"  Inbox path: {bv.inbox_path}")

üîÑ Auto-load replies enabled for client2@sandbox.local (polling every 2.0s)
‚úì Connected as client2@sandbox.local
  SyftBox enabled: True
  Inbox path: /Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/datasites/client2@sandbox.local/shared/biovault


In [5]:
# Check encryption status
if bv.backend:
    print(f"  Encryption: {'‚úì Enabled' if bv.backend.uses_crypto else '‚úó Disabled'}")
    print(f"  Vault path: {bv.backend.vault_path}")
    print(f"  Shared path: {bv.backend.shared_path}")

  Encryption: ‚úì Enabled
  Vault path: /Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/.syc
  Shared path: /Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/datasites/client2@sandbox.local/shared/biovault


## Request a Session with Data Owner

Before accessing data, request a session with the data owner.
The data owner must approve the request before you can access their data.

In [6]:
# Request a session with the data owner
session = bv.request_session(
    peer_email=PEER_EMAIL,
    message="Requesting access for patient data analysis"
)

session

üì§ Session request sent to client1@sandbox.local
   Session ID: a88409add5c9
   Use session.wait_for_acceptance() to wait for approval


In [None]:
# Wait for the data owner to accept
print("Waiting for data owner to accept...")
print("(Run the DO notebook and accept the session request)")

accepted = session.wait_for_acceptance(timeout=120)
if accepted:
    print(f"\n‚úì Session accepted!")
    print(f"  Session ID: {session.session_id}")
    print(f"  Local folder: {session.local_folder}")
    print(f"  Peer folder: {session.peer_folder}")
else:
    print("Session not accepted within timeout")

Waiting for data owner to accept...
(Run the DO notebook and accept the session request)
‚è≥ Waiting for client1@sandbox.local to accept session a88409add5c9...


## Browse Peer's Published Data

Once the session is accepted, view the data the owner has published.

In [None]:
session.is_active

In [14]:
session.peer_remote_vars

DEBUG read_with_shadow:
  datasite: "/Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/datasites/client1@sandbox.local/shared/biovault/sessions/7493df382bf6/remote_vars.json"
  relative: "client1@sandbox.local/shared/biovault/sessions/7493df382bf6/remote_vars.json"
  shadow: "/Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/unencrypted/client1@sandbox.local/shared/biovault/sessions/7493df382bf6/remote_vars.json"
  shadow_root: "/Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/unencrypted"
  ‚úì Created shadow parent: "/Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/unencrypted/client1@sandbox.local/shared/biovault/sessions/7493df382bf6"
‚úì Cached PLAINTEXT to shadow: "/Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/unencrypted/client1@sandbox.local/shared/biovault/sessions/7493df382bf6/remote_vars.json"


Name,Type,ID
patient_data,"Twin[DataFrame] (3, 5)",7ca715568ebb...


In [15]:
# View data published in the session
if session.is_active:
    peer_vars = session.peer_remote_vars
    print(f"Data available from {session.peer}:")
    display(peer_vars)
else:
    print("Session not active yet")

Data available from client1@sandbox.local:


DEBUG read_with_shadow:
  datasite: "/Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/datasites/client1@sandbox.local/shared/biovault/sessions/7493df382bf6/remote_vars.json"
  relative: "client1@sandbox.local/shared/biovault/sessions/7493df382bf6/remote_vars.json"
  shadow: "/Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/unencrypted/client1@sandbox.local/shared/biovault/sessions/7493df382bf6/remote_vars.json"
  shadow_root: "/Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/unencrypted"


Name,Type,ID
patient_data,"Twin[DataFrame] (3, 5)",7ca715568ebb...


## Load a Twin from Peer

When we load a Twin:
1. The encrypted `.beaver` file is read from the session folder
2. Decrypted using our private key (from `.syc/`)
3. Only the **public** (mock) data is available locally
4. Private data stays with the owner

In [16]:
# Load patient data Twin from session
if session.is_active:
    try:
        patient_data = session.peer_remote_vars["patient_data"].load(auto_accept=True)
        patient_data
    except KeyError:
        print("patient_data not yet published by data owner")
        print("Run the DO notebook to publish data")

‚úì Loaded Twin 'patient_data' from published location


DEBUG read_with_shadow:
  datasite: "/Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/datasites/client1@sandbox.local/shared/biovault/sessions/7493df382bf6/remote_vars.json"
  relative: "client1@sandbox.local/shared/biovault/sessions/7493df382bf6/remote_vars.json"
  shadow: "/Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/unencrypted/client1@sandbox.local/shared/biovault/sessions/7493df382bf6/remote_vars.json"
  shadow_root: "/Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/unencrypted"
DEBUG read_with_shadow:
  datasite: "/Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/datasites/client1@sandbox.local/shared/biovault/sessions/7493df382bf6/data/1081834ff8a64b65bb8f34d07231f8ef.beaver"
  relative: "client1@sandbox.local/shared/biovault/sessions/7493df382bf6/data/1081834ff8a64b65bb8f34d07231f8ef.beaver"
  shadow: "/Users/madhavajay/dev/biovault-beaver/workspace2/sandbo

In [17]:
session.peer_remote_vars["patient_data"].load()

‚úì Loaded Twin 'patient_data' from published location


DEBUG read_with_shadow:
  datasite: "/Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/datasites/client1@sandbox.local/shared/biovault/sessions/7493df382bf6/remote_vars.json"
  relative: "client1@sandbox.local/shared/biovault/sessions/7493df382bf6/remote_vars.json"
  shadow: "/Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/unencrypted/client1@sandbox.local/shared/biovault/sessions/7493df382bf6/remote_vars.json"
  shadow_root: "/Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/unencrypted"
DEBUG read_with_shadow:
  datasite: "/Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/datasites/client1@sandbox.local/shared/biovault/sessions/7493df382bf6/data/1081834ff8a64b65bb8f34d07231f8ef.beaver"
  relative: "client1@sandbox.local/shared/biovault/sessions/7493df382bf6/data/1081834ff8a64b65bb8f34d07231f8ef.beaver"
  shadow: "/Users/madhavajay/dev/biovault-beaver/workspace2/sandbo

0,1,2
üîí Private,not available,.request_private()
üåç Public,patient_id name age test_result diagnosis 0 ...,‚Üê .value uses this
Owner,client1@sandbox.local,client1@sandbox.local
Live,‚ö´ Disabled,‚ö´ Disabled
IDs: twin=a85dadf3... private=e959439d... public=ad63459e...,IDs: twin=a85dadf3... private=e959439d... public=ad63459e...,IDs: twin=a85dadf3... private=e959439d... public=ad63459e...


In [18]:
patient_data.public

Unnamed: 0,patient_id,name,age,test_result,diagnosis
0,M001,Patient A,30,6.5,negative
1,M002,Patient B,40,8.0,positive
2,M003,Patient C,35,7.0,positive


In [19]:
# Access the public (mock) value
try:
    print(f"Public data available: {patient_data.public is not None}")
    print(f"Private data available: {patient_data.private is not None}")
    display(patient_data.public)
except NameError:
    print("patient_data not loaded yet")

Public data available: True
Private data available: False


Unnamed: 0,patient_id,name,age,test_result,diagnosis
0,M001,Patient A,30,6.5,negative
1,M002,Patient B,40,8.0,positive
2,M003,Patient C,35,7.0,positive


## Define Analysis Function

Create a function to analyze the data. The `@bv` decorator makes it privacy-aware.

In [20]:
@bv
def compute_stats(df):
    """Compute basic statistics on patient data."""
    return {
        "count": len(df),
        "mean_age": df["age"].mean(),
        "mean_test_result": df["test_result"].mean(),
        "positive_rate": (df["diagnosis"] == "positive").mean(),
    }

In [21]:
patient_data.public

Unnamed: 0,patient_id,name,age,test_result,diagnosis
0,M001,Patient A,30,6.5,negative
1,M002,Patient B,40,8.0,positive
2,M003,Patient C,35,7.0,positive


In [22]:
# Run on public (mock) data first
try:
    result = compute_stats(patient_data)
    result
except NameError:
    print("patient_data not loaded yet")

In [23]:
# View public result
try:
    print("Public (mock) statistics:")
    print(result.public)
except NameError:
    print("result not computed yet")

Public (mock) statistics:
{'count': 3, 'mean_age': np.float64(35.0), 'mean_test_result': np.float64(7.166666666666667), 'positive_rate': np.float64(0.6666666666666666)}


## Request Private Computation

When we request private computation:
1. The function code + inputs are serialized
2. **Encrypted** for the data owner (client1)
3. Written to the session folder
4. Synced to peer via SyftBox server
5. Only the data owner can decrypt and execute

In [24]:
# Request private computation
try:
    result.request_private()
except NameError:
    print("result not computed yet")

üì® Sending computation request to client1@sandbox.local
   Function: compute_stats
   Result: result
‚úì Sent to /Users/madhavajay/dev/biovault-beaver/workspace2/sandbox/client2@sandbox.local/datasites/client2@sandbox.local/shared/client1@sandbox.local/8696b10a2863479790c7ee1482faec40.beaver
üí° Result will auto-update when client1@sandbox.local approves


In [25]:
# Wait for the data owner to approve
print("Waiting for data owner to approve...")
bv.wait_for_message()

Waiting for data owner to approve...


KeyboardInterrupt: 

In [26]:
# Check the result
try:
    result
except NameError:
    print("result not computed yet")

In [None]:
# View private result (if approved)
try:
    if result.private is not None:
        print("Private (real) statistics:")
        print(result.private)
    else:
        print("Private result not yet available")
except NameError:
    print("result not computed yet")

## Check Inbox

View all received messages (decrypted).

In [None]:
bv.inbox()

## Summary

With session-based SyftBox integration:
- ‚úì Session-based access control
- ‚úì All communication is encrypted end-to-end
- ‚úì Only authorized peers can read messages
- ‚úì Computation requests are encrypted for the data owner
- ‚úì Results are encrypted for the requester
- ‚úì Files sync automatically via SyftBox server