# Data Owner (DO) - Single Cell Analysis with SyftBox

This notebook demonstrates the **Beaver + SyftBox integration** from the data owner's perspective.

Key features:
- **Session-based** communication with data scientists
- Data is **encrypted at rest** using SyftBox crypto
- Private data never leaves your environment
- Files sync automatically via SyftBox server

## Setup

In [None]:
# Install dependencies if needed
# !uv pip install scanpy anndata matplotlib scikit-misc

In [None]:
import sys
from pathlib import Path

# Add beaver to path (for development)
sys.path.insert(0, "../python/src")

import beaver
from beaver import Twin
import scanpy as sc
import anndata as ad

In [None]:
# Configuration
DATA_DIR = Path.cwd()  # Should be sandbox/client1@sandbox.local
USER_EMAIL = "client1@sandbox.local"

print(f"Data directory: {DATA_DIR}")
print(f"User: {USER_EMAIL}")

## Connect with SyftBox Backend

In [None]:
# Connect with SyftBox backend (encrypted mode)
bv = beaver.connect(
    user=USER_EMAIL,
    data_dir=DATA_DIR,
)

print(f"Connected as {bv.user}")
print(f"  SyftBox enabled: {bv.syftbox_enabled}")
if bv.backend:
    print(f"  Encryption: {'Enabled' if bv.backend.uses_crypto else 'Disabled'}")

In [None]:
# Register AnnData serializer for efficient Twin transfer
from beaver.runtime import TrustedLoader

@TrustedLoader.register(ad.AnnData)
def annadata_serialize_file(obj, path):
    obj.write_h5ad(path)

@TrustedLoader.register(ad.AnnData)
def annadata_deserialize_file(path):
    return ad.read_h5ad(path)

## Load Single Cell Data

In [None]:
# Data paths
data_dir = Path("../notebooks/single_cell/data")
private_path = data_dir / "sc_RNAseq_adata_downsampled_to5percent.private.h5ad"
mock_path = data_dir / "sc_RNAseq_adata_downsampled_to5percent.mock.h5ad"
sim_path = data_dir / "adata_simulated.h5ad"

print(f"Private data: {private_path.exists()}")
print(f"Mock data: {mock_path.exists()}")

In [None]:
# Prepare mock data from simulated if mock doesn't exist
if not mock_path.exists() and sim_path.exists():
    adata_sim = sc.read(sim_path)
    adata_sim.obs.rename(
        columns={"pct_counts_in_top_50_genes": "pct_counts_mt"},
        inplace=True,
    )
    adata_sim.obs.rename(columns={"group": "cell_type"}, inplace=True)
    adata_sim.write_h5ad(mock_path)
    print("Created mock data from simulated")

In [None]:
# Load both datasets
adata_private = sc.read(private_path)
adata_mock = sc.read(mock_path)

print(f"Private data: {adata_private.n_obs} cells x {adata_private.n_vars} genes")
print(f"Mock data: {adata_mock.n_obs} cells x {adata_mock.n_vars} genes")

## Create Twin

In [None]:
# Create Twin with private (real) and public (mock) data
patient_sc = Twin(
    private=adata_private,
    public=adata_mock,
    owner=USER_EMAIL,
    name="patient_sc",
)

patient_sc

## Handle Session Requests

Data scientists must request a session before accessing data.
Review pending requests and accept/reject them.

In [None]:
# Check for pending session requests
requests = bv.session_requests()
requests

In [None]:
# Accept the first request (if any)
if len(requests) > 0:
    session = requests[0].accept()
    print(f"Session accepted: {session.session_id}")
    print(f"  Peer: {session.peer}")
    print(f"  Local folder: {session.local_folder}")
else:
    print("No pending session requests.")
    print("Wait for a data scientist to request a session...")

## Publish Data to Session

Once a session is accepted, publish data to the session folder.
Only the session peer can access this data.

In [None]:
# Publish Twin to session
try:
    session.remote_vars["patient_sc"] = patient_sc
    print(f"Published patient_sc to session {session.session_id}")
except NameError:
    print("No active session. Accept a session request first.")

In [None]:
# View session workspace
try:
    session.workspace()
except NameError:
    print("No active session")

## Handle Computation Requests

Check the session inbox for computation requests from the data scientist.
Execute on private data, review results, then approve or reject.

In [None]:
# Check session inbox for requests from peer
try:
    session.inbox()
except NameError:
    print("No active session")

In [None]:
# Load a computation request
try:
    inbox = session.inbox()
    if len(inbox) > 0:
        inbox[0].load()
    else:
        print("No requests in inbox yet")
except NameError:
    print("No active session")

In [None]:
# Run on mock data first to preview
# mock_result = request_make_violin_for_result.run_mock()
# mock_result.public_figures[0]

In [None]:
# Run on both mock and private data
# result = request_make_violin_for_result.run_both()
# result

In [None]:
# Review private results before approval
# print("Private stdout:", result.private_stdout)
# result.private_figures[0]

In [None]:
# Approve and send result back to data scientist
# result.approve()

## Process Multiple Requests

Use this loop to handle incoming computation requests.

In [None]:
# Continuously check for and process requests
import time

def process_requests(session, timeout=300):
    """Process incoming computation requests."""
    start = time.time()
    processed = set()
    
    while time.time() - start < timeout:
        inbox = session.inbox()
        
        for env in inbox:
            if env.envelope_id not in processed:
                print(f"\nNew request: {env.name}")
                obj = env.load()
                
                # If it's a computation request, process it
                if hasattr(obj, 'run_both'):
                    result = obj.run_both()
                    print(f"  Computed result")
                    
                    # Auto-approve (or add manual review here)
                    result.approve()
                    print(f"  Approved and sent")
                
                processed.add(env.envelope_id)
        
        time.sleep(2)  # Poll interval
    
    print(f"\nProcessed {len(processed)} requests")

# Uncomment to run:
# process_requests(session)

## Summary

With session-based SyftBox integration:
- Data is encrypted at rest
- Session-based access control
- Only authorized peers can decrypt
- Private data never leaves this environment
- Files sync automatically via SyftBox server