# Data Owner (DO) - Bob's Notebook

This notebook demonstrates the **RemoteData architecture** from the data owner's perspective.

Bob owns sensitive patient data and wants to collaborate with Alice (a data scientist) while preserving privacy.

## Setup

In [1]:
# # Clean start
!cd shared && rm -rf alice public bob

In [2]:
import sys
sys.path.insert(0, "../python/src")

import beaver
import pandas as pd
import numpy as np
from beaver import Twin

In [3]:
# Connect as Bob (data owner)
bv = beaver.connect("shared", user="bob")

üîÑ Auto-load replies enabled for bob (polling every 2.0s)


## 1. Create Real (Private) Data

Bob has sensitive patient data with real names, ages, and test results.

In [4]:
count = 1

In [5]:
counter = Twin(
    private=1,      # Real data (stays local)
    public=count,       # Mock data (shareable)
    owner="bob",
    name="count"
)

In [6]:
counter

0,1,2
üîí Private,1,‚Üê .value uses this
üåç Public,1,‚úì
Live,‚ö´ Disabled,‚ö´ Disabled
IDs: twin=ad0927db... private=0b24796d... public=ac20f2d2...,IDs: twin=ad0927db... private=0b24796d... public=ac20f2d2...,IDs: twin=ad0927db... private=0b24796d... public=ac20f2d2...


In [7]:
counter.enable_live(interval=2.0)

üü¢ Live sync enabled (read-only, every 2.0s)


In [8]:
counter

0,1,2
üîí Private,1,‚Üê .value uses this
üåç Public,1,‚úì
Live,"üü¢ Enabled (read-only, 2.0s)","üü¢ Enabled (read-only, 2.0s)"
IDs: twin=ad0927db... private=0b24796d... public=ac20f2d2...,IDs: twin=ad0927db... private=0b24796d... public=ac20f2d2...,IDs: twin=ad0927db... private=0b24796d... public=ac20f2d2...


In [9]:
bv.remote_vars["counter"] = counter

üåç Using PUBLIC data from Twin 'counter...'
üì¢ Published Twin 'counter' (public side available at: shared/public/bob/data/0fc60921907f4eb3963c3305c9d3930e.beaver)


In [10]:
counter.public += 1

In [11]:
counter

0,1,2
üîí Private,1,‚Üê .value uses this
üåç Public,2,‚úì
Live,"üü¢ Enabled (read-only, 2.0s)","üü¢ Enabled (read-only, 2.0s)"
IDs: twin=ad0927db... private=0b24796d... public=ac20f2d2...,IDs: twin=ad0927db... private=0b24796d... public=ac20f2d2...,IDs: twin=ad0927db... private=0b24796d... public=ac20f2d2...


üåç Using PUBLIC data from Twin 'counter...'
  üì¢ Re-published to public registry


In [12]:
for i in range(10):
    print(i)
    import time
    time.sleep(2)
    counter.public += 1

0
1
2
üåç Using PUBLIC data from Twin 'counter...'
  üì¢ Re-published to public registry
3
üåç Using PUBLIC data from Twin 'counter...'
  üì¢ Re-published to public registry
4
üåç Using PUBLIC data from Twin 'counter...'
  üì¢ Re-published to public registry
5
üåç Using PUBLIC data from Twin 'counter...'
  üì¢ Re-published to public registry
6
üåç Using PUBLIC data from Twin 'counter...'
  üì¢ Re-published to public registry
7
üåç Using PUBLIC data from Twin 'counter...'
  üì¢ Re-published to public registry
8
üåç Using PUBLIC data from Twin 'counter...'
  üì¢ Re-published to public registry
9
üåç Using PUBLIC data from Twin 'counter...'
  üì¢ Re-published to public registry


KeyboardInterrupt: 

In [None]:
# Real patient data (SENSITIVE)
real_data = pd.DataFrame({
    'patient_id': ['P001', 'P002', 'P003', 'P004', 'P005', 'P006', 'P007', 'P008', 'P009', 'P010'],
    'name': ['Alice Johnson', 'Bob Smith', 'Carol White', 'David Brown', 'Eve Davis', 
             'Frank Miller', 'Grace Lee', 'Henry Wilson', 'Iris Taylor', 'Jack Anderson'],
    'age': [34, 45, 29, 52, 38, 41, 36, 48, 31, 55],
    'test_result': [7.2, 8.5, 6.1, 9.3, 7.8, 8.1, 6.9, 9.0, 7.5, 8.8],
    'diagnosis': ['positive', 'positive', 'negative', 'positive', 'positive', 
                  'positive', 'negative', 'positive', 'positive', 'positive']
})

print("üîí PRIVATE DATA (Real Patient Records):")
real_data

## 2. Create Mock (Public) Data

Bob creates anonymized mock data for Alice to develop her analysis on.

In [None]:
# Mock data - anonymized, smaller sample
mock_data = pd.DataFrame({
    'patient_id': ['M001', 'M002', 'M003', 'M004', 'M005'],
    'name': ['Patient A', 'Patient B', 'Patient C', 'Patient D', 'Patient E'],
    'age': [30, 40, 35, 50, 45],
    'test_result': [6.5, 8.0, 7.0, 9.0, 8.5],
    'diagnosis': ['negative', 'positive', 'positive', 'positive', 'positive']
})

print("üåç PUBLIC DATA (Mock/Anonymized):")
mock_data

## 3. Create a Twin (Dual-Value Privacy Object)

A **Twin** holds both private (real) and public (mock) data.
- Bob sees and uses the **private** side locally
- When shared, only the **public** side is transmitted

In [None]:
# Create Twin with both sides
patient_data = Twin(
    private=real_data,      # Real data (stays local)
    public=mock_data,       # Mock data (shareable)
    owner="bob",
    name="patient_data"
)

patient_data

## 4. Work with the Twin Locally

When Bob accesses `.value`, it uses the **private** side (real data).

In [None]:
# Bob works with real data
print(f"Total patients: {len(patient_data.value)}")
print(f"Average age: {patient_data.value['age'].mean():.1f}")
print(f"Positive rate: {(patient_data.value['diagnosis'] == 'positive').sum() / len(patient_data.value) * 100:.1f}%")

In [None]:
# Notice: The "Using PRIVATE data" message only printed once!
# Multiple accesses don't spam the output
patient_data.value['test_result'].mean()

## 5. Share Twin with Data Scientist

When Bob sends the Twin to Alice, **only the public side is transmitted**.
The private data never leaves Bob's environment.

In [None]:
# Send to Alice - only public side is shared!
result = bv.send(patient_data, user="alice")
print(f"\n‚úâÔ∏è  Sent to Alice: {result.path}")
print(f"üì¶ Envelope ID: {result.envelope.envelope_id[:12]}...")

In [None]:
bv.inbox()

In [None]:
req = bv.inbox()[0]

In [None]:
req

In [None]:
req.load()

In [None]:
real_result

In [None]:
r = real_result.run_both()

In [None]:
r

In [None]:
r.result

In [None]:
r.approve()

In [None]:
# counter.subscribe_live("alice")

# bv.send(counter, user="alice")

In [None]:

# # Now just update - it auto-sends!
# counter.public += 1  # Automatically sends to Alice

# Alice's side:
# counter = list(bv.inbox())[0].load(inject=False)  # 1. Load Twin

# # 2. Watch for updates (generator yields on each update)
# for updated in counter.watch_live(context=bv_alice):
#   print(f"Counter: {updated.public}")  # Real-time updates!

# üìÑ Files Modified

# 1. python/src/beaver/twin.py:348-387 - Fixed duplicate updates in
# watch_live()
# 2. python/src/beaver/live_mixin.py:238-262 - Removed debug output
# 3. LIVE_SYNC_GUIDE.md - Created comprehensive guide

# The live sync system is now fully functional! Bob's changes automatically
# propagate to Alice in real-time. üéâ

In [None]:
counter.public

In [None]:
# counter.enable_live(interval=2.0)

In [None]:
bv.send(counter, user="alice")

In [None]:
for i in range(10):
    print(i)
    import time
    time.sleep(2)
    counter.public += 1

In [None]:
counter.public

In [None]:
z = real_result.run_both()

In [None]:
z.result.private = None

In [None]:
z.approve()

In [None]:
computation_analyze_patients_result

In [None]:
comp_result = computation_analyze_patients_result.run()

In [None]:
comp_result

In [None]:
comp_result

In [None]:
# View the request details
print(comp_request)
# ComputationRequest: analyze_patients_result
#   From: alice
#   Function: analyze_patients
#   üí° Will update Twin variable 'analyze_patients_result' on
alice's side
#   Args (1):
#     [0] Twin: patient_data
#   üí° Call .run() to execute


# ============================================================
# STEP 2: Test on MOCK data first (safe preview)
# ============================================================
# Inspect the function source
import inspect
print(inspect.getsource(comp_request.func))

# Run on mock/public data to see what it does
mock_test = comp_request.func(patient_data.public)
print("Mock result:", mock_test)
# ‚Üí Shows: {'total_patients': 5, 'avg_age': 40.0, ...}


# ============================================================
# STEP 3: Run on REAL data (executes computation)
# ============================================================
comp_result = comp_request.run(context=bv)

# View the result
print(comp_result)
# ComputationResult: analyze_patients_result
#   From request by: alice
#   ‚úì Result (dict): {'total_patients': 10, 'avg_age': 42.5, ...}
#   üí° Actions:
#      .approve()           - Send result back
#      .approve_with(value) - Send different value
#      .data = new_value    - Modify before approving


# ============================================================
# OPTION A: Approve and send back as-is
# ============================================================
comp_result.approve()
# ‚úÖ Approving result for: analyze_patients_result
#    ‚úì Result sent to alice's inbox
# ‚Üí Alice's `real_result` Twin auto-updates!


# ============================================================
# OPTION B: Reject (just don't approve, or delete the request)
# ============================================================
# Don't call .approve() - just ignore it
# Or explicitly:
print("‚ùå Rejecting request - not sending result")


# ============================================================
# OPTION C: Modify result before approving
# ============================================================
# Option C1: Modify the result data directly
comp_result.data['total_patients'] = 9  # Hide one patient
comp_result.data['positive_rate'] =
round(comp_result.data['positive_rate'], 1)
comp_result.approve()

# Option C2: Substitute with completely different value
modified_result = {
  'total_patients': comp_result.data['total_patients'],
  'summary': 'Limited data shared for privacy'
}
comp_result.approve_with(modified_result)


# ============================================================
# OPTION D: Inspect before deciding
# ============================================================
# Check stdout/stderr from execution
print("Stdout:", comp_result.stdout)
print("Stderr:", comp_result.stderr)

# Check if there was an error
if comp_result.error:
  print("‚ùå Execution failed:", comp_result.error)
else:
  # Decide based on result
  if comp_result.data['positive_rate'] > 50:
      print("‚ö†Ô∏è  High positive rate - reviewing before
approval")
      # ... manual review ...

  comp_result.approve()

Quick Reference

# Load
comp_request = bv.inbox()[0].load()

# Test on mock
preview = comp_request.func(patient_data.public)

# Run on real
comp_result = comp_request.run(context=bv)

# Approve
comp_result.approve()                    # Send as-is
comp_result.approve_with(other_value)    # Send different value
comp_result.data = modified; comp_result.approve()  # Modify then
send

The enhanced display now shows: üí° Will update Twin variable
'analyze_patients_result' on alice's side so Bob knows exactly
what will happen when he approves!

In [None]:
# Enable live sync (read-only, 2 second interval)
patient_data.enable_live(mutable=False, interval=2.0)

In [None]:
# Check live status
print(f"Live enabled: {patient_data.live}")
print(f"Mutable: {patient_data.mutable}")
print(f"Sync interval: {patient_data.sync_interval}s")
print(f"Last sync: {patient_data.last_sync}")

In [None]:
# Display shows live status
patient_data

## 7. Update Data (Simulating Real-time Changes)

Let's simulate Bob adding new patient records in real-time.

In [None]:
import time

# Simulate data updates
print("üìù Adding new patient records...\n")

# Add to real data
new_patient_real = pd.DataFrame([{
    'patient_id': 'P011',
    'name': 'Karen Martin',
    'age': 42,
    'test_result': 7.6,
    'diagnosis': 'positive'
}])

patient_data.private = pd.concat([patient_data.private, new_patient_real], ignore_index=True)
print(f"‚úì Added to private data. Total patients: {len(patient_data.private)}")

time.sleep(3)  # Wait for sync

# Add another
new_patient_real2 = pd.DataFrame([{
    'patient_id': 'P012',
    'name': 'Leo Thompson',
    'age': 39,
    'test_result': 6.8,
    'diagnosis': 'negative'
}])

patient_data.private = pd.concat([patient_data.private, new_patient_real2], ignore_index=True)
print(f"‚úì Added another. Total patients: {len(patient_data.private)}")

## 8. Remote Variables (Named References)

Bob can publish the Twin as a **remote variable** that Alice can reference by name.

In [None]:
patient_data.enable_live(mutable=False, interval=2.0)

In [None]:
type(patient_data)

In [None]:
# Publish as remote variable
bv.remote_vars["patient_data"] = patient_data
print("üìç Published 'patient_data' as remote variable")

In [None]:
# View Bob's remote variables
bv.remote_vars

## 9. Create Another Twin with Auto-Mock

The `Twin.from_mock()` method can auto-generate mock data.

## 10. Summary

Bob has:
- ‚úÖ Created Twins with private (real) and public (mock) data
- ‚úÖ Worked with private data locally
- ‚úÖ Shared only public data with Alice
- ‚úÖ Enabled live sync for real-time updates
- ‚úÖ Published remote variables for easy reference
- ‚úÖ Used auto-mock generation

**üé¨ Now switch to `test_ds.ipynb` to see Alice's perspective!**

In [None]:
# Cleanup: disable live sync before finishing
patient_data.disable_live()