# Data Scientist (DS) - Alice's Notebook

This notebook demonstrates the **RemoteData architecture** from the data scientist's perspective.

Alice wants to analyze patient data owned by Bob, but Bob wants to preserve privacy.

**‚ö†Ô∏è  Make sure you've run `test_do.ipynb` first!**

## Setup

In [1]:
import sys
sys.path.insert(0, "../python/src")

import beaver
import pandas as pd
from beaver import Twin

In [2]:
# Connect as Alice (data scientist)
bv = beaver.connect("shared", user="alice")

üîÑ Auto-load replies enabled for alice (polling every 2.0s)


## 1. Check Inbox

Alice received data from Bob. Let's see what's in the inbox.

In [3]:
bv.peer("bob").remote_vars

Name,Type,ID
counter,Twin[int],cb2f4f8a3379...


In [4]:
bv.peer("bob").remote_vars["counter"].load()

üü¢ Live sync enabled (read-only, every 2.0s)
‚úì Loaded Twin 'counter' from published location


0,1,2
üîí Private,not available,.request_private()
üåç Public,1,‚Üê .value uses this
Owner,bob,bob
Live,"üü¢ Enabled (read-only, 2.0s)","üü¢ Enabled (read-only, 2.0s)"
IDs: twin=ad0927db... private=12866af7... public=ac20f2d2...,IDs: twin=ad0927db... private=12866af7... public=ac20f2d2...,IDs: twin=ad0927db... private=12866af7... public=ac20f2d2...


In [5]:
counter

0,1,2
üîí Private,not available,.request_private()
üåç Public,1,‚Üê .value uses this
Owner,bob,bob
Live,"üü¢ Enabled (read-only, 2.0s)","üü¢ Enabled (read-only, 2.0s)"
IDs: twin=ad0927db... private=12866af7... public=ac20f2d2...,IDs: twin=ad0927db... private=12866af7... public=ac20f2d2...,IDs: twin=ad0927db... private=12866af7... public=ac20f2d2...


In [6]:
counter.public

1

In [7]:
counter.public

1

In [8]:
counter.public_value

1

In [9]:
for i in range(100):
    import time
    time.sleep(2)
    print(counter.public)

1
1
2
2
4
5
6
7
8
9


KeyboardInterrupt: 

In [None]:
# View inbox
bv.inbox()

In [None]:
# Look at the first envelope
bv.inbox()[0]

## 2. Load the Twin (Public Side Only)

When Alice loads the Twin, she **only receives the public (mock) data**.
The private data never left Bob's environment!

In [None]:
# Load the Twin from Bob
bv.inbox()[-1].load()

In [None]:
# Now patient_data is in globals
patient_data

In [None]:
# Check what Alice has access to
print(f"Has private: {patient_data.has_private}")  # False!
print(f"Has public: {patient_data.has_public}")    # True!
print(f"Owner: {patient_data.owner}")

## 3. Work with Public (Mock) Data

Alice can develop her analysis using the mock data.

In [None]:
# Alice works with mock data
print("üåç Working with PUBLIC data:")
patient_data.value

In [None]:
# Develop analysis on mock data
print(f"Total patients (mock): {len(patient_data.value)}")
print(f"Average age (mock): {patient_data.value['age'].mean():.1f}")
print(f"Positive rate (mock): {(patient_data.value['diagnosis'] == 'positive').sum() / len(patient_data.value) * 100:.1f}%")

In [None]:
# Create analysis function
@bv
def analyze_patients(df):
    """Analysis function developed on mock data."""
    results = {
        'total_patients': len(df),
        'avg_age': df['age'].mean(),
        'avg_test_result': df['test_result'].mean(),
        'positive_count': (df['diagnosis'] == 'positive').sum(),
        'positive_rate': (df['diagnosis'] == 'positive').sum() / len(df) * 100
    }
    return results

In [None]:
# Test on mock data
mock_results = analyze_patients(patient_data.public)
print("\nüìä Analysis Results (Mock Data):")
for key, value in mock_results.items():
    print(f"  {key}: {value:.2f}")

In [None]:
type(mock_results)

## 4. Request Private Data Access

Once Alice's analysis is ready, she can request access to the **real (private)** data.

In [None]:
real_result = analyze_patients(patient_data)

In [None]:
real_result

In [None]:
real_result.request_private()

In [None]:
real_result

In [None]:
real_result

In [None]:
real_result.value

In [None]:
real_result

In [None]:
for i in range(100):
    import time
    time.sleep(2)
    print(count.public)

In [None]:
bv.inbox()

In [None]:
bv.inbox()[2].load()

In [None]:
count

In [None]:
count.value

In [None]:
count.value

In [None]:
for i in range(100):
    import time
    time.sleep(2)
    print(count.public)

In [None]:
bv.inbox()

In [None]:
# Load the request (gets injected into globals)
comp_request = bv.inbox()[0].load()

In [None]:
# Request private access
patient_data.request_private()

üí° **Note**: The request flow is not yet fully implemented. In a complete system:
- Bob would receive Alice's request
- Bob could review the analysis code
- Bob could approve and send results (computed on private data)
- Or Bob could grant temporary access with conditions

## 5. Access Remote Variables

Alice can see what remote variables Bob has published.

In [None]:
# View Bob's remote variables
bv.peer("bob").remote_vars

In [None]:
bv.peer("bob").remote_vars["patient_data"].load()

In [None]:
patient_data.live

## 6. Load the Second Twin (Test Scores)

Bob also sent test scores with auto-generated mock data.

In [None]:
# # Load second Twin
# if len(bv.inbox()) > 1:
#     bv.inbox()[1].load()
#     print("\n‚úì Loaded test_scores")
#     test_scores

In [None]:
# # Alice only sees the mock (first 10 items)
# if 'test_scores' in globals():
#     print(f"Test scores (mock): {test_scores.value}")
#     print(f"\nCan Alice see private? {test_scores.has_private}")  # False
#     print(f"Can Alice see public? {test_scores.has_public}")      # True

## 7. Watch for Live Updates (Demo)

If Bob enabled live sync, Alice can watch for real-time changes.

**Note**: This is a demonstration of the API. In practice, you'd need Bob to enable live sync first.

In [None]:
# Check if live sync is available
print(f"Live sync enabled: {patient_data.live}")
print(f"Sync interval: {patient_data.sync_interval}s")

if patient_data.live:
    print("\nüî¥ Live sync is active!")
    print(f"Last sync: {patient_data.last_sync}")
else:
    print("\n‚ö´ Live sync not enabled on this Twin")

In [None]:
# Example: Subscribe to changes (if live sync were enabled)
def on_data_change():
    print(f"üîî Data updated! New row count: {len(patient_data.value)}")

# patient_data.on_change(on_data_change)
print("üí° If live sync were enabled, changes would trigger callbacks")

In [None]:
# Example: Watch pattern (generator)
# for updated_value in patient_data.watch(timeout=10):
#     print(f"Updated: {len(updated_value)} patients")

print("üí° The .watch() method provides a generator pattern for monitoring changes")

## 8. Compare Twin Display Formats

Twins have rich display formats showing all relevant information.

In [None]:
# String representation
print("String representation (__str__):")
print(patient_data)

In [None]:
# Repr (same as str)
print("\nRepr representation:")
print(repr(patient_data))

In [None]:
# Jupyter HTML display (automatic in Jupyter)
patient_data  # Shows rich HTML in Jupyter

## 9. RemoteData Interface

All Twins implement the RemoteData interface with consistent methods.

In [None]:
# RemoteData interface methods
print("RemoteData Interface:")
print(f"  has_data(): {patient_data.has_data()}")
print(f"  get_value(): {type(patient_data.get_value()).__name__}")
print(f"  id: {patient_data.id[:12]}...")
print(f"  twin_id: {patient_data.twin_id[:12]}...")
print(f"  owner: {patient_data.owner}")
print(f"  var_type: {patient_data.var_type}")

## 10. Summary

Alice has:
- ‚úÖ Received Twins with only public (mock) data
- ‚úÖ Developed analysis on mock data
- ‚úÖ Requested private data access
- ‚úÖ Viewed remote variables
- ‚úÖ Learned about live sync capabilities
- ‚úÖ Used the RemoteData interface

**Key Privacy Features:**
- üîí Private data **never leaves** Bob's environment
- üåç Alice only receives public/mock data
- üìä Analysis can be developed on mock, then run on real data
- üî¥ Live sync enables real-time collaboration
- üí° Request flow allows Bob to approve/review before sharing results

## 11. Full Workflow Example

Here's the complete collaborative workflow:

In [None]:
print("""
üîÑ COMPLETE WORKFLOW:

1. Bob (DO):
   - Creates Twin with private (real) + public (mock) data
   - Sends Twin to Alice (only public side transmitted)
   - Optionally enables live sync

2. Alice (DS):
   - Receives Twin with only public data
   - Develops analysis using mock data
   - Validates approach, tunes parameters
   - Requests private data access

3. Bob (DO):
   - Reviews Alice's analysis code
   - Approves and runs on private data
   - Sends results back to Alice
   - Or grants temporary access

4. Alice (DS):
   - Receives results computed on real data
   - Publishes findings
   - Never had direct access to sensitive data!

üéØ BENEFITS:
- Privacy-preserving by default
- Development on mock, production on real
- Real-time collaboration with live sync
- Explicit approval workflow
- Unified interface (RemoteData)
""")