# Security Test: Data Owner (Victim)

This notebook simulates a data owner with **SECRET DATA** that an attacker will try to steal.

The secret is: `TOP_SECRET_PASSWORD_12345`

Run alongside `00-malicious-ds.ipynb` to verify all attack vectors are blocked.

## Step 1: Setup

In [None]:
!uv pip install pandas numpy -q

In [None]:
import beaver
from beaver import Twin

bv = beaver.ctx()
session = bv.active_session()
session.reset(force=True)

print(f"You: {bv.user}")
print(f"Peer: {session.peer}")

## Step 2: Create Secret Data

This is the sensitive data the attacker wants to steal.

In [None]:
import pandas as pd
import numpy as np

# The SECRET that should NEVER leak
SECRET_PASSWORD = "TOP_SECRET_PASSWORD_12345"
SECRET_API_KEY = "sk-live-SUPER_SECRET_API_KEY_67890"

# Public mock data (safe to share)
mock_data = pd.DataFrame({
    'user_id': [1, 2, 3],
    'name': ['Alice Mock', 'Bob Mock', 'Charlie Mock'],
    'balance': [100.0, 200.0, 300.0]
})

# Private REAL data with secrets embedded
real_data = pd.DataFrame({
    'user_id': [1, 2, 3],
    'name': ['Alice Real', 'Bob Real', 'Charlie Real'],
    'balance': [10000.0, 25000.0, 50000.0],
    'password': [SECRET_PASSWORD, 'other_secret_1', 'other_secret_2'],
    'api_key': [SECRET_API_KEY, 'sk-test-key-1', 'sk-test-key-2']
})

print("Mock data (public):")
print(mock_data)
print("\nReal data shape:", real_data.shape)
print("(Private data contains secrets - should NEVER leak!)")

## Step 3: Publish Twin with Secret Data

In [None]:
secret_twin = Twin(public=mock_data, private=real_data, name="user_data")
session.remote_vars["user_data"] = secret_twin
print("Published 'user_data' twin with SECRET private data!")
print("Attacker will now try various exploits to steal the secrets...")

## Step 4: Process Attacker Requests

We'll wait for and process ALL attacker requests.
Each malicious request should be **BLOCKED** by security checks.

In [None]:
from beaver.runtime import SecurityError
import traceback
import time

attacks_blocked = 0
attacks_total = 0
secrets_leaked = False

# Give DS time to send attacks
print("Waiting for attacker to send malicious requests...")
time.sleep(3)

# Process all pending requests (up to 10)
for i in range(10):
    try:
        print(f"\n{'='*60}")
        print(f"Checking for attack attempt #{i+1}...")
        
        # Short timeout - just check if there's a request
        request = bv.wait_for_request(secret_twin, timeout=5)
        if request is None:
            print("No more attack attempts.")
            break
            
        attacks_total += 1
        print(f"Received malicious request!")
        
        # Try to run - the function will execute but we review the output
        try:
            result = request.run_both()
            
            # Check if the result contains any secrets
            public_str = str(result.data.public) if result.data.public else ""
            private_str = str(result.data.private) if result.data.private else ""
            result_str = public_str + private_str
            
            if SECRET_PASSWORD in result_str or SECRET_API_KEY in result_str:
                print(f"SECURITY ALERT: Code tried to leak secrets!")
                print(f"Result preview: {result_str[:200]}...")
                print(f"DO DECISION: REJECTING - will not approve this result")
                attacks_blocked += 1
                # Don't approve! This is the key protection.
            else:
                # Check if it's trying to exfiltrate raw data
                if 'all_data' in result_str and ('Alice Real' in result_str or 'Bob Real' in result_str):
                    print(f"SECURITY ALERT: Code tried to exfiltrate raw private data!")
                    print(f"DO DECISION: REJECTING - will not approve this result")
                    attacks_blocked += 1
                else:
                    print(f"Request executed. Checking output...")
                    print(f"Public result: {public_str[:100]}...")
                    # Even if no secrets in output, we don't auto-approve
                    # In real usage, DO would manually review
                    attacks_blocked += 1
                    print(f"DO DECISION: Not auto-approving (requires manual review)")
                
        except SecurityError as e:
            print(f"BLOCKED by SecurityError: {e}")
            attacks_blocked += 1
        except Exception as e:
            print(f"BLOCKED by error: {type(e).__name__}: {e}")
            attacks_blocked += 1
            
    except TimeoutError:
        print(f"No more attack attempts (timeout).")
        break
    except Exception as e:
        if "timeout" in str(e).lower() or "Timeout" in type(e).__name__:
            print(f"No more attack attempts.")
            break
        print(f"Error: {e}")
        break

print(f"\n{'='*60}")
print(f"SECURITY SUMMARY")
print(f"{'='*60}")
print(f"Total attacks received: {attacks_total}")
print(f"Attacks blocked/not-approved: {attacks_blocked}")
if attacks_total > 0:
    print(f"Block rate: {attacks_blocked/attacks_total*100:.1f}%")
if secrets_leaked:
    print(f"\nWARNING: SECRETS WERE LEAKED!")
else:
    print(f"\nSECRETS PROTECTED - No approved leaks!")

## Step 5: Verify Security

Final verification that secrets are still protected.

In [None]:
# Final assertions
assert not secrets_leaked, "SECURITY FAILURE: Secrets were leaked!"
print("All security checks passed!")
print(f"Secret password remains protected: {SECRET_PASSWORD}")

## Summary

The security model in Beaver relies on **Data Owner review**:

1. **Attacker sends computation requests** - these can contain arbitrary code
2. **DO receives requests** - can view the function code before running
3. **DO runs on mock/public data** - test without exposing secrets
4. **DO reviews output** - check what would be returned
5. **DO approves or rejects** - only approved results are sent back

**Key protections:**
- DO can inspect function code before execution
- DO can test on mock data first
- DO must explicitly approve results
- Secrets only leak if DO accidentally approves bad output

**Current limitations:**
- Function code is NOT sandboxed during execution
- Side effects (file writes, network) are possible
- RestrictedPython only applies to TrustedLoader deserialization

**Future improvements:**
- Sandbox user function execution
- Block dangerous imports in computation functions
- Add static analysis of submitted functions