# SyftObject and SyftMessage Tutorial

This notebook demonstrates how to use the file-backed `SyftObject` and `SyftMessage` classes for secure, transport-agnostic file syncing.

In [1]:
import syft_client as sc
from pathlib import Path
import tempfile
import json

## Part 1: Understanding SyftObject

`SyftObject` is a base class that provides file-backed storage with:
- Atomic operations (no partial writes)
- Concurrency control (file locking)
- Path security (prevents directory traversal)
- Streaming for large files

In [2]:
# Create a temporary directory for our examples
with tempfile.TemporaryDirectory() as tmpdir:
    # Create a basic SyftObject
    obj_path = Path(tmpdir) / "my_object"
    obj = sc.SyftObject(obj_path)
    
    # Set some metadata
    obj.set_metadata({
        "name": "Example Object",
        "type": "demo",
        "author": "alice@example.com"
    })
    
    # Read metadata back
    metadata = obj.get_metadata()
    print("Metadata:", json.dumps(metadata, indent=2))
    
    # Check directory structure
    print("\nDirectory structure:")
    for p in sorted(obj_path.rglob("*")):
        print(f"  {p.relative_to(obj_path)}")

Metadata: {
  "_schema_version": "1.0.0",
  "_updated_at": "2025-08-22T14:10:21.276327",
  "author": "alice@example.com",
  "name": "Example Object",
  "type": "demo"
}

Directory structure:
  .write_lock
  data
  metadata.yaml


In [3]:
# Working with data files
with tempfile.TemporaryDirectory() as tmpdir:
    obj = sc.SyftObject(Path(tmpdir) / "data_example")
    
    # Write some data files
    obj.write_data_file("config.json", b'{"setting": "value"}')
    obj.write_json("users.json", [{"name": "Alice"}, {"name": "Bob"}])
    
    # Read them back
    config = obj.read_data_file("config.json")
    users = obj.read_json("users.json")
    
    print("Config:", config.decode())
    print("Users:", users)
    
    # List all data files
    print("\nAll data files:")
    for f in obj.list_data_files():
        print(f"  {f.name}")

Config: {"setting": "value"}
Users: [{'name': 'Alice'}, {'name': 'Bob'}]

All data files:
  config.json
  users.json


In [4]:
# Demonstrate checksums and locking
with tempfile.TemporaryDirectory() as tmpdir:
    obj = sc.SyftObject(Path(tmpdir) / "lockable")
    
    # Add some content
    obj.write_data_file("data.txt", b"Important data")
    obj.set_metadata({"status": "draft"})
    
    # Calculate checksum
    checksum = obj.calculate_checksum()
    print(f"Checksum: {checksum}")
    
    # Lock the object
    obj.lock(finalized=True, reviewer="bob@example.com")
    
    # Check lock status
    print(f"\nIs locked: {obj.is_locked()}")
    print(f"Lock info: {json.dumps(obj.get_lock_info(), indent=2)}")
    
    # Verify integrity
    print(f"\nChecksum valid: {obj.validate_checksum()}")

Checksum: 99baed7c2e510a0a2c3528fac497371276e05f848d7ce63d84b17b72f57dca29

Is locked: True
Lock info: {
  "checksum": "99baed7c2e510a0a2c3528fac497371276e05f848d7ce63d84b17b72f57dca29",
  "locked_at": "2025-08-22T14:10:21.601665",
  "schema_version": "1.0.0",
  "finalized": true,
  "reviewer": "bob@example.com"
}

Checksum valid: True


## Part 2: Working with SyftMessage

`SyftMessage` extends `SyftObject` for file syncing between users. It provides:
- Message creation with unique IDs
- File attachment with metadata
- Permission tracking
- Message validation

In [5]:
# Create a new message
with tempfile.TemporaryDirectory() as tmpdir:
    # Create message root directory
    outbox = Path(tmpdir) / "outbox"
    outbox.mkdir()
    
    # Create a new message
    message = sc.SyftMessage.create(
        sender_email="alice@example.com",
        recipient_email="bob@example.com",
        message_root=outbox,
        message_type="file_sync"
    )
    
    print(f"Message ID: {message.message_id}")
    print(f"Sender: {message.sender_email}")
    print(f"Recipient: {message.recipient_email}")
    print(f"Timestamp: {message.timestamp}")

Message ID: gdrive_alice@example.com_bob@example.com_1755886221_0eaf44e9
Sender: alice@example.com
Recipient: bob@example.com
Timestamp: 1755886221.937682


In [6]:
# Add files to a message
with tempfile.TemporaryDirectory() as tmpdir:
    # Create some test files
    test_files = Path(tmpdir) / "test_files"
    test_files.mkdir()
    
    # Create test data
    (test_files / "data.csv").write_text("id,name,value\n1,Alice,100\n2,Bob,200")
    (test_files / "report.txt").write_text("This is a confidential report.")
    
    # Create message
    outbox = Path(tmpdir) / "outbox"
    message = sc.SyftMessage.create(
        sender_email="alice@example.com",
        recipient_email="bob@example.com",
        message_root=outbox
    )
    
    # Add files with permissions
    file1 = message.add_file(
        source_path=test_files / "data.csv",
        syftbox_path="/alice@example.com/shared/data.csv",
        permissions={
            "read": ["bob@example.com", "charlie@example.com"],
            "write": ["alice@example.com"],
            "admin": ["alice@example.com"]
        }
    )
    
    file2 = message.add_file(
        source_path=test_files / "report.txt",
        syftbox_path="/alice@example.com/private/report.txt",
        permissions={
            "read": ["bob@example.com"],
            "write": [],
            "admin": ["alice@example.com"]
        }
    )
    
    print("Added files:")
    for f in message.get_files():
        print(f"\n  {f['filename']}:")
        print(f"    Path: {f['syftbox_path']}")
        print(f"    Size: {f['file_size']} bytes")
        print(f"    Hash: {f['file_hash'][:16]}...")
        print(f"    Read: {f['permissions']['read']}")

Added files:

  data.csv:
    Path: /alice@example.com/shared/data.csv
    Size: 35 bytes
    Hash: 4acab7e77f0730f4...
    Read: ['bob@example.com', 'charlie@example.com']

  report.txt:
    Path: /alice@example.com/private/report.txt
    Size: 30 bytes
    Hash: d3777fdfd65c9f3b...
    Read: ['bob@example.com']


In [7]:
# Finalize and validate a message
with tempfile.TemporaryDirectory() as tmpdir:
    # Create and populate message
    outbox = Path(tmpdir) / "outbox"
    message = sc.SyftMessage.create(
        sender_email="alice@example.com",
        recipient_email="bob@example.com",
        message_root=outbox
    )
    
    # Add a test file
    test_file = Path(tmpdir) / "test.txt"
    test_file.write_text("Hello, Bob!")
    message.add_file(test_file, "/alice@example.com/notes/test.txt")
    
    # Add a README
    message.add_readme("""
    <html>
    <body>
        <h1>File Update</h1>
        <p>Hi Bob, here's the latest test file.</p>
    </body>
    </html>
    """)
    
    print(f"Ready before finalize: {message.is_ready}")
    
    # Finalize the message
    message.finalize()
    
    print(f"Ready after finalize: {message.is_ready}")
    
    # Validate the message
    is_valid, error = message.validate()
    print(f"\nValidation: {'✓ Valid' if is_valid else f'✗ Invalid: {error}'}")
    
    # Show directory structure
    print("\nMessage structure:")
    for p in sorted(message.path.rglob("*")):
        if p.is_file():
            print(f"  {p.relative_to(message.path)}")

Ready before finalize: False
Ready after finalize: True

Validation: ✓ Valid

Message structure:
  .write_lock
  README.html
  data/files/test.txt
  lock.json
  metadata.yaml


## Part 3: Simulating Message Transfer

Let's simulate sending a message from Alice to Bob through a shared folder.

In [8]:
import shutil

with tempfile.TemporaryDirectory() as tmpdir:
    tmpdir = Path(tmpdir)
    
    # Setup directories
    alice_outbox = tmpdir / "alice" / "outbox"
    shared_folder = tmpdir / "shared" / ".syft" / "messages"
    bob_inbox = tmpdir / "bob" / "inbox"
    
    for d in [alice_outbox, shared_folder, bob_inbox]:
        d.mkdir(parents=True)
    
    # ALICE: Create and send a message
    print("=== ALICE CREATES MESSAGE ===")
    
    # Create test file
    alice_data = tmpdir / "alice_data.csv"
    alice_data.write_text("metric,value\naccuracy,0.95\nloss,0.05")
    
    # Create message
    message = sc.SyftMessage.create(
        sender_email="alice@research.org",
        recipient_email="bob@research.org",
        message_root=alice_outbox
    )
    
    # Add file
    message.add_file(
        source_path=alice_data,
        syftbox_path="/alice@research.org/results/metrics.csv",
        permissions={
            "read": ["bob@research.org"],
            "write": ["alice@research.org"],
            "admin": ["alice@research.org"]
        }
    )
    
    # Finalize
    message.finalize()
    print(f"Created message: {message.message_id}")
    
    # "Send" by copying to shared folder
    shared_msg_path = shared_folder / message.path.name
    shutil.copytree(message.path, shared_msg_path)
    print(f"Sent to shared folder: {shared_msg_path.name}")
    
    # BOB: Receive and process the message
    print("\n=== BOB RECEIVES MESSAGE ===")
    
    # "Receive" by copying from shared folder
    bob_msg_path = bob_inbox / message.path.name
    shutil.copytree(shared_msg_path, bob_msg_path)
    
    # Load the message
    received_msg = sc.SyftMessage(bob_msg_path)
    
    # Validate
    is_valid, error = received_msg.validate()
    print(f"Message valid: {is_valid}")
    
    # Extract files
    bob_files = tmpdir / "bob" / "extracted"
    bob_files.mkdir()
    
    for file_info in received_msg.get_files():
        print(f"\nExtracting: {file_info['filename']}")
        print(f"  From: {received_msg.sender_email}")
        print(f"  Target path: {file_info['syftbox_path']}")
        print(f"  Permissions: Read={file_info['permissions']['read']}")
        
        # Extract the file
        dest = bob_files / file_info['filename']
        received_msg.extract_file(file_info['filename'], dest)
        
        # Read and display content
        print(f"  Content: {dest.read_text()}")

=== ALICE CREATES MESSAGE ===
Created message: gdrive_alice@research.org_bob@research.org_1755886222_2d86efc4
Sent to shared folder: gdrive_alice@research.org_bob@research.org_1755886222_2d86efc4

=== BOB RECEIVES MESSAGE ===
Message valid: True

Extracting: alice_data.csv
  From: alice@research.org
  Target path: /alice@research.org/results/metrics.csv
  Permissions: Read=['bob@research.org']
  Content: metric,value
accuracy,0.95
loss,0.05


## Part 4: Security Features

Let's demonstrate the security features that protect against malicious inputs.

In [9]:
# Path traversal protection
with tempfile.TemporaryDirectory() as tmpdir:
    obj = sc.SyftObject(Path(tmpdir) / "secure_object")
    
    # These will be rejected
    dangerous_names = [
        "../../../etc/passwd",
        "./../sensitive.txt", 
        "subdir/../../../escape.txt",
        ".hidden_file.txt"
    ]
    
    for name in dangerous_names:
        try:
            obj.write_data_file(name, b"malicious content")
            print(f"❌ SECURITY FAIL: {name} was allowed!")
        except ValueError as e:
            print(f"✓ Blocked: {name} - {e}")
    
    # These are safe
    safe_names = ["data.txt", "report.pdf", "results_2024.csv"]
    
    for name in safe_names:
        try:
            obj.write_data_file(name, b"safe content")
            print(f"✓ Allowed: {name}")
        except ValueError as e:
            print(f"❌ Wrongly blocked: {name} - {e}")

✓ Blocked: ../../../etc/passwd - Path traversal attempt detected: ../../../etc/passwd
✓ Blocked: ./../sensitive.txt - Path traversal attempt detected: ./../sensitive.txt
✓ Blocked: subdir/../../../escape.txt - Path traversal attempt detected: subdir/../../../escape.txt
✓ Blocked: .hidden_file.txt - Hidden files not allowed: .hidden_file.txt
✓ Allowed: data.txt
✓ Allowed: report.pdf
✓ Allowed: results_2024.csv


In [ ]:
# Concurrent access demonstration
import threading
import time

with tempfile.TemporaryDirectory() as tmpdir:
    obj = sc.SyftObject(Path(tmpdir) / "concurrent_test")
    
    # Initial value
    obj.set_metadata({"counter": 0})
    
    def increment_counter(thread_id):
        for i in range(5):
            # WRONG WAY - Race condition!
            # current = obj.get_metadata()["counter"]
            # time.sleep(0.01)  # Simulate some work
            # obj.update_metadata({"counter": current + 1})
            
            # RIGHT WAY - Atomic update
            def atomic_increment(metadata):
                current = metadata.get("counter", 0)
                time.sleep(0.01)  # Simulate some work
                metadata["counter"] = current + 1
                print(f"Thread {thread_id}: incremented to {current + 1}")
                return metadata
            
            obj.update_metadata_atomic(atomic_increment)
    
    # Run two threads concurrently
    t1 = threading.Thread(target=increment_counter, args=(1,))
    t2 = threading.Thread(target=increment_counter, args=(2,))
    
    t1.start()
    t2.start()
    
    t1.join()
    t2.join()
    
    # Check final value
    final = obj.get_metadata()["counter"]
    print(f"\nFinal counter: {final}")
    print(f"Expected: 10")
    print(f"Correct: {final == 10}")

### Concurrent Access: Wrong Way vs Right Way

The following example shows why you need atomic operations for concurrent updates:

In [ ]:
# WRONG WAY - Race condition demonstration
import threading
import time

with tempfile.TemporaryDirectory() as tmpdir:
    obj = sc.SyftObject(Path(tmpdir) / "race_condition_demo")
    obj.set_metadata({"counter": 0})
    
    def bad_increment(thread_id):
        for i in range(5):
            # Reading and writing are separate operations
            current = obj.get_metadata()["counter"]
            time.sleep(0.01)  # This delay makes the race condition more likely
            obj.update_metadata({"counter": current + 1})
            print(f"Thread {thread_id}: incremented to {current + 1}")
    
    threads = [threading.Thread(target=bad_increment, args=(i,)) for i in range(2)]
    for t in threads:
        t.start()
    for t in threads:
        t.join()
    
    final = obj.get_metadata()["counter"]
    print(f"\nFinal counter: {final} (Expected: 10)")
    print("❌ Race condition causes lost updates!")

In [ ]:
# RIGHT WAY - Using atomic updates
import threading
import time

with tempfile.TemporaryDirectory() as tmpdir:
    obj = sc.SyftObject(Path(tmpdir) / "atomic_demo")
    obj.set_metadata({"counter": 0})
    
    def good_increment(thread_id):
        for i in range(5):
            # The update function is called while holding an exclusive lock
            def atomic_update(metadata):
                current = metadata.get("counter", 0)
                time.sleep(0.01)  # Work inside the lock is atomic
                metadata["counter"] = current + 1
                print(f"Thread {thread_id}: incremented to {current + 1}")
                return metadata
            
            obj.update_metadata_atomic(atomic_update)
    
    threads = [threading.Thread(target=good_increment, args=(i,)) for i in range(2)]
    for t in threads:
        t.start()
    for t in threads:
        t.join()
    
    final = obj.get_metadata()["counter"]
    print(f"\nFinal counter: {final} (Expected: 10)")
    print("✅ Atomic updates ensure correctness!")

## Part 5: Performance with Large Files

The streaming features allow handling large files without loading them into memory.

In [11]:
import io

with tempfile.TemporaryDirectory() as tmpdir:
    obj = sc.SyftObject(Path(tmpdir) / "large_files")
    
    # Create a "large" file using a stream
    # (In practice, this could be a multi-GB file)
    class LargeFileStream(io.BytesIO):
        def __init__(self, size_mb=10):
            self.size = size_mb * 1024 * 1024
            self.position = 0
        
        def read(self, size=-1):
            if size == -1:
                size = self.size - self.position
            else:
                size = min(size, self.size - self.position)
            
            if size <= 0:
                return b''
            
            # Generate data on the fly
            data = b'x' * size
            self.position += size
            return data
    
    # Write a "10MB" file using streaming
    print("Writing 10MB file using streaming...")
    stream = LargeFileStream(10)
    path = obj.write_data_file_stream("large_data.bin", stream)
    print(f"Written to: {path}")
    
    # Calculate hash without loading into memory
    print("\nCalculating hash (streaming)...")
    file_hash = obj.calculate_file_hash(path)
    print(f"Hash: {file_hash[:16]}...")
    
    # The file is processed in chunks, not loaded entirely
    print(f"\nFile size: {path.stat().st_size / 1024 / 1024:.1f} MB")
    print("✓ File processed without loading into memory")

Writing 10MB file using streaming...
Written to: /private/var/folders/d4/s582723j2hqbtw60rnn5345r0000gn/T/tmpjpi4y43k/large_files/data/large_data.bin

Calculating hash (streaming)...
Hash: 462a12a876c0364e...

File size: 10.0 MB
✓ File processed without loading into memory


## Summary

The `SyftObject` and `SyftMessage` classes provide:

1. **File-backed storage** - All data persists to disk
2. **Atomic operations** - No partial writes or corrupted state
3. **Concurrency safety** - Multiple processes can access safely
4. **Security** - Path traversal protection and input validation
5. **Performance** - Streaming for large files
6. **Transport agnostic** - Can be used with any transport (email, GDrive, etc.)

### Important Notes on Locking

The implementation uses file-based locking to prevent concurrent access issues:
- **exclusive_access()** - Used for write operations
- **shared_access()** - Used for read operations  
- Methods ending in **_no_lock()** are internal and assume the caller already holds a lock

This prevents deadlocks when methods call each other internally.

These building blocks enable secure, decentralized file syncing as described in the Beach RFC.