# Transactions and Optimistic Concurrency

Iceberg provides ACID transactions with optimistic concurrency control. This means multiple writers can work simultaneously, and Iceberg detects conflicts.

In this notebook:

* **ACID properties**: How Iceberg guarantees them
* **Optimistic concurrency**: No locks, detect conflicts at commit
* **Conflict scenarios**: When commits fail
* **Conflict resolution**: Retry strategies
* **Isolation levels**: Serializable isolation

## ACID in Iceberg

**A**tomicity: All-or-nothing commits
* Metadata updates are atomic (single catalog UPDATE)
* Either everything commits or nothing does

**C**onsistency: Valid states only
* Schema always valid
* All referenced files exist

**I**solation: Writers don't interfere
* Each writer sees a consistent snapshot
* Serializable isolation (strongest level)

**D**urability: Committed changes persist
* Once catalog updated, change is permanent
* Metadata and data files are durable

In [None]:
import daft
import pyarrow as pa
import json
from pathlib import Path
from pyiceberg.catalog.sql import SqlCatalog
import threading
import time
from datetime import datetime

## Optimistic Concurrency Control

Traditional databases use **pessimistic locking**:
* Writer acquires lock
* Others wait
* Writer releases lock
* Problem: Locks block other writers

Iceberg uses **optimistic locking**:
* No locks acquired
* Each writer reads current metadata
* Makes changes
* At commit: check if metadata still current
* If changed: conflict detected, retry

### The Optimistic Lock

Remember the catalog UPDATE from the metadata notebook:

```sql
UPDATE iceberg_tables
SET metadata_location = 'new.json',
    previous_metadata_location = 'current.json'
WHERE table_name = 'events'
  AND metadata_location = 'current.json'  -- Optimistic lock!
```

If another writer committed first:
* `metadata_location` is no longer `'current.json'`
* WHERE clause doesn't match
* UPDATE affects 0 rows
* Commit fails → must retry

In [None]:
# Setup
warehouse_path = Path('../data/warehouse_concurrency').absolute()
warehouse_path.mkdir(parents=True, exist_ok=True)
catalog_db = warehouse_path / 'catalog.db'
catalog_db.unlink(missing_ok=True)

catalog = SqlCatalog('concurrency_demo', **{'uri': f'sqlite:///{catalog_db}', 'warehouse': f'file://{warehouse_path}'})
catalog.create_namespace('demo')

# Create initial table
df_events = daft.read_json('../data/input/events.jsonl')
df_initial = df_events.limit(5000)

arrow_table = df_initial.to_arrow()
events_table = catalog.create_table('demo.events', schema=pa.schema(arrow_table.schema))
events_table.append(arrow_table)

print(f"✅ Created table with {len(arrow_table):,} records")

## Simulating Concurrent Writers

Let's simulate two writers trying to commit simultaneously.

### Scenario 1: Both append to same table

In [None]:
def writer_task(writer_id, start_idx, count, results):
    """Simulated writer that appends data"""
    try:
        # Each writer creates its own catalog connection
        local_catalog = SqlCatalog('writer', **{'uri': f'sqlite:///{catalog_db}', 'warehouse': f'file://{warehouse_path}'})
        local_table = local_catalog.load_table('demo.events')
        
        # Load data using Daft
        df_all = daft.read_json('../data/input/events.jsonl')
        df_batch = df_all.offset(start_idx).limit(count)
        
        # Small delay to ensure both writers start around the same time
        time.sleep(0.1)
        
        # Try to append
        start = time.time()
        arrow = df_batch.to_arrow()
        local_table.append(arrow)
        elapsed = time.time() - start
        
        results[writer_id] = {'success': True, 'time': elapsed, 'records': len(arrow)}
        print(f"✅ Writer {writer_id} committed {len(arrow)} records in {elapsed:.2f}s")
        
    except Exception as e:
        results[writer_id] = {'success': False, 'error': str(e)}
        print(f"❌ Writer {writer_id} failed: {e}")

# Run two writers concurrently
print("Starting two concurrent writers...\n")
results = {}
threads = [
    threading.Thread(target=writer_task, args=(1, 5000, 1000, results)),
    threading.Thread(target=writer_task, args=(2, 6000, 1000, results))
]

for t in threads:
    t.start()
for t in threads:
    t.join()

print("\nResults:")
for writer_id, result in results.items():
    if result['success']:
        print(f"  Writer {writer_id}: ✅ Success ({result['records']} records in {result['time']:.2f}s)")
    else:
        print(f"  Writer {writer_id}: ❌ Failed - {result['error']}")

# Verify total count
events_table = catalog.load_table('demo.events')
df = daft.read_iceberg(events_table)
total = daft.sql("SELECT COUNT(*) as total FROM df").collect().to_pydict()['total'][0]
print(f"\nTotal records in table: {total:,}")
print(f"Expected: {5000 + sum(r['records'] for r in results.values() if r['success']):,}")

### What Happened?

Depending on timing, you may see:

**Both succeed**: Writers committed at different times
* Writer 1 commits → updates metadata
* Writer 2 commits → sees new metadata, commits on top

**One fails**: True concurrent commit
* Both read same metadata
* Writer 1 commits first
* Writer 2's UPDATE fails (metadata changed)
* Writer 2 must retry

With SQLite + local filesystem, both usually succeed because operations are fast. In production with S3, conflicts are more common.

## Types of Conflicts

### Non-Conflicting Operations

These can succeed concurrently:
* Append to different partitions
* Add different columns (schema evolution)
* Different snapshot operations

### Conflicting Operations

These typically conflict:
* Both append to same partition
* Delete + Append
* Schema evolution conflicts (both add same column)
* Compaction + Write

### Demonstration: Delete + Append Conflict

In [None]:
def delete_task(results):
    try:
        local_catalog = SqlCatalog('deleter', **{'uri': f'sqlite:///{catalog_db}', 'warehouse': f'file://{warehouse_path}'})
        local_table = local_catalog.load_table('demo.events')
        time.sleep(0.05)
        local_table.delete("type = 'c8y_LocationUpdate'")
        results['delete'] = {'success': True}
        print("✅ Delete committed")
    except Exception as e:
        results['delete'] = {'success': False, 'error': str(e)}
        print(f"❌ Delete failed: {e}")

def append_task(results):
    try:
        local_catalog = SqlCatalog('appender', **{'uri': f'sqlite:///{catalog_db}', 'warehouse': f'file://{warehouse_path}'})
        local_table = local_catalog.load_table('demo.events')
        
        # Load data using Daft
        df_all = daft.read_json('../data/input/events.jsonl')
        df_batch = df_all.offset(7000).limit(500)
        
        time.sleep(0.05)
        arrow = df_batch.to_arrow()
        local_table.append(arrow)
        results['append'] = {'success': True, 'records': len(arrow)}
        print(f"✅ Append committed ({len(arrow)} records)")
    except Exception as e:
        results['append'] = {'success': False, 'error': str(e)}
        print(f"❌ Append failed: {e}")

print("Simulating concurrent delete + append...\n")
results = {}
threads = [
    threading.Thread(target=delete_task, args=(results,)),
    threading.Thread(target=append_task, args=(results,))
]

for t in threads:
    t.start()
for t in threads:
    t.join()

print("\nResult: One likely succeeded, the other would need to retry in production.")

## Conflict Resolution: Retry Strategy

When a commit fails, the application should:

1. **Reload table metadata**: Get the new current metadata
2. **Reapply changes**: Recalculate what needs to be done
3. **Retry commit**: Try again with updated metadata
4. **Exponential backoff**: Wait longer between retries

Here's a retry wrapper:

In [None]:
def retry_on_conflict(func, max_retries=3, initial_delay=0.1):
    """
    Retry a function on conflict with exponential backoff.
    
    Args:
        func: Function to execute
        max_retries: Maximum number of retry attempts
        initial_delay: Initial delay in seconds
    """
    delay = initial_delay
    
    for attempt in range(max_retries + 1):
        try:
            return func()
        except Exception as e:
            if attempt < max_retries:
                print(f"Attempt {attempt + 1} failed: {e}")
                print(f"Retrying in {delay:.2f}s...")
                time.sleep(delay)
                delay *= 2  # Exponential backoff
            else:
                print(f"All {max_retries + 1} attempts failed")
                raise

# Example usage
def append_with_retry():
    catalog = SqlCatalog('retry_demo', **{'uri': f'sqlite:///{catalog_db}', 'warehouse': f'file://{warehouse_path}'})
    table = catalog.load_table('demo.events')
    
    # Load data using Daft
    df_all = daft.read_json('../data/input/events.jsonl')
    df_batch = df_all.offset(8000).limit(100)
    
    arrow = df_batch.to_arrow()
    table.append(arrow)
    return len(arrow)

try:
    records = retry_on_conflict(append_with_retry, max_retries=3)
    print(f"\n✅ Successfully appended {records} records (with retries if needed)")
except Exception as e:
    print(f"\n❌ Failed after retries: {e}")

## Serializable Isolation

Iceberg provides **serializable isolation** - the strongest isolation level:

* Each transaction sees a consistent snapshot
* Transactions appear to execute in serial order
* No dirty reads, non-repeatable reads, or phantoms

### How It Works

1. Writer reads snapshot X
2. Makes changes based on X
3. At commit: checks if X is still current
4. If not: conflict detected

This is stronger than most databases' default (read-committed).

### Comparison

| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read | Iceberg |
|----------------|-----------|---------------------|--------------|----------|
| Read Uncommitted | ✅ | ✅ | ✅ | ❌ |
| Read Committed | ❌ | ✅ | ✅ | ❌ |
| Repeatable Read | ❌ | ❌ | ✅ | ❌ |
| Serializable | ❌ | ❌ | ❌ | ✅ |

## Production Considerations

### Catalog Choice Matters

* **SQLite (this demo)**: Not suitable for production
  - Single file, local only
  - Limited concurrency

* **AWS Glue**: Good for AWS
  - Managed service
  - Built-in optimistic locking
  - DynamoDB-backed

* **Nessie**: Git-like catalog
  - Branches and tags
  - Multi-table transactions
  - Good for complex workflows

* **REST Catalog**: Generic HTTP-based
  - Works with any backend
  - Custom implementation

### Best Practices

1. **Always retry on conflict**: Conflicts are expected
2. **Use exponential backoff**: Don't hammer the catalog
3. **Keep transactions short**: Less chance of conflict
4. **Partition your data**: Reduces write conflicts
5. **Monitor conflict rate**: High rate indicates design issues

## Review Questions

1. **Why is optimistic concurrency better for data lakes than pessimistic locking?**
   - Think about S3, distributed systems, and long-running operations.

2. **What would happen without conflict detection?**
   - How would two writers corrupt each other?

3. **How would you design a retry strategy for production?**
   - Backoff? Max retries? Logging?

4. **Can reads block writes in Iceberg?**
   - What about writes blocking reads?

5. **Why is serializable isolation expensive in most databases?**
   - How does Iceberg make it cheap?

6. **What's the difference between a conflict and a failure?**
   - When should you retry vs. give up?

## Hands-on Challenge

### Challenge 1: Simulate Schema Conflict

1. Two writers try to add different columns
2. Both use same initial schema
3. One succeeds, one conflicts
4. Implement retry logic

### Challenge 2: Conflict Rate Monitor

1. Run 10 concurrent writers
2. Track success vs. failure rate
3. Calculate: conflict percentage
4. Plot: conflicts over time

### Challenge 3: Partition-Based Isolation

1. Create a partitioned table
2. Writers each write to different partitions
3. Verify: no conflicts
4. Compare with unpartitioned

Use the cells below:

In [None]:
# Challenge 1: Your code here


In [None]:
# Challenge 2: Your code here


In [None]:
# Challenge 3: Your code here


## Summary

Iceberg provides robust concurrency control:

* **ACID transactions**: All-or-nothing, consistent, isolated, durable
* **Optimistic concurrency**: No locks, detect conflicts at commit
* **Serializable isolation**: Strongest consistency guarantee
* **Automatic conflict detection**: Via catalog UPDATE
* **Retry-friendly**: Conflicts are expected and handleable

### Key Takeaways

1. **No locks = better scalability**: Writers don't block each other
2. **Conflicts are normal**: Build retry logic
3. **Catalog matters**: Choose production-ready catalog
4. **Partitioning reduces conflicts**: Independent partitions don't conflict
5. **Short transactions win**: Less time = less chance of conflict

### What's Next?

* **Partitioning**: How to scale to millions of files
* **Object stores**: Iceberg on S3 with efficient metadata