# Module 8: Transactions

**Goal: Shatter the illusion that you are the only one using the database.**

In previous chapters, we treated the database like a personal file cabinet. You open it, read a file, close it. But in the real world, a database is a bustling train station. Thousands of people are reading, writing, and deleting data at the same time.

If two people try to buy the last concert ticket simultaneously, who wins? If you transfer money from Savings to Checking, but the power fails halfway through, does the money vanish?

This chapter explores ACID (Atomicity, Consistency, Isolation, Durability)â€”the set of physical laws that prevent data corruption in a concurrent world.

---

## 1. Setup and Connection
We will use Postgres for this chapter. While analytical engines (like DuckDB) support transactions, Postgres is the gold standard for OLTP (Online Transaction Processing), where safe, concurrent updates are critical.

We will simulate multiple users by opening two distinct connections (`conn_A` and `conn_B`) to the same database.

In [None]:
import psycopg2
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import time
import threading

# Database Connection Parameters
DB_PARAMS = {
    "host": "db_int_opt",
    "port": 5432,
    "user": "admin",
    "password": "password",
    "dbname": "db_int_opt"
}

# Reset function to clear our sandbox table
def reset_bank_table():
    with psycopg2.connect(**DB_PARAMS) as conn:
        with conn.cursor() as cur:
            cur.execute("DROP TABLE IF EXISTS accounts;")
            cur.execute("""
                CREATE TABLE accounts (
                    id SERIAL PRIMARY KEY,
                    name TEXT,
                    balance INT
                );
            """)
            # Create two accounts: Alice (1000) and Bob (0)
            cur.execute("INSERT INTO accounts (name, balance) VALUES ('Alice', 1000), ('Bob', 0);")
        conn.commit()

print("Setup Complete. Bank Table Ready.")

----

## 2. Experiment 8.1: The All-or-Nothing Rule (Atomicity)
**The Concept**: In physics, an "atom" used to mean "uncuttable." In databases, Atomicity means a transaction is indivisible.

Imagine moving $500 from Alice to Bob. This requires two physical write operations:
1. **UPDATE** Alice: Balance - 500
2. **UPDATE** Bob: Balance + 500

If the server crashes after Step 1 but before Step 2, Alice loses money, and Bob gets nothing. Money is destroyed. Atomicity guarantees that either both happen, or neither happens.

#### Step 1: Hypothesis
If we raise a Python Exception (simulating a crash) immediately after deducting money from Alice, but before committing the transaction, what will Alice's balance be when we reconnect?

#### Step 2: The Experiment
We will start a transaction, deduct money, and then force an error.

In [None]:
reset_bank_table()

def run_failed_transfer():
    try:
        # 1. Connect (Start Transaction implicitly)
        with psycopg2.connect(**DB_PARAMS) as conn:
            with conn.cursor() as cur:
                print(f"[Transaction] Deducting $500 from Alice...")
                cur.execute("UPDATE accounts SET balance = balance - 500 WHERE name = 'Alice';")
                
                # Verify the deduction happened in memory
                cur.execute("SELECT balance FROM accounts WHERE name = 'Alice';")
                print(f"[Transaction] Alice's current uncommitted balance: ${cur.fetchone()[0]}")
                
                # 2. SIMULATE CRASH
                print("[Transaction] ðŸ’¥ SYSTEM FAILURE! Raising Exception...")
                raise Exception("Power Failure Simulation")
                
                # This line never runs:
                cur.execute("UPDATE accounts SET balance = balance + 500 WHERE name = 'Bob';")
                conn.commit()
                
    except Exception as e:
        print(f"[System] Caught crash: {e}")

# Run the doomed transaction
run_failed_transfer()

# Check the reality after the dust settles
with psycopg2.connect(**DB_PARAMS) as conn:
    df_result = pd.read_sql("SELECT name, balance FROM accounts", conn)

print("\nFinal Account States:")
print(df_result)

#### Step 3: Visualization

In [None]:
plt.figure(figsize=(6, 4))
sns.barplot(data=df_result, x='name', y='balance', hue='name', palette=['red', 'blue'])
plt.title('Account Balances After Failed Transaction')
plt.axhline(y=1000, color='gray', linestyle='--', label='Original Alice Balance')
plt.ylabel('Balance ($)')
plt.legend()
plt.show()

#### Step 4: The Physics
**Why did the money come back?**

When you ran `UPDATE`, Postgres wrote the new balance (500) to a memory buffer (WAL - Write Ahead Log). However, the data on the actual disk pages was marked with a "Transaction ID" (XID) that was not yet marked as "Committed."

When the connection closed (due to the crash), Postgres performed an automatic `ROLLBACK`. It looked at the transaction log, saw that the transaction never finished, and effectively ignored the pending changes. To the database, the `UPDATE` never happened.

----

## 3. Experiment 8.2: Isolation Levels (The Changing Past)
**The Concept**: Isolation controls how much one user's "in-progress" work is visible to others.

The default isolation level in Postgres is `READ COMMITTED`.
- **Rule**: You can only see data that has been committed.
- **The Catch**: If someone commits data while you are reading, your data might change right under your nose. This is called a Non-Repeatable Read.

#### Step 1: Hypothesis
We will open two connections.
- Analyst (Conn A): Reads the total balance of the bank twice in the same transaction.
- Thief (Conn B): Steals money and commits between the Analyst's two reads.

Will the Analyst see two different numbers inside the same transaction?

#### Step 2: The Experiment

In [None]:
reset_bank_table()

# Create two separate connections
conn_A = psycopg2.connect(**DB_PARAMS) # The Analyst
conn_B = psycopg2.connect(**DB_PARAMS) # The Thief

# Turn off autocommit so we control transaction boundaries manually
conn_A.autocommit = False 
conn_B.autocommit = True

results = []

try:
    with conn_A.cursor() as cur_A, conn_B.cursor() as cur_B:
        
        # 1. Analyst starts a transaction and looks at the vault
        cur_A.execute("BEGIN;") # Start Transaction explicitly
        cur_A.execute("SELECT SUM(balance) FROM accounts;")
        read_1 = cur_A.fetchone()[0]
        results.append({'Read Attempt': '1. Start of Report', 'Total Balance': read_1})
        print(f"[Analyst] Initial Check: ${read_1}")
        
        # 2. The Thief strikes! (On a separate connection)
        # Thief updates immediately (autocommit is True)
        print(f"[Thief] Stealing $500...")
        cur_B.execute("UPDATE accounts SET balance = balance - 500 WHERE name = 'Alice';")
        
        # 3. Analyst checks again IN THE SAME TRANSACTION
        cur_A.execute("SELECT SUM(balance) FROM accounts;")
        read_2 = cur_A.fetchone()[0]
        results.append({'Read Attempt': '2. End of Report', 'Total Balance': read_2})
        print(f"[Analyst] Final Check: ${read_2}")
        
        # Clean up
        conn_A.commit()

finally:
    conn_A.close()
    conn_B.close()

#### Step 3: Visualization

In [None]:
df_iso = pd.DataFrame(results)

plt.figure(figsize=(8, 4))
bars = plt.bar(df_iso['Read Attempt'], df_iso['Total Balance'], color=['green', 'orange'])
plt.title('Total Bank Balance Seen by Analyst (Read Committed)')
plt.ylabel('Total Balance ($)')
plt.ylim(0, 1200)

# Add text labels
for bar in bars:
    yval = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2, yval + 20, f"${yval}", ha='center', va='bottom')

plt.show()

#### Step 4: The Physics
**Why did the data change inside a transaction?**

Postgres's default `READ COMMITTED` level promises: "I will only show you data that was definitely true at the start of your specific query."

It does not promise that data will remain constant throughout your entire transaction block.
1. **Read 1**: Postgres looked at the most recent "Snapshot" of the database.
2. **The Update**: The Thief created a new version of the row (MVCC) and committed it.
3. **Read 2**: Postgres took a new Snapshot. It saw the Thief's committed version and used it.

To fix this, you would need `REPEATABLE READ` isolation, which forces the database to keep using the first snapshot for the duration of the entire transaction.

----

## 4. Experiment 8.3: Phantom Rows
**The Concept**: A Phantom Read is a specific type of isolation failure where rows appear or disappear. Even if you lock the rows you are reading, a new row can be inserted by someone else that matches your search criteria.

#### Step 1: Hypothesis
- **Conn A**: Counts how many users have a balance > 0.
- **Conn B**: Inserts a new user with balance > 0.
- **Conn A**: Counts again. Will the count increase?

#### Step 2: The Experiment

In [None]:
# Re-open connections
conn_A = psycopg2.connect(**DB_PARAMS)
conn_B = psycopg2.connect(**DB_PARAMS)
conn_A.autocommit = False
conn_B.autocommit = True

phantom_results = []

try:
    with conn_A.cursor() as cur_A, conn_B.cursor() as cur_B:
        
        # 1. Analyst counts active accounts
        cur_A.execute("BEGIN;")
        cur_A.execute("SELECT COUNT(*) FROM accounts WHERE balance > 0;")
        count_1 = cur_A.fetchone()[0]
        phantom_results.append({'Stage': 'Before Insert', 'Active Users': count_1})
        print(f"[Analyst] found {count_1} active users.")
        
        # 2. Marketing Team adds a new user
        print(f"[Marketing] Signing up Charlie...")
        cur_B.execute("INSERT INTO accounts (name, balance) VALUES ('Charlie', 500);")
        
        # 3. Analyst counts again
        cur_A.execute("SELECT COUNT(*) FROM accounts WHERE balance > 0;")
        count_2 = cur_A.fetchone()[0]
        phantom_results.append({'Stage': 'After Insert', 'Active Users': count_2})
        print(f"[Analyst] found {count_2} active users.")
        
        conn_A.commit()

finally:
    conn_A.close()
    conn_B.close()

#### Step 3: Visualization

In [None]:
# (Visual metaphor for Phantom Rows - Just kidding, we use matplotlib)

df_phantom = pd.DataFrame(phantom_results)

plt.figure(figsize=(6, 4))
plt.plot(df_phantom['Stage'], df_phantom['Active Users'], marker='o', linestyle='-', color='purple', linewidth=3)
plt.title('The Phantom Row Effect (Count Mismatch)')
plt.ylabel('Count of Users')
plt.yticks([0, 1, 2, 3, 4])
plt.grid(True)
plt.show()

#### Step 4: The Physics
**Why didn't the transaction protect us?**

Transactions generally protect rows that exist. When `Conn_A` ran the first count, it locked nothing (unless we asked it to). Even if it did lock the rows "Alice" and "Bob," it cannot lock "Charlie" because Charlie didn't exist yet.

This is the **Phantom Problem**. The "gap" between rows is wide open.
- In `READ COMMITTED`, phantoms are allowed.
- In `SERIALIZABLE` isolation (the strictest level), the database uses Predicate Locks or advanced conflict detection to notice that `Conn_B` inserted into a range that `Conn_A` was querying, and would actually force one of them to fail (Serialization Failure).

----

## Key Takeaways
1. **Atomicity** saves you from partial failures. If the power goes out, the database cleans up the mess.
2. **Isolation** is a spectrum. The default settings usually prioritize speed over perfect consistency.
3. **MVCC (Multi-Version Concurrency Control)** is how Postgres allows `Conn_A` to read while `Conn_B` writes. It keeps multiple versions of the same row (old vs. new) to serve both users simultaneously without blocking.