# Transactions with [`psycopg`](https://www.psycopg.org/psycopg3/docs/index.html)

In this notebook, we showcase how to connect to a [PostgreSQL](https://www.postgresql.org/) database, execute queries, and run transactions with different isolation levels using the [`psycopg`](https://www.psycopg.org/psycopg3/docs/index.html) package for python. The notebook is inspired by [this](https://pynative.com/python-postgresql-transaction-management-using-commit-and-rollback/) website.

Copyright Marcel Maltry & Jens Dittrich, [Big Data Analytics Group](https://bigdata.uni-saarland.de/), [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode)

# Setup

The following cell serves as setup. We will explain the syntax in more detail below. Here, we simply connect to the database, create a new table `accounts` with attributes `id` and `balance`, and add some toy data.

In [1]:
import psycopg

# dsn depends on installation type
dsn = 'dbname=postgres user=postgres host=/var/run/postgresql/'  # vagrant VM, host may be /tmp/ on other systems
#dsn = 'dbname=postgres user=postgres host=db' # docker

def reset_db(dsn):
    # Connect to postgres user's default database
    with psycopg.connect(dsn) as conn:

        # Open a cursor to perform database operations
        with conn.cursor() as cur:

            # Drop table if existing
            cur.execute("""DROP TABLE IF EXISTS accounts;""")

            # Create accounts table
            cur.execute("""CREATE TABLE accounts
                          (id int PRIMARY KEY, balance float(2));""")

            # Insert sample data into accounts table
            cur.execute("""INSERT INTO accounts
                           VALUES
                               (1, 2000.0),
                               (2, 520.0),
                               (3, 470.0),
                               (4, 1700.0),
                               (5, 2400.0);""")

            # Note: if no exception has been raised by the block, the transaction is commited implicitly.
            
reset_db(dsn)

# Basics

In order to send queries to the database, we first need to establish a `connection`. We call the `connect()` method and provide some basic connection parameters such as the database name, the user, the password, and the host, if the database we try to connect to is running on a remote.

We can send queries to the database with a cursor that has to be opened from an established connection via the `cursor()` method. The cursor allows us both to send queries (`execute()`) and retrieve results (`fetchone()`, `fetchall()`). Results are always tuples, even if they consist of a single integer. We have to consider this when parsing the results. When we are done, we close the cursor and the connection (this happens implicitly using the [`with`](https://docs.python.org/3/reference/compound_stmts.html#the-with-statement) statement.

The following example shows how to query the database for an entire table. We also use the cursor to obtain some additional information on the results.

In [2]:
# Connect to database as specified in dsn
with psycopg.connect(dsn) as conn:
    # Open a cursor to perform database operations
    with conn.cursor() as cur:
        # Define a SQL query
        q_accounts = """SELECT * FROM accounts;"""

        # Execute the query using the cursor
        cur.execute(q_accounts)

        # Print information on the query and its result
        print(f"The query was executed with status message \"{cur.statusmessage}\".")
        print(f"The query returned {cur.rowcount} rows that can be fetched.")
        print(f"The cursor currently points to row {cur.rownumber}.")
        print(f"The description of the query result is {cur.description}.")

        # Fetch results from cursor
        accounts = cur.fetchall()

        # Print sorted results
        print(f"The query returned the following tuples:\n{sorted(accounts)}")

The query was executed with status message "SELECT 5".
The query returned 5 rows that can be fetched.
The cursor currently points to row 0.
The description of the query result is [<Column 'id', type: int4 (oid: 23)>, <Column 'balance', type: float4 (oid: 700)>].
The query returned the following tuples:
[(1, 2000.0), (2, 520.0), (3, 470.0), (4, 1700.0), (5, 2400.0)]


# Session Parameters

`psycopg` allows us to set certain session parameters that handle, how the next transaction of a connection is executed. In the following, we will take a closer look at:

* `read_only`: The session is set to read-only and, thus, write operations will fail with an exception.
* `autocommit`: Every statement sent to the database has an immediate effect, i.e. each statement is an individual transaction that is implicitly commited upon successful completion.
* `isolation_level`: This allows us to choose from multiple isolation levels.

Session parameters can be set explicitly, e.g. `conn.read_only = True`. Note that session parameters of a connection can only be set if no transaction is currently being performed on the connection.


## Read-Only

If we set a connection to read-only mode, write operations will not be executed and instead will raise an `ReadOnlySqlTransaction` exception. The following example demonstrates this behavior.

In [3]:
with psycopg.connect(dsn) as conn:
    conn.read_only = True
    with conn.cursor() as cur:
        try:
            # Try to insert a new tuple into the table
            cur.execute("INSERT INTO accounts VALUES (6, 100000.0);")

            # If successful, print newly added tuple
            cur.execute("SELECT * FROM accounts WHERE id=6;")
            print(cur.fetchone())
        except psycopg.errors.ReadOnlySqlTransaction:
            print(f"ERROR: The query failed due to the connection being read-only.")

ERROR: The query failed due to the connection being read-only.


## Auto-Commit

If `autocommit` is set to `True`, each call of `cur.execute()` is handled as an individual transactions and will either have an immediate effect or fail. The following example shows that each modification is immediately visible to other connections to the database.

In [4]:
# no auto-commit
with psycopg.connect(dsn) as conn1:
    # Open a second connection to the database in auto-commit mode
    with psycopg.connect(dsn) as conn2:  
        with conn1.cursor() as cur1:
            cur1.execute("""INSERT INTO accounts VALUES (6, 237.0);""") 
        with conn2.cursor() as cur2:
            cur2.execute("""SELECT * FROM accounts WHERE id=6;""")
            print(f"Due to NO auto-commit, the tuple with id=6 is NOT visible to the other connection: {cur2.fetchall()}") 

Due to NO auto-commit, the tuple with id=6 is NOT visible to the other connection: []


In [5]:
# after connection 1 is committed, the tuple is visible
with psycopg.connect(dsn) as conn:
    with conn.cursor() as cur:
        cur.execute("""SELECT * FROM accounts WHERE id=6;""")
        print(f'{cur.fetchall()}')

[(6, 237.0)]


In [6]:
# auto-commit, same code as above (including autocommit)
with psycopg.connect(dsn) as conn1:
    # Set connection to auto-commit mode
    conn1.autocommit = True
    with psycopg.connect(dsn) as conn2:
        # Set connection to auto-commit mode
        conn2.autocommit = True        
        with conn1.cursor() as cur1:
            cur1.execute("""INSERT INTO accounts VALUES (7, 237.0);""") 
        # Use cursor from second connection to see immediate effect
        with conn2.cursor() as cur2:
            cur2.execute("""SELECT * FROM accounts WHERE id=7;""")
            print(f"Due to auto-commit, the tuple with id=7 is already visible to the other connection: {cur2.fetchall()}") 

Due to auto-commit, the tuple with id=7 is already visible to the other connection: [(7, 237.0)]


### Commit Transaction

If we set `autocommit` to `False` (this is the default setting), the first call of execute on a cursor begins a new transaction ([`BEGIN`](https://www.postgresql.org/docs/current/sql-begin.html)) and `commit()` has to be called explicitly to make the transaction persistent ([`COMMIT`](https://www.postgresql.org/docs/current/sql-commit.html)). The example below transfers money from one account to another. It is equivalent to running the following transaction directly from the database shell:
```SQL
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id=3;
UPDATE accounts SET balance = balance + 100 WHERE id=1;
COMMIT;
```
We also show that as long as the transaction is not commited, changes are not visible to other connections. Note that this also depends on the isolation level (we will get back to this).

### Show visibility of local changes:

In [7]:
with psycopg.connect(dsn) as conn1:
    # Set connection to transaction mode
    conn1.autocommit = False
    with psycopg.connect(dsn) as conn2:
        # Set connection to autocommit mode
        conn2.autocommit = True        
        with conn1.cursor() as cur1:
             with conn2.cursor() as cur2:
                # Update balance of account 3, implicitly begins a transaction
                cur1.execute("""UPDATE accounts SET balance = balance - 100 WHERE id=3;""") 
                # Update balance of account 1, implicitly begins a transaction
                cur1.execute("""UPDATE accounts SET balance = balance + 100 WHERE id=1;""")
                # Compare states visible to both transactions
                q_acc = """SELECT * FROM accounts WHERE id=1 OR id=3;"""
                cur1.execute(q_acc)
                cur2.execute(q_acc)
                print(f"Account balances observed by each connection before COMMIT:\n"\
                      f"Transaction 1: {cur1.fetchall()}\n"\
                      f"Transaction 2: {cur2.fetchall()}\n"\
                      f"Changes not yet visible to connection 2."\
                     )
                # Explicitly commit transaction 1
                conn1.commit()
                print("--Transaction 1 commited--")
                # Compare states visible to both transactions
                cur1.execute(q_acc)
                cur2.execute(q_acc)
                print(f"Account balances observed by each connection after COMMIT:\n"\
                      f"Transaction 1: {cur1.fetchall()}\n"\
                      f"Transaction 2: {cur2.fetchall()}\n"\
                      f"Changes visible to transaction 2."\
                     )

Account balances observed by each connection before COMMIT:
Transaction 1: [(1, 2100.0), (3, 370.0)]
Transaction 2: [(1, 2000.0), (3, 470.0)]
Changes not yet visible to connection 2.
--Transaction 1 commited--
Account balances observed by each connection after COMMIT:
Transaction 1: [(1, 2100.0), (3, 370.0)]
Transaction 2: [(1, 2100.0), (3, 370.0)]
Changes visible to transaction 2.


### Rollback Transactions

The next example shows a similar transaction as above. The only difference is that instead of making the changes persistent, we decide to [`ABORT`](https://www.postgresql.org/docs/current/sql-abort.html) the transaction by calling `rollback()` on the connection. It is equivalent to running the following transaction directly from the database shell:
```SQL
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id=3;
UPDATE accounts SET balance = balance + 100 WHERE id=1;
ABORT;
```
All changes performed by the aborted must not become durable in the database. Note that if we `close()` an open connection, `rollback()` will be performed implicitly. In other words: if `autocommit` is set to `False`, calling `close()` is equivalent to aborting both the transaction and closing the connection.

In [8]:
with psycopg.connect(dsn) as conn1:
    with psycopg.connect(dsn) as conn2:     
        with conn1.cursor() as cur1:
             with conn2.cursor() as cur2:
                # Update balance of account 3, implicitly begins a transaction
                cur1.execute("""UPDATE accounts SET balance = balance - 100 WHERE id=3;""") 
                # Update balance of account 1, implicitly begins a transaction
                cur1.execute("""UPDATE accounts SET balance = balance + 100 WHERE id=1;""")
                # Compare states visible to both transactions
                q_acc = """SELECT * FROM accounts WHERE id=1 OR id=3;"""
                cur1.execute(q_acc)
                cur2.execute(q_acc)
                print(f"Account balances observed by each connection before COMMIT:\n"\
                      f"Transaction 1: {cur1.fetchall()}\n"\
                      f"Transaction 2: {cur2.fetchall()}\n"\
                      f"Changes not yet visible to transaction 2."\
                     )
                # Abort transaction 1
                conn1.rollback()
                print("--Transaction 1 aborted--")
                # Compare states visible to both transactions
                cur1.execute(q_acc)
                cur2.execute(q_acc)
                print(f"Account balances observed by each connection after ABORT:\n"\
                      f"Transaction 1: {cur1.fetchall()}\n"\
                      f"Transaction 2: {cur2.fetchall()}\n"\
                      f"Changes of transaction 1 undone."\
                     )

Account balances observed by each connection before COMMIT:
Transaction 1: [(1, 2200.0), (3, 270.0)]
Transaction 2: [(1, 2100.0), (3, 370.0)]
Changes not yet visible to transaction 2.
--Transaction 1 aborted--
Account balances observed by each connection after ABORT:
Transaction 1: [(1, 2100.0), (3, 370.0)]
Transaction 2: [(1, 2100.0), (3, 370.0)]
Changes of transaction 1 undone.


## Isolation Levels

Furthermore, we can set the isolation level ([`SET TRANSACTION`](https://www.postgresql.org/docs/current/sql-set-transaction.html)) per session by calling `set_isolation_level()` or `set_session()` with the corresponding parameters.

The following example showcases the impact of isolation levels. While transaction 1 withdraws money from a bank account, transaction 2 sets its isolation level to `REPEATABLE READ` and reads the entry for this bank account. After transaction 1 commited, transaction 2 again accesses the same bank account. However, since its isolation level is set to `REPEATABLE READ` it still sees the unchanged data.

The scenario is equivalent to running the following transactions in parallel from two database shells.

Transaction 1:
```SQL
BEGIN;
SELECT * FROM accounts WHERE id=2;
UPDATE accounts SET balance = balance - 50 WHERE id=2;
COMMIT;
```

Transaction 2:
```SQL
BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM accounts WHERE id=2;
-- in the meantime Transaction 1 updates account 2 and commits.
SELECT * FROM accounts WHERE id=2;
COMMIT;
```

Note that in SQL, isolation levels are set *within* a transaction block while `psycopg` requires us to set the isolation level *before* we start a new transaction. The isolation level supported by PostgreSQL and `psycopg` can be found [here](https://www.postgresql.org/docs/current/transaction-iso.html).

In [9]:
with psycopg.connect(dsn) as conn1:
    #  the default isolation level in PostgreSQL is READ COMMITTED
    with psycopg.connect(dsn) as conn2:
        conn2.isolation_level=psycopg.IsolationLevel.REPEATABLE_READ
        with conn1.cursor() as cur1:
            with conn2.cursor() as cur2:
                # Compare states visible to both transactions
                q_acc = """SELECT * FROM accounts WHERE id=2;"""
                cur1.execute(q_acc)
                cur2.execute(q_acc)
                print(f"Account balance observed by each transaction:\n"\
                      f"Transaction 1: {cur1.fetchall()}\n"\
                      f"Transaction 2: {cur2.fetchall()}\n"\
                      f"Both transactions see the same balance.")
                
                # Withdraw money from account 2 and commit
                cur1.execute("""UPDATE accounts SET balance = balance - 50 WHERE id=2;""")
                conn1.commit()
                print("--Update perfomed, Transaction 1 commited--")

                # Compare states visible to both transactions
                q_acc = """SELECT * FROM accounts WHERE id=2;"""
                cur1.execute(q_acc)
                cur2.execute(q_acc)
                print(f"Account balance observed by each transaction:\n"\
                      f"Transaction 1: {cur1.fetchall()}\n"\
                      f"Transaction 2: {cur2.fetchall()}\n"\
                      f"Transaction 2 still sees the state from the beginning of the transaction.")

Account balance observed by each transaction:
Transaction 1: [(2, 520.0)]
Transaction 2: [(2, 520.0)]
Both transactions see the same balance.
--Update perfomed, Transaction 1 commited--
Account balance observed by each transaction:
Transaction 1: [(2, 470.0)]
Transaction 2: [(2, 520.0)]
Transaction 2 still sees the state from the beginning of the transaction.
